Rastream doesn't rotate properly when daemonied?
Carter Bullard
carter at qosient.com
Wed May 22 10:44:46 EDT 2013
Hey Matt,
Hmmm, so your emails are waaayyyyy too dense for dealing with bugs / problems.
Lets focus on one thing at a time.
Your racluster() is segfaulting. Why is it doing that? Can you run racluster()
against the file and get good results ? Do you have multiple versions of the
clients installed on your machine, so that there is ambiguity in which racluster()
or rastream() is actually running? We should clear that up before proceeding
any further.
Carter
On May 21, 2013, at 1:31 PM, Matt Brown <matthewbrown at gmail.com> wrote:
> Hello Carter,
>
> Thanks for getting back to me quickly.
>
> I've tried the following to attempt to troubleshoot the problem, but it doesn't appear that anything's outstanding.
>
> Apologies for the verbosity…
>
>
> # cd ./argus-clients-3.0.7.10
> # touch .devel .debug
> # ./configure
> # make clean && make
> # kill -2 $(pgrep rastream)
> # ./bin/rastream -d -F /etc/rastream.conf -S 127.0.0.1:561 -B 15s -M time 1h -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh -D4
> rastream[29500.c0567db7]: 2013-05-21 10:54:31.455541 ArgusNewQueue () returning 0x8609548
> rastream[29500.c0567db7]: 2013-05-21 10:54:31.455612 ArgusNewHashTable (65536) returning 0x8609bc0
> rastream[29500.c0567db7]: 2013-05-21 10:54:31.455632 ArgusNewList () returning 0x860a010
> rastream[29500.c0567db7]: 2013-05-21 10:54:31.455695 ArgusNewQueue () returning 0x8611878
> rastream[29500]: 2013-05-21 10:54:31.455729 started
>
> syslog reports:
> May 21 10:54:31 ny-sentinel rastream[29500]: 2013-05-21 10:54:31.455729 started
> May 21 11:00:18 ny-sentinel racluster[29544]: 2013-05-21 11:00:18.635754 open '/var/opt/argus/2013-05-21/argus_10:00:00': No such file or directory
> May 21 11:00:18 ny-sentinel kernel: racluster[29544]: segfault at 46 ip 001d9877 sp bfb9f200 error 4 in libc-2.12.so[179000+190000]
>
> With /usr/local/bin/rastream.sh executing `racluster -M replace -r $FILES`, using the racluster in $PATH (as it is originally).
>
> Note that '/var/opt/argus/2013-05-21/argus_10:00:00' did exist at the time [but that doesn't mean that racluster can read it ?].
>
>
>
> This is interesting:
> I modifed /usr/local/bin/rastream.sh to execute `[path]/argus-clients-3.0.7.10/bin/racluster -D4 -M replace -r $FILES` instead of just using the $PATHed racluster…
>
> # kill -2 $(pgrep rastream)
> # ./argus-clients-3.0.7.10/bin/rastream -d -F /etc/rastream.conf -S 127.0.0.1:561 -B 15s -M time 10m -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh -D4
> rastream[30131.c0f67cb7]: 2013-05-21 11:26:53.528909 ArgusNewQueue () returning 0x9e77550
> rastream[30131.c0f67cb7]: 2013-05-21 11:26:53.528982 ArgusNewHashTable (65536) returning 0x9e77bc8
> rastream[30131.c0f67cb7]: 2013-05-21 11:26:53.529008 ArgusNewList () returning 0x9e78018
> rastream[30131.c0f67cb7]: 2013-05-21 11:26:53.529072 ArgusNewQueue () returning 0x9e7f880
> rastream[30131]: 2013-05-21 11:26:53.529102 started
>
> syslog reports:
> May 21 11:26:53 ny-sentinel rastream[30131]: 2013-05-21 11:26:53.529102 started
>
> …and nothing from racluster or anything else at the 10 minute mark, even though the rotation (and problem) still occur:
>
> # ll --full-time /var/opt/argus/2013-05-21/ | tail -n 5
> -rw-r--r--. 1 root 159764 2013-05-21 11:30:59.466174549 -0400 argus_11:20:00
> -rw-r--r--. 1 root 384117 2013-05-21 11:30:17.619561000 -0400 argus_11:20:00.gz
> -rw-r--r--. 1 root 181472 2013-05-21 11:41:00.721865476 -0400 argus_11:30:00
> -rw-r--r--. 1 root 394783 2013-05-21 11:40:17.511247000 -0400 argus_11:30:00.gz
> -rw-r--r--. 1 root 1421440 2013-05-21 11:48:55.328587722 -0400 argus_11:40:00
>
>
>
> Reverting rastream.sh to execute racluster in the $PATH (as original) and changing back to 'time 1h' as:
> # kill -2 $(pgrep rastream)
> # ./argus-clients-3.0.7.10/bin/rastream -d -F /etc/rastream.conf -S 127.0.0.1:561 -B 15s -M time 1h -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh -D4
>
> results in syslog:
> May 21 12:05:37 ny-sentinel rastream[30905]: 2013-05-21 12:05:37.950988 started
>
> …and nothing from racluster (why and how?)), but the problem still exists:
>
> # ll --full-time /var/opt/argus/2013-05-21/ | tail -n 3
> -rw-r--r--. 1 root 145612 2013-05-21 13:01:00.562634873 -0400 argus_12:00:00
> -rw-r--r--. 1 root 2303564 2013-05-21 13:00:18.216161000 -0400 argus_12:00:00.gz
> -rw-r--r--. 1 root 315520 2013-05-21 13:02:35.126692699 -0400 argus_13:00:00
>
> It's quite obvious that the size of the argus data has increased a very large amount (due to the debugging).
>
>
> Does anything here indicate the problem?
>
> Is there anything else that I can do to further troubleshoot the issue?
>
>
>
>
> Here's some freaky stuff:
>
> querying the file argus_12:00:00 for ltime doesn't correlate with the last modified time of the file:
>
> # ra -nr argus_12\:00\:00 -s stime ltime -w - | rasort -m stime -s stime ltime | head -n 3
> StartTime LastTime
> 2013-05-21 12:59:03.164195 2013-05-21 12:59:06.417385
> 2013-05-21 12:59:03.369926 2013-05-21 12:59:03.501171
>
> # ra -nr argus_12\:00\:00 -s stime ltime -w - | rasort -m ltime -s stime ltime | tail -n 2
> 2013-05-21 12:59:17.117235 2013-05-21 13:00:00.000000
> 2013-05-21 12:59:28.896645 2013-05-21 13:00:00.000000
>
> # ra -nr argus_12\:00\:00 -s stime ltime | head -n 3
> StartTime LastTime
> 2013-05-21 12:05:38.036319 2013-05-21 09:52:57.532792 # this is weird.
> 2013-05-21 12:59:27.970584 2013-05-21 12:59:36.325653
>
>
>
> querying the file argus_12:00:00.gz:
>
> sort by stime:
> # ra -nr argus_12\:00\:00.gz -s stime ltime -w - | rasort -m stime -s stime ltime | head -n 3
> StartTime LastTime
> 2013-05-21 11:59:44.064216 2013-05-21 12:34:17.996512
> 2013-05-21 11:59:44.509320 2013-05-21 11:59:59.639822
>
> # ra -nr argus_12\:00\:00.gz -s stime ltime -w - | rasort -m stime -s stime ltime | tail -n 2
> 2013-05-21 12:59:02.980166 2013-05-21 12:59:02.981078
> 2013-05-21 12:59:02.980171 2013-05-21 12:59:02.981120
>
>
> sort by ltime:
> # ra -nr argus_12\:00\:00.gz -s ltime -w - | rasort -m ltime -s stime ltime | head -n 3
> StartTime LastTime
> 2013-05-21 11:59:46.214566 2013-05-21 11:59:46.314022
> 2013-05-21 11:59:51.860594 2013-05-21 11:59:51.887852
>
> # ra -nr argus_12\:00\:00.gz -s ltime -w - | rasort -m stime ltime -s stime ltime | tail -n 2
> 2013-05-21 12:00:00.000000 2013-05-21 13:00:00.000000
> 2013-05-21 12:38:12.914597 2013-05-21 13:00:00.000000
>
>
>
>
> Thanks again,
>
> Matt
>
> On May 21, 2013, at 10:49 AM, Carter Bullard <carter at qosient.com> wrote:
>
>> Yes you have to compile in debug support. We should turn on symbols, etc... as well. In the clients root directory, try this:
>>
>> % touch .devel .debug
>> % ./configure
>> % make clean; make
>>
>> And use the resulting binaries for the testing.
>> Carter
>>
>> On May 21, 2013, at 10:08 AM, Matt Brown <matthewbrown at gmail.com> wrote:
>>
>>> Carter,
>>>
>>> Do settings need to be changed at compile time in order to get debugging features? I have started with -D4 and only the start time and forked time are logged. I have confirmed that rastream is appending to the proper file.
>>>
>>>
>>> Just to let you know *latest* is still 3.0.7.9, not 3.0.7.10.
>>>
>>>
>>> Thanks very much!
>>>
>>> Matt
>>>
>>> On May 21, 2013, at 9:55 AM, Carter Bullard <carter at qosient.com> wrote:
>>>
>>>> Try the newest client distribution, just to see if any of the changes we
>>>> made to date affect the problem. This may be a tough one to debug,
>>>> but if I can recreate the issue, then we're on our way to happiness.
>>>>
>>>> We'll need you to turn on debug on your daemon, the output should
>>>> go to the syslog, so testing that briefly may help a bit. Run your rastream
>>>> with the "-d -D4" options, and see if you get any debug info in your syslog.
>>>>
>>>> Carter
>>>>
>>>> On May 21, 2013, at 9:48 AM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>
>>>>> Good morning Carter!
>>>>>
>>>>>
>>>>> I wrote an init script and had some issues when starting rastream with the script targetting ~/.rarc even if I set $ARGUSHOME or $HOME within the script (on CentOS) (due to service's use of /bin/env).
>>>>>
>>>>>
>>>>> So, the line I use within the init script is:
>>>>>
>>>>> /usr/local/bin/rastream -d -F /etc/rastream.conf -S 127.0.0.1:561 -B 15s -M time 1h -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh
>>>>>
>>>>> The same problem with timing occurs as had previously occur when relying on the full ~/.rarc.
>>>>>
>>>>>
>>>>> Here is an etherpad with relevant information:https://etherpad.mozilla.org/dmoSdfQ9H4
>>>>>
>>>>> # rastream --version
>>>>> Rastream Version 3.0.7.9
>>>>>
>>>>>
>>>>> Thanks very much Carter!
>>>>>
>>>>> Matt
>>>>>
>>>>> On May 21, 2013, at 6:44 AM, Carter Bullard <carter at qosient.com> wrote:
>>>>>
>>>>>>
>>>>>> Hey Matt,
>>>>>> So.......any chance you are setting the timezine in a rarc file somewhere, like /etc/rarc or in your yme directory ?
>>>>>> What version are you running ?
>>>>>>
>>>>>> Carter
>>>>>>
>>>>>> On May 20, 2013, at 2:42 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>>>
>>>>>>> Hello All:
>>>>>>>
>>>>>>>
>>>>>>> I am having a problem with rastream that's manifested itself when using the -f "shell script executor" argument to rotate files at 'time 1h'.
>>>>>>>
>>>>>>>
>>>>>>> If I run rastream as a daemon, then the script seems to run before the "hour" is over (and the "hour" is over at the incorrect time):
>>>>>>> # rastream -d -S 127.0.0.1:561 -B 15s -M time 1h -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh
>>>>>>>
>>>>>>> A few hours' files look like:
>>>>>>>
>>>>>>> # ls --full-time /var/opt/argus/2013-05-18
>>>>>>> total 3728
>>>>>>> -rw-r--r--. 1 root 18752 2013-05-18 01:00:59.556839459 -0400 argus_00:00:00
>>>>>>> -rw-r--r--. 1 root 160607 2013-05-18 01:00:17.793286000 -0400 argus_00:00:00.gz
>>>>>>> -rw-r--r--. 1 root 12068 2013-05-18 02:00:59.619364943 -0400 argus_01:00:00
>>>>>>> -rw-r--r--. 1 root 163409 2013-05-18 02:00:17.943700000 -0400 argus_01:00:00.gz
>>>>>>> -rw-r--r--. 1 root 9032 2013-05-1803:01:00.579907536 -0400 argus_02:00:00
>>>>>>> -rw-r--r--. 1 root 122920 2013-05-18 03:00:17.834317000 -0400 argus_02:00:00.gz
>>>>>>> -rw-r--r--. 1 root 22092 2013-05-18 04:01:00.698357771 -0400 argus_03:00:00
>>>>>>> -rw-r--r--. 1 root 122002 2013-05-18 04:00:17.835675000 -0400 argus_03:00:00.gz
>>>>>>> -rw-r--r--. 1 root 17704 2013-05-18 05:01:00.450618851 -0400 argus_04:00:00
>>>>>>> -rw-r--r--. 1 root 133212 2013-05-18 05:00:17.742040000 -0400 argus_04:00:00.gz
>>>>>>> -rw-r--r--. 1 root 14592 2013-05-18 06:00:54.886285774 -0400 argus_05:00:00
>>>>>>> -rw-r--r--. 1 root 160523 2013-05-18 06:00:17.562776000 -0400 argus_05:00:00.gz
>>>>>>>
>>>>>>> It looks like the gzipped file is last modified before the hour file, which leads me to believe that rastream isn't finished writing to the argus file before -f[] is executed.
>>>>>>>
>>>>>>>
>>>>>>> If I run rastream as follows, I have no problem:
>>>>>>> # nohup rastream -S 127.0.0.1:561 -B 15s -M time 1h -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh &
>>>>>>>
>>>>>>> A few hours' files look like:
>>>>>>>
>>>>>>> # ls --full-time /var/opt/argus/2013-05-20
>>>>>>> total 5372
>>>>>>> -rw-r--r--. 1 root 217245 2013-05-20 10:00:17.573908000 -0400 argus_09:00:00.gz
>>>>>>> -rw-r--r--. 1 root 6377 2013-05-2011:00:17.762140000 -0400 argus_10:00:00.gz
>>>>>>> -rw-r--r--. 1 root 9269 2013-05-2011:38:08.879810000 -0400 argus_11:00:00.gz
>>>>>>> -rw-r--r--. 1 root 313 2013-05-2013:00:17.170958000 -0400 argus_12:00:00.gz
>>>>>>> -rw-r--r--. 1 root 8965 2013-05-2014:00:17.540889000 -0400 argus_13:00:00.gz
>>>>>>>
>>>>>>> (early day over there)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I have verified that the system time is correct and ntpd is running properly.
>>>>>>>
>>>>>>> Clients are 3.0.7.9.
>>>>>>>
>>>>>>>
>>>>>>> Carter, do you have any ideas?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Matt
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130522/f17378f1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130522/f17378f1/attachment.bin>
More information about the argus
mailing list