Rewrite of rastream

Carter Bullard carter at qosient.com
Mon Jun 10 14:01:05 EDT 2013


OK, so I've gone through all of this and it looks to me like its working.

The files all have data that conform to the time range in the filename,
(except that single entry, but you didn't print enough information, as it
could be a management record, and while that would be weird, it could be
ok).

This is all behaving as if your sensor is sending data after the " -B 15s "
time buffer has expired.  So either your argus is using an ARGUS_FAR_STATUS_INTERVAL
longer than 15s, or your sensor looks to be out of sync, sending data that
is timestamped about 1 minute behind your collector.

If argus sends data that should go into file argus_11.00.00, which is
configured to hold records between 11-12am, after 12:00:15, then rastream()
is going to appear to screwed up.  Normally we would throw the late data
away, but right now we will recreate the 11-12 file, and then put the
late data into it.

rastream(), like rasplit(), splits records, so that they are aligned to
the time boundaries you defined with the "-M time 1h".  In your case, records
that have a start time before 12:00:00 and an ending time after 11:00:00, will
be cut into 2 records, one that will be written into a 11-12 file, and
one that will be written in the 12-13 file.  When this happens, if the
11-12 file has already been processed, and the 11-12 file is gone, then
rastream() will recreate the 11-12 file.

The collector is running the script right at 15-16 seconds after the end of the
hour, which is how you've configured it.  But, I think the source is
sending records late.  Because the new rastream() isn't rejecting these
records, we'll write them into regenerated hourly file.

I cannot reproduce this, I think, because my probes and collectors
are time synced, so that they all have the same time… I have no idea
why the -d option is getting in the way…

So let me put the record rejection logic in, and go through one more
time with a fine toothed comb, and we'll see what we get.

Carter



On Jun 5, 2013, at 2:21 PM, Matt Brown <matthewbrown at gmail.com> wrote:

> Hello again,
> 
> 
> I am seeing the following:
>  
> # ll --full-time /var/opt/argus/2013-06-05/ | tail -n 8
> -rw-r--r--. 1 root 3040945 2013-06-05 11:00:16.550353000 -0400 argus_10:00:00.gz
> -rw-r--r--. 1 root  114816 2013-06-05 12:01:00.027313065 -0400 argus_11:00:00
> -rw-r--r--. 1 root 3280470 2013-06-05 12:00:16.362839000 -0400 argus_11:00:00.gz
> -rw-r--r--. 1 root  159872 2013-06-05 13:00:58.831520433 -0400 argus_12:00:00
> -rw-r--r--. 1 root 3102872 2013-06-05 13:00:16.461063000 -0400 argus_12:00:00.gz
> -rw-r--r--. 1 root  315520 2013-06-05 14:00:59.629755798 -0400 argus_13:00:00
> -rw-r--r--. 1 root 3135652 2013-06-05 14:00:16.742423000 -0400 argus_13:00:00.gz
> -rw-r--r--. 1 root 3653760 2013-06-05 14:15:29.035682712 -0400 argus_14:00:00
>  
>  
> I have verified (using ra) that argus_11:00:00and argus_11:00:00.gz do not contain the same data (seehttps://etherpad.mozilla.org/RxWI2TPDE0 for output).
>  
>  
> I have the -B parameter set to 15s.  How do I verify that the (single instance of) argus I am running locally is time synced?  I am, for sure, processing argus data.
>  
> Your definition of the -B parameter seems a little odd to me: "we're going to wait [N seconds] after [the -M time N(time)], and then close and process the file."
> Can you confirm that the -f script is executing after or before this data is flushed to the files?
>  
>  
> You can review the stime and ltime output of files produced with the beta rastream() on:https://etherpad.mozilla.org/RxWI2TPDE0
> You can review the stime and ltime output of files produced with rastream() 3.0.7.10 on:http://thread.gmane.org/gmane.network.argus/9354/focus=9361 (search for "Here's some freaky stuff").
>  
> Both of those support that anything after `-M time N(time)` is being dropped (ltime is NEVER over the `-M time N(time)`).  Does this mean the argus is time synced?
>  
>  
> I'm still on the line of thinking that rastream is writing data to the "old file" while it executes the -f script.  I think the stime and ltimes contained in the data files, as well as the last modified time of the data file support this.
>  
>  
> If you can not reproduce the issue with the -d and -f combo (where -f is the script ../support/Config/rastream.sh), then what other system info can I gather for you that you believe would be useful in further troubleshooting?
>  
> Otherwise, I have no problem just nohupping and &ing the rastream process.  I'm happy to continue troubleshooting though.
>  
>  
>  
> Thanks much,
>  
> Matt
> 
> On Jun 5, 2013, at 6:28 AM, Carter Bullard <carter at qosient.com> wrote:
> 
>> In the new rastream(), I haven't turned on the time buffer bounds rejection, yet.
>> 
>> So are you seeing two files of data for certain time periods?  Assuming that your moving or compressing the target file, in your script, are you getting one compressed and one not compressed?
>> 
>> When you use the "-B <secs>" option, we're going to wait so many secs, after some time boundary, (like 10 seconds after 2:00pm), and then close and process the file.   Any subsequent flows that would arrive that would then need to go into the target file, must be thrown away.   They are late.
>> 
>> If this is the case, then either you're "-B <secs>" value is too small, or your argi are not time sync'd, or you're not processing argus data. 
>> The first two are fixable, the 3rd is not.
>> 
>> Or does your output file have data that is outside the time range for the target file?  That would be a bug, and I can fix that.
>> 
>> Carter
>> 
>> 
>> 
>> On Jun 4, 2013, at 9:51 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>> 
>>> Carter,
>>> 
>>> I meant to say "generated."  It is the same problem that had always occurred.  rastream must be writing a file while/after the original file is gzipped.
>>> 
>>> I can check last modified times tomorrow, but I do believe it will be exactly the same as previously mentioned.
>>> 
>>> What do you think?
>>> 
>>> 
>>> It's very interesting that you can't reproduce the problem, as the same exact problem occurs on multiple systems for me.  Regardless, the problem doesn't occur if I run it with nohup and drop it to the background with &.
>>> 
>>> 
>>> Thanks again,
>>> 
>>> Matt
>>> 
>>> 
>>> On Tue, Jun 4, 2013 at 1:43 PM, Carter Bullard <carter at qosient.com> wrote:
>>> Sounds like some progress.  That's good.
>>> Not sure that I know what you mean by "the file with excess data is still generate".
>>> What do you expect to happen, and what is it doing?
>>> 
>>> Carter 
>>> 
>>> On Jun 4, 2013, at 1:34 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>> 
>>>> Carter,
>>>> 
>>>> -d no longer causes the script to execute every second.  However the file with excess data is still generate when the script is executed.  racluster has not yet logged a segfault in syslog.
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> Matt
>>>> 
>>>> On Jun 3, 2013, at 11:17 AM, Carter Bullard <carter at qosient.com> wrote:
>>>> 
>>>>> Hey Matt,
>>>>> Don't forget to give the new rastream a run.  I have had a stability issue with 
>>>>> this new rastream(), on one of my systems, so please give this one a try, and
>>>>> yell back if you get what you want, or not !!!!  I'll want to release argus-clients-3.0.8
>>>>> soon after this is done.
>>>>> 
>>>>> Hope all is most excellent,
>>>>> 
>>>>> Carter
>>>>> 
>>>>> On May 31, 2013, at 2:07 AM, Carter Bullard <carter at qosient.com> wrote:
>>>>> 
>>>>>> Hey Matt,
>>>>>> OK, so here is a complete rewrite of rastream() that should fix your problem,
>>>>>> but if it doesn't holler.  Just replace the file ./examples/rastream/rastream.c
>>>>>> with the included file and recompile.
>>>>>> 
>>>>>> There might be a minor memory leak, but I'll do that testing tomorrow.
>>>>>> 
>>>>>> Carter
>>>>>> 
>>>>>> <rastream.c>
>>>>>> 
>>>>>> 
>>>>>> On May 30, 2013, at 4:15 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>>> 
>>>>>>> Well, that is a new twist, and will explain all your problems, running
>>>>>>> the script every second, …, that isn't correct.  rastream() doesn't do this on
>>>>>>> my machine, but I'll have to pay attention to test that …...
>>>>>>> 
>>>>>>> Just so you will know, when we, or anyone for that matter, does the
>>>>>>> UNIX daemon thing,  they must change their directory to a known
>>>>>>> place in the filesystem, which traditionally is "/", so the " -d " operation
>>>>>>> always has an effect on the environment, and that is about the only
>>>>>>> thing that is different.
>>>>>>> 
>>>>>>> Carter
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On May 30, 2013, at 3:53 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hello Carter,
>>>>>>>>  
>>>>>>>> Thanks for persistently looking into this issue.
>>>>>>>>  
>>>>>>>> I'm happy to a guinea pig for the new version, just advise where I can locate the test version.
>>>>>>>>  
>>>>>>>> Unforunately, I seriously just started using argus four weeks ago (after my short stint with it last year), so I can't make any serious recommendations at this time.  Although, having some functionality of ralabel would be nice (like cco)?  But maybe that's the point of client separation.
>>>>>>>>  
>>>>>>>>  
>>>>>>>> I hate to be so stubborn :) but I don't think there's anything wrong with the environment.
>>>>>>>>  
>>>>>>>> The contents of the script are that of ../clients/support/Config/rastream.sh  No more, no less.
>>>>>>>>  
>>>>>>>> It _is_ worth noting (again) that this does only happen when I used the daemon forking mechanism (-d) to start rastream.
>>>>>>>>  
>>>>>>>> It sounds like you hit whatever nail on whatever head, and I look forward to testing the new version!
>>>>>>>>  
>>>>>>>>  
>>>>>>>> So, fine!  Even though I don't believe it's the environment, I've added the following lines to rastream.sh befow the `racluster -M replace -r $FILES`:
>>>>>>>>  
>>>>>>>> echo $(date) " which racluster: " $(which racluster) &>> /var/log/rastreamtest.log
>>>>>>>> echo $(date) " \$FILES: " $FILES &>> /var/log/rastreamtest.log
>>>>>>>>  
>>>>>>>>  
>>>>>>>> What I see mildly astounds me:
>>>>>>>>  
>>>>>>>> /usr/local/bin/rastream -d -S 127.0.0.1:561 -B 15s -M time 1m -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh -F /etc/rastream.conf
>>>>>>>>  
>>>>>>>> There is a log entries in /var/log/rastreamtest.log for every single second.
>>>>>>>>  
>>>>>>>> Thu May 30 15:43:21 EDT 2013  which racluster:  /usr/local/bin/racluster
>>>>>>>> Thu May 30 15:43:21 EDT 2013  $FILES:  /var/opt/argus/2013-05-30/argus_15:42:00
>>>>>>>> gzip: /var/opt/argus/2013-05-30/argus_15:42:00.gz already exists;       not overwritten
>>>>>>>> Thu May 30 15:43:22 EDT 2013  which racluster:  /usr/local/bin/racluster
>>>>>>>> Thu May 30 15:43:22 EDT 2013  $FILES:  /var/opt/argus/2013-05-30/argus_15:42:00
>>>>>>>> gzip: /var/opt/argus/2013-05-30/argus_15:42:00.gz already exists;       not overwritten
>>>>>>>>  
>>>>>>>>  
>>>>>>>>  
>>>>>>>> nohup /usr/local/bin/rastream -S 127.0.0.1:561 -B 15s -M time 1m -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh -F /etc/rastream.conf &
>>>>>>>>  
>>>>>>>> There is a log entry as expected, once a minute:
>>>>>>>> Thu May 30 15:47:17 EDT 2013  which racluster:  /usr/local/bin/racluster
>>>>>>>> Thu May 30 15:47:17 EDT 2013  $FILES:  /var/opt/argus/2013-05-30/argus_15:46:00
>>>>>>>> Thu May 30 15:48:17 EDT 2013  which racluster:  /usr/local/bin/racluster
>>>>>>>> Thu May 30 15:48:17 EDT 2013  $FILES:  /var/opt/argus/2013-05-30/argus_15:47:00
>>>>>>>>  
>>>>>>>>  
>>>>>>>> So regardless, the -d logic and the -f logic do not like each other one bit.
>>>>>>>>  
>>>>>>>> Let me know what you think, and I'll await the new version to test!
>>>>>>>>  
>>>>>>>>  
>>>>>>>> Thanks!
>>>>>>>>  
>>>>>>>> Matt
>>>>>>>> 
>>>>>>>> On May 30, 2013, at 11:33 AM, Carter Bullard <carter at qosient.com> wrote:
>>>>>>>> 
>>>>>>>>> Hey Matt,
>>>>>>>>> OK, so I've found a pretty bad bug in rastream() and rasqlinsert(), based on
>>>>>>>>> your descriptions, but I'm sorry that what I've found is not supportive of
>>>>>>>>> your thesis.  This problem needs to be fixed, but existing implementations
>>>>>>>>> can continue if you are not experiencing problems.
>>>>>>>>> 
>>>>>>>>> OK, the bug is centered around the core tech in rastream(), which was
>>>>>>>>> modeled after rabins(), rather than rasplit().  rabins() has been reworked,
>>>>>>>>> and its working very well, but rastream() is a little behind those changes.
>>>>>>>>> What I'm seeing is flows not being split correctly at the end of a time
>>>>>>>>> period, so that a days data can encroach into the next day by a few
>>>>>>>>> seconds.  This doesn't work for me, at all, so I need to work this out.
>>>>>>>>> 
>>>>>>>>> As a result, I am doing a complete re-write to get rastream() back to
>>>>>>>>> its original design, and should be done Fri/Mon.  I suspect that it will
>>>>>>>>> solve any problems you are having, so are you, will you be willing to
>>>>>>>>> be a guinea pig for this?
>>>>>>>>> 
>>>>>>>>> Since I'm rewriting it based on rasplit(), now would be the time to put
>>>>>>>>> some new feature requests if there are any, and any discussions as
>>>>>>>>> to what sucks and what works in rastream() would be great !!!!!
>>>>>>>>> 
>>>>>>>>> We still need to figure out what is wrong with your environment,
>>>>>>>>> as I think that is at the root of some of what you're seeing.  You don't have
>>>>>>>>> any options that are causing your rastream() to chroot(), like a
>>>>>>>>> "-c" option?
>>>>>>>>> 
>>>>>>>>> Carter
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On May 29, 2013, at 3:23 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Hey Matt,
>>>>>>>>>> Hmmmmmmmm, well, we need to get on the same page.  Based on the last
>>>>>>>>>> set of information, I don't think that you are running the correct code, as you
>>>>>>>>>> are reporting segfault information without symbols, which means you're running
>>>>>>>>>> the wrong set of software.  If you are running the wrong software there, not
>>>>>>>>>> sure that you are running the right software anywhere.  I can give racluster()
>>>>>>>>>> bad filenames, files without permissions, etc…. and it doesn't segfault…..
>>>>>>>>>> 
>>>>>>>>>> So we have to clean things up, then we can hypothesis what the real problem is.
>>>>>>>>>> 
>>>>>>>>>> Now, I have been digging into rastream() and rasqlinsert() code and I've found
>>>>>>>>>> an issue that may be related, but I need to have some certainty with regard to
>>>>>>>>>> your setup, before I take the leap to say that I have a strategy…...
>>>>>>>>>> 
>>>>>>>>>> So, we need to understand what is going on with your rastream() script.
>>>>>>>>>> One thing to do is to simplify the script, have it do some extensive logging
>>>>>>>>>> to a file in /tmp, or /var/tmp or somewhere you like, so we can see what its
>>>>>>>>>> doing……  Like printing out its environment, what it thinks is its root directory,
>>>>>>>>>> its path, etc…...
>>>>>>>>>> 
>>>>>>>>>> Carter
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On May 29, 2013, at 1:43 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hello carter,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for replying.
>>>>>>>>>>> 
>>>>>>>>>>> Rastream and racluster are 3.0.7.10.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Are you able to reproduce the issue when using the -d switch with rastream?   I can reproduce it on other systems.
>>>>>>>>>>> 
>>>>>>>>>>> What do you think of my theory?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Hope this helps,
>>>>>>>>>>> 
>>>>>>>>>>> Matt
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On May 29, 2013, at 1:34 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hey Matt,
>>>>>>>>>>>> So where are we on this?  Its clear that you are not running the version of code
>>>>>>>>>>>> that you think you are running, so we need to clear that up before proceeding
>>>>>>>>>>>> any further ?
>>>>>>>>>>>> 
>>>>>>>>>>>> Carter
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On May 22, 2013, at 4:07 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Sorry about that, I was trying to be verbose in case you spotted something interesting that indicated a clear problem.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for getting back to me.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1) racluster segfaulting:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> racluster is executed via the script assigned by the -f parameter of the rastream instance (/usr/local/bin/rastream.sh).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> When racluster segfaults, its reporting the following to syslog:
>>>>>>>>>>>>> May 22 01:00:18 ny-sentinel kernel: racluster[5558]: segfault at 46 ip 002e1877 sp bfb54e10 error 4 in libc-2.12.so[281000+190000]
>>>>>>>>>>>>> May 22 02:00:18 ny-sentinel racluster[6044]: 02:00:18.586910 open '/var/opt/argus/2013-05-22/argus_01:00:00': No such file or directory
>>>>>>>>>>>>> May 22 02:00:18 ny-sentinel kernel: racluster[6044]: segfault at 46 ip 0020d877 sp bf854d50 error 4 in libc-2.12.so[1ad000+190000]
>>>>>>>>>>>>> May 22 03:00:18 ny-sentinel racluster[6520]: 03:00:18.534326 open '/var/opt/argus/2013-05-22/argus_02:00:00': No such file or directory
>>>>>>>>>>>>> May 22 03:00:18 ny-sentinel kernel: racluster[6520]: segfault at 46 ip 0019c877 sp bfaf2b30 error 4 in libc-2.12.so[13c000+190000]
>>>>>>>>>>>>> May 22 06:00:18 ny-sentinel racluster[8015]: 06:00:18.786627 open '/var/opt/argus/2013-05-22/argus_05:00:00': No such file or directory
>>>>>>>>>>>>> May 22 06:00:18 ny-sentinel kernel: racluster[8015]: segfault at 46 ip 002a2877 sp bfa74270 error 4 in libc-2.12.so[242000+190000]
>>>>>>>>>>>>> May 22 07:00:18 ny-sentinel racluster[8517]: 07:00:18.386497 open '/var/opt/argus/2013-05-22/argus_06:00:00': No such file or directory
>>>>>>>>>>>>> May 22 07:00:18 ny-sentinel kernel: racluster[8517]: segfault at 46 ip 0073c877 sp bfaeee10 error 4 in libc-2.12.so[6dc000+190000]
>>>>>>>>>>>>> May 22 08:00:18 ny-sentinel racluster[9006]: 08:00:18.591348 open '/var/opt/argus/2013-05-22/argus_07:00:00': No such file or directory
>>>>>>>>>>>>> May 22 08:00:18 ny-sentinel kernel: racluster[9006]: segfault at 46 ip 0023e877 sp bfe6f110 error 4 in libc-2.12.so[1de000+190000]
>>>>>>>>>>>>> May 22 10:00:19 ny-sentinel racluster[10009]: 10:00:19.991831 open '/var/opt/argus/2013-05-22/argus_09:00:00': No such file or directory
>>>>>>>>>>>>> May 22 10:00:19 ny-sentinel kernel: racluster[10009]: segfault at 46 ip 00627877sp bfd4cf50 error 4 in libc-2.12.so[5c7000+190000]
>>>>>>>>>>>>> May 22 13:00:20 ny-sentinel racluster[11556]: 13:00:20.585851 open '/var/opt/argus/2013-05-22/argus_12:00:00': No such file or directory
>>>>>>>>>>>>> May 22 13:00:20 ny-sentinel kernel: racluster[11556]: segfault at 46 ip 002bb877 sp bfda4480 error 4 in libc-2.12.so[25b000+190000]
>>>>>>>>>>>>> May 22 14:00:20 ny-sentinel racluster[12106]: 14:00:20.585760 open '/var/opt/argus/2013-05-22/argus_13:00:00': No such file or directory
>>>>>>>>>>>>> May 22 14:00:20 ny-sentinel kernel: racluster[12106]: segfault at 46 ip 00c3b877 sp bfbebb40 error 4 in libc-2.12.so[bdb000+190000]
>>>>>>>>>>>>> May 22 14:00:20 ny-sentinel racluster[12109]: 14:00:20.997232 open '/var/opt/argus/2013-05-22/argus_13:00:00': No such file or directory
>>>>>>>>>>>>> May 22 14:00:20 ny-sentinel kernel: racluster[12109]: segfault at 46 ip 001ba877 sp bf838800 error 4 in libc-2.12.so[15a000+190000]
>>>>>>>>>>>>> May 22 15:00:19 ny-sentinel racluster[12619]: 15:00:19.898971 open '/var/opt/argus/2013-05-22/argus_14:00:00': No such file or directory
>>>>>>>>>>>>> May 22 15:00:19 ny-sentinel kernel: racluster[12619]: segfault at 46 ip 001d4877 sp bf865ab0 error 4 in libc-2.12.so[174000+190000]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Note that there is no easily identifiable pattern in which files cause the segfault and which do not, but the problem occurs with every file.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I can only guess that it is segfaulting because it can't open '/var/opt/argus/N/argus_N' (No such file or directory).
>>>>>>>>>>>>> It's worth noting that in the instance mentioned in the previous email, '/var/opt/argus/2013-05-21/argus_10:00:00' did exist at the time racluster is executed.  I am concluding this because after racluster is executed, gzip compresses '/var/opt/argus/2013-05-21/argus_10:00:00' successfully.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don't believe the segfault is the problem.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2) Can I run racluster against the file and get good results?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, I can run racluster against both the gzipped resultant file, and the file that is generated by rastream that wasn't gzipped.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 3) Multiple client versions:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> When I mention modifying which racluster is executed by the shell script (executed by rastream's -f argument), I simply was attempting to have it use the racluster binary that I had compiled with support for debugging and symbols.  My reasoning was so that it would produce more verbose output to syslog, which it didn't.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There are only two sets of binaries located on the system.  One that has been compiled with debugging support and one that hasn't:
>>>>>>>>>>>>> # find / -iname racluster
>>>>>>>>>>>>> /root/argus-clients-3.0.7.10/bin/racluster #debugging support
>>>>>>>>>>>>> /usr/local/bin/racluster #no debugging support
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /usr/local/bin/rastream is executed and is writing the files I am concerned with (and is causing the problem).
>>>>>>>>>>>>> /usr/local/bin/racluster is currently executed by /usr/local/bin/rastream.sh due to its location in the $PATH (and is the one that is segfaulting occasionally).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 4) How I see the problem:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don't want to make any assumptions, or create a red herring, since I am not familiar with the code of rastream, but I believe that the following is happening:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1) `/usr/local/bin/rastream -d -S 127.0.0.1:561 -B 15s -M time 1h -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh` is executed, and then (due to -d) the daemonizing method is called and forks another rastream process and starts writing to /var/opt/argus/Y-m-d/argus_T1. (which is good)
>>>>>>>>>>>>> 2) At `time 1h`, `/usr/local/bin/rastream.sh` is forked, without blocking `rastream` from writing. (good)
>>>>>>>>>>>>> 3) While #2 occurs, rastream is _still_ writing to /var/opt/argus/Y-m-d/argus_T1 (the file from the previous `time 1h` time span, not the new `time 1h` time span). (which is bad)
>>>>>>>>>>>>> 4) `/usr/local/bin/rastream.sh` executes `racluster -M replace -r $FILES` against the file /var/opt/argus/Y-m-d/argus_T1. (bad)
>>>>>>>>>>>>> 5) `/usr/local/bin/rastream.sh` executes `gzip $FILES` and is able to add /var/opt/argus/Y-m-d/argus_T1 (bad)
>>>>>>>>>>>>> 6) `rastream` begins writing /var/opt/argus/Y-m-d/argus_T2 (the new `time 1h` file) (bad)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So, the daemonized fork of rastream doesn't start writing to the new file (/var/opt/argus/Y-m-d/argus_T2) until _after_ it executes the -f designated script `/usr/local/bin/rastream.sh`.
>>>>>>>>>>>>> Instead, what should be happening is: the daemonized fork of rastream should start writing to the new file, and _then_ execute the -f designated script `/usr/local/bin/rastream.sh`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It would seem that the way that rastream is daemonized affects the later -f logic.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hope this helps.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for your time,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Matt
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 22, 2013, at 10:45 AM, Carter Bullard <carter at qosient.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey Matt,
>>>>>>>>>>>>>> Hmmm, so your emails are waaayyyyy too dense for dealing with bugs / problems.
>>>>>>>>>>>>>> Lets focus on one thing at a time.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Your racluster() is segfaulting.  Why is it doing that?  Can you run racluster()
>>>>>>>>>>>>>> against the file and get good results ?  Do you have multiple versions of the
>>>>>>>>>>>>>> clients installed on your machine, so that there is ambiguity in which racluster()
>>>>>>>>>>>>>> or rastream() is actually running?  We should clear that up before proceeding
>>>>>>>>>>>>>> any further.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Carter
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 21, 2013, at 1:31 PM, Matt Brown <matthewbrown at gmail.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hello Carter,
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Thanks for getting back to me quickly.
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> I've tried the following to attempt to troubleshoot the problem, but it doesn't appear that anything's outstanding.
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Apologies for the verbosity…
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> # cd ./argus-clients-3.0.7.10
>>>>>>>>>>>>>>> # touch .devel .debug
>>>>>>>>>>>>>>> # ./configure
>>>>>>>>>>>>>>> # make clean && make
>>>>>>>>>>>>>>> # kill -2 $(pgrep rastream)
>>>>>>>>>>>>>>> # ./bin/rastream -d -F /etc/rastream.conf -S 127.0.0.1:561 -B 15s -M time 1h -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh -D4
>>>>>>>>>>>>>>> rastream[29500.c0567db7]: 2013-05-21 10:54:31.455541 ArgusNewQueue () returning 0x8609548
>>>>>>>>>>>>>>> rastream[29500.c0567db7]: 2013-05-21 10:54:31.455612 ArgusNewHashTable (65536) returning 0x8609bc0
>>>>>>>>>>>>>>> rastream[29500.c0567db7]: 2013-05-21 10:54:31.455632 ArgusNewList () returning 0x860a010
>>>>>>>>>>>>>>> rastream[29500.c0567db7]: 2013-05-21 10:54:31.455695 ArgusNewQueue () returning 0x8611878
>>>>>>>>>>>>>>> rastream[29500]: 2013-05-21 10:54:31.455729 started
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> syslog reports:
>>>>>>>>>>>>>>> May 21 10:54:31 ny-sentinel rastream[29500]: 2013-05-21 10:54:31.455729 started
>>>>>>>>>>>>>>> May 21 11:00:18 ny-sentinel racluster[29544]: 2013-05-21 11:00:18.635754 open '/var/opt/argus/2013-05-21/argus_10:00:00': No such file or directory
>>>>>>>>>>>>>>> May 21 11:00:18 ny-sentinel kernel: racluster[29544]: segfault at 46 ip 001d9877 sp bfb9f200 error 4 in libc-2.12.so[179000+190000]
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> With /usr/local/bin/rastream.sh executing `racluster -M replace -r $FILES`, using the racluster in $PATH (as it is originally).
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Note that '/var/opt/argus/2013-05-21/argus_10:00:00' did exist at the time [but that doesn't mean that racluster can read it ?].
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> This is interesting:
>>>>>>>>>>>>>>> I modifed /usr/local/bin/rastream.sh to execute `[path]/argus-clients-3.0.7.10/bin/racluster -D4 -M replace -r $FILES` instead of just using the $PATHed racluster…
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> # kill -2 $(pgrep rastream)
>>>>>>>>>>>>>>> # ./argus-clients-3.0.7.10/bin/rastream -d -F /etc/rastream.conf -S 127.0.0.1:561 -B 15s -M time 10m -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh -D4
>>>>>>>>>>>>>>> rastream[30131.c0f67cb7]: 2013-05-21 11:26:53.528909 ArgusNewQueue () returning 0x9e77550
>>>>>>>>>>>>>>> rastream[30131.c0f67cb7]: 2013-05-21 11:26:53.528982 ArgusNewHashTable (65536) returning 0x9e77bc8
>>>>>>>>>>>>>>> rastream[30131.c0f67cb7]: 2013-05-21 11:26:53.529008 ArgusNewList () returning 0x9e78018
>>>>>>>>>>>>>>> rastream[30131.c0f67cb7]: 2013-05-21 11:26:53.529072 ArgusNewQueue () returning 0x9e7f880
>>>>>>>>>>>>>>> rastream[30131]: 2013-05-21 11:26:53.529102 started
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> syslog reports:
>>>>>>>>>>>>>>> May 21 11:26:53 ny-sentinel rastream[30131]: 2013-05-21 11:26:53.529102 started
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> …and nothing from racluster or anything else at the 10 minute mark, even though the rotation (and problem) still occur:
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> # ll --full-time /var/opt/argus/2013-05-21/ | tail -n 5
>>>>>>>>>>>>>>> -rw-r--r--. 1 root  159764 2013-05-21 11:30:59.466174549 -0400 argus_11:20:00
>>>>>>>>>>>>>>> -rw-r--r--. 1 root  384117 2013-05-21 11:30:17.619561000 -0400 argus_11:20:00.gz
>>>>>>>>>>>>>>> -rw-r--r--. 1 root  181472 2013-05-21 11:41:00.721865476 -0400 argus_11:30:00
>>>>>>>>>>>>>>> -rw-r--r--. 1 root  394783 2013-05-21 11:40:17.511247000 -0400 argus_11:30:00.gz
>>>>>>>>>>>>>>> -rw-r--r--. 1 root 1421440 2013-05-21 11:48:55.328587722 -0400 argus_11:40:00
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Reverting rastream.sh to execute racluster in the $PATH (as original) and changing back to 'time 1h' as:
>>>>>>>>>>>>>>> # kill -2 $(pgrep rastream)
>>>>>>>>>>>>>>> # ./argus-clients-3.0.7.10/bin/rastream -d -F /etc/rastream.conf -S 127.0.0.1:561 -B 15s -M time 1h -w /var/opt/argus/%Y-%m-%d/argus_%T -f /usr/local/bin/rastream.sh -D4
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> results in syslog:
>>>>>>>>>>>>>>> May 21 12:05:37 ny-sentinel rastream[30905]: 2013-05-21 12:05:37.950988 started
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> …and nothing from racluster (why and how?)), but the problem still exists:
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> # ll --full-time /var/opt/argus/2013-05-21/ | tail -n 3
>>>>>>>>>>>>>>> -rw-r--r--. 1 root  145612 2013-05-21 13:01:00.562634873 -0400 argus_12:00:00
>>>>>>>>>>>>>>> -rw-r--r--. 1 root 2303564 2013-05-21 13:00:18.216161000 -0400 argus_12:00:00.gz
>>>>>>>>>>>>>>> -rw-r--r--. 1 root  315520 2013-05-21 13:02:35.126692699 -0400 argus_13:00:00
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> It's quite obvious that the size of the argus data has increased a very large amount (due to the debugging).
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Does anything here indicate the problem?
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Is there anything else that I can do to further troubleshoot the issue?
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Here's some freaky stuff:
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> querying the file argus_12:00:00 for ltime doesn't correlate with the last modified time of the file:
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> # ra -nr argus_12\:00\:00 -s stime ltime -w - | rasort -m stime -s stime ltime | head -n 3
>>>>>>>>>>>>>>>                  StartTime                   LastTime
>>>>>>>>>>>>>>> 2013-05-21 12:59:03.164195 2013-05-21 12:59:06.417385
>>>>>>>>>>>>>>> 2013-05-21 12:59:03.369926 2013-05-21 12:59:03.501171
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> # ra -nr argus_12\:00\:00 -s stime ltime -w - | rasort -m ltime -s stime ltime | tail -n 2
>>>>>>>>>>>>>>> 2013-05-21 12:59:17.117235 2013-05-21 13:00:00.000000
>>>>>>>>>>>>>>> 2013-05-21 12:59:28.896645 2013-05-21 13:00:00.000000
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> # ra -nr argus_12\:00\:00 -s stime ltime | head -n 3
>>>>>>>>>>>>>>>                  StartTime                   LastTime
>>>>>>>>>>>>>>> 2013-05-21 12:05:38.036319 2013-05-21 09:52:57.532792 # this is weird.
>>>>>>>>>>>>>>> 2013-05-21 12:59:27.970584 2013-05-21 12:59:36.325653
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> querying the file argus_12:00:00.gz:
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> sort by stime:
>>>>>>>>>>>>>>> # ra -nr argus_12\:00\:00.gz -s stime ltime -w - | rasort -m stime -s stime ltime | head -n 3
>>>>>>>>>>>>>>>                  StartTime                   LastTime
>>>>>>>>>>>>>>> 2013-05-21 11:59:44.064216 2013-05-21 12:34:17.996512
>>>>>>>>>>>>>>> 2013-05-21 11:59:44.509320 2013-05-21 11:59:59.639822
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> # ra -nr argus_12\:00\:00.gz -s stime ltime -w - | rasort -m stime -s stime ltime | tail -n 2
>>>>>>>>>>>>>>> 2013-05-21 12:59:02.980166 2013-05-21 12:59:02.981078
>>>>>>>>>>>>>>> 2013-05-21 12:59:02.980171 2013-05-21 12:59:02.981120
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> sort by ltime:
>>>>>>>>>>>>>>> # ra -nr argus_12\:00\:00.gz -s ltime -w - | rasort -m ltime -s stime ltime | head -n 3

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130610/18f89770/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130610/18f89770/attachment.bin>


More information about the argus mailing list