rabins issue, maybe order related

Carter Bullard carter at qosient.com
Wed Jun 20 13:29:38 EDT 2012


Hey  Mark,
Yes, all ra* programs will take "-R ." file names and sort them to figure
out what files to open in what order.    I don't think the -r filenames are
a part of that algorithm, so you can push files in front of others by using
-r instead of -R.

Currently, rabins() would like its input to be in some time order, as it uses the
stream to estimate what the starting bin time is, and how many bins to process.
rabins() will discard input records that are before its notion of the "epoch" of time.
If you use "-D 2", rabins() will print out an error message when it throws
a record away.  Something like this:

  rabins[xxx] xxxxxxxx.xxxxxxx ArgusInsertRecord(0x…., 0x,…) array too short ind %d index %d"

rabins() generally uses the first record it reads to give it some sense of what
the "epoch" of time should be.  I think it allows for a large number of seconds before
the stream start time, as " wobble", but it doesn't appear to be enough for your
situation.

You can solve this problem by giving rabins() a time filter that spans  the
time of your processing.  This tell rabins to preallocate bins for the time
span, and you are guaranteed that your data will be processed, regardless
of what order it comes in.  Since your doing day bins, you can probably
get away with an entire year as your filter " -t -12M+12M ".  If you were doing
seconds bins, you'd want to scale that back a bit.

There is an option for rabins() to process all the data twice, once to read
in all the data in order to get the timerange, and the second pass to
actually process the data.  I'll have to re-enable that, if its needed.

Hope this helps !!!

Carter



On Jun 20, 2012, at 12:57 PM, Mark E. Mallett wrote:

> Hi,
> 
> I'm having an odd issue with rabins.  I'm trying to process
> multiple files and directories with a mix of "-r" and "-R"
> options. If the response at this point is "don't do that,"
> then that can be that and you can stop reading :)
> 
> However, other ra* tools seem happy with the mix.
> 
> I'm not sure I can construct a test with your sample data, so I'd just
> like to present the alleged problem in narrative form first. I can
> probably provide data if necessary.
> 
> Say I have an argus file at /usr/local/argus/summary/20120601 .
> As you might imagine, it contains records generated on June 1, 2012.
> 
> Say I have a directory at /usr/local/argus/archive/2012/06/20 .  There
> are argus files in this directory (or even just one file) that
> contain records generated on June 20, 2012.
> 
> 
> Step 1.
> 
> I run rabins to look at simple daily data from those files. The actual
> command line is more elaborate but I reduced it to this and the issue
> is still present, so:
> 
> $ rabins -L-1 -u -m proto saddr -M time 1d \
>    -r /usr/local/argus/summary/20120601 \
>    -R /usr/local/argus/archive/2012/06/20 \
>    -s stime saddr bytes:12 trans:18 - ip |\
>    dtm-c1
> 
> where 'dtm-c1' is a simple script to convert the epoch time in the
> first column to a date format that I am interested in.
> 
> This invocation only shows me records from June 20, i.e. from the -R
> directory. There's nothing from June 1 (the -r file).
> 
> 
> Step 2.
> 
> I somehow discovered that if I make a symbolic link:
>  ln -s /usr/local/argus/summary/20120601 /tmp/20120601
> and rerun it:
> 
> $ rabins -L-1 -u -m proto saddr -M time 1d \
>    -r /tmp/20120601 \
>    -R /usr/local/argus/archive/2012/06/20 \
>    -s stime saddr bytes:12 trans:18 - ip |\
>    dtm-c1
> 
> I get all the records. Likewise if I copy the file instead of link it.
> And likewise if instead of using -R for the June 20 directory, if I
> use one or more -r options for individual files in that directory, in
> all those cases I get all records like I hope to.
> 
> 
> Step 3. Observation.
> 
> I ran strace against the various invocations and compared what was
> happening. The only significant thing that I can see is that the order
> of the opening of the files changes.  I note that /usr/local/argus/archive/
> comes alphabetically before /usr/local/argus/summary/ and that
> /tmp/20120601 comes alphabetically before /usr/local/anything - I'm
> guessing that there's some sorting going on, and it appears that when I
> move the -r file or the file reference to /tmp, it gets opened before
> the -R directories does, whereas in the original command, they sort
> in the opposite order and are opened in the opposite order.
> 
> The -r file is always being opened and read; it just seems to matter what
> order it's opened in.
> 
> I hope that this is reasonably clear and that it might even make sense.
> I can probably just use '-r' everywhere (expand the -R directories) but
> it makes for much longer command lines, and there's other boring reasons
> that I want to use -R as well.  But this does make me wonder if I need
> to open files in date order (or in bin order, at least). I scoured the
> rabins man page to see if I could find a mention of that but didn't see
> one.  I note that '-r' is documented as opening files in the order
> specified, but I don't see any mention of order of multiple -R
> directories or for a mix of -r and -R.  The command line I build actually
> does specify all files and directories in date order.
> 
> Yours,
> -mm-

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120620/0cdce444/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120620/0cdce444/attachment.bin>


More information about the argus mailing list