Big O Impact of Filters

Jason dn1nj4 at gmail.com
Thu May 22 11:57:36 EDT 2014


Two additional questions here:

3. Why are both all.bin and all2.bin so much larger than the original 375MB
of data, when I thought I was using racluster to aggregate it?

4. Memory utilization throughout the tests looks to be almost 5-10x the
argus file size (125MB argus -> 1.8GB RAM, 375MB argus -> 2.2GB RAM and
625MB argus -> 3.3GB RAM <-- additional test). Do you have a feel for how
this utilization scales?

I'm trying to cluster ~50GB of argus files at a time and that's where my
problem began...


Sorry for all the questions.
Jason



On Thu, May 22, 2014 at 10:32 AM, Jason <dn1nj4 at gmail.com> wrote:

> (resending to the list)
>
> Carter,
>
> Still trying to find a workable solution for my environment...  I have
> tried the rasqlinsert route, but in my dev environment, on a very small
> data set, what takes racluster 5.8 seconds to cluster takes me 66 seconds
> to "rasqlinsert -M cache" and then read the data back out for a report.  I
> am worried that this solution won't be fast enough to keep up with my
> production data volume.
>
> Today I thought I would test clustering small files together and checking
> the results.  What I found has confused me further.  For three argus files
> (a.bin, b.bin & c.bin) that are each 125MB:
>
> racluster -w all.bin -M rmon -m saddr proto sport dport -r a.bin b.bin
> c.bin
>
> Takes ~30 seconds and generates a 485M all.bin.
>
> Attempting a staged approach, like this:
>
> racluster -w p1.bin -M rmon -m saddr proto sport dport -r a.bin b.bin
> racluster -w all2.bin -M rmon -m saddr proto sport dport -r p1.bin c.bin
>
> Takes 17s for the first run and 32s for the second and generates a 565M
> all2.bin.
>
> I guess my confusion is:
>
> 1. Why is there such a discrepancy between the resulting file sizes (485M
> vs 565M)?
> 2. Why does the second staged run (all2.bin) still take 32 seconds event
> though I've already clustered a.bin and b.bin together?
>
> Thanks again,
> Jason
>
>
> On Mon, May 19, 2014 at 12:46 PM, Carter Bullard <carter at qosient.com>wrote:
>
>> Ahhhhhh progress … can you replicate the error running racluster against
>> just the file, or the last two files?
>>
>> Any chance you can share the file ??
>> Carter
>>
>> On May 19, 2014, at 12:29 PM, Jason <dn1nj4 at gmail.com> wrote:
>>
>> Carter,
>>
>> The patch did not appear to change the error.  Removing the final file,
>> however, did (both with and without the patch).  So the implication here is
>> perhaps the file is corrupted?
>>
>> Jason
>>
>>
>> On Mon, May 19, 2014 at 11:27 AM, Carter Bullard <carter at qosient.com>wrote:
>>
>>> Hey Jason,
>>> Sorry for the barrage of email.
>>> Could you try this patch ??  Seems that it may help a little here.
>>>
>>> ==== //depot/argus/clients/clients/racluster.c#87 -
>>> /Volumes/Users/carter/argus/clients/clients/racluster.c ====
>>> 308c308
>>> <          if (ArgusSorter != NULL)
>>> ---
>>> >          if (ArgusSorter != NULL) {
>>> 309a310,311
>>> >             ArgusSorter = NULL;
>>> >          }
>>>
>>>
>>> Carter
>>>
>>>
>>> On May 19, 2014, at 11:21 AM, Carter Bullard <carter at qosient.com> wrote:
>>>
>>> > Hey Jason,
>>> > If you run with debug level 1, you’ll see the files as they are being
>>> > processed, and that can show you which file is the culprit.
>>> > If it is the last one, which is looks like it is, it maybe that one
>>> > of the threads has shutdown / deleted a construct that is needed,
>>> > like the memory manager.  This is a threads issue, so I’m going down
>>> > that path to solve this problem.
>>> >
>>> > If you see anything that suggests otherwise, like its not the last
>>> file,
>>> > send a note, if you have the time …
>>> >
>>> > Thanks for all the help !!!
>>> > Carter
>>> >
>>> > On May 19, 2014, at 9:25 AM, Carter Bullard <carter at qosient.com>
>>> wrote:
>>> >
>>> >> Hey Jason,
>>> >> Is it always the same file?   any chance it would fail on just that
>>> file ??
>>> >> If you run with "-M ind", does the problem go away ??  This option
>>> forces aggregation to be limited on each file...
>>> >>
>>> >> Carter
>>> >>
>>> >> Carter Bullard, QoSient, LLC
>>> >> 150 E. 57th Street Suite 12D
>>> >> New York, New York 10022
>>> >> +1 212 588-9133 Phone
>>> >> +1 212 588-9134 Fax
>>> >>
>>> >> On May 19, 2014, at 2:40 AM, Jason <dn1nj4 at gmail.com> wrote:
>>> >>
>>> >>> #0  0x00007ffff7349e08 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>> >>> #1  0x00007ffff734b496 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>> >>> #2  0x00007ffff734df95 in malloc () from
>>> /lib/x86_64-linux-gnu/libc.so.6
>>> >>> #3  0x000000000044c276 in ArgusMalloc (bytes=24328) at
>>> ./argus_util.c:21779
>>> >>> #4  0x000000000049388a in ArgusSortQueue (sorter=0x1c7feb40,
>>> queue=0xfdf250) at ./argus_client.c:15390
>>> >>> #5  0x0000000000404820 in RaParseComplete (sig=0) at
>>> ./racluster.c:277
>>> >>> #6  0x0000000000407cf4 in main (argc=66, argv=0x7fffffffd828) at
>>> ./argus_main.c:390
>>> >>>
>>> >>>
>>> >>> On Fri, May 16, 2014 at 10:20 AM, Carter Bullard <carter at qosient.com>
>>> wrote:
>>> >>> Hey Jason,
>>> >>> Thanks for testing this.  Any chance you can run using gdb to see
>>> where its
>>> >>> running into trouble ???  To compile with symbols:
>>> >>>
>>> >>>    % touch .devel
>>> >>>    % ./configure
>>> >>>    % make clean
>>> >>>    % make
>>> >>>
>>> >>> If it breaks in with the same error, then type
>>> >>>
>>> >>>    (gdb) where
>>> >>>
>>> >>> That should be very helpful !!!!
>>> >>>
>>> >>> Carter
>>> >>>
>>> >>> On May 16, 2014, at 8:42 AM, Jason <dn1nj4 at gmail.com> wrote:
>>> >>>
>>> >>>> I'm testing the 3.0.2.27 now.  Duplicating the original test in
>>> this thread produced much more reasonable results.  When I run against a
>>> larger test data set though (around 40 input files), I am getting the
>>> following error:
>>> >>>>
>>> >>>> *** glibc detected *** racluster: corrupted double-linked list:
>>> 0x000000001e900470 ***
>>> >>>>
>>> >>>> The error is the same each time I run the test.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Thu, May 15, 2014 at 10:40 PM, Carter Bullard <
>>> carter at qosient.com> wrote:
>>> >>>> Hey Jason,
>>> >>>> So I uploaded argus-clients-3.0.7.27 that has a complete fix in
>>> >>>> for the problem you reported.  FYI, the problem was that we were
>>> >>>> calling the queue timeout management routines on every flow,
>>> >>>> which, interestingly, really crushed the routine when the idle
>>> >>>> timers and status timers were both turned on, both not in the
>>> >>>> same order of magnitude, and a specific filter gets a large
>>> >>>> number of hits in a short period of time...
>>> >>>>
>>> >>>> That of course is / was really stupid, not really a bug, but kinda
>>> of a bug.
>>> >>>>
>>> >>>> OK, the fix that is now in has independent logic for managing the
>>> idle
>>> >>>> and status timeouts. Each filter entry get a complete aggregation
>>> >>>> engine, and processing queue, so we can use an efficient idle
>>> timeout
>>> >>>> processing strategy, but we need to process the status timeouts
>>> >>>> independently, which we now do once every second.
>>> >>>>
>>> >>>> Hopefully things are working better for you now.
>>> >>>>
>>> >>>> Carter
>>> >>>>
>>> >>>> On May 15, 2014, at 11:30 AM, Carter Bullard <carter at qosient.com>
>>> wrote:
>>> >>>>
>>> >>>>> Hey Jason,
>>> >>>>> Could you give this version of racluster() a run to see if it does
>>> >>>>> what you want ???  The principal difference is that the output of
>>> >>>>> this new racluster() will have records a bit more out of order
>>> >>>>> that the other version.
>>> >>>>>
>>> >>>>> With streaming data, you may not get status reports timely (like
>>> >>>>> within 0.25 seconds of the status timer expiration) but you will
>>> >>>>> get correct status record reporting, driven by the idle timeout
>>> >>>>> period.  I’ll improve this behavior later today.
>>> >>>>>
>>> >>>>> Sorry for any inconvenience, and thanks for pushing on this !!!!
>>> >>>>>
>>> >>>>> Carter
>>> >>>>>
>>> >>>>> <racluster.c>
>>> >>>>>
>>> >>>>> On May 15, 2014, at 10:20 AM, Carter Bullard <carter at qosient.com>
>>> wrote:
>>> >>>>>
>>> >>>>>> Hey Jason,
>>> >>>>>> Found the problem, and its a poor design assumption on my part.
>>> >>>>>> Its a kind of a thrash between the status timer and the idle
>>> timer.
>>> >>>>>> This does not affect rabins() or radium(), just racluster().
>>> >>>>>>
>>> >>>>>> Fixing it now.
>>> >>>>>>
>>> >>>>>> Carter
>>> >>>>>>
>>> >>>>>> On May 14, 2014, at 5:53 PM, Jason <dn1nj4 at gmail.com> wrote:
>>> >>>>>>
>>> >>>>>>> Hi Carter,
>>> >>>>>>>
>>> >>>>>>> So I asked a very similar question last year (
>>> http://comments.gmane.org/gmane.network.argus/9110), but I can't seem
>>> to find a response.  I apologize if I'm just missing something or have just
>>> forgotten.
>>> >>>>>>>
>>> >>>>>>> I am trying once again to understand why there is such a
>>> significant impact on the length of time it takes to run racluster when
>>> leveraging filters.  Here is the racluster.conf file I am testing:
>>> >>>>>>>
>>> >>>>>>> filter="udp and port domain" model="saddr daddr proto sport
>>> dport" status=600 idle=10
>>> >>>>>>> filter="udp" model="saddr daddr proto sport dport" status=600
>>> idle=60
>>> >>>>>>> filter="" model="saddr daddr proto sport dport" status=600
>>> idle=600
>>> >>>>>>>
>>> >>>>>>> And here are two runs against a single argus file.  The only
>>> difference is whether or not I'm using the racluster.conf:
>>> >>>>>>>
>>> >>>>>>> $ time racluster -f racluster.conf -r infile.bin -w outfile.bin
>>> -M rmon -u -c "," -m saddr proto sport dport -L0 -Z s -s stime saddr proto
>>> sport dport sbytes runtime dbytes trans state - not arp
>>> >>>>>>>
>>> >>>>>>> real    2m42.935s
>>> >>>>>>> user    2m39.274s
>>> >>>>>>> sys     0m3.288s
>>> >>>>>>>
>>> >>>>>>> $ time racluster -r infile.bin -w outfile.bin -M rmon -u -c ","
>>> -m saddr proto sport dport -L0 -Z s -s stime saddr proto sport dport sbytes
>>> runtime dbytes trans state - not arp
>>> >>>>>>>
>>> >>>>>>> real    0m1.054s
>>> >>>>>>> user    0m0.944s
>>> >>>>>>> sys     0m0.108s
>>> >>>>>>>
>>> >>>>>>> Why does the filtered option take exponentially longer?
>>> >>>>>>>
>>> >>>>>>> Thanks!
>>> >>>>>>> Jason
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140522/45f60384/attachment.html>


More information about the argus mailing list