Big O Impact of Filters
Carter Bullard
carter at qosient.com
Thu May 22 11:58:52 EDT 2014
Jason,
I’m very happy that you’re doing all this testing… I am trying
to release the new code as stable, so all this testing is great.
Just to recount where we are. You were seeing very large clustering
times, and that is now fixed ????
The rasqlinsert() issues you are discovering are great, and
I’m fixing now, as I would like to release ASAP. Default engine
when using the -X option has been fixed, and the changes that
I made to flush the queues when rasqlinsert() has finished reading
all its input files, is broken, and I’m fixing now… (zero metric counts).
OK, after bugs, performance is never a straight forward thing
but you need to understand how things work before you worry
about how to make them faster.
The database will slow things down… we use it when we want lots of
programs to get concurrent access to the same data, and using it
as a backing store to get around RAM limitations works great, but
it is slower than running anything in memory.
Now, your racluster() command is completely broken. You can’t
use the “ -M rmon “ option and process bidirectional objects.
So get rid of the “ dport “ in your aggregation mask. It doesn’t
make any sense. Things will go much faster, as there will be less
aggregates in the end.
The -M rmon option duplicates the input record stream so
that it can swap the identifiers and metrics, if you use that
incorrectly you will screw things up completely. SO,
running “-M rmon” against data that has already been “rmon’d”
is a bad thing. So, don’t do that !!!!!
so do this:
racluster -M rmon -m smac saddr sport -r a.bin b.bin c.bin
and compare with
racluster -M rmon -m smac saddr sport -r a.bin
racluster -M rmon -m smac saddr sport -r b.bin
racluster -M rmon -m smac saddr sport -r c.bin
To stage processing
racluster -M rmon -m smac saddr sport -r a.bin -w p1
racluster -m rmon -m smac saddr sport -r b.bin -w t
racluster -r p1 t -w t2; mv t2 t
racluster -m rmon -m smac saddr sport -r c.bin -w t
racluster -r p1 t -w t2; mv t2 t
To understand the output of each stage, try to see what the output
looks like. use racount() on the intermediate files, so you can
see what kind of loads you’re dealing with.
Carter
On May 22, 2014, at 10:32 AM, Jason <dn1nj4 at gmail.com> wrote:
> (resending to the list)
>
> Carter,
>
> Still trying to find a workable solution for my environment... I have tried the rasqlinsert route, but in my dev environment, on a very small data set, what takes racluster 5.8 seconds to cluster takes me 66 seconds to "rasqlinsert -M cache" and then read the data back out for a report. I am worried that this solution won't be fast enough to keep up with my production data volume.
>
> Today I thought I would test clustering small files together and checking the results. What I found has confused me further. For three argus files (a.bin, b.bin & c.bin) that are each 125MB:
>
> racluster -w all.bin -M rmon -m saddr proto sport dport -r a.bin b.bin c.bin
>
> Takes ~30 seconds and generates a 485M all.bin.
>
> Attempting a staged approach, like this:
>
> racluster -w p1.bin -M rmon -m saddr proto sport dport -r a.bin b.bin
> racluster -w all2.bin -M rmon -m saddr proto sport dport -r p1.bin c.bin
>
> Takes 17s for the first run and 32s for the second and generates a 565M all2.bin.
>
> I guess my confusion is:
>
> 1. Why is there such a discrepancy between the resulting file sizes (485M vs 565M)?
> 2. Why does the second staged run (all2.bin) still take 32 seconds event though I've already clustered a.bin and b.bin together?
>
> Thanks again,
> Jason
>
>
> On Mon, May 19, 2014 at 12:46 PM, Carter Bullard <carter at qosient.com> wrote:
> Ahhhhhh progress … can you replicate the error running racluster against
> just the file, or the last two files?
>
> Any chance you can share the file ??
> Carter
>
> On May 19, 2014, at 12:29 PM, Jason <dn1nj4 at gmail.com> wrote:
>
>> Carter,
>>
>> The patch did not appear to change the error. Removing the final file, however, did (both with and without the patch). So the implication here is perhaps the file is corrupted?
>>
>> Jason
>>
>>
>> On Mon, May 19, 2014 at 11:27 AM, Carter Bullard <carter at qosient.com> wrote:
>> Hey Jason,
>> Sorry for the barrage of email.
>> Could you try this patch ?? Seems that it may help a little here.
>>
>> ==== //depot/argus/clients/clients/racluster.c#87 - /Volumes/Users/carter/argus/clients/clients/racluster.c ====
>> 308c308
>> < if (ArgusSorter != NULL)
>> ---
>> > if (ArgusSorter != NULL) {
>> 309a310,311
>> > ArgusSorter = NULL;
>> > }
>>
>>
>> Carter
>>
>>
>> On May 19, 2014, at 11:21 AM, Carter Bullard <carter at qosient.com> wrote:
>>
>> > Hey Jason,
>> > If you run with debug level 1, you’ll see the files as they are being
>> > processed, and that can show you which file is the culprit.
>> > If it is the last one, which is looks like it is, it maybe that one
>> > of the threads has shutdown / deleted a construct that is needed,
>> > like the memory manager. This is a threads issue, so I’m going down
>> > that path to solve this problem.
>> >
>> > If you see anything that suggests otherwise, like its not the last file,
>> > send a note, if you have the time …
>> >
>> > Thanks for all the help !!!
>> > Carter
>> >
>> > On May 19, 2014, at 9:25 AM, Carter Bullard <carter at qosient.com> wrote:
>> >
>> >> Hey Jason,
>> >> Is it always the same file? any chance it would fail on just that file ??
>> >> If you run with "-M ind", does the problem go away ?? This option forces aggregation to be limited on each file...
>> >>
>> >> Carter
>> >>
>> >> Carter Bullard, QoSient, LLC
>> >> 150 E. 57th Street Suite 12D
>> >> New York, New York 10022
>> >> +1 212 588-9133 Phone
>> >> +1 212 588-9134 Fax
>> >>
>> >> On May 19, 2014, at 2:40 AM, Jason <dn1nj4 at gmail.com> wrote:
>> >>
>> >>> #0 0x00007ffff7349e08 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> >>> #1 0x00007ffff734b496 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> >>> #2 0x00007ffff734df95 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
>> >>> #3 0x000000000044c276 in ArgusMalloc (bytes=24328) at ./argus_util.c:21779
>> >>> #4 0x000000000049388a in ArgusSortQueue (sorter=0x1c7feb40, queue=0xfdf250) at ./argus_client.c:15390
>> >>> #5 0x0000000000404820 in RaParseComplete (sig=0) at ./racluster.c:277
>> >>> #6 0x0000000000407cf4 in main (argc=66, argv=0x7fffffffd828) at ./argus_main.c:390
>> >>>
>> >>>
>> >>> On Fri, May 16, 2014 at 10:20 AM, Carter Bullard <carter at qosient.com> wrote:
>> >>> Hey Jason,
>> >>> Thanks for testing this. Any chance you can run using gdb to see where its
>> >>> running into trouble ??? To compile with symbols:
>> >>>
>> >>> % touch .devel
>> >>> % ./configure
>> >>> % make clean
>> >>> % make
>> >>>
>> >>> If it breaks in with the same error, then type
>> >>>
>> >>> (gdb) where
>> >>>
>> >>> That should be very helpful !!!!
>> >>>
>> >>> Carter
>> >>>
>> >>> On May 16, 2014, at 8:42 AM, Jason <dn1nj4 at gmail.com> wrote:
>> >>>
>> >>>> I'm testing the 3.0.2.27 now. Duplicating the original test in this thread produced much more reasonable results. When I run against a larger test data set though (around 40 input files), I am getting the following error:
>> >>>>
>> >>>> *** glibc detected *** racluster: corrupted double-linked list: 0x000000001e900470 ***
>> >>>>
>> >>>> The error is the same each time I run the test.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Thu, May 15, 2014 at 10:40 PM, Carter Bullard <carter at qosient.com> wrote:
>> >>>> Hey Jason,
>> >>>> So I uploaded argus-clients-3.0.7.27 that has a complete fix in
>> >>>> for the problem you reported. FYI, the problem was that we were
>> >>>> calling the queue timeout management routines on every flow,
>> >>>> which, interestingly, really crushed the routine when the idle
>> >>>> timers and status timers were both turned on, both not in the
>> >>>> same order of magnitude, and a specific filter gets a large
>> >>>> number of hits in a short period of time...
>> >>>>
>> >>>> That of course is / was really stupid, not really a bug, but kinda of a bug.
>> >>>>
>> >>>> OK, the fix that is now in has independent logic for managing the idle
>> >>>> and status timeouts. Each filter entry get a complete aggregation
>> >>>> engine, and processing queue, so we can use an efficient idle timeout
>> >>>> processing strategy, but we need to process the status timeouts
>> >>>> independently, which we now do once every second.
>> >>>>
>> >>>> Hopefully things are working better for you now.
>> >>>>
>> >>>> Carter
>> >>>>
>> >>>> On May 15, 2014, at 11:30 AM, Carter Bullard <carter at qosient.com> wrote:
>> >>>>
>> >>>>> Hey Jason,
>> >>>>> Could you give this version of racluster() a run to see if it does
>> >>>>> what you want ??? The principal difference is that the output of
>> >>>>> this new racluster() will have records a bit more out of order
>> >>>>> that the other version.
>> >>>>>
>> >>>>> With streaming data, you may not get status reports timely (like
>> >>>>> within 0.25 seconds of the status timer expiration) but you will
>> >>>>> get correct status record reporting, driven by the idle timeout
>> >>>>> period. I’ll improve this behavior later today.
>> >>>>>
>> >>>>> Sorry for any inconvenience, and thanks for pushing on this !!!!
>> >>>>>
>> >>>>> Carter
>> >>>>>
>> >>>>> <racluster.c>
>> >>>>>
>> >>>>> On May 15, 2014, at 10:20 AM, Carter Bullard <carter at qosient.com> wrote:
>> >>>>>
>> >>>>>> Hey Jason,
>> >>>>>> Found the problem, and its a poor design assumption on my part.
>> >>>>>> Its a kind of a thrash between the status timer and the idle timer.
>> >>>>>> This does not affect rabins() or radium(), just racluster().
>> >>>>>>
>> >>>>>> Fixing it now.
>> >>>>>>
>> >>>>>> Carter
>> >>>>>>
>> >>>>>> On May 14, 2014, at 5:53 PM, Jason <dn1nj4 at gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Hi Carter,
>> >>>>>>>
>> >>>>>>> So I asked a very similar question last year (http://comments.gmane.org/gmane.network.argus/9110), but I can't seem to find a response. I apologize if I'm just missing something or have just forgotten.
>> >>>>>>>
>> >>>>>>> I am trying once again to understand why there is such a significant impact on the length of time it takes to run racluster when leveraging filters. Here is the racluster.conf file I am testing:
>> >>>>>>>
>> >>>>>>> filter="udp and port domain" model="saddr daddr proto sport dport" status=600 idle=10
>> >>>>>>> filter="udp" model="saddr daddr proto sport dport" status=600 idle=60
>> >>>>>>> filter="" model="saddr daddr proto sport dport" status=600 idle=600
>> >>>>>>>
>> >>>>>>> And here are two runs against a single argus file. The only difference is whether or not I'm using the racluster.conf:
>> >>>>>>>
>> >>>>>>> $ time racluster -f racluster.conf -r infile.bin -w outfile.bin -M rmon -u -c "," -m saddr proto sport dport -L0 -Z s -s stime saddr proto sport dport sbytes runtime dbytes trans state - not arp
>> >>>>>>>
>> >>>>>>> real 2m42.935s
>> >>>>>>> user 2m39.274s
>> >>>>>>> sys 0m3.288s
>> >>>>>>>
>> >>>>>>> $ time racluster -r infile.bin -w outfile.bin -M rmon -u -c "," -m saddr proto sport dport -L0 -Z s -s stime saddr proto sport dport sbytes runtime dbytes trans state - not arp
>> >>>>>>>
>> >>>>>>> real 0m1.054s
>> >>>>>>> user 0m0.944s
>> >>>>>>> sys 0m0.108s
>> >>>>>>>
>> >>>>>>> Why does the filtered option take exponentially longer?
>> >>>>>>>
>> >>>>>>> Thanks!
>> >>>>>>> Jason
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140522/59713dc0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140522/59713dc0/attachment.sig>
More information about the argus
mailing list