Fwd: Big O Impact of Filters

Jason dn1nj4 at gmail.com
Thu May 22 12:58:39 EDT 2014


Let me clarify and provide a bit more context...  I expect the following
flows:

1.2.3.4:23456 -> 5.6.7.8:34567
1.2.3.4:45678 -> 6.7.8.9:34567

To result in the following output data:

1.2.3.4 23456 34567
1.2.3.4 45678 34567
5.6.7.8 34567 23456
6.7.8.9 34567 45678

((in addition to various other stats aggregated with the saddr,sport,dport
fields as the key))

I'm then taking the above data and doing simplistic port groupings, such as
"34567 is (typically) part of the app1 port group" (think 80, 8000, 8080 as
typically "web").  Then I generate a report that says:

1.2.3.4, client to the app1 port group, X bytes from this client, Y bytes
to this client, Z connections from this client

5.6.7.8, server for the app1 port group, X bytes from this server, Y bytes
to this server, Z connections to this server

6.7.8.9, server for the app1 port group, X bytes from this server, Y bytes
to this server,Z connections to this server

This is a gross oversimplification, but is there a better way to do the
above?

Thanks!
Jason

On May 22, 2014, at 12:08 PM, Jason <dn1nj4 at gmail.com> wrote:



 On Thu, May 22, 2014 at 11:58 AM, Carter Bullard<carter at qosient.com> wrote:

> Jason,
> I’m very happy that you’re doing all this testing…  I am trying
> to release the new code as stable, so all this testing is great.
>

Hah.  Glad to help!?


> Just to recount where we are.  You were seeing very large clustering
> times, and that is now fixed ????
>

The clustering times were excessive when leveraging filters.  Yes, that
appears to be fixed.

Now, your racluster() command is completely broken.  You can’t
> use the “ -M rmon “ option and process bidirectional objects.
> So get rid of the “ dport “ in your aggregation mask.  It doesn’t
> make any sense.  Things will go much faster, as there will be less
> aggregates in the end.
>

Completely understand about rmon and bidirectional objects. However, this
isn't the end state of the data.  I'm leveraging racluster as an
intermediary to getting the data into a particular format. I need saddr,
sport and dport in the aggregation in order to build the final data report.



> The -M rmon option duplicates the input record stream so
> that it can swap the identifiers and metrics, if you use that
> incorrectly you will screw things up completely.  SO,
> running “-M rmon” against data that has already been “rmon’d”
> is a bad thing.  So, don’t do that !!!!!
>

Got it.  Will chew on this and continue working towards a solution.

Thanks!
Jason

Carter

On May 22, 2014, at 10:32 AM, Jason <dn1nj4 at gmail.com> wrote:

(resending to the list)

Carter,

Still trying to find a workable solution for my environment...  I have
tried the rasqlinsert route, but in my dev environment, on a very small
data set, what takes racluster 5.8 seconds to cluster takes me 66 seconds
to "rasqlinsert -M cache" and then read the data back out for a report.  I
am worried that this solution won't be fast enough to keep up with my
production data volume.

Today I thought I would test clustering small files together and checking
the results.  What I found has confused me further.  For three argus files
(a.bin, b.bin & c.bin) that are each 125MB:

racluster -w all.bin -M rmon -m saddr proto sport dport -r a.bin b.bin c.bin

Takes ~30 seconds and generates a 485M all.bin.

Attempting a staged approach, like this:

racluster -w p1.bin -M rmon -m saddr proto sport dport -r a.bin b.bin
racluster -w all2.bin -M rmon -m saddr proto sport dport -r p1.bin c.bin

Takes 17s for the first run and 32s for the second and generates a 565M
all2.bin.

I guess my confusion is:

1. Why is there such a discrepancy between the resulting file sizes (485M
vs 565M)?
2. Why does the second staged run (all2.bin) still take 32 seconds event
though I've already clustered a.bin and b.bin together?

Thanks again,
Jason


On Mon, May 19, 2014 at 12:46 PM, Carter Bullard<carter at qosient.com> wrote:

> Ahhhhhh progress … can you replicate the error running racluster against
> just the file, or the last two files?
>
> Any chance you can share the file ??
> Carter
>
> On May 19, 2014, at 12:29 PM, Jason <dn1nj4 at gmail.com> wrote:
>
> Carter,
>
> The patch did not appear to change the error.  Removing the final file,
> however, did (both with and without the patch).  So the implication here is
> perhaps the file is corrupted?
>
> Jason
>
>
> On Mon, May 19, 2014 at 11:27 AM, Carter Bullard<carter at qosient.com>
> wrote:
>
>> Hey Jason,
>> Sorry for the barrage of email.
>> Could you try this patch ??  Seems that it may help a little here.
>>
>> ==== //depot/argus/clients/clients/racluster.c#87 -
>> /Volumes/Users/carter/argus/clients/clients/racluster.c ====
>> 308c308
>> <          if (ArgusSorter != NULL)
>> ---
>> >          if (ArgusSorter != NULL) {
>> 309a310,311
>> >             ArgusSorter = NULL;
>> >          }
>>
>>
>> Carter
>>
>>
>> On May 19, 2014, at 11:21 AM, Carter Bullard <carter at qosient.com> wrote:
>>
>> > Hey Jason,
>> > If you run with debug level 1, you’ll see the files as they are being
>> > processed, and that can show you which file is the culprit.
>> > If it is the last one, which is looks like it is, it maybe that one
>> > of the threads has shutdown / deleted a construct that is needed,
>> > like the memory manager.  This is a threads issue, so I’m going down
>> > that path to solve this problem.
>> >
>> > If you see anything that suggests otherwise, like its not the last file,
>> > send a note, if you have the time …
>> >
>> > Thanks for all the help !!!
>> > Carter
>> >
>> > On May 19, 2014, at 9:25 AM, Carter Bullard <carter at qosient.com> wrote:
>> >
>> >> Hey Jason,
>> >> Is it always the same file?   any chance it would fail on just that
>> file ??
>> >> If you run with "-M ind", does the problem go away ??  This option
>> forces aggregation to be limited on each file...
>> >>
>> >> Carter
>> >>
>> >> Carter Bullard, QoSient, LLC
>> >> 150 E. 57th Street Suite 12D
>> >> New York, New York 10022
>> >> +1 212 588-9133 Phone
>> >> +1 212 588-9134 Fax
>> >>
>> >> On May 19, 2014, at 2:40 AM, Jason <dn1nj4 at gmail.com> wrote:
>> >>
>> >>> #0  0x00007ffff7349e08 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> >>> #1  0x00007ffff734b496 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> >>> #2  0x00007ffff734df95 in malloc () from
>> /lib/x86_64-linux-gnu/libc.so.6
>> >>> #3  0x000000000044c276 in ArgusMalloc (bytes=24328) at
>> ./argus_util.c:21779
>> >>> #4  0x000000000049388a in ArgusSortQueue (sorter=0x1c7feb40,
>> queue=0xfdf250) at ./argus_client.c:15390
>> >>> #5  0x0000000000404820 in RaParseComplete (sig=0) at ./racluster.c:277
>> >>> #6  0x0000000000407cf4 in main (argc=66, argv=0x7fffffffd828) at
>> ./argus_main.c:390
>> >>>
>> >>>
>> >>> On Fri, May 16, 2014 at 10:20 AM, Carter Bullard <carter at qosient.com>
>> wrote:
>> >>> Hey Jason,
>> >>> Thanks for testing this.  Any chance you can run using gdb to see
>> where its
>> >>> running into trouble ???  To compile with symbols:
>> >>>
>> >>>    % touch .devel
>> >>>    % ./configure
>> >>>    % make clean
>> >>>    % make
>> >>>
>> >>> If it breaks in with the same error, then type
>> >>>
>> >>>    (gdb) where
>> >>>
>> >>> That should be very helpful !!!!
>> >>>
>> >>> Carter
>> >>>
>> >>> On May 16, 2014, at 8:42 AM, Jason <dn1nj4 at gmail.com> wrote:
>> >>>
>> >>>> I'm testing the 3.0.2.27 now.  Duplicating the original test in this
>> thread produced much more reasonable results.  When I run against a larger
>> test data set though (around 40 input files), I am getting the following
>> error:
>> >>>>
>> >>>> *** glibc detected *** racluster: corrupted double-linked list:
>> 0x000000001e900470 ***
>> >>>>
>> >>>> The error is the same each time I run the test.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Thu, May 15, 2014 at 10:40 PM, Carter Bullard <carter at qosient.com>
>> wrote:
>> >>>> Hey Jason,
>> >>>> So I uploaded argus-clients-3.0.7.27 that has a complete fix in
>> >>>> for the problem you reported.  FYI, the problem was that we were
>> >>>> calling the queue timeout management routines on every flow,
>> >>>> which, interestingly, really crushed the routine when the idle
>> >>>> timers and status timers were both turned on, both not in the
>> >>>> same order of magnitude, and a specific filter gets a large
>> >>>> number of hits in a short period of time...
>> >>>>
>> >>>> That of course is / was really stupid, not really a bug, but kinda
>> of a bug.
>> >>>>
>> >>>> OK, the fix that is now in has independent logic for managing the
>> idle
>> >>>> and status timeouts. Each filter entry get a complete aggregation
>> >>>> engine, and processing queue, so we can use an efficient idle timeout
>> >>>> processing strategy, but we need to process the status timeouts
>> >>>> independently, which we now do once every second.
>> >>>>
>> >>>> Hopefully things are working better for you now.
>> >>>>
>> >>>> Carter
>> >>>>
>> >>>> On May 15, 2014, at 11:30 AM, Carter Bullard <carter at qosient.com>
>> wrote:
>> >>>>
>> >>>>> Hey Jason,
>> >>>>> Could you give this version of racluster() a run to see if it does
>> >>>>> what you want ???  The principal difference is that the output of
>> >>>>> this new racluster() will have records a bit more out of order
>> >>>>> that the other version.
>> >>>>>
>> >>>>> With streaming data, you may not get status reports timely (like
>> >>>>> within 0.25 seconds of the status timer expiration) but you will
>> >>>>> get correct status record reporting, driven by the idle timeout
>> >>>>> period.  I’ll improve this behavior later today.
>> >>>>>
>> >>>>> Sorry for any inconvenience, and thanks for pushing on this !!!!
>> >>>>>
>> >>>>> Carter
>> >>>>>
>> >>>>> <racluster.c>
>> >>>>>
>> >>>>> On May 15, 2014, at 10:20 AM, Carter Bullard <carter at qosient.com>
>> wrote:
>> >>>>>
>> >>>>>> Hey Jason,
>> >>>>>> Found the problem, and its a poor design assumption on my part.
>> >>>>>> Its a kind of a thrash between the status timer and the idle timer.
>> >>>>>> This does not affect rabins() or radium(), just racluster().
>> >>>>>>
>> >>>>>> Fixing it now.
>> >>>>>>
>> >>>>>> Carter
>> >>>>>>
>> >>>>>> On May 14, 2014, at 5:53 PM, Jason <dn1nj4 at gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Hi Carter,
>> >>>>>>>
>> >>>>>>> So I asked a very similar question last year (
>> http://comments.gmane.org/gmane.network.argus/9110), but I can't seem to
>> find a response.  I apologize if I'm just missing something or have just
>> forgotten.
>> >>>>>>>
>> >>>>>>> I am trying once again to understand why there is such a
>> significant impact on the length of time it takes to run racluster when
>> leveraging filters.  Here is the racluster.conf file I am testing:
>> >>>>>>>
>> >>>>>>> filter="udp and port domain" model="saddr daddr proto sport
>> dport" status=600 idle=10
>> >>>>>>> filter="udp" model="saddr daddr proto sport dport" status=600
>> idle=60
>> >>>>>>> filter="" model="saddr daddr proto sport dport" status=600
>> idle=600
>> >>>>>>>
>> >>>>>>> And here are two runs against a single argus file.  The only
>> difference is whether or not I'm using the racluster.conf:
>> >>>>>>>
>> >>>>>>> $ time racluster -f racluster.conf -r infile.bin -w outfile.bin
>> -M rmon -u -c "," -m saddr proto sport dport -L0 -Z s -s stime saddr proto
>> sport dport sbytes runtime dbytes trans state - not arp
>> >>>>>>>
>> >>>>>>> real    2m42.935s
>> >>>>>>> user    2m39.274s
>> >>>>>>> sys     0m3.288s
>> >>>>>>>
>> >>>>>>> $ time racluster -r infile.bin -w outfile.bin -M rmon -u -c ","
>> -m saddr proto sport dport -L0 -Z s -s stime saddr proto sport dport sbytes
>> runtime dbytes trans state - not arp
>> >>>>>>>
>> >>>>>>> real    0m1.054s
>> >>>>>>> user    0m0.944s
>> >>>>>>> sys     0m0.108s
>> >>>>>>>
>> >>>>>>> Why does the filtered option take exponentially longer?
>> >>>>>>>
>> >>>>>>> Thanks!
>> >>>>>>> Jason
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140522/a80317cc/attachment.html>


More information about the argus mailing list