Unique, complete flows only

Tue Jul 28 17:19:53 EDT 2009

So I should double check my emails.

    The rasqlinsert() example in my earlier response needs an  
additional option; "-M cache".
This tells rasqlinsert to use the database table as a backing store  
for the aggregation system.
That way rasqlinsert() can  timeout flows while its running, thus  
minimizing the memory
needed to run.   If the "-M cache" option is not used, then  
rasqlinsert() will delete entries from
the database table, as it deletes entries from its local cache,  
looking like a ratop() screen.

    With the "-M cache" option, if a flow record is received by  
rasqlinsert() that is not in its local
cache, rasqlinsert() first queries the db to see if the record is  
there.  If so, it reads in the
record from the table, and uses it as the cache entry.  If not,  it  
generates a new flow record
in its local cache with the new record, and then schedules a write to  
the db for the new entry.

Sorry for the incomplete answer earlier,

Carter

On Jul 28, 2009, at 2:56 PM, Carter Bullard wrote:

> Hey Harry,
> Pipe the output into rasplit.
>    rabins -S tinderbox -M hard time 10s -B 20s -w - | rasplit -M  
> time 10s -w "test.%Y%m%d-%H.%M.%S"
>
> Not sure that you want 10 second files.  That will generate a lot of  
> files.
>
> And not sure what you're trying to achieve.  There maybe a better  
> way if
> we knew what you wanted to do.
>
> A rabins() strategy works as long as you have memory to hold all the  
> aggregations, because
> rabins won't spit out any records until its time period expires.  If  
> the bins are
> very long (1d) you may run out of memory before rabins() outputs  
> anything.
>
> Use rasqlinsert() in this case (where the time period is large)
>    rasqlinsert -S tinderbox -M time 1d -B 20s -w mysql:// 
> user at localhost/argusdb/test_%Y_%m_%d_%H_%M_%S \
>         -s +1srcid -d
>
> The rasqlinsert() method allows you to access the flow records while  
> they are being aggregated.
>    rasql -M time 1d -r mysql://user@localhost/argusdb/test_%Y_%m_%d_ 
> %H_%M_%S
>
> So you don't have to wait all day for the results.
>
> Or if you want very large periods aggregated, you can use rastream()  
> to split the records
> into a file, and then after the time period expires, you have  
> rastream() run racluster() or
> rasqlinsert() against the whole file, as a script.
>
> Yes, if you say 1d, it will align to 12 midnight, if you do 24h, it  
> will align to the hour when you started.
>
> Carter
>
>
> On Jul 28, 2009, at 2:20 PM, Harry Bock wrote:
>
>> Great... rabins seems to do EXACTLY what I need, thanks!
>>
>> My only problem now is that rabins does not seem to process the  
>> output prefix properly - when invoked as follows,
>> $ rabins -S tinderbox -M hard time 10s -B 20s -w "test.%Y%m%d-%H.%M. 
>> %S" - ip
>> rabins creates a file test.%Y%m%d-%H.%M.%S, and continually  
>> overwrites it - it does not create new files with names processed  
>> with strftime,
>> as stated in the manual page.  Am I invoking this wrong, or is this  
>> a bug? I'm still basing off argus-clients-3.0.2-beta8, should I try  
>> updating to beta10?
>>
>> Also, when using time splitting with rabins, does it always align  
>> to the earliest time in that period (i.e., when you select 1 day or  
>> 1 week, does the time period start at midnight the current day or  
>> right now, or Sunday of this week?)? It looks like it does from  
>> your brief 10s test, which is what I'm looking for, but I just want  
>> to make sure :)
>>
>> Thanks!!
>> Harry
>>
>> On Mon, Jul 27, 2009 at 3:02 PM, Carter Bullard  
>> <carter at qosient.com> wrote:
>> Hey Harry,
>>    racluster -r input -w output
>>
>> will aggregate flows using the default 6-tuple key of "srcid saddr  
>> daddr proto sport dport".
>> The "-M norep" option is pretty much obsolete.  It caused  
>> racluster() to not report the AGR dsr,
>> which caused problems for some client programs, like rahisto().   
>> The option was "no report"
>> of aggregation statistics.  If you want to ignore the agr dsr, all  
>> client programs now support
>> input dsr filtering:
>>
>>    ra -r output.file -M dsr="-agr"
>>
>> Rmon is a completely different thing all together.  The "-M rmon"  
>> option indicates to racluster()
>> that you want to generate IETF RMON style statistics, which  
>> reference single objects, rather
>> than the two object stats that argus records generate.  So, if you  
>> wanted stats for a single IP
>> address, or a single port, you would use RMON, and then choose a  
>> specific object for aggregation,
>> such as:
>>    racluster -M rmon -m smac saddr -r input -w output
>>
>> This would generate RMON "In" and "Out" stats for single mac/ip  
>> address pairs.
>>
>> Yes, rabins() is an important program to use, as it does all the  
>> hard work under the covers.
>> And you can use rabins() on live feeds.  Try running this:
>>
>>    rabins -S argus.stream -M time 10s -m srcid matrix/16 -B 20s -s  
>> stime dur srcid saddr daddr spkts dpkts - ip
>>
>> Wait a little while ~30seconds,  and you should get, every 10  
>> seconds, an aggregated matrix report for that time period.
>> If you want the reported time for the flow reports to reflect the  
>> time period, add the "hard" mode:
>>
>>   rabins -S argus.stream -M hard time 10s -m srcid matrix/16 -B 20s  
>> -s stime dur srcid saddr daddr spkts dpkts - ip
>>
>> Carter
>>
>> On Jul 27, 2009, at 2:42 PM, Harry Bock wrote:
>>
>>> Hi all,
>>>
>>> I was curious as to how Argus client users aggregate their traffic  
>>> data using racluster/rabins.  What I'm looking for is essentially  
>>> completed/timed-out flows only, aggregated so that there is only  
>>> one record per flow.  It seems like this can be achieved for IP  
>>> transactions with something along the lines of:
>>>
>>> racluster -r input -m saddr daddr proto sport dport -M norep
>>>
>>> What's the difference between norep and rmon in terms of  
>>> aggregation? Judging by the description in the manual page, norep  
>>> seems to be what I'm looking for.  For general IPv4/6 traffic,  
>>> would the above aggregation objects be suffient for uniqueness?
>>>
>>> The program I'm writing right now does its own transaction stream  
>>> time-splitting, but the more I look at rabins, the more it seems  
>>> like it would make things much easier to just bin remote data with  
>>> rabins() and then perform data processing locally on the output  
>>> files.
>>>
>>> Harry
>>>
>>> -- 
>>> Harry Bock
>>> Software Developer, Package Maintainer
>>> OSHEAN, Inc.
>>> Email: harry at oshean.org
>>> PGP Key ID: 546CC353
>>
>>
>>
>>
>> -- 
>> Harry Bock
>> Software Developer, Package Maintainer
>> OSHEAN, Inc.
>> Email: harry at oshean.org
>> PGP Key ID: 546CC353
>
> Carter Bullard
> CEO/President
> QoSient, LLC
> 150 E 57th Street Suite 12D
> New York, New York  10022
>
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
>
>
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090728/2a02ab9d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090728/2a02ab9d/attachment.bin>