Difference between record and trans?

Thu Jan 12 11:03:23 EST 2017

Hey Jesse,

racount.1 just counts actual argus records, both flow and management records.  Its a convenience tool to peek into an argus data file.  Many of the new functions, like counting IP address types, are add-on’s that extend this simple record counting function.  Its fast in its default form, because it doesn’t actually process the flow records, it just accumulates simple counters.  Not it isn’t developing a flow cache, its not sorting records, its not looking anything up.   Add the -M proto or -M addr options, and it will slow down a bit, so that it can count based on some key flow record contents, but its still not tracking flow history.

So, racount.1 will count just the records.

Racluster.1, on the other hand, processes flow records, establishes memory caches, and tracks key identifiers.  When it counts, it actually merges argus records into persistent flow cache memory, and performs lots of operations, on very field that is in an argus record.  So its really busy when it processes records.  racluster.1 will go faster when the number of caches is low, so the use of the “-m “ option to minimize the key length, will change the performance quite a bit.

The “trans” field is a bit more complicated that just the current run record count.  It is the value of the agr.count field in the argus record.  The “agr” data elements are the numbers that result from aggregation.  Whenever you aggregate argus records, a history of the aggregation is stored in an “ agr “ data element that is inserted into each record.  This data element contains things like the number of records used to create this record … the “trans” field (trans for transactions).  When you aggregate this record again, the “trans” field is preserved and continues to accumulate. 

Because you are processing “cooked_data_tag.argus”, I suspect that that file was generated by an argus record aggregator, such as racluster.1.  So I think you’re getting in your “trans” counter, the number of all the flow records you processed to generate the cooked data set.   To get rid of this residual value, you need to remove the “ agr “ data element as you read the flow records into your racluster.1.  If so, to get the numbers between recount.1 and racluster.1 to agree a little bit better, run your racluster this way:

   # time racluster -M dsrs=“-agr” -m proto -r cooked_data_tag.argus -s fields

Hopefully all will seem more like what you expect.

Yes, you are right, racount.1 doesn’t have much in the way of format support.  It wasn’t really intended as a processor, but as a convenience.  Adding that is pretty easy. If you are using our commercial set of tools, then we can add CSV, XML, JSON and possibly NEWICK data formats to recount.1, pretty easily, almost overnight.

Hope all is most excellent,
Carter

> On Jan 11, 2017, at 8:59 PM, Jesse Bowling via Argus-info <argus-info at lists.andrew.cmu.edu> wrote:
> 
> Hi,
> 
> I was working with racount, with the intention of using it to generate summary statistics what could later be aggregated, and found something odd. First I found that apparently at least some of the ra options for formatting aren’t effective with racount (specifically, I wanted to generate “CSV” formatted data, and the client appears to ignore the options provided in ./support/Config/excel.rc ). While then comparing the performance of racount v/s racluster (which does respect formatting options) I found an odd inconsistency:
> 
> # time racluster -m proto -r cooked_data_tag.argus -s proto trans:20 pkts:20 spkts:20 dpkts:20 bytes:20 sbytes:20 dbytes:20
> Proto                Trans              TotPkts              SrcPkts              DstPkts             TotBytes             SrcBytes             DstBytes
>   udp              3191659              7003372              3499666              3503706           1389726491            380807954           1008918537
>   tcp               297920             21915099              7519746             14395353          17408556156           1823768894          15584787262
>  icmp                31380                69180                34807                34373              5328628              2666022              2662606
> 
> real	0m5.328s
> user	0m5.207s
> sys	0m0.118s
> # time racount -M proto -r cooked_data_tag.argus
> racount   records     total_pkts     src_pkts       dst_pkts       total_bytes        src_bytes          dst_bytes
>    sum   2965835     28987651       11054219       17933432       18803611275        2207242870         16596368405
> Protocol Summary
>   icmp   24813       69180          34807          34373          5328628            2666022            2662606
>    tcp   193869      21915099       7519746        14395353       17408556156        1823768894         15584787262
>    udp   2747152     7003372        3499666        3503706        1389726491         380807954          1008918537
> 
> real	0m2.716s
> user	0m2.592s
> sys	0m0.122s
> #
> 
> While most of the data agrees between these two clients, the "records" field of racount does not agree with the "trans" field of racluster/ra. Which leads me to ask the questions: is this expected, and if it is, how are these fields calculated (what do they represent)? How does racount arrive at it's data so much more quickly than racluster, and what options might tune racluster to perform similarly? How difficult would it be to add support to racount for the formatting options available in ra? :)
> 
> Cheers,
> 
> Jesse
> 
> --
> Jesse Bowling
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20170112/0bc17196/attachment.html>