Difference between record and trans?
Carter Bullard via Argus-info
argus-info at lists.andrew.cmu.edu
Thu Jan 12 11:03:23 EST 2017
Hey Jesse,
racount.1 just counts actual argus records, both flow and management records. Its a convenience tool to peek into an argus data file. Many of the new functions, like counting IP address types, are add-on’s that extend this simple record counting function. Its fast in its default form, because it doesn’t actually process the flow records, it just accumulates simple counters. Not it isn’t developing a flow cache, its not sorting records, its not looking anything up. Add the -M proto or -M addr options, and it will slow down a bit, so that it can count based on some key flow record contents, but its still not tracking flow history.
So, racount.1 will count just the records.
Racluster.1, on the other hand, processes flow records, establishes memory caches, and tracks key identifiers. When it counts, it actually merges argus records into persistent flow cache memory, and performs lots of operations, on very field that is in an argus record. So its really busy when it processes records. racluster.1 will go faster when the number of caches is low, so the use of the “-m “ option to minimize the key length, will change the performance quite a bit.
The “trans” field is a bit more complicated that just the current run record count. It is the value of the agr.count field in the argus record. The “agr” data elements are the numbers that result from aggregation. Whenever you aggregate argus records, a history of the aggregation is stored in an “ agr “ data element that is inserted into each record. This data element contains things like the number of records used to create this record … the “trans” field (trans for transactions). When you aggregate this record again, the “trans” field is preserved and continues to accumulate.
Because you are processing “cooked_data_tag.argus”, I suspect that that file was generated by an argus record aggregator, such as racluster.1. So I think you’re getting in your “trans” counter, the number of all the flow records you processed to generate the cooked data set. To get rid of this residual value, you need to remove the “ agr “ data element as you read the flow records into your racluster.1. If so, to get the numbers between recount.1 and racluster.1 to agree a little bit better, run your racluster this way:
# time racluster -M dsrs=“-agr” -m proto -r cooked_data_tag.argus -s fields
Hopefully all will seem more like what you expect.
Yes, you are right, racount.1 doesn’t have much in the way of format support. It wasn’t really intended as a processor, but as a convenience. Adding that is pretty easy. If you are using our commercial set of tools, then we can add CSV, XML, JSON and possibly NEWICK data formats to recount.1, pretty easily, almost overnight.
Hope all is most excellent,
Carter
> On Jan 11, 2017, at 8:59 PM, Jesse Bowling via Argus-info <argus-info at lists.andrew.cmu.edu> wrote:
>
> Hi,
>
> I was working with racount, with the intention of using it to generate summary statistics what could later be aggregated, and found something odd. First I found that apparently at least some of the ra options for formatting aren’t effective with racount (specifically, I wanted to generate “CSV” formatted data, and the client appears to ignore the options provided in ./support/Config/excel.rc ). While then comparing the performance of racount v/s racluster (which does respect formatting options) I found an odd inconsistency:
>
> # time racluster -m proto -r cooked_data_tag.argus -s proto trans:20 pkts:20 spkts:20 dpkts:20 bytes:20 sbytes:20 dbytes:20
> Proto Trans TotPkts SrcPkts DstPkts TotBytes SrcBytes DstBytes
> udp 3191659 7003372 3499666 3503706 1389726491 380807954 1008918537
> tcp 297920 21915099 7519746 14395353 17408556156 1823768894 15584787262
> icmp 31380 69180 34807 34373 5328628 2666022 2662606
>
> real 0m5.328s
> user 0m5.207s
> sys 0m0.118s
> # time racount -M proto -r cooked_data_tag.argus
> racount records total_pkts src_pkts dst_pkts total_bytes src_bytes dst_bytes
> sum 2965835 28987651 11054219 17933432 18803611275 2207242870 16596368405
> Protocol Summary
> icmp 24813 69180 34807 34373 5328628 2666022 2662606
> tcp 193869 21915099 7519746 14395353 17408556156 1823768894 15584787262
> udp 2747152 7003372 3499666 3503706 1389726491 380807954 1008918537
>
> real 0m2.716s
> user 0m2.592s
> sys 0m0.122s
> #
>
> While most of the data agrees between these two clients, the "records" field of racount does not agree with the "trans" field of racluster/ra. Which leads me to ask the questions: is this expected, and if it is, how are these fields calculated (what do they represent)? How does racount arrive at it's data so much more quickly than racluster, and what options might tune racluster to perform similarly? How difficult would it be to add support to racount for the formatting options available in ra? :)
>
> Cheers,
>
> Jesse
>
> --
> Jesse Bowling
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20170112/0bc17196/attachment.html>
More information about the argus
mailing list