[ARGUS] Questions about rabins

Thu Jan 14 09:36:34 EST 2021

Hey Kolja,
Answers below ...
Carter

> On Jan 14, 2021, at 3:26 AM, Kolja Straub via Argus-info <argus-info at lists.andrew.cmu.edu> wrote:
> 
> Hi all,
> 
> First of all a happy new year. 
> 
> My current goal is to aggregate certain netflows that have the same source IP, destination IP, destination port, protocol and happen in a 10 minute time intervall. Therefore I used rabins to aggregate the flows.
> The command I used is "rabins -r input.biargus -M time 10m hard -m saddr daddr dport proto -P stime -F ra.conf > output.netflow".
> However, I have some questions regarding rabins and the features it can generate and hoped if you could help me:
> 
> 1. Is it possible to get the amount of different source ports used? After aggregating the flows with different source ports, the resulting source port of the bin was only "0". However a list of the used ports or the number of different ports that were used would be helpful. I observed such a behavior with the Label feature. After applying rabins, flows with different Labels were aggregated together and the resulted bin still had all labels as a comma separated list.

Rabins, like other aggregators like racluster.1, support different aggregation models and they can be used to generate aggregate counts of just about anything … but you may have to condition the data to generate the numbers that you want ….

When you use the ‘-m saddr daddr proto dport’ model, you are merge any records that match these 4 attributes (features) … if a value doesn’t match, in many cases it is zeroed out (no match) … for some fields like saddr and daddr, there are complex algorithms used to try to report on the ‘aggregate’ value (for IP addresses its ‘longest prefix match’) … sport’s don’t have an aggregate value, so in argus its comes up zero … Labels are merged together … etc ...

When records are merged together, argus will generate a new TLV structure ‘agr’ that contains general aggregate values such as mean, stddev, max and min of a value in the record.  This by default is the “dur” field, as it is the most common aggregate metric for network traffic … but you can change that field using a racluster.conf file to specify what field the aggregate will be calculated on.  The “agr” TLV (we call them DSRs) contains the number of records that were merged together to generate the aggregate, and you print that value using the “trans” output field.

OK, so if you want the number of unique sports in a group of flow records, and print the value using the “trans” field, you need to run rabins twice … once to get the data set so that all the key objects are unique, and then again without the sport …

   % rabins -r argus.file -M time 10m -m saddr daddr proto sport dport -w - | rabins -M time 10m hard dsrs=“-agr” -m saddr daddr proto dport -F ra.conf -s +trans

So what is this doing … aggregate all the records using the default 5-tuple flow first … then run rabins again with the same parameters, but with 2 fundamental changes … define a 4-tuple flow model and remove the “agr” DSR generated by the first run, so that the next run will report the number of records that matched the 4-tuple model (which are the number of unique sports).  Add the ’trans’ field to your output and there you are …

> 
> 2. Is it possible to get a list of the time differences of the start times of two consecutive flows as a feature of the bin. I only know about the packet interarrival time. However, it would be nice to be able to calculate the absolute difference between two consecutive flows to analyze the periodicity.

Rabins.1 creates its bins as completely independent data sets, so it doesn’t carry over any data between bins … it actually modifies flow records to create that independence by splitting records to create the bins … within the bin you get all sorts of packet dynamic timings, as well as interflow dynamics, but that is generally for flows that match … tools like ra.1 can print the differences in time values between consecutive records,  which my be useful for you …. I would recommend that you generate argus data as your output rather than text and continue to process the output file to get some of your answers …

   rabins -r input.biargus -M time 10m hard -m saddr daddr dport proto -P stime -F ra.conf -w output.argus

Then work with output.argus to build up additional data features.

> 
> 3. As I mentioned in question 1, when using rabins, the labels of the flows are aggregated to a comma separated list. Since I standardly used "," as the seperator, this led to an error while reading the file as csv with pandas. Currently I am using another separator or is it possible to wrap the values of the columns in quotation marks so that parsing could still work?

RA_FIELD_QUOTED=["single" | "double"]

> 
> 4. Lastly, I am not sure if I understand the direction feature correctly. There are directions like "?>", "<?" and "<?>" additionally to "->", "<-" and "<->". Do the directions with "?" indicate that the origin of the flow is not entirely sure or does it have another meaning.

You got it … question marks are reserved for connection oriented flow where argus didn’t see any of the connection setup packets (SYN, SYNACK, ACK) …

Hope this is helpful !!!!

> 
> Thank you in advance for your effort.
> 
> Best regards,
> Kolja
> 
> 
> Sent with ProtonMail <https://protonmail.com/> Secure Email.
> 
> _______________________________________________
> argus mailing list
> argus at qosient.com
> https://pairlist1.pair.net/mailman/listinfo/argus

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20210114/085f55e2/attachment-0001.html>