UNSW-NB15 Feature Extraction

Carter Bullard carter at qosient.com
Fri Aug 21 09:09:51 EDT 2020


Hey Jonas,
The features for UNSW-NB15 are mix of simple argus features, mixed with some ‘cooked’ values from argus or from Bro (week), so there is some processing that you need to do to generate the complete list.   UNSW-NB15 has, what, 49 features listed.  From this link:
   https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/NUSW-NB15_features.csv <https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/NUSW-NB15_features.csv>

Features 1-35, except for 25, are straight argus features, and can be extracted from any argus record, either in a file or from a live stream.  Feature number 14, service, is a field the is provided in the label using the program raservices.1 program.  25, trans_depth, is not an argus feature, unless it references argus’s trans field after the records had been aggregated, which is how I interpreted it.

To get the features into the argus records, you will have to configure argus to generate all the values that it can generate, including ARGUS_CAPTURE_DATA_LEN, which I recommend around 64-128 bytes.  This data is needed to determine the service in the flow, and to extract the GET, POST and PUT keywords from http traffic flow records.

If you look at the uses of UNSW-NB15, it runs off of IP flow data, but it doesn’t have to, as argus does provide non-IP network flows.

To create UNSW-NB15 1-35, minus 25, from an argus record that is in, say an argus.file … run:

   ra -r argus.file -s saddr sport daddr sport proto state dur sbytes dbytes stil dttl sloss dloss sload dload spkts dpkts swin dwin stcpb dtcpb smeansz dmeansz trans appbytes sjit djit stime ltime sintpkt dintpkt tcprtt synack ackdat - ip

A few features are aggregates based on state or content.  These, I believe were generated using Bro, but can be generated by simple argus data processing scripts, as the data is in the argus record.  These features 36-40.

For features 41-47, you can generate the numbers from argus data, but you have to aggregate the argus records using rabins.1.  If anyone asked me, I would recommend that these 6 features should be based on time, rather than number of connections.  It seems that these metrics assume that there is a lot of traffic to monitor, and that generating percentages every 100 connections would generate some data, but I think time is a bit more manageable.  Still report percentages …

Anyway, you can generate 41-47 using rabins.1 in 'Load based bin mode’ with many different aggression strategies and then using the ‘trans’ to report number of connections.

48, seem to be the result of UNSW-NB15’s labeling of their data, and their results, which you would provide in our ML against the data.

Hope this helps to get you going, if you make any progress, or have any insights / ideas / opinions / reactions or comments, don’t hesitate to send email !!!

Carter


 

> On Aug 21, 2020, at 8:29 AM, jonas.b.kunze at stud.h-da.de wrote:
> 
> Hey Carter,
> 
> I am trying to implement a Network Anomaly Detection System in live mode and would like to use the UNSW-NB15 set as feature set, which was extracted using argus back then. Do you by any chance have an argus/Python script that can extract these features?
> 
> Thanks a lot
> 
> Jonas
> 
> _______________________________________________
> argus mailing list
> argus at qosient.com
> https://pairlist1.pair.net/mailman/listinfo/argus

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20200821/18822d08/attachment.html>


More information about the argus mailing list