yet another kdd cup question

Oğuz Yarımtepe oguzyarimtepe at gmail.com
Wed Oct 2 07:29:06 EDT 2013


Hi,

I figured a bit. A line from KDD Cup Data set, representing the value of
each attribute gave the idea indeed.

0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.


Lets check the first attributes i am interested and last ones i also
interested.

duration  length (number of seconds) of the connection  continuous
protocol_type  type of the protocol, e.g. tcp, udp, etc.  discrete
service  network
service on the destination, e.g., http, telnet, etc.  discrete
src_bytes  number
of data bytes from source to destination  continuous  dst_bytes  number of
data bytes from destination to source  continuous  flag  normal or error
status of the connection  discrete   land  1 if connection is from/to the
same host/port; 0 otherwise  discrete  wrong_fragment  number of ``wrong''
fragments  continuous  urgent  number of urgent packets  continuous

By looking at these attributes, i think Argus will calculate many of them.
I am not sure about the number of wrong fragments part. I checked the ra
documentation and saw it is possible to display the flags attributes. It
has Urgent flag value. But the wrong_fragmentation part is the one that i
am not sure about. Any idea how will i calculate it?

And now the more ambiguous ones

*feature name* *description * *type*  count  number of connections to the
same host as the current connection in the past two seconds: Since it is
calculated for the current connection i think it is the the number of
connections whose source IP address and destination IP address are the same
to those of the current connection in the past two seconds, meaning 2
seconds prior to this connection. continuous
*Note: The following  features refer to these same-host connections.*
 serror_rate  % of connections that have ``SYN'' errors: I found some
information at Bro-IDS documentation. It is how they display the status of
a connection/flow at the conn.log.

   - *S0*: Connection attempt seen, no reply.
   - *S1*: Connection established, not terminated.
   - *SF*: Normal establishment and termination. Note that this is the same
   symbol as for state S1. You can tell the two apart because for S1 there
   will not be any byte counts in the summary, while for SF there will be.
   - *REJ*: Connection attempt rejected.
   - *RSTO*: Connection established, originator aborted (sent a RST).
   - *RSTR*: Established, responder aborted.

So RSTO and RSTR can be SYN errors. REJ is the mentioned thing. A
connection attemt is made but it is rejected. Is there a flag to see this
event?
continuous  rerror_rate  % of connections that have ``REJ'' errors
continuous  same_srv_rate  % of connections to the same service: This is he
percentage. And by service i assume the port number. So in two seconds
time, number of connection attempts/connections done to the same port /
count calculated above will give the percentage i think
continuous  diff_srv_rate  % of connections to different services: This
will be 1 - above_percentage i think
continuous  srv_count  number of connections to the same service as the
current connection in the past two seconds: It is already calculated above
continuous
*Note: The following features refer to these same-service connections.*
 srv_serror_rate  % of connections that have ``SYN'' errors: These will be
calculated by looking at the port number and in two seconds period.
continuous  srv_rerror_rate  % of connections that have ``REJ'' errors
continuous  srv_diff_host_rate  % of connections to different hosts
What my plan was to listen a mirrored port and save the calculated data to
db. I am not sure whether i will calculate all properties in one time and
save to db. What do you suggest? First listen the GBit traffic and save it
as Argus format and then work on to with Argus commends and save to db?

Or, directly save to db whatever i can calculate with ra and then run some
other scripts to calculate percentages and two second issues. But saving to
db will take into consideration of 1 minute time interval by default i
guess and i should be doing something for two second thing. Not sure
indeed. What do you suggest?

I am not dying to use this attributes but unfortunately it is a dataset
still in use. Just in case, better to have some solution for my problem.

Thank you.

-- 
Oğuz Yarımtepe
http://about.me/oguzy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20131002/88fe5829/attachment.html>


More information about the argus mailing list