yet another kdd cup question
Oğuz Yarımtepe
oguzyarimtepe at gmail.com
Wed Oct 2 07:29:06 EDT 2013
Hi,
I figured a bit. A line from KDD Cup Data set, representing the value of
each attribute gave the idea indeed.
0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.
Lets check the first attributes i am interested and last ones i also
interested.
duration length (number of seconds) of the connection continuous
protocol_type type of the protocol, e.g. tcp, udp, etc. discrete
service network
service on the destination, e.g., http, telnet, etc. discrete
src_bytes number
of data bytes from source to destination continuous dst_bytes number of
data bytes from destination to source continuous flag normal or error
status of the connection discrete land 1 if connection is from/to the
same host/port; 0 otherwise discrete wrong_fragment number of ``wrong''
fragments continuous urgent number of urgent packets continuous
By looking at these attributes, i think Argus will calculate many of them.
I am not sure about the number of wrong fragments part. I checked the ra
documentation and saw it is possible to display the flags attributes. It
has Urgent flag value. But the wrong_fragmentation part is the one that i
am not sure about. Any idea how will i calculate it?
And now the more ambiguous ones
*feature name* *description * *type* count number of connections to the
same host as the current connection in the past two seconds: Since it is
calculated for the current connection i think it is the the number of
connections whose source IP address and destination IP address are the same
to those of the current connection in the past two seconds, meaning 2
seconds prior to this connection. continuous
*Note: The following features refer to these same-host connections.*
serror_rate % of connections that have ``SYN'' errors: I found some
information at Bro-IDS documentation. It is how they display the status of
a connection/flow at the conn.log.
- *S0*: Connection attempt seen, no reply.
- *S1*: Connection established, not terminated.
- *SF*: Normal establishment and termination. Note that this is the same
symbol as for state S1. You can tell the two apart because for S1 there
will not be any byte counts in the summary, while for SF there will be.
- *REJ*: Connection attempt rejected.
- *RSTO*: Connection established, originator aborted (sent a RST).
- *RSTR*: Established, responder aborted.
So RSTO and RSTR can be SYN errors. REJ is the mentioned thing. A
connection attemt is made but it is rejected. Is there a flag to see this
event?
continuous rerror_rate % of connections that have ``REJ'' errors
continuous same_srv_rate % of connections to the same service: This is he
percentage. And by service i assume the port number. So in two seconds
time, number of connection attempts/connections done to the same port /
count calculated above will give the percentage i think
continuous diff_srv_rate % of connections to different services: This
will be 1 - above_percentage i think
continuous srv_count number of connections to the same service as the
current connection in the past two seconds: It is already calculated above
continuous
*Note: The following features refer to these same-service connections.*
srv_serror_rate % of connections that have ``SYN'' errors: These will be
calculated by looking at the port number and in two seconds period.
continuous srv_rerror_rate % of connections that have ``REJ'' errors
continuous srv_diff_host_rate % of connections to different hosts
What my plan was to listen a mirrored port and save the calculated data to
db. I am not sure whether i will calculate all properties in one time and
save to db. What do you suggest? First listen the GBit traffic and save it
as Argus format and then work on to with Argus commends and save to db?
Or, directly save to db whatever i can calculate with ra and then run some
other scripts to calculate percentages and two second issues. But saving to
db will take into consideration of 1 minute time interval by default i
guess and i should be doing something for two second thing. Not sure
indeed. What do you suggest?
I am not dying to use this attributes but unfortunately it is a dataset
still in use. Just in case, better to have some solution for my problem.
Thank you.
--
Oğuz Yarımtepe
http://about.me/oguzy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20131002/88fe5829/attachment.html>
More information about the argus
mailing list