Argus-info Digest, Vol 77, Issue 4

Wed Jan 11 15:01:03 EST 2012

Thanks Carter,
The rasqltimeindex() tool sounds interesting. It would be nice to have the HowTo for it. I'll also try with rasplit.1 and see what I can get.

Cheers, 
-Manaf

________________________________
 From: "argus-info-request at lists.andrew.cmu.edu" <argus-info-request at lists.andrew.cmu.edu>
To: argus-info at lists.andrew.cmu.edu 
Sent: Wednesday, January 11, 2012 8:33 AM
Subject: Argus-info Digest, Vol 77, Issue 4

Send Argus-info mailing list submissions to
    argus-info at lists.andrew.cmu.edu

To subscribe or unsubscribe via the World Wide Web, visit
    https://lists.andrew.cmu.edu/mailman/listinfo/argus-info
or, via email, send a message with subject or body 'help' to
    argus-info-request at lists.andrew.cmu.edu

You can reach the person managing the list at
    argus-info-owner at lists.andrew.cmu.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Argus-info digest..."

Today's Topics:

   1.  Argus ralabel (CS Lee)
   2. Re:  (no subject) (Bruce Hawkins)
   3.  Clustering flows within a specific time interval
      (manaf gharaibeh)
   4. Re:  Clustering flows within a specific time interval
      (Carter Bullard)

----------------------------------------------------------------------

Message: 1
Date: Wed, 11 Jan 2012 10:37:29 +0800
From: CS Lee <geek00l at gmail.com>
Subject: [ARGUS] Argus ralabel
To: Argus <argus-info at lists.andrew.cmu.edu>
Message-ID:
    <CABWd2irzQbO9QSu96FpOc=CTmHEMZk3X2qJM92isZD6TtS731A at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

hi Carter,

Maxmind has released ipv6 to AS mapping where you can find here -

http://geolite.maxmind.com/download/geoip/database/asnum/

Will you add support for ipv6 to AS for ralabel, that would be something
good to have!

Cheers!

-- 
Best Regards,

CS Lee<geek00L[at]gmail.com>

http://geek00l.blogspot.com
http://defcraft.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120111/163e40eb/attachment-0001.html 

------------------------------

Message: 2
Date: Tue, 10 Jan 2012 23:47:41 -0600
From: Bruce Hawkins <keta144 at msn.com>
Subject: Re: [ARGUS] (no subject)
To: <adam at funkstarr.com>, <aishaterux3 at yahoo.com>, <aelahi at umd.edu>,
    <aelahi at mail.umd.edu>, <amie745 at hotmail.com>,
    <argus-info at lists.andrew.cmu.edu>, <usnavygirl82 at aol.com>
Message-ID: <SNT112-W50C16E0765F2D039C27A9AEF9E0 at phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"

http://www.quikly.com.ar/january.php?opozyt=74&ywam=614&umjvifygyr=74

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120110/2c722eb2/attachment-0001.html 

------------------------------

Message: 3
Date: Wed, 11 Jan 2012 00:31:19 -0800 (PST)
From: manaf gharaibeh <manafhgh at yahoo.com>
Subject: [ARGUS] Clustering flows within a specific time interval
To: "argus-info at lists.andrew.cmu.edu"
    <argus-info at lists.andrew.cmu.edu>
Message-ID:
    <1326270679.37971.YahooMailNeo at web33807.mail.mud.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I have huge Argus files (each with records of flows for an entire day). I am trying to gather statistics like the number of flows, number of different sources, or source packets that target the same destination within a given interval of time like 1 minute. I use the following command line within a Perl script to cluster flows based on destination then sort the result of that based on the number of source packets to destinations:
`racluster -nw - @arglist -m daddr -t @timeIneterval |rasort -u -m spkts -s daddr stime ltime dur spkts srate -c, > spktsSorted.dat`;?

where @arglist contains user command-line options, mainly the name of the input argus file. And @timeIneterval contains a time interval in a form like i1293864155+60s. The result is saved to spktsSorted.dat file in a comma separated format.

Now here is my problem: The argus files I have are originally sorted based on the ending time of a flow rather than the starting time of that flow. So when I run the racluster command, it will have no clue where are the flows that fall within the specified interval. It will simply search through the whole argus file, which is very expensive with huge files like the ones I'm working with. I used the option -N to limit the number of flows that racluster should find, and that reduced the time needed by the command significantly. But this is not a good solution since I might loose some flows. Or if the integer with the -N is larger than the number of flows the satisfy the specified constrains then I will have the original expensive exhaustive search problem.

So the question is: how can I cluster flows based on destination host IP within a specific time interval in a reasonable time, that is to cluster flows that were active during an interval that starts at x and ends at y based on their destination IP addresses? ?
?
-Manaf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120111/e6afdce6/attachment-0001.html 

------------------------------

Message: 4
Date: Wed, 11 Jan 2012 10:33:20 -0500
From: Carter Bullard <carter at qosient.com>
Subject: Re: [ARGUS] Clustering flows within a specific time interval
To: manaf gharaibeh <manafhgh at yahoo.com>
Cc: "argus-info at lists.andrew.cmu.edu"
    <argus-info at lists.andrew.cmu.edu>
Message-ID: <60E5372D-346C-4381-AC5C-97C5AB5D1FEA at qosient.com>
Content-Type: text/plain; charset="iso-8859-1"

Hey Manaf,
The tool for this is rasqltimeindex(), but it is poorly documented.  This program uses
mysql and builds "Filename" and "Seconds" tables, that hold the byte offsets of
argus data records for the start of every second in the file.  rasql(), with a time filter,
then accesses the tables, to find the records from the specified time range.

This program is designed to work with standard argus archives, where the files are
persistent, and so the tools allow for finding data pretty quickly in very large repositories,
but it could be used in a more dynamic way.

I'm not sure that its useable in its current state without some dialog.  I will try to put
together a "HowTo" description on how to use it before I get back from FloCon.

Until then, most sites use rasplit.1 to divide the large data files into more manageable
time periods. rasplit.1 is well documented, so it may be the best approach for you.
I split all of my data streams into 5 minute files, and then my perl scripts take the
"-t timerangefilter" and finds the files that need to be processed to find the data.

Let me improve the rasqltimeindex() approach so that it can be useful for you.

Carter

On Jan 11, 2012, at 3:31 AM, manaf gharaibeh wrote:

> Hi,
> 
> I have huge Argus files (each with records of flows for an entire day). I am trying to gather statistics like the number of flows, number of different sources, or source packets that target the same destination within a given interval of time like 1 minute. I use the following command line within a Perl script to cluster flows based on destination then sort the result of that based on the number of source packets to destinations:
> `racluster -nw - @arglist -m daddr -t @timeIneterval |rasort -u -m spkts -s daddr stime ltime dur spkts srate -c, > spktsSorted.dat`; 
> 
> where @arglist contains user command-line options, mainly the name of the input argus file. And @timeIneterval contains a time interval in a form like i1293864155+60s. The result is saved to spktsSorted.dat file in a comma separated format.
> 
> Now here is my problem: The argus files I have are originally sorted based on the ending time of a flow rather than the starting time of that flow. So when I run the racluster command, it will have no clue where are the flows that fall within the specified interval. It will simply search through the whole argus file, which is very expensive with huge files like the ones I'm working with. I used the option -N to limit the number of flows that racluster should find, and that reduced the time needed by the command significantly. But this is not a good solution since I might loose some flows. Or if the integer with the -N is larger than the number of flows the satisfy the specified constrains then I will have the original expensive exhaustive search problem.
> 
> So the question is: how can I cluster flows based on destination host IP within a specific time interval in a reasonable time, that is to cluster flows that were active during an interval that starts at x and ends at y based on their destination IP addresses?  
>  
> -Manaf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120111/a8e40150/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
Url : https://lists.andrew.cmu.edu/mailman/private/argus-info/attachments/20120111/a8e40150/attachment.bin 

------------------------------

_______________________________________________
Argus-info mailing list
Argus-info at lists.andrew.cmu.edu
https://lists.andrew.cmu.edu/mailman/listinfo/argus-info

End of Argus-info Digest, Vol 77, Issue 4
*****************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120111/0e118a53/attachment.html>