Huge argus files and racluster

Wed Feb 8 06:57:07 EST 2012

Are you using the latest client programs?

http://qosient.com/argus/dev/argus-clients-latest.tar.gz

Carter

On Feb 8, 2012, at 5:57 AM, Marco <listaddr at gmail.com> wrote:

> Thanks, that was very useful. Now I'm running into another couple of issues.
> 
> The first one is that rabin (invoked from ragraph) segfaults if the
> filter I specify does not match any flow. Can provide sample data if
> needed. In this specific case, the command line I'm using is
> 
> $ ragraph sbytes dbytes  -M 10s -n -r sample.argus -m proto  -w
> mygraph.png -title "abc" - dst host 10.192.1.138
> sh: line 1: 19240 Segmentation fault      /usr/bin/rabins -M hard zero
> -p6 -GL0 -s ltime sbytes dbytes -M 10s -n -r sample.argus -m proto -
> dst host 10.192.1.138 > /tmp/filepZbhuC
> usage: /usr/bin/ragraph metric (srcid | proto [daddr] | dport) [-title
> "title"] [ra-options]
> /usr/bin/ragraph: unable to create `/tmp/filepZbhuC.rrd': start time:
> unparsable time:
> 
> The file sample.argus happens to not have any flow where the
> destination IP is 10.192.1.138.
> 
> The second one is more of a philosophical issue, and now I'm wondering
> whether argus is really the tool I need for the task. Basically, since
> I need to determine incoming/outgoing bandwidth usage, I'm using
> "ragraph sbytes dbytes" to produce the graphics. If every flow is
> initiated from the LAN being monitored (say, 192.168.44.0/24), then
> "sbytes" effectively indicates the amount of physically outgoing data,
> and "dbytes" effectively indicates the amount of physically incoming
> data, so the resulting graph is an accurate representation of in/out
> bandwidth usage over time. But if there are externally-initiated flows
> (as in my case) along with internally-initiated ones, then "sbytes"
> will aggregate a mixture of data leaving and entering the network,
> depending on who initiated the flow being considered. The same happens
> for "dbytes". What this means is that a "ragraph sbytes dbytes" isn't
> really representing bandwidth usage in the two directions (the
> aggregate value "bytes" is still correct, but doesn't give any detail
> about what's outgoing and what's incoming).
> So obviously this isn't argus' fault, but I'm wondering whether
> there's a way to do what I'm looking for (with argus or another tool).
> It's also possible that I'm missing something obvious.
> 
> Thanks again for the quick replies and your patience.
> 
> 
> Il 07 febbraio 2012 19:26, Carter Bullard <carter at qosient.com> ha scritto:
>> Hey Marco,
>> Argus is very good at not over or undercounting packets, so don't worry
>> about
>> the aggregation model and how it affects accuracy, that has been worked over
>> very well.
>> 
>> Since you are interested in making sense of it all.
>> You should run racount.1 first.
>> 
>>    racount -r files -M proto addr
>> 
>> You should be doing some very large aggregations, such as:
>> 
>>    racluster -m matrix/16 -r files -s stime dur saddr daddr pkts bytes - ip
>> 
>> This will show you which CIDR /16 networks are talking to whom.
>> 
>> If you want to know the list of IP addresses that are active:
>> 
>>    racluster -M rmon -m saddr -r files -w addrs.out - ip
>> 
>> Then you can aggregate for the networks, or the countries or whatever:
>>    racluster -r addrs.out -m saddr/24 -r files -s stime dur saddr spkts
>> dpkts sbytes dbytes -  ip
>> 
>> If you want to aggregate based on the country code, you need to use ralabel
>> to set the
>> country codes.  Check out 'man ralabel' and 'man 5 ralabel' to see how to do
>> that, and you can
>> do that with the IP address file you created above:
>> 
>>    ralabel -f ralabel.country.code.conf -r addrs,out -w - |  racluster -m
>> sco -w - | \
>>       rasort -m sco -v -s stime dur sco spkts dpkts sbytes dbytes
>> 
>> 
>> There are the perl scripts:
>> 
>>    rahosts -r files
>>    raports -r files
>> 
>> These are pretty informative, and will server you well.
>> That should get you started.
>> 
>> Carter
>> 
>> 
>> On Feb 7, 2012, at 11:38 AM, Marco wrote:
>> 
>> Thanks for the detailed answer. I suppose a bit more of background on
>> what I'm trying to do is in order here. Basically, I've been handed
>> that 50GB pcap monster and been told to "make sense of it".
>> Essentially, it contains all the traffic to and from the Internet seen
>> on a particular LAN.
>> "making sense of it" basically means, in simple terms, finding out:
>> 
>> - global bandwidth usage (incoming, outgoing)
>> - bandwidth usage by protocol (http, smtp, dns, etc.), again incoming
>> and outgoing
>> - traffic between specific source/destination hosts (possibly
>> including detailed protocol usage within that specific traffic)
>> 
>> Ideally, I'd like to graph some or all of that information, but for
>> now I'm ok with running some command line query using racluster/rasort
>> to get textual tabular output.
>> 
>> So, based on what I read, the first thing I was doing was trying to
>> summarize the pcap data into an argus file to use as a starting point,
>> and that file should ideally include exactly one entry per flow (where
>> flow==saddr daddr proto sport dport), because otherwise (if I
>> understand correctly) packets, bytes, etc. belonging to a specific
>> flow would be counted multiple times, which is not what I want (it's
>> entirely possible that I'm misunderstanding how argus works though).
>> Note that I'm mostly interested in aggregated numbers here rather than
>> detailed flow analysis. For example: I'd like to get all flows where
>> the protocol is TCP and dport is 80, then obtain aggregated sbytes and
>> dbytes for all those flows. Same for other well-known destination
>> ports.
>> 
>> As it's probably clear by now, I'm a novice to argus, so any help
>> would be appreciated (including pointers to examples or other material
>> to study). Thanks for your help.
>> 
>> 2012/2/7 Carter Bullard <carter at qosient.com>:
>> 
>> Hey Marco,
>> 
>> Regardless of what time range you work with, there will always be
>> 
>> a flow that extends beyond that range.  You have to figure out what
>> 
>> you are trying to say with the data to decide if you need to count
>> 
>> every connection only once.
>> 
>> 
>> If 5 or 10 or 15 minute files isn't attractive, racluster.1 provides you
>> 
>> configuration options so you can efficiently track long term flows, but
>> 
>> it is based on finding an effective idle timeout that will make persistent
>> 
>> tracking work for your memory limits.  See racluster.5.  Most flows are
>> 
>> finished in less than a second, and so keeping all of those flows in memory
>> 
>> is a waste.  Figuring out a good idle timeout strategy, however, is an art.
>> 
>> 
>> By default, racluster's idle timeout is "infinite" and so it holds each flow
>> in
>> 
>> memory until the end of processing.  If you decide that 600 seconds
>> 
>> of idle time is sufficient to decide that the flow is done (120 works for
>> 
>> most, except Windows boxes, which can send TCP Resets for
>> 
>> connections that have been closed for well over 300 seconds), then
>> 
>> a simple racluster.conf file of:
>> 
>> 
>> racluster.conf
>> 
>>    filter="" model="saddr daddr proto sport dport" status=0 idle=600
>> 
>> 
>> may keep you from running out of memory.  If a flow hasn't seen any
>> 
>> activity in 600 seconds, racluster.1 will report the flow and release
>> 
>> its memory.
>> 
>> 
>>    racluster -f racluster.conf -r your.files -w single.output.file
>> 
>> 
>> Improving on the aggregation model would include protocol and port
>> 
>> specific idle time strategies, such as:
>> 
>> 
>> racluster.better.conf
>> 
>>    filter="udp and port domain" model="saddr daddr proto sport dport"
>> status=0 idle=10
>> 
>>    filter="udp" model="saddr daddr proto sport dport" status=0 idle=60
>> 
>>    filter="" model="saddr daddr proto sport dport" status=0 idle=600
>> 
>> 
>> The output data stream of this type of processing will be semi-sorted
>> 
>> in last time seen order, rather than start time order, so that may be a
>> 
>> consideration for you.  Sorting currently is a memory hog, so don't
>> 
>> expect to sort these records after you generate the single output file,
>> 
>> without some strategy, like using rasplit.1.
>> 
>> 
>> Using state, such as TCP closing state to declare that a flow is done, is
>> 
>> an attractive approach, but it has huge problems, and I don't recommend it.
>> 
>> 
>> rasqlinsert.1 is the tool of choice if you really would like to have 1 flow
>> 
>> record per flow, and you're running out of resources.
>> 
>> 
>> Using argus-clients-3.0.5.31 from the developers thread of code,
>> 
>> use rasqlinsert.1 with the caching option.
>> 
>> 
>>   rasqlinsert -M cache -r your.files -w mysql://user@localhost/db/raOutfile
>> 
>> 
>> This causes rasqlinsert.1 to use a database table as its flow cache.
>> 
>> Its pretty efficient so its not going to do a database transaction per
>> 
>> record, if there would be aggregation, so you do get some wins.
>> 
>> When its finished processing, then create your single file with:
>> 
>> 
>>   rasql -r mysql://user@localhost/db/raOutfile -w single.output.file
>> 
>> 
>> 
>> There are problems with any approach that aggregates over long periods
>> 
>> time, because systems do reuse the 5-tuple flow attributes that make
>> 
>> up a flow key much faster than you would think.  This results in many
>> situations
>> 
>> where multiple independent sessions will be reported as a single very
>> 
>> long lived flow.  This is particularly evident with DNS, where if you
>> aggregate
>> 
>> over months, you find that you get fewer and fewer DNS transactions (they
>> 
>> tend to approach somewhere around 32K) between host and server, and
>> 
>> instead of lasting around 0.025 seconds, they seem to last for months.
>> 
>> 
>> I like 5 minute files, and if I need to understand what is going on just at
>> 
>> the edge of two 5 minute boundaries, I read them both, and focus on the edge
>> 
>> time boundary.  Anything longer than that is another type of time domain,
>> 
>> and there are lots of processing strategies for developing data at that
>> scale,
>> 
>> that may be useful.
>> 
>> 
>> Carter
>> 
>> 
>> 
>> On Feb 7, 2012, at 9:45 AM, Marco wrote:
>> 
>> 
>> 
>> Thanks. But what about long-lived flows that last more than 5 minutes?
>> 
>> Will they be merged or will they appear once per 5-minute file in the
>> 
>> result? The whole point of clustering is having a single entry for
>> 
>> each of them, AFAIK.
>> 
>>