Huge argus files and racluster
Carter Bullard
carter at qosient.com
Wed Feb 8 06:57:07 EST 2012
Are you using the latest client programs?
http://qosient.com/argus/dev/argus-clients-latest.tar.gz
Carter
On Feb 8, 2012, at 5:57 AM, Marco <listaddr at gmail.com> wrote:
> Thanks, that was very useful. Now I'm running into another couple of issues.
>
> The first one is that rabin (invoked from ragraph) segfaults if the
> filter I specify does not match any flow. Can provide sample data if
> needed. In this specific case, the command line I'm using is
>
> $ ragraph sbytes dbytes -M 10s -n -r sample.argus -m proto -w
> mygraph.png -title "abc" - dst host 10.192.1.138
> sh: line 1: 19240 Segmentation fault /usr/bin/rabins -M hard zero
> -p6 -GL0 -s ltime sbytes dbytes -M 10s -n -r sample.argus -m proto -
> dst host 10.192.1.138 > /tmp/filepZbhuC
> usage: /usr/bin/ragraph metric (srcid | proto [daddr] | dport) [-title
> "title"] [ra-options]
> /usr/bin/ragraph: unable to create `/tmp/filepZbhuC.rrd': start time:
> unparsable time:
>
> The file sample.argus happens to not have any flow where the
> destination IP is 10.192.1.138.
>
> The second one is more of a philosophical issue, and now I'm wondering
> whether argus is really the tool I need for the task. Basically, since
> I need to determine incoming/outgoing bandwidth usage, I'm using
> "ragraph sbytes dbytes" to produce the graphics. If every flow is
> initiated from the LAN being monitored (say, 192.168.44.0/24), then
> "sbytes" effectively indicates the amount of physically outgoing data,
> and "dbytes" effectively indicates the amount of physically incoming
> data, so the resulting graph is an accurate representation of in/out
> bandwidth usage over time. But if there are externally-initiated flows
> (as in my case) along with internally-initiated ones, then "sbytes"
> will aggregate a mixture of data leaving and entering the network,
> depending on who initiated the flow being considered. The same happens
> for "dbytes". What this means is that a "ragraph sbytes dbytes" isn't
> really representing bandwidth usage in the two directions (the
> aggregate value "bytes" is still correct, but doesn't give any detail
> about what's outgoing and what's incoming).
> So obviously this isn't argus' fault, but I'm wondering whether
> there's a way to do what I'm looking for (with argus or another tool).
> It's also possible that I'm missing something obvious.
>
> Thanks again for the quick replies and your patience.
>
>
> Il 07 febbraio 2012 19:26, Carter Bullard <carter at qosient.com> ha scritto:
>> Hey Marco,
>> Argus is very good at not over or undercounting packets, so don't worry
>> about
>> the aggregation model and how it affects accuracy, that has been worked over
>> very well.
>>
>> Since you are interested in making sense of it all.
>> You should run racount.1 first.
>>
>> racount -r files -M proto addr
>>
>> You should be doing some very large aggregations, such as:
>>
>> racluster -m matrix/16 -r files -s stime dur saddr daddr pkts bytes - ip
>>
>> This will show you which CIDR /16 networks are talking to whom.
>>
>> If you want to know the list of IP addresses that are active:
>>
>> racluster -M rmon -m saddr -r files -w addrs.out - ip
>>
>> Then you can aggregate for the networks, or the countries or whatever:
>> racluster -r addrs.out -m saddr/24 -r files -s stime dur saddr spkts
>> dpkts sbytes dbytes - ip
>>
>> If you want to aggregate based on the country code, you need to use ralabel
>> to set the
>> country codes. Check out 'man ralabel' and 'man 5 ralabel' to see how to do
>> that, and you can
>> do that with the IP address file you created above:
>>
>> ralabel -f ralabel.country.code.conf -r addrs,out -w - | racluster -m
>> sco -w - | \
>> rasort -m sco -v -s stime dur sco spkts dpkts sbytes dbytes
>>
>>
>> There are the perl scripts:
>>
>> rahosts -r files
>> raports -r files
>>
>> These are pretty informative, and will server you well.
>> That should get you started.
>>
>> Carter
>>
>>
>> On Feb 7, 2012, at 11:38 AM, Marco wrote:
>>
>> Thanks for the detailed answer. I suppose a bit more of background on
>> what I'm trying to do is in order here. Basically, I've been handed
>> that 50GB pcap monster and been told to "make sense of it".
>> Essentially, it contains all the traffic to and from the Internet seen
>> on a particular LAN.
>> "making sense of it" basically means, in simple terms, finding out:
>>
>> - global bandwidth usage (incoming, outgoing)
>> - bandwidth usage by protocol (http, smtp, dns, etc.), again incoming
>> and outgoing
>> - traffic between specific source/destination hosts (possibly
>> including detailed protocol usage within that specific traffic)
>>
>> Ideally, I'd like to graph some or all of that information, but for
>> now I'm ok with running some command line query using racluster/rasort
>> to get textual tabular output.
>>
>> So, based on what I read, the first thing I was doing was trying to
>> summarize the pcap data into an argus file to use as a starting point,
>> and that file should ideally include exactly one entry per flow (where
>> flow==saddr daddr proto sport dport), because otherwise (if I
>> understand correctly) packets, bytes, etc. belonging to a specific
>> flow would be counted multiple times, which is not what I want (it's
>> entirely possible that I'm misunderstanding how argus works though).
>> Note that I'm mostly interested in aggregated numbers here rather than
>> detailed flow analysis. For example: I'd like to get all flows where
>> the protocol is TCP and dport is 80, then obtain aggregated sbytes and
>> dbytes for all those flows. Same for other well-known destination
>> ports.
>>
>> As it's probably clear by now, I'm a novice to argus, so any help
>> would be appreciated (including pointers to examples or other material
>> to study). Thanks for your help.
>>
>> 2012/2/7 Carter Bullard <carter at qosient.com>:
>>
>> Hey Marco,
>>
>> Regardless of what time range you work with, there will always be
>>
>> a flow that extends beyond that range. You have to figure out what
>>
>> you are trying to say with the data to decide if you need to count
>>
>> every connection only once.
>>
>>
>> If 5 or 10 or 15 minute files isn't attractive, racluster.1 provides you
>>
>> configuration options so you can efficiently track long term flows, but
>>
>> it is based on finding an effective idle timeout that will make persistent
>>
>> tracking work for your memory limits. See racluster.5. Most flows are
>>
>> finished in less than a second, and so keeping all of those flows in memory
>>
>> is a waste. Figuring out a good idle timeout strategy, however, is an art.
>>
>>
>> By default, racluster's idle timeout is "infinite" and so it holds each flow
>> in
>>
>> memory until the end of processing. If you decide that 600 seconds
>>
>> of idle time is sufficient to decide that the flow is done (120 works for
>>
>> most, except Windows boxes, which can send TCP Resets for
>>
>> connections that have been closed for well over 300 seconds), then
>>
>> a simple racluster.conf file of:
>>
>>
>> racluster.conf
>>
>> filter="" model="saddr daddr proto sport dport" status=0 idle=600
>>
>>
>> may keep you from running out of memory. If a flow hasn't seen any
>>
>> activity in 600 seconds, racluster.1 will report the flow and release
>>
>> its memory.
>>
>>
>> racluster -f racluster.conf -r your.files -w single.output.file
>>
>>
>> Improving on the aggregation model would include protocol and port
>>
>> specific idle time strategies, such as:
>>
>>
>> racluster.better.conf
>>
>> filter="udp and port domain" model="saddr daddr proto sport dport"
>> status=0 idle=10
>>
>> filter="udp" model="saddr daddr proto sport dport" status=0 idle=60
>>
>> filter="" model="saddr daddr proto sport dport" status=0 idle=600
>>
>>
>> The output data stream of this type of processing will be semi-sorted
>>
>> in last time seen order, rather than start time order, so that may be a
>>
>> consideration for you. Sorting currently is a memory hog, so don't
>>
>> expect to sort these records after you generate the single output file,
>>
>> without some strategy, like using rasplit.1.
>>
>>
>> Using state, such as TCP closing state to declare that a flow is done, is
>>
>> an attractive approach, but it has huge problems, and I don't recommend it.
>>
>>
>> rasqlinsert.1 is the tool of choice if you really would like to have 1 flow
>>
>> record per flow, and you're running out of resources.
>>
>>
>> Using argus-clients-3.0.5.31 from the developers thread of code,
>>
>> use rasqlinsert.1 with the caching option.
>>
>>
>> rasqlinsert -M cache -r your.files -w mysql://user@localhost/db/raOutfile
>>
>>
>> This causes rasqlinsert.1 to use a database table as its flow cache.
>>
>> Its pretty efficient so its not going to do a database transaction per
>>
>> record, if there would be aggregation, so you do get some wins.
>>
>> When its finished processing, then create your single file with:
>>
>>
>> rasql -r mysql://user@localhost/db/raOutfile -w single.output.file
>>
>>
>>
>> There are problems with any approach that aggregates over long periods
>>
>> time, because systems do reuse the 5-tuple flow attributes that make
>>
>> up a flow key much faster than you would think. This results in many
>> situations
>>
>> where multiple independent sessions will be reported as a single very
>>
>> long lived flow. This is particularly evident with DNS, where if you
>> aggregate
>>
>> over months, you find that you get fewer and fewer DNS transactions (they
>>
>> tend to approach somewhere around 32K) between host and server, and
>>
>> instead of lasting around 0.025 seconds, they seem to last for months.
>>
>>
>> I like 5 minute files, and if I need to understand what is going on just at
>>
>> the edge of two 5 minute boundaries, I read them both, and focus on the edge
>>
>> time boundary. Anything longer than that is another type of time domain,
>>
>> and there are lots of processing strategies for developing data at that
>> scale,
>>
>> that may be useful.
>>
>>
>> Carter
>>
>>
>>
>> On Feb 7, 2012, at 9:45 AM, Marco wrote:
>>
>>
>>
>> Thanks. But what about long-lived flows that last more than 5 minutes?
>>
>> Will they be merged or will they appear once per 5-minute file in the
>>
>> result? The whole point of clustering is having a single entry for
>>
>> each of them, AFAIK.
>>
>>
More information about the argus
mailing list