ra reads argus file very slow
Zi Hu
zihu at usc.edu
Thu Oct 31 17:45:24 EDT 2013
On Thu, Oct 31, 2013 at 5:37 AM, Carter Bullard <carter at qosient.com> wrote:
> Hey Zi,
> The only thing that ra() is doing is parsing the record stream and
> converting the fields to ascii. Converting the time can be very expensive
> on some machines, as we use strftime() to get the format.
>
> Does running with the '-u' make a big difference ????
>
No.
I just repeat the test without the "-u" option and find that running "ra"
without the '-u' option doesn't make much difference. Still very slow.
-Zi
>
> Carter
>
> Carter Bullard, QoSient, LLC
> 150 E. 57th Street Suite 12D
> New York, New York 10022
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
>
> On Oct 31, 2013, at 3:41 AM, Zi Hu <zihu at usc.edu> wrote:
>
> On Wed, Oct 30, 2013 at 9:39 PM, Carter Bullard <carter at qosient.com>wrote:
>
>> Hey Zi,
>> Well, based on the performance of racount(), I'd say the subject line is
>> a little off, in that we can read the file, and decode all the records
>> pretty quickly,... 10.29 seconds. Looks like the ra* programs can process
>> the file faster than you can cat() it, so I'd say the problem is in writing
>> to the disk. Maybe you have some disk errors?? Did you check your system
>> logs ???
>>
>>
> Hi, Carter,
> Thanks for your comments, but I didn't see any disk errors from the system
> logs.
> Moreover, I don't think disk errors are the cause, since I "cat" and "ra"
> the same file on the same machine. If I have some disk errors, they both
> should be slow. Besides, I also copy the 2G argus file to another
> machine, still it takes more than 80 minutes to read the file with "ra".
>
> I did another test:
> I made another ~2G argus file and run "ra" on it, this time it is much
> faster (about 12 minutes), although it is still slow compared to "cat"
> (about 24 seconds).
> zihu at proton:~$ time ra -r tmp/201320d-060000.argus -u > temp.dat
>
> real 11m55.636s
> user 11m16.636s
> sys 0m38.653s
>
> zihu at proton:~$ time cat tmp/201320d-060000.argus > temp.dat
>
> real 0m24.298s
> user 0m0.009s
> sys 0m3.747s
>
> zihu at proton:~$ time racount -r tmp/201320d-060000.argus
> racount records total_pkts src_pkts dst_pkts
> total_bytes src_bytes dst_bytes
> sum 18357344 814467265 557563621 256903644
> 937278498435 620800235862 316478262573
>
> real 0m10.753s
> user 0m9.902s
> sys 0m0.832s
>
>
> For me, it looks like "ra" runs fast on some files, while it becomes slow
> on certain files.
> Do you have a reason why "ra" performs differently on different files?
> Could this be a potential bug of "ra"?
> By the way, it is still not quite clear for me why the memory keeps
> growing when I run the "ra" command.
>
> -Zi
>
>
>
>> Carter
>>
>> On Oct 30, 2013, at 8:43 PM, Zi Hu <zihu at usc.edu> wrote:
>>
>>
>> Thanks for your reply, Carter.
>>
>> On Wed, Oct 30, 2013 at 4:27 PM, Carter Bullard <carter at qosient.com>wrote:
>>
>>> Hey Zi,
>>> The only time I’ve seen ra() have problems reading and writing
>>> data, to the level you report, is when one tries to do DNS
>>> lookups to get the names of the IP addresses, instead of
>>> dotted decimal notation.
>>>
>>>
>> By default, "ra" won't perform DNS lookups right? If this is true, given
>> the command line I used in my experiment, I don't think it does DNS
>> lookups.
>> Besides, I also tried -nn option, it doesn't make much difference.
>>
>>
>>> I can read about 2G of flow data in about 65 secs, on a
>>> standard machine, but I can cat() that file in about
>>> 2.5 secs, so your machine may not be performing as well
>>> as you would want.
>>>
>>> What version of argus and clients are you using??
>>>
>>
>> 3.0.6
>>
>>
>>> Do you have a .rarc file in your home directory?
>>>
>>
>> I don't see a .rarc file in my home directory.
>>
>>
>>
>>> What does a line of ra() output look like ?
>>>
>>>
>> zihu at proton:~$ ra -r
>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u | head
>> StartTime Flgs Proto SrcAddr Sport Dir
>> DstAddr Dport TotPkts TotBytes State
>> 1378009800.024648 e tcp 129.82.228.28.11021 <?>
>> 74.125.142.131.xmpp-* 2 144 CON
>> 1378009800.000000 e d tcp 129.82.97.104.63194 ->
>> 129.82.224.179.https 10 1154 CON
>> 1378009800.132037 e tcp 129.82.12.68.57547 ->
>> 75.130.96.44.59943 2 1414 CON
>> 1378009800.131337 e udp 129.82.12.66.44115 <->
>> 131.254.208.196.44295 2 234 CON
>> 1378009800.000000 e d tcp 129.82.227.103.ica <?>
>> 129.82.97.52.49341 11 882 CON
>> 1378009800.173511 e udp 129.82.12.66.44115 <->
>> 211.69.207.154.38275 2 215 CON
>> 1378009800.619227 e icmp 129.82.12.68.0x0303 ->
>> 143.215.131.247.0xd782 1 102 URP
>> 1378009800.623714 e icmp 129.82.12.68.0x0303 ->
>> 143.215.131.247.0xd882 1 102 URP
>> 1378009800.719767 e icmp 192.43.217.17.0x000b ->
>> 129.82.12.68.0x0000 1 70 TXD
>>
>>
>>
>> Besides, the following is some information about the 2G argus file on my
>> machine, not sure if this can help you to diagnose the issue.
>> zihu at proton:~$ time racount -r
>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus
>> racount records total_pkts src_pkts dst_pkts
>> total_bytes src_bytes dst_bytes
>> sum 20327732 127070924 81280364 45790560
>> 108939377747 66625641107 42313736640
>>
>> real 0m10.297s
>> user 0m9.478s
>> sys 0m0.780s
>>
>>
>> thanks
>> -Zi
>>
>>
>>
>>> Carter
>>>
>>>
>>>
>>> On Oct 30, 2013, at 6:34 PM, Zi Hu <zihu at usc.edu> wrote:
>>>
>>> Hi, Carter,
>>>
>>> In my application, I need a simple tool to read what it is in the argus
>>> file, then output certain fields that I am interested in ascii format, such
>>> as srcip, dstip, sport, dport. protocol, ....
>>>
>>> I thought the command "ra" is what I need. However, I find it is very
>>> slow to read the argus data with "ra". I did a small experiment: dump the
>>> same argus file (about 2G) with both "ra" and "cat".
>>> Using the "ra" command, it took me about 87 minutes to read the file,
>>> while it took only 40 seconds to dump it with "cat". and also I notice
>>> that the memory keeps growing when I am running "ra".
>>>
>>> zihu at proton:~$ time cat
>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus > temp.dat
>>>
>>> real 0m39.490s
>>> user 0m0.027s
>>> sys 0m4.204s
>>> zihu at proton:~$ time ra -r
>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u > temp.dat
>>>
>>> real 87m40.973s
>>> user 86m42.397s
>>> sys 0m56.256s
>>> zihu at proton:~$
>>>
>>>
>>>
>>> So I guess "ra" does more than just reading the argus file, formatting
>>> and outputing the result. Does "ra" keep track of flows in memory so that
>>> the memory keeps growing ?
>>>
>>> If "ra" is not the right choice for my application, then what's the
>>> right command for this simple application? Or if we don't have such a tool,
>>> I am thinking of writing one by myself. Could you point me where to start?
>>> Any suggestions are welcomed.
>>>
>>>
>>> Thanks
>>> -Zi
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20131031/8c4afea2/attachment.html>
More information about the argus
mailing list