ra reads argus file very slow

Zi Hu zihu at usc.edu
Thu Oct 31 17:45:24 EDT 2013


On Thu, Oct 31, 2013 at 5:37 AM, Carter Bullard <carter at qosient.com> wrote:

> Hey Zi,
> The only thing that ra() is doing is parsing the record stream and
> converting the fields to ascii.  Converting the time can be very expensive
> on some machines, as we use strftime() to get the format.
>
> Does running with the '-u' make a big difference ????
>

No.
I just repeat the test without the "-u" option and find that running "ra"
without the '-u' option doesn't make much difference. Still very slow.

-Zi


>
> Carter
>
> Carter Bullard, QoSient, LLC
> 150 E. 57th Street Suite 12D
> New York, New York 10022
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
>
> On Oct 31, 2013, at 3:41 AM, Zi Hu <zihu at usc.edu> wrote:
>
> On Wed, Oct 30, 2013 at 9:39 PM, Carter Bullard <carter at qosient.com>wrote:
>
>> Hey Zi,
>> Well, based on the performance of racount(), I'd say the subject line is
>> a little off, in that we can read the file, and decode all the records
>> pretty quickly,... 10.29 seconds.   Looks like the ra* programs can process
>> the file faster than you can cat() it, so I'd say the problem is in writing
>> to the disk.  Maybe you have some disk errors??  Did you check your system
>> logs ???
>>
>>
> Hi, Carter,
> Thanks for your comments, but I didn't see any disk errors from the system
> logs.
> Moreover, I don't think disk errors are the cause, since I "cat" and "ra"
> the same file on the same machine. If I have some disk errors, they both
> should be slow.  Besides,  I also copy the 2G argus file to another
> machine, still it takes more than 80 minutes to read the file with "ra".
>
> I did another test:
> I made another ~2G argus file and run "ra" on it, this time it is much
> faster (about 12 minutes), although it is still slow compared to "cat"
> (about 24 seconds).
> zihu at proton:~$  time ra -r tmp/201320d-060000.argus -u > temp.dat
>
> real    11m55.636s
> user    11m16.636s
> sys     0m38.653s
>
> zihu at proton:~$ time cat tmp/201320d-060000.argus > temp.dat
>
> real    0m24.298s
> user    0m0.009s
> sys     0m3.747s
>
> zihu at proton:~$ time racount -r tmp/201320d-060000.argus
> racount   records     total_pkts     src_pkts       dst_pkts
> total_bytes        src_bytes          dst_bytes
>     sum   18357344    814467265      557563621      256903644
>  937278498435       620800235862       316478262573
>
> real    0m10.753s
> user    0m9.902s
> sys     0m0.832s
>
>
> For me, it looks like "ra" runs fast on some files, while it becomes slow
> on certain files.
> Do you have a reason why "ra" performs differently on different files?
> Could this be a potential bug of "ra"?
> By the way, it is still not quite clear for me why the memory keeps
> growing when I run the "ra" command.
>
> -Zi
>
>
>
>> Carter
>>
>> On Oct 30, 2013, at 8:43 PM, Zi Hu <zihu at usc.edu> wrote:
>>
>>
>> Thanks for your reply, Carter.
>>
>> On Wed, Oct 30, 2013 at 4:27 PM, Carter Bullard <carter at qosient.com>wrote:
>>
>>> Hey Zi,
>>> The only time I’ve seen ra() have problems reading and writing
>>> data, to the level you report, is when one tries to do DNS
>>> lookups to get the names of the IP addresses, instead of
>>> dotted decimal notation.
>>>
>>>
>> By default, "ra" won't perform DNS lookups right? If this is true, given
>> the command line I used in my experiment, I don't think it does DNS
>> lookups.
>> Besides, I also tried -nn option, it doesn't make much difference.
>>
>>
>>> I can read about 2G of flow data in about 65 secs, on a
>>> standard machine, but I can cat() that file in about
>>> 2.5 secs, so your machine may not be performing as well
>>> as you would want.
>>>
>>> What version of argus and clients are you using??
>>>
>>
>> 3.0.6
>>
>>
>>> Do you have a .rarc file in your home directory?
>>>
>>
>> I don't see a .rarc file in my home directory.
>>
>>
>>
>>> What does a line of ra() output look like ?
>>>
>>>
>> zihu at proton:~$ ra -r
>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u | head
>>          StartTime      Flgs  Proto            SrcAddr  Sport   Dir
>>      DstAddr  Dport  TotPkts   TotBytes State
>>  1378009800.024648  e           tcp      129.82.228.28.11021    <?>
>> 74.125.142.131.xmpp-*        2        144   CON
>>  1378009800.000000  e d         tcp      129.82.97.104.63194     ->
>> 129.82.224.179.https        10       1154   CON
>>  1378009800.132037  e           tcp       129.82.12.68.57547     ->
>> 75.130.96.44.59943         2       1414   CON
>>  1378009800.131337  e           udp       129.82.12.66.44115    <->
>>  131.254.208.196.44295         2        234   CON
>>  1378009800.000000  e d         tcp     129.82.227.103.ica      <?>
>> 129.82.97.52.49341        11        882   CON
>>  1378009800.173511  e           udp       129.82.12.66.44115    <->
>> 211.69.207.154.38275         2        215   CON
>>  1378009800.619227  e          icmp       129.82.12.68.0x0303    ->
>>  143.215.131.247.0xd782        1        102   URP
>>  1378009800.623714  e          icmp       129.82.12.68.0x0303    ->
>>  143.215.131.247.0xd882        1        102   URP
>>  1378009800.719767  e          icmp      192.43.217.17.0x000b    ->
>> 129.82.12.68.0x0000        1         70   TXD
>>
>>
>>
>> Besides, the following is some information about the 2G argus file on my
>> machine, not sure if this can help you to diagnose the issue.
>> zihu at proton:~$ time racount -r
>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus
>> racount   records     total_pkts     src_pkts       dst_pkts
>> total_bytes        src_bytes          dst_bytes
>>     sum   20327732    127070924      81280364       45790560
>> 108939377747       66625641107        42313736640
>>
>> real 0m10.297s
>> user 0m9.478s
>> sys 0m0.780s
>>
>>
>> thanks
>> -Zi
>>
>>
>>
>>>  Carter
>>>
>>>
>>>
>>> On Oct 30, 2013, at 6:34 PM, Zi Hu <zihu at usc.edu> wrote:
>>>
>>> Hi, Carter,
>>>
>>> In my application, I need a simple tool to read what it is in the argus
>>> file, then output certain fields that I am interested in ascii format, such
>>> as srcip, dstip, sport, dport. protocol, ....
>>>
>>> I thought the command "ra" is what I need. However, I find it is very
>>> slow to read the argus data with "ra".  I did a small experiment: dump the
>>> same argus file (about 2G) with both "ra" and "cat".
>>> Using the "ra" command, it took me about 87 minutes to read the file,
>>> while it took only 40 seconds to dump it with "cat".  and also I notice
>>> that the memory keeps growing when I am running "ra".
>>>
>>> zihu at proton:~$ time cat
>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus > temp.dat
>>>
>>> real    0m39.490s
>>> user    0m0.027s
>>> sys     0m4.204s
>>> zihu at proton:~$ time ra -r
>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u > temp.dat
>>>
>>> real    87m40.973s
>>> user    86m42.397s
>>> sys     0m56.256s
>>> zihu at proton:~$
>>>
>>>
>>>
>>> So I guess "ra" does more than just reading the argus file, formatting
>>> and outputing the result.   Does "ra" keep track of flows in memory so that
>>> the memory keeps growing ?
>>>
>>> If "ra" is not the right choice for my application, then what's the
>>> right command for this simple application? Or if we don't have such a tool,
>>> I am thinking of writing one by myself. Could you point me where to start?
>>>  Any suggestions are welcomed.
>>>
>>>
>>> Thanks
>>> -Zi
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20131031/8c4afea2/attachment.html>


More information about the argus mailing list