ra reads argus file very slow

Jesper Skou Jensen jesper.skou.jensen at uni-c.dk
Thu Oct 31 10:16:57 EDT 2013


What about avoiding writing to disk at all, but instead redirecting to 
/dev/null, that way it's only the reading that's an issue.

This should be very fast, as fast as your HDD can read.
time cat tmp/201320d-060000.argus > /dev/null

Using RA is much more CPU intensive, and will take considerable longer.
These two should take almost the same time.
time ra -r tmp/201320d-060000.argus > /dev/null
time ra -nr tmp/201320d-060000.argus > /dev/null

On my system a quick test shows that it takes about 430 times longer to 
process the ra commands than the cat.


--
Regards
Jesper

On 31-10-2013 14:04, Mike Slifcak wrote:
> Hi Zi.
>
> Have you ruled out operations on filesystems that might be 
> synchronized across machines?
> In other words, have you traced the directories above your working 
> directory to ensure that the working directory is wholly contained on 
> an internal disk drive, and not NFS mounted and not part of a 
> networked disk cluster?
>
> Is it possible that some other administrative process is using 'rsync' 
> or other process to synchronize remote disks, and that your working 
> directory is on a volume that is being synchronized to a remote system?
>
> Best regards,
> -Mike Slifcak, Expansion Unlimited LLC
>
> On 10/31/2013 08:37 AM, Carter Bullard wrote:
>> Hey Zi,
>> The only thing that ra() is doing is parsing the record stream and 
>> converting the fields
>> to ascii.  Converting the time can be very expensive on some 
>> machines, as we use
>> strftime() to get the format.
>>
>> Does running with the '-u' make a big difference ????
>>
>> Carter
>>
>> Carter Bullard, QoSient, LLC
>> 150 E. 57th Street Suite 12D
>> New York, New York 10022
>> +1 212 588-9133 Phone
>> +1 212 588-9134 Fax
>>
>> On Oct 31, 2013, at 3:41 AM, Zi Hu <zihu at usc.edu 
>> <mailto:zihu at usc.edu>> wrote:
>>
>>> On Wed, Oct 30, 2013 at 9:39 PM, Carter Bullard <carter at qosient.com
>>> <mailto:carter at qosient.com>> wrote:
>>>
>>>     Hey Zi,
>>>     Well, based on the performance of racount(), I'd say the subject 
>>> line is a little
>>>     off, in that we can read the file, and decode all the records 
>>> pretty quickly,...
>>>     10.29 seconds.   Looks like the ra* programs can process the 
>>> file faster than you
>>>     can cat() it, so I'd say the problem is in writing to the disk.  
>>> Maybe you have some
>>>     disk errors??  Did you check your system logs ???
>>>
>>> Hi, Carter,
>>> Thanks for your comments, but I didn't see any disk errors from the 
>>> system logs.
>>> Moreover, I don't think disk errors are the cause, since I "cat" and 
>>> "ra" the same file
>>> on the same machine. If I have some disk errors, they both should be 
>>> slow.  Besides,  I
>>> also copy the 2G argus file to another machine, still it takes more 
>>> than 80 minutes to
>>> read the file with "ra".
>>>
>>> I did another test:
>>> I made another ~2G argus file and run "ra" on it, this time it is 
>>> much faster (about 12
>>> minutes), although it is still slow compared to "cat" (about 24 
>>> seconds).
>>> zihu at proton:~$  time ra -r tmp/201320d-060000.argus -u > temp.dat
>>>
>>> real    11m55.636s
>>> user    11m16.636s
>>> sys     0m38.653s
>>>
>>> zihu at proton:~$ time cat tmp/201320d-060000.argus > temp.dat
>>>
>>> real    0m24.298s
>>> user    0m0.009s
>>> sys     0m3.747s
>>>
>>> zihu at proton:~$ time racount -r tmp/201320d-060000.argus
>>> racount   records     total_pkts     src_pkts dst_pkts       
>>> total_bytes
>>>  src_bytes          dst_bytes
>>>     sum   18357344    814467265      557563621 256903644      
>>> 937278498435
>>> 620800235862       316478262573
>>>
>>> real    0m10.753s
>>> user    0m9.902s
>>> sys     0m0.832s
>>>
>>>
>>> For me, it looks like "ra" runs fast on some files, while it becomes 
>>> slow on certain files.
>>> Do you have a reason why "ra" performs differently on different 
>>> files? Could this be a
>>> potential bug of "ra"?
>>> By the way, it is still not quite clear for me why the memory keeps 
>>> growing when I run
>>> the "ra" command.
>>>
>>> -Zi
>>>
>>>     Carter
>>>
>>>     On Oct 30, 2013, at 8:43 PM, Zi Hu <zihu at usc.edu 
>>> <mailto:zihu at usc.edu>> wrote:
>>>
>>>>
>>>>     Thanks for your reply, Carter.
>>>>
>>>>     On Wed, Oct 30, 2013 at 4:27 PM, Carter Bullard 
>>>> <carter at qosient.com
>>>>     <mailto:carter at qosient.com>> wrote:
>>>>
>>>>         Hey Zi,
>>>>         The only time I’ve seen ra() have problems reading and writing
>>>>         data, to the level you report, is when one tries to do DNS
>>>>         lookups to get the names of the IP addresses, instead of
>>>>         dotted decimal notation.
>>>>
>>>>
>>>>     By default, "ra" won't perform DNS lookups right? If this is 
>>>> true, given the
>>>>     command line I used in my experiment, I don't think it does DNS 
>>>> lookups.
>>>>     Besides, I also tried -nn option, it doesn't make much difference.
>>>>
>>>>         I can read about 2G of flow data in about 65 secs, on a
>>>>         standard machine, but I can cat() that file in about
>>>>         2.5 secs, so your machine may not be performing as well
>>>>         as you would want.
>>>>
>>>>         What version of argus and clients are you using??
>>>>
>>>>
>>>>     3.0.6
>>>>
>>>>         Do you have a .rarc file in your home directory?
>>>>
>>>>
>>>>     I don't see a .rarc file in my home directory.
>>>>
>>>>         What does a line of ra() output look like ?
>>>>
>>>>
>>>>     zihu at proton:~$ ra -r 
>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u
>>>>     | head
>>>>              StartTime      Flgs  Proto            SrcAddr Sport   Dir
>>>>      DstAddr  Dport  TotPkts   TotBytes State
>>>>      1378009800.024648  e           tcp 129.82.228.28.11021    <?>
>>>>     74.125.142.131.xmpp-*        2        144   CON
>>>>      1378009800.000000  e d         tcp 129.82.97.104.63194     ->
>>>>     129.82.224.179.https        10       1154   CON
>>>>      1378009800.132037  e           tcp 129.82.12.68.57547     ->
>>>>     75.130.96.44.59943         2       1414   CON
>>>>      1378009800.131337  e           udp 129.82.12.66.44115    <->
>>>>      131.254.208.196.44295         2        234   CON
>>>>      1378009800.000000  e d         tcp 129.82.227.103.ica      <?>
>>>>     129.82.97.52.49341        11        882   CON
>>>>      1378009800.173511  e           udp 129.82.12.66.44115    <->
>>>>     211.69.207.154.38275         2        215   CON
>>>>      1378009800.619227  e          icmp 129.82.12.68.0x0303    ->
>>>>      143.215.131.247.0xd782        1        102   URP
>>>>      1378009800.623714  e          icmp 129.82.12.68.0x0303    ->
>>>>      143.215.131.247.0xd882        1        102   URP
>>>>      1378009800.719767  e          icmp 192.43.217.17.0x000b    ->
>>>>     129.82.12.68.0x0000        1         70   TXD
>>>>
>>>>
>>>>
>>>>     Besides, the following is some information about the 2G argus 
>>>> file on my machine,
>>>>     not sure if this can help you to diagnose the issue.
>>>>     zihu at proton:~$ time racount -r
>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus
>>>>     racount   records     total_pkts     src_pkts dst_pkts       
>>>> total_bytes
>>>>        src_bytes          dst_bytes
>>>>         sum   20327732    127070924      81280364 45790560       
>>>> 108939377747
>>>>       66625641107        42313736640
>>>>
>>>>     real0m10.297s
>>>>     user0m9.478s
>>>>     sys0m0.780s
>>>>
>>>>
>>>>     thanks
>>>>     -Zi
>>>>
>>>>         Carter
>>>>
>>>>
>>>>
>>>>         On Oct 30, 2013, at 6:34 PM, Zi Hu <zihu at usc.edu 
>>>> <mailto:zihu at usc.edu>> wrote:
>>>>
>>>>>         Hi, Carter,
>>>>>
>>>>>         In my application, I need a simple tool to read what it is 
>>>>> in the argus file,
>>>>>         then output certain fields that I am interested in ascii 
>>>>> format, such as
>>>>>         srcip, dstip, sport, dport. protocol, ....
>>>>>
>>>>>         I thought the command "ra" is what I need. However, I find 
>>>>> it is very slow to
>>>>>         read the argus data with "ra".  I did a small experiment: 
>>>>> dump the same argus
>>>>>         file (about 2G) with both "ra" and "cat".
>>>>>         Using the "ra" command, it took me about 87 minutes to 
>>>>> read the file, while it
>>>>>         took only 40 seconds to dump it with "cat".  and also I 
>>>>> notice that the memory
>>>>>         keeps growing when I am running "ra".
>>>>>
>>>>>         zihu at proton:~$ time cat
>>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus > temp.dat
>>>>>
>>>>>         real    0m39.490s
>>>>>         user    0m0.027s
>>>>>         sys     0m4.204s
>>>>>         zihu at proton:~$ time ra -r
>>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u > 
>>>>> temp.dat
>>>>>
>>>>>         real    87m40.973s
>>>>>         user    86m42.397s
>>>>>         sys     0m56.256s
>>>>>         zihu at proton:~$
>>>>>
>>>>>
>>>>>         So I guess "ra" does more than just reading the argus 
>>>>> file, formatting and
>>>>>         outputing the result.   Does "ra" keep track of flows in 
>>>>> memory so that the
>>>>>         memory keeps growing ?
>>>>>
>>>>>         If "ra" is not the right choice for my application, then 
>>>>> what's the right
>>>>>         command for this simple application? Or if we don't have 
>>>>> such a tool, I am
>>>>>         thinking of writing one by myself. Could you point me 
>>>>> where to start?  Any
>>>>>         suggestions are welcomed.
>>>>>
>>>>>
>>>>>         Thanks
>>>>>         -Zi
>>>>
>>>>
>>>




More information about the argus mailing list