ra reads argus file very slow
Mike Slifcak
slif at bellsouth.net
Thu Oct 31 09:04:14 EDT 2013
Hi Zi.
Have you ruled out operations on filesystems that might be synchronized across machines?
In other words, have you traced the directories above your working directory to ensure
that the working directory is wholly contained on an internal disk drive, and not NFS
mounted and not part of a networked disk cluster?
Is it possible that some other administrative process is using 'rsync' or other process to
synchronize remote disks, and that your working directory is on a volume that is being
synchronized to a remote system?
Best regards,
-Mike Slifcak, Expansion Unlimited LLC
On 10/31/2013 08:37 AM, Carter Bullard wrote:
> Hey Zi,
> The only thing that ra() is doing is parsing the record stream and converting the fields
> to ascii. Converting the time can be very expensive on some machines, as we use
> strftime() to get the format.
>
> Does running with the '-u' make a big difference ????
>
> Carter
>
> Carter Bullard, QoSient, LLC
> 150 E. 57th Street Suite 12D
> New York, New York 10022
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
>
> On Oct 31, 2013, at 3:41 AM, Zi Hu <zihu at usc.edu <mailto:zihu at usc.edu>> wrote:
>
>> On Wed, Oct 30, 2013 at 9:39 PM, Carter Bullard <carter at qosient.com
>> <mailto:carter at qosient.com>> wrote:
>>
>> Hey Zi,
>> Well, based on the performance of racount(), I'd say the subject line is a little
>> off, in that we can read the file, and decode all the records pretty quickly,...
>> 10.29 seconds. Looks like the ra* programs can process the file faster than you
>> can cat() it, so I'd say the problem is in writing to the disk. Maybe you have some
>> disk errors?? Did you check your system logs ???
>>
>> Hi, Carter,
>> Thanks for your comments, but I didn't see any disk errors from the system logs.
>> Moreover, I don't think disk errors are the cause, since I "cat" and "ra" the same file
>> on the same machine. If I have some disk errors, they both should be slow. Besides, I
>> also copy the 2G argus file to another machine, still it takes more than 80 minutes to
>> read the file with "ra".
>>
>> I did another test:
>> I made another ~2G argus file and run "ra" on it, this time it is much faster (about 12
>> minutes), although it is still slow compared to "cat" (about 24 seconds).
>> zihu at proton:~$ time ra -r tmp/201320d-060000.argus -u > temp.dat
>>
>> real 11m55.636s
>> user 11m16.636s
>> sys 0m38.653s
>>
>> zihu at proton:~$ time cat tmp/201320d-060000.argus > temp.dat
>>
>> real 0m24.298s
>> user 0m0.009s
>> sys 0m3.747s
>>
>> zihu at proton:~$ time racount -r tmp/201320d-060000.argus
>> racount records total_pkts src_pkts dst_pkts total_bytes
>> src_bytes dst_bytes
>> sum 18357344 814467265 557563621 256903644 937278498435
>> 620800235862 316478262573
>>
>> real 0m10.753s
>> user 0m9.902s
>> sys 0m0.832s
>>
>>
>> For me, it looks like "ra" runs fast on some files, while it becomes slow on certain files.
>> Do you have a reason why "ra" performs differently on different files? Could this be a
>> potential bug of "ra"?
>> By the way, it is still not quite clear for me why the memory keeps growing when I run
>> the "ra" command.
>>
>> -Zi
>>
>> Carter
>>
>> On Oct 30, 2013, at 8:43 PM, Zi Hu <zihu at usc.edu <mailto:zihu at usc.edu>> wrote:
>>
>>>
>>> Thanks for your reply, Carter.
>>>
>>> On Wed, Oct 30, 2013 at 4:27 PM, Carter Bullard <carter at qosient.com
>>> <mailto:carter at qosient.com>> wrote:
>>>
>>> Hey Zi,
>>> The only time I’ve seen ra() have problems reading and writing
>>> data, to the level you report, is when one tries to do DNS
>>> lookups to get the names of the IP addresses, instead of
>>> dotted decimal notation.
>>>
>>>
>>> By default, "ra" won't perform DNS lookups right? If this is true, given the
>>> command line I used in my experiment, I don't think it does DNS lookups.
>>> Besides, I also tried -nn option, it doesn't make much difference.
>>>
>>> I can read about 2G of flow data in about 65 secs, on a
>>> standard machine, but I can cat() that file in about
>>> 2.5 secs, so your machine may not be performing as well
>>> as you would want.
>>>
>>> What version of argus and clients are you using??
>>>
>>>
>>> 3.0.6
>>>
>>> Do you have a .rarc file in your home directory?
>>>
>>>
>>> I don't see a .rarc file in my home directory.
>>>
>>> What does a line of ra() output look like ?
>>>
>>>
>>> zihu at proton:~$ ra -r 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u
>>> | head
>>> StartTime Flgs Proto SrcAddr Sport Dir
>>> DstAddr Dport TotPkts TotBytes State
>>> 1378009800.024648 e tcp 129.82.228.28.11021 <?>
>>> 74.125.142.131.xmpp-* 2 144 CON
>>> 1378009800.000000 e d tcp 129.82.97.104.63194 ->
>>> 129.82.224.179.https 10 1154 CON
>>> 1378009800.132037 e tcp 129.82.12.68.57547 ->
>>> 75.130.96.44.59943 2 1414 CON
>>> 1378009800.131337 e udp 129.82.12.66.44115 <->
>>> 131.254.208.196.44295 2 234 CON
>>> 1378009800.000000 e d tcp 129.82.227.103.ica <?>
>>> 129.82.97.52.49341 11 882 CON
>>> 1378009800.173511 e udp 129.82.12.66.44115 <->
>>> 211.69.207.154.38275 2 215 CON
>>> 1378009800.619227 e icmp 129.82.12.68.0x0303 ->
>>> 143.215.131.247.0xd782 1 102 URP
>>> 1378009800.623714 e icmp 129.82.12.68.0x0303 ->
>>> 143.215.131.247.0xd882 1 102 URP
>>> 1378009800.719767 e icmp 192.43.217.17.0x000b ->
>>> 129.82.12.68.0x0000 1 70 TXD
>>>
>>>
>>>
>>> Besides, the following is some information about the 2G argus file on my machine,
>>> not sure if this can help you to diagnose the issue.
>>> zihu at proton:~$ time racount -r
>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus
>>> racount records total_pkts src_pkts dst_pkts total_bytes
>>> src_bytes dst_bytes
>>> sum 20327732 127070924 81280364 45790560 108939377747
>>> 66625641107 42313736640
>>>
>>> real0m10.297s
>>> user0m9.478s
>>> sys0m0.780s
>>>
>>>
>>> thanks
>>> -Zi
>>>
>>> Carter
>>>
>>>
>>>
>>> On Oct 30, 2013, at 6:34 PM, Zi Hu <zihu at usc.edu <mailto:zihu at usc.edu>> wrote:
>>>
>>>> Hi, Carter,
>>>>
>>>> In my application, I need a simple tool to read what it is in the argus file,
>>>> then output certain fields that I am interested in ascii format, such as
>>>> srcip, dstip, sport, dport. protocol, ....
>>>>
>>>> I thought the command "ra" is what I need. However, I find it is very slow to
>>>> read the argus data with "ra". I did a small experiment: dump the same argus
>>>> file (about 2G) with both "ra" and "cat".
>>>> Using the "ra" command, it took me about 87 minutes to read the file, while it
>>>> took only 40 seconds to dump it with "cat". and also I notice that the memory
>>>> keeps growing when I am running "ra".
>>>>
>>>> zihu at proton:~$ time cat
>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus > temp.dat
>>>>
>>>> real 0m39.490s
>>>> user 0m0.027s
>>>> sys 0m4.204s
>>>> zihu at proton:~$ time ra -r
>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u > temp.dat
>>>>
>>>> real 87m40.973s
>>>> user 86m42.397s
>>>> sys 0m56.256s
>>>> zihu at proton:~$
>>>>
>>>>
>>>> So I guess "ra" does more than just reading the argus file, formatting and
>>>> outputing the result. Does "ra" keep track of flows in memory so that the
>>>> memory keeps growing ?
>>>>
>>>> If "ra" is not the right choice for my application, then what's the right
>>>> command for this simple application? Or if we don't have such a tool, I am
>>>> thinking of writing one by myself. Could you point me where to start? Any
>>>> suggestions are welcomed.
>>>>
>>>>
>>>> Thanks
>>>> -Zi
>>>
>>>
>>
More information about the argus
mailing list