ra reads argus file very slow
Jesper Skou Jensen
jesper.skou.jensen at uni-c.dk
Thu Oct 31 10:16:57 EDT 2013
What about avoiding writing to disk at all, but instead redirecting to
/dev/null, that way it's only the reading that's an issue.
This should be very fast, as fast as your HDD can read.
time cat tmp/201320d-060000.argus > /dev/null
Using RA is much more CPU intensive, and will take considerable longer.
These two should take almost the same time.
time ra -r tmp/201320d-060000.argus > /dev/null
time ra -nr tmp/201320d-060000.argus > /dev/null
On my system a quick test shows that it takes about 430 times longer to
process the ra commands than the cat.
--
Regards
Jesper
On 31-10-2013 14:04, Mike Slifcak wrote:
> Hi Zi.
>
> Have you ruled out operations on filesystems that might be
> synchronized across machines?
> In other words, have you traced the directories above your working
> directory to ensure that the working directory is wholly contained on
> an internal disk drive, and not NFS mounted and not part of a
> networked disk cluster?
>
> Is it possible that some other administrative process is using 'rsync'
> or other process to synchronize remote disks, and that your working
> directory is on a volume that is being synchronized to a remote system?
>
> Best regards,
> -Mike Slifcak, Expansion Unlimited LLC
>
> On 10/31/2013 08:37 AM, Carter Bullard wrote:
>> Hey Zi,
>> The only thing that ra() is doing is parsing the record stream and
>> converting the fields
>> to ascii. Converting the time can be very expensive on some
>> machines, as we use
>> strftime() to get the format.
>>
>> Does running with the '-u' make a big difference ????
>>
>> Carter
>>
>> Carter Bullard, QoSient, LLC
>> 150 E. 57th Street Suite 12D
>> New York, New York 10022
>> +1 212 588-9133 Phone
>> +1 212 588-9134 Fax
>>
>> On Oct 31, 2013, at 3:41 AM, Zi Hu <zihu at usc.edu
>> <mailto:zihu at usc.edu>> wrote:
>>
>>> On Wed, Oct 30, 2013 at 9:39 PM, Carter Bullard <carter at qosient.com
>>> <mailto:carter at qosient.com>> wrote:
>>>
>>> Hey Zi,
>>> Well, based on the performance of racount(), I'd say the subject
>>> line is a little
>>> off, in that we can read the file, and decode all the records
>>> pretty quickly,...
>>> 10.29 seconds. Looks like the ra* programs can process the
>>> file faster than you
>>> can cat() it, so I'd say the problem is in writing to the disk.
>>> Maybe you have some
>>> disk errors?? Did you check your system logs ???
>>>
>>> Hi, Carter,
>>> Thanks for your comments, but I didn't see any disk errors from the
>>> system logs.
>>> Moreover, I don't think disk errors are the cause, since I "cat" and
>>> "ra" the same file
>>> on the same machine. If I have some disk errors, they both should be
>>> slow. Besides, I
>>> also copy the 2G argus file to another machine, still it takes more
>>> than 80 minutes to
>>> read the file with "ra".
>>>
>>> I did another test:
>>> I made another ~2G argus file and run "ra" on it, this time it is
>>> much faster (about 12
>>> minutes), although it is still slow compared to "cat" (about 24
>>> seconds).
>>> zihu at proton:~$ time ra -r tmp/201320d-060000.argus -u > temp.dat
>>>
>>> real 11m55.636s
>>> user 11m16.636s
>>> sys 0m38.653s
>>>
>>> zihu at proton:~$ time cat tmp/201320d-060000.argus > temp.dat
>>>
>>> real 0m24.298s
>>> user 0m0.009s
>>> sys 0m3.747s
>>>
>>> zihu at proton:~$ time racount -r tmp/201320d-060000.argus
>>> racount records total_pkts src_pkts dst_pkts
>>> total_bytes
>>> src_bytes dst_bytes
>>> sum 18357344 814467265 557563621 256903644
>>> 937278498435
>>> 620800235862 316478262573
>>>
>>> real 0m10.753s
>>> user 0m9.902s
>>> sys 0m0.832s
>>>
>>>
>>> For me, it looks like "ra" runs fast on some files, while it becomes
>>> slow on certain files.
>>> Do you have a reason why "ra" performs differently on different
>>> files? Could this be a
>>> potential bug of "ra"?
>>> By the way, it is still not quite clear for me why the memory keeps
>>> growing when I run
>>> the "ra" command.
>>>
>>> -Zi
>>>
>>> Carter
>>>
>>> On Oct 30, 2013, at 8:43 PM, Zi Hu <zihu at usc.edu
>>> <mailto:zihu at usc.edu>> wrote:
>>>
>>>>
>>>> Thanks for your reply, Carter.
>>>>
>>>> On Wed, Oct 30, 2013 at 4:27 PM, Carter Bullard
>>>> <carter at qosient.com
>>>> <mailto:carter at qosient.com>> wrote:
>>>>
>>>> Hey Zi,
>>>> The only time I’ve seen ra() have problems reading and writing
>>>> data, to the level you report, is when one tries to do DNS
>>>> lookups to get the names of the IP addresses, instead of
>>>> dotted decimal notation.
>>>>
>>>>
>>>> By default, "ra" won't perform DNS lookups right? If this is
>>>> true, given the
>>>> command line I used in my experiment, I don't think it does DNS
>>>> lookups.
>>>> Besides, I also tried -nn option, it doesn't make much difference.
>>>>
>>>> I can read about 2G of flow data in about 65 secs, on a
>>>> standard machine, but I can cat() that file in about
>>>> 2.5 secs, so your machine may not be performing as well
>>>> as you would want.
>>>>
>>>> What version of argus and clients are you using??
>>>>
>>>>
>>>> 3.0.6
>>>>
>>>> Do you have a .rarc file in your home directory?
>>>>
>>>>
>>>> I don't see a .rarc file in my home directory.
>>>>
>>>> What does a line of ra() output look like ?
>>>>
>>>>
>>>> zihu at proton:~$ ra -r
>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u
>>>> | head
>>>> StartTime Flgs Proto SrcAddr Sport Dir
>>>> DstAddr Dport TotPkts TotBytes State
>>>> 1378009800.024648 e tcp 129.82.228.28.11021 <?>
>>>> 74.125.142.131.xmpp-* 2 144 CON
>>>> 1378009800.000000 e d tcp 129.82.97.104.63194 ->
>>>> 129.82.224.179.https 10 1154 CON
>>>> 1378009800.132037 e tcp 129.82.12.68.57547 ->
>>>> 75.130.96.44.59943 2 1414 CON
>>>> 1378009800.131337 e udp 129.82.12.66.44115 <->
>>>> 131.254.208.196.44295 2 234 CON
>>>> 1378009800.000000 e d tcp 129.82.227.103.ica <?>
>>>> 129.82.97.52.49341 11 882 CON
>>>> 1378009800.173511 e udp 129.82.12.66.44115 <->
>>>> 211.69.207.154.38275 2 215 CON
>>>> 1378009800.619227 e icmp 129.82.12.68.0x0303 ->
>>>> 143.215.131.247.0xd782 1 102 URP
>>>> 1378009800.623714 e icmp 129.82.12.68.0x0303 ->
>>>> 143.215.131.247.0xd882 1 102 URP
>>>> 1378009800.719767 e icmp 192.43.217.17.0x000b ->
>>>> 129.82.12.68.0x0000 1 70 TXD
>>>>
>>>>
>>>>
>>>> Besides, the following is some information about the 2G argus
>>>> file on my machine,
>>>> not sure if this can help you to diagnose the issue.
>>>> zihu at proton:~$ time racount -r
>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus
>>>> racount records total_pkts src_pkts dst_pkts
>>>> total_bytes
>>>> src_bytes dst_bytes
>>>> sum 20327732 127070924 81280364 45790560
>>>> 108939377747
>>>> 66625641107 42313736640
>>>>
>>>> real0m10.297s
>>>> user0m9.478s
>>>> sys0m0.780s
>>>>
>>>>
>>>> thanks
>>>> -Zi
>>>>
>>>> Carter
>>>>
>>>>
>>>>
>>>> On Oct 30, 2013, at 6:34 PM, Zi Hu <zihu at usc.edu
>>>> <mailto:zihu at usc.edu>> wrote:
>>>>
>>>>> Hi, Carter,
>>>>>
>>>>> In my application, I need a simple tool to read what it is
>>>>> in the argus file,
>>>>> then output certain fields that I am interested in ascii
>>>>> format, such as
>>>>> srcip, dstip, sport, dport. protocol, ....
>>>>>
>>>>> I thought the command "ra" is what I need. However, I find
>>>>> it is very slow to
>>>>> read the argus data with "ra". I did a small experiment:
>>>>> dump the same argus
>>>>> file (about 2G) with both "ra" and "cat".
>>>>> Using the "ra" command, it took me about 87 minutes to
>>>>> read the file, while it
>>>>> took only 40 seconds to dump it with "cat". and also I
>>>>> notice that the memory
>>>>> keeps growing when I am running "ra".
>>>>>
>>>>> zihu at proton:~$ time cat
>>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus > temp.dat
>>>>>
>>>>> real 0m39.490s
>>>>> user 0m0.027s
>>>>> sys 0m4.204s
>>>>> zihu at proton:~$ time ra -r
>>>>> 2013-09-01-0700/temp/20130831-223000-hWukIYC-lander4.argus -u >
>>>>> temp.dat
>>>>>
>>>>> real 87m40.973s
>>>>> user 86m42.397s
>>>>> sys 0m56.256s
>>>>> zihu at proton:~$
>>>>>
>>>>>
>>>>> So I guess "ra" does more than just reading the argus
>>>>> file, formatting and
>>>>> outputing the result. Does "ra" keep track of flows in
>>>>> memory so that the
>>>>> memory keeps growing ?
>>>>>
>>>>> If "ra" is not the right choice for my application, then
>>>>> what's the right
>>>>> command for this simple application? Or if we don't have
>>>>> such a tool, I am
>>>>> thinking of writing one by myself. Could you point me
>>>>> where to start? Any
>>>>> suggestions are welcomed.
>>>>>
>>>>>
>>>>> Thanks
>>>>> -Zi
>>>>
>>>>
>>>
More information about the argus
mailing list