BUG - No immediate exit when pipeing ra output

elof2 at sentor.se elof2 at sentor.se
Wed May 18 07:57:25 EDT 2011


Hi Carter!
It seems that the bugfix to detect closure of the pipe still have issues, 
at least on linux.

I've compiled argus-clients 3.0.5.10 on both linux and FreeBSD.

On FreeBSD, ra immediately terminates on pipe closure.
On linux however, ra keep processing the entire input file(s) even though 
the pipe has closed.

Examples:

FreeBSD:
time ra -r argus.log | grep -q "^."
real    0m0.015s
user    0m0.009s
sys     0m0.007s

Linux:
time ra -r argus.log | grep -q "^."
real    0m6.555s
user    0m6.284s
sys     0m0.284s



As a comparison, here is the time to process the entire argus.log:

FreeBSD:
time ra -r argus-.log > /dev/null
real    0m5.909s
user    0m5.804s
sys     0m0.108s

Linux:
time ra -r argus.log > /dev/null
real    0m6.461s
user    0m6.204s
sys     0m0.256s


Conclusion:
Both FreeBSD and Linux take approx 6 seconds to parse the entire logfile.
On FreeBSD, pipe closure is detected correctly and ra terminates after 
approx 0.015s. Good.
On linux, pipe closure seem not to be detected at all, and the *entire* 
file is processed. Ra terminates after approx 6s.
(if the logfile is huge, it takes a loooong time before the ra process 
terminates)


Please let me know if I can help with any additional details to solve 
this.

/Elof



On Wed, 11 May 2011, Carter Bullard wrote:

> On my laptop, I am getting about 250K records per sec throughput with ra(), which is a lot of records per day, but, it's not 1M records per sec, so there is room for improvement.
>
> There is something in initialization that is very slow, and I'll look at it.  strace() may show what it is.
>
> Carter
>
> On May 11, 2011, at 4:47 AM, elof2 at sentor.se wrote:
>
>>
>> On Tue, 10 May 2011, Carter Bullard wrote:
>>> Not sure where the 288 files == 24 hours?
>>
>> For example I have a directory with 288 argus-files. Each file hold 5 minutes worth of data. I.e. 288*5min=1440min=24h.
>> That was just an example to illustrate that if I want to perform some action on the argus data for this day, the shortest execution time even if I use -qN1 will be 0.40s using ra v3, where ra v2 take 0.008s.
>>
>> Even though 0.40s is not a long time, it is 50 times longer than 0.008s.
>> This is not a biggie, I just wanted to inform you about the big difference between v2 and v3.
>>
>>
>>> ra() is threaded, and may read most of the file in before it starts to write any output.  You could remove the threads and see if it picks up.
>>>  % ./configure --without-threads; make clean; make
>>
>> I tried this just now. No change. The unthreaded ra still take 0.40s just to start and exit.
>>  time ra -qN1 -r argus1.log ... argus288.log
>>  real    0m0.406s
>>  user    0m0.220s
>>  sys     0m0.192s
>>
>> As I said, not a biggie. Just FYI.
>>
>>
>>> I'll checkout why we may be processing all the input files if the output pipe closes.
>>
>> Thanks! In my opinion this is a biggie.
>>
>> /Elof
>>
>>
>>
>>> On May 10, 2011, at 11:43 AM, elof2 at sentor.se wrote:
>>>
>>>>
>>>> Hi Carter!
>>>>
>>>> It seems that the bugfix to detect closure of the pipe still have issues.
>>>>
>>>> In ra v2.x I was used to having the ra command terminate in 0.004s.
>>>> With the new ra 3.0.5.8, it won't terminate as swiftly.
>>>>
>>>> ra v3.0.5.8:
>>>> time ra -nr argus.log - > /dev/null
>>>> real    0m4.499s
>>>> user    0m4.288s
>>>> sys     0m0.212s
>>>> Time to parse through 100% of this test-file is apparently ~4.5s.
>>>>
>>>>
>>>> Now lets see the time when grep or head terminates directly on the first line:
>>>> time ra -nr argus.log - | grep -q "."
>>>> real    0m4.562s
>>>> user    0m4.392s
>>>> sys     0m0.176s
>>>> time ra -nr argus.log - | head -1
>>>> real    0m4.588s
>>>> user    0m4.428s
>>>> sys     0m0.168s
>>>>
>>>> Both commands take the full 4.5s (plus a little more) to finish.
>>>>
>>>> If I add another (or the same) logfile, it will double the execution time:
>>>> time ra -nr argus.log argus.log - | head -1
>>>> real    0m9.002s
>>>> user    0m8.617s
>>>> sys     0m0.392s
>>>>
>>>> Conclusion:
>>>> ra 3.0.5.8 don't immediately terminate upon pipe closure.
>>>>
>>>>
>>>>
>>>> Speaking of latency and execution times...
>>>>
>>>> Even when using the builtin -N1 option, ra v2 is way faster than v3:
>>>> ra v2.x:
>>>> time ra -N1 -nr argus.log -
>>>> real    0m0.003s
>>>> user    0m0.000s
>>>> sys     0m0.000s
>>>>
>>>> ra 3.0.5.8:
>>>> real    0m0.024s
>>>> user    0m0.000s
>>>> sys     0m0.021s
>>>>
>>>> v3 is 8 times slower than v2 when it comes to swift termination.
>>>> Can this be improved?
>>>>
>>>>
>>>> Also, no matter how many files I add to ra v2, it still terminates after 0.003 seconds where as ra v3 will increase its 0.024s by approx 0.002 per file:
>>>> ra v2.x:
>>>> time ra -N1 -nr argus.log argus.log argus.log argus.log argus.log argus.log argus.log argus.log argus.log argus.log -
>>>> real    0m0.003s  <-  still 0.003s
>>>> user    0m0.000s
>>>> sys     0m0.002s
>>>>
>>>> ra 3.0.5.8:
>>>> real    0m0.032s  <-  increases more and more for every file added
>>>> user    0m0.007s
>>>> sys     0m0.024s
>>>>
>>>> If you have 288 argus files in a log dir and use ra -nr argus*.log to do some simple processing, you will always have a noticeable lag just for the sake of the 288 files, even if you use -N1. Can this be improved to be as swift as v2?
>>>> (288 files á 5 minutes = 24 hours)
>>>>
>>>> /Elof
>>>>
>>>>
>>>> On Fri, 29 Apr 2011, Carter Bullard wrote:
>>>>
>>>>> Hey Elof,
>>>>> I have the closed piped issue fixed now.  Its a small fix in every client, so you should wait to get a new client release.
>>>>
>>>>> Carter
>>>>
>>>>
>>>>
>>>>>>>> Two examples using the old ra v2.x :
>>>>>>>>
>>>>>>>> 1)
>>>>>>>> time ra -Zb -nr argus.log - ip | head -1
>>>>>>>> 04-27-11 23:54:00.925018  s        tcp             10.10.10.10.46382 ->   20.20.20.20.1503   5467  5550  967561   2559850     PA_PA
>>>>>>>>
>>>>>>>> real    0m0.004s
>>>>>>>> user    0m0.000s
>>>>>>>> sys     0m0.000s
>>>>>>>>
>>>>>>>>
>>>>>>>> 2)
>>>>>>>> time ra -Zb -nr argus-20110428.log - ip | grep -q "PA_PA$"
>>>>>>>>
>>>>>>>> real    0m0.004s
>>>>>>>> user    0m0.000s
>>>>>>>> sys     0m0.000s
>>>>>>>>
>>>>>>>>
>>>>>>>> Both examples only take 0.004s to execute.
>>>>>>>> Good!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Now I try the same using ra v3 :
>>>>>>>>
>>>>>>>> 1)
>>>>>>>> time ra -Zb -nr argus.log - ip | head -1
>>>>>>>> 04-27-11 23:54:00.925018  s        tcp             30.30.30.30.32451 ->   40.40.40.40.25    52  97  1151   12274     PA_PA
>>>>>>>>
>>>>>>>> real    0m50.043s
>>>>>>>> user    0m48.171s
>>>>>>>> sys     0m1.892s
>>>>>>>>
>>>>>>>>
>>>>>>>> 2)
>>>>>>>> time ra -Zb -nr argus-20110429.13.log - ip | grep -q "PA_PA$"
>>>>>>>>
>>>>>>>> real    0m50.161s
>>>>>>>> user    0m48.103s
>>>>>>>> sys     0m2.068s
>>>>>>>>
>>>>>>>>
>>>>>>>> The commands take 50 seconds (!!!) to complete when they should have taken approx 0.004 seconds.
>>>>>>>>
>>>>>>>> The argus.log file is 96 MB and hold 828 372 rows of data.
>>>>>>>> (The larger the file, the longer I have to wait until the command terminates. If I have an old argus.log holding 24 hours of data, it will take ~30 minutes to parse it (even longer if it is bzipped))
>>>>>>>>
>>>>>>>>
>>>>>>>> It takes 50 seconds just to parse through 100% of the file:
>>>>>>>> time ra -nr argus.log - ip > /dev/null
>>>>>>>>
>>>>>>>> real    0m50.257s
>>>>>>>> user    0m48.599s
>>>>>>>> sys     0m1.596s
>>>
>


More information about the argus mailing list