new clients rc.62 on the server - description of rastream()

Terry Burton tez at terryburton.co.uk
Thu Nov 1 21:57:33 EDT 2007


On Nov 1, 2007 5:39 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Terry,
> OK, so looking at your graph and the valgrind output and all
> information so far,
> the system is not hurting for memory.  I'm working on the potential
> leak and
> may have found some things to clean up, but I'm not thinking that its
> the
> cause of your issues.   It maybe that we are running too many concurrent
> processes, and the first complaint by fork() (EAGAIN) just maps to an
> error
> messages that sez there is not enough memory.  I'm going to change the
> script scheduling and patch up the memory issues, and we'll try again.

Hi Carter,

After performing some more basic tests I have found some information
that may help to find the leak. I'm not sure whether this correlates
with your current thinking on the problem or not...

I run the following collectors:

/opt/argus/sbin/argus -X -d -A -i eth2 -P 561
/opt/argus/sbin/radium -X -d -C -S 1006 -P 564
/opt/argus/sbin/radium -X -d -C -S 1007 -P 565

I have another process that aggregates these:

/opt/argus/sbin/radium -X -d -S localhost:561 -S localhost:564 -S
localhost:565 -P 569

Connecting to the SPAN feed does not appear to leak (at least not
significantly enough for me to have noticed after one hour):

/opt/argus/bin/rastream -X -S localhost:561 -M time 5m -B 10s -f
/bin/true -w /srv/argus/archive/%Y-%m-%d/\$srcid-%H:%M:%S.arg

Connecting to either the aggregated feed or any of the individual
NetFlow feeds leaks rapidly (up to ~10MB/min per NetFlow):

/opt/argus/bin/rastream -X -S localhost:569 -M time 5m -B 10s -f
/bin/true -w /srv/argus/archive/%Y-%m-%d/\$srcid-%H:%M:%S.arg
/opt/argus/bin/rastream -X -S localhost:564 -M time 5m -B 10s -f
/bin/true -w /srv/argus/archive/%Y-%m-%d/\$srcid-%H:%M:%S.arg
/opt/argus/bin/rastream -X -S localhost:565 -M time 5m -B 10s -f
/bin/true -w /srv/argus/archive/%Y-%m-%d/\$srcid-%H:%M:%S.arg

So it would appear to be a NetFlow related problem, possibly with the
some memory allocated through the call path main -> ArgusReadStream ->
ArgusReadStreamSocket -> ArgusHandleDatum -> RaProcessRecord ->
RaProcessThisRecord -> ArgusAlignRecord -> ArgusCopyRecordStruct never
being freed, as hinted at by the following section of the valgrind
output:

==23957== 2,388 bytes in 3 blocks are possibly lost in loss record 12 of 17
==23957==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
==23957==    by 0x806B4F9: ArgusCalloc (argus_util.c:15011)
==23957==    by 0x80838C8: ArgusCopyRecordStruct (argus_client.c:3493)
==23957==    by 0x8083FC8: ArgusAlignRecord (argus_client.c:7137)
==23957==    by 0x804C7E7: RaProcessThisRecord (rastream.c:894)
==23957==    by 0x804CC50: RaProcessRecord (rastream.c:872)
==23957==    by 0x8077FD4: RaScheduleRecord (argus_util.c:860)
==23957==    by 0x807820D: ArgusHandleDatum (argus_util.c:930)
==23957==    by 0x808C8BE: ArgusReadStreamSocket (argus_client.c:1622)
==23957==    by 0x808D35E: ArgusReadStream (argus_client.c:1997)
==23957==    by 0x80502BC: main (argus_main.c:359)

Does this appear to be along the right lines?

What is frustrating (from the point of view of debugging) is that I
seem to get consistently differing results from valgrind depending
upon whether I compile with or without CFLAGS="-O -g -fno-inline". The
above trace (with CFLAGS amendments) differs from my previous posting
by "possibly" loosing ~2KB rather than "definitely" loosing 1MB
(without CFLAGS mods) over similar 15min runs. Also with the CFLAGS
amendments I get this new "definite" leak from a different allocation
path:

==23957== 275,838 (275,736 direct, 102 indirect) bytes in 349 blocks
are definitely lost in loss record 16 of 17
==23957==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
==23957==    by 0x806B4F9: ArgusCalloc (argus_util.c:15011)
==23957==    by 0x8075022: setArgusWfile (argus_util.c:18486)
==23957==    by 0x804F1FD: ArgusParseArgs (argus_main.c:1193)
==23957==    by 0x804F9D3: ArgusMainInit (argus_main.c:729)
==23957==    by 0x804FA6F: main (argus_main.c:131)

Anyhow, I greatly appreciate your efforts on this and do not want you
to take any of this feedback as though I am insisting upon you for a
quick fix - that's not my intention at all as their is no great
urgency on me to get this working.

Let me know if there is anything that you would like me to do by way
of testing for this problem or anything else.


Hope this all helps,

Tez



More information about the argus mailing list