new clients rc.62 on the server - description of rastream()

Carter Bullard carter at qosient.com
Thu Nov 1 22:32:41 EDT 2007


Hey Terry,
We're very close to releasing argus-3.0, and its going to be
difficult to say the code is good if we have a known memory
leak in a key component, so its important to me to get this fixed.

So the question is, "is there a .threads file in your root directory".
If so, try removing it and doing the "./configure;make clean;make"
again, to see if that makes a difference.

There is some clutter that valgrind() will report on that is
not critical, such as the port names hash table memory, or
a few strdup'd strings that are left behind.  I'm not worried
about these.  But real memory leaks, that keep you from
running these programs for a year at a time are very
important to fix, so thanks for helping me out on this.


Carter

On Nov 1, 2007, at 9:57 PM, Terry Burton wrote:

> On Nov 1, 2007 5:39 PM, Carter Bullard <carter at qosient.com> wrote:
>> Hey Terry,
>> OK, so looking at your graph and the valgrind output and all
>> information so far,
>> the system is not hurting for memory.  I'm working on the potential
>> leak and
>> may have found some things to clean up, but I'm not thinking that its
>> the
>> cause of your issues.   It maybe that we are running too many  
>> concurrent
>> processes, and the first complaint by fork() (EAGAIN) just maps to an
>> error
>> messages that sez there is not enough memory.  I'm going to change  
>> the
>> script scheduling and patch up the memory issues, and we'll try  
>> again.
>
> Hi Carter,
>
> After performing some more basic tests I have found some information
> that may help to find the leak. I'm not sure whether this correlates
> with your current thinking on the problem or not...
>
> I run the following collectors:
>
> /opt/argus/sbin/argus -X -d -A -i eth2 -P 561
> /opt/argus/sbin/radium -X -d -C -S 1006 -P 564
> /opt/argus/sbin/radium -X -d -C -S 1007 -P 565
>
> I have another process that aggregates these:
>
> /opt/argus/sbin/radium -X -d -S localhost:561 -S localhost:564 -S
> localhost:565 -P 569
>
> Connecting to the SPAN feed does not appear to leak (at least not
> significantly enough for me to have noticed after one hour):
>
> /opt/argus/bin/rastream -X -S localhost:561 -M time 5m -B 10s -f
> /bin/true -w /srv/argus/archive/%Y-%m-%d/\$srcid-%H:%M:%S.arg
>
> Connecting to either the aggregated feed or any of the individual
> NetFlow feeds leaks rapidly (up to ~10MB/min per NetFlow):
>
> /opt/argus/bin/rastream -X -S localhost:569 -M time 5m -B 10s -f
> /bin/true -w /srv/argus/archive/%Y-%m-%d/\$srcid-%H:%M:%S.arg
> /opt/argus/bin/rastream -X -S localhost:564 -M time 5m -B 10s -f
> /bin/true -w /srv/argus/archive/%Y-%m-%d/\$srcid-%H:%M:%S.arg
> /opt/argus/bin/rastream -X -S localhost:565 -M time 5m -B 10s -f
> /bin/true -w /srv/argus/archive/%Y-%m-%d/\$srcid-%H:%M:%S.arg
>
> So it would appear to be a NetFlow related problem, possibly with the
> some memory allocated through the call path main -> ArgusReadStream ->
> ArgusReadStreamSocket -> ArgusHandleDatum -> RaProcessRecord ->
> RaProcessThisRecord -> ArgusAlignRecord -> ArgusCopyRecordStruct never
> being freed, as hinted at by the following section of the valgrind
> output:
>
> ==23957== 2,388 bytes in 3 blocks are possibly lost in loss record  
> 12 of 17
> ==23957==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
> ==23957==    by 0x806B4F9: ArgusCalloc (argus_util.c:15011)
> ==23957==    by 0x80838C8: ArgusCopyRecordStruct (argus_client.c:3493)
> ==23957==    by 0x8083FC8: ArgusAlignRecord (argus_client.c:7137)
> ==23957==    by 0x804C7E7: RaProcessThisRecord (rastream.c:894)
> ==23957==    by 0x804CC50: RaProcessRecord (rastream.c:872)
> ==23957==    by 0x8077FD4: RaScheduleRecord (argus_util.c:860)
> ==23957==    by 0x807820D: ArgusHandleDatum (argus_util.c:930)
> ==23957==    by 0x808C8BE: ArgusReadStreamSocket (argus_client.c:1622)
> ==23957==    by 0x808D35E: ArgusReadStream (argus_client.c:1997)
> ==23957==    by 0x80502BC: main (argus_main.c:359)
>
> Does this appear to be along the right lines?
>
> What is frustrating (from the point of view of debugging) is that I
> seem to get consistently differing results from valgrind depending
> upon whether I compile with or without CFLAGS="-O -g -fno-inline". The
> above trace (with CFLAGS amendments) differs from my previous posting
> by "possibly" loosing ~2KB rather than "definitely" loosing 1MB
> (without CFLAGS mods) over similar 15min runs. Also with the CFLAGS
> amendments I get this new "definite" leak from a different allocation
> path:
>
> ==23957== 275,838 (275,736 direct, 102 indirect) bytes in 349 blocks
> are definitely lost in loss record 16 of 17
> ==23957==    at 0x401C6CA: calloc (vg_replace_malloc.c:279)
> ==23957==    by 0x806B4F9: ArgusCalloc (argus_util.c:15011)
> ==23957==    by 0x8075022: setArgusWfile (argus_util.c:18486)
> ==23957==    by 0x804F1FD: ArgusParseArgs (argus_main.c:1193)
> ==23957==    by 0x804F9D3: ArgusMainInit (argus_main.c:729)
> ==23957==    by 0x804FA6F: main (argus_main.c:131)
>
> Anyhow, I greatly appreciate your efforts on this and do not want you
> to take any of this feedback as though I am insisting upon you for a
> quick fix - that's not my intention at all as their is no great
> urgency on me to get this working.
>
> Let me know if there is anything that you would like me to do by way
> of testing for this problem or anything else.
>
>
> Hope this all helps,
>
> Tez
>



More information about the argus mailing list