new clients rc.62 on the server - description of rastream()

Carter Bullard carter at qosient.com
Fri Nov 2 10:33:49 EDT 2007


Hey Terry,
Well, the tool of choice for this situation is rasplit(), and
to have an independent cron job process the files an
hour or so after the fact.  rasplit() is rastream() but without
the buffering or task spawning.

rastream() is really trying to bring this stuff into a near
realtime time frame, and netflow data isn't really appropriate
for this as it is not well scheduled.

I have found the bug in rastream(), and will have a new clients
release, rc.63, up in about an hour or so.

Give rasplit() a try, you shouldn't have any memory leak
problems with it, as it doesn't have any memory
requirements.

Carter


On Nov 2, 2007, at 8:48 AM, Terry Burton wrote:

> On Nov 2, 2007 3:24 AM, Carter Bullard <carter at qosient.com> wrote:
>> Ok, one thing that I've discovered in my tests with fprobe()
>> as a netflow record source, is that the hold time for rastream
>> may need to be very large.  Possibly in the order of 2-5 minutes,
>> rather than 10s.   This is because of netflow's very poor cache
>> management strategies.
>
> Hi Carter,
>
> Is my understanding correct that rastream buffers all records in
> memory and only writes them to the output files in strict time
> ordering after the "-B interval" has elapsed?
>
> This could pose severe memory usage limitations for NetFlow sources
> for which events may arrive from for up to 32 mins in the past. From
> http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.1E/ 
> native/configuration/guide/nde.html#wp1046346:
> "For flows that remain continuously active, flow entries in the MLS
> cache expire every 32 minutes to ensure periodic reporting of active
> flows."
>
> Listening to our NetFlow streams does indeed show records whos
> timestamp more than 30 mins in the past. Even if we were to amend the
> active export period of the routers to something much lower, it could
> still eat varyingly large amounts of memory.
>
> If the above understanding is correct, then the problem could be
> solved with an option to put rastream in a mode where it does not
> concern itself with the strict time ordering of records, but ensures
> only that records are placed in their correct "time bin/file" which
> are held open for the "-B interval", thus avoiding buffering up the
> records. The records in each file could always be sorted in the "-f
> script" if this were required.
>
>> If a record comes in that is outside the range of the "-B secs"
>> option, rastream() will toss it.  To test, compile the clients with
>> debug support ( "touch .devel .debug; ./configure; make clean; make")
>> and run rastream() with a -D2 and see if it complains about
>> the range of the input records.  I did find a leak where some
>> of these out of range records were dropped without being de-
>> allocated, so that may have been our problem.
>
> That sounds quite likely to be the source of the problem. I perform
> some additional testing along these lines.
>
>> I could have rastream() adjust its range timer to accomodate
>> records that come in way out of range, but, I'm not comfortable
>> with these types of dynamic behaviors, as you find after
>> some time that the rastream() stops outputting records, is
>> getting huge, because the hold time has increased to
>> some ridiculous value, like 1.5 years (not good).
>
> Understood. I too think that it is better to deal with absolute
> limits, rather than tune the system after records are already lost.
>
>
> Warm regards,
>
> Tez
>



More information about the argus mailing list