FW: rasqlinsert - was - IPFIX support timeline.

David Edelman via Argus-info argus-info at lists.andrew.cmu.edu
Fri Feb 5 11:51:15 EST 2016


That should be 3.0.8.2.rc1

 

 

 

From: Argus-info [mailto:argus-info-bounces+dedelman=iname.com at lists.andrew.cmu.edu] On Behalf Of David Edelman via Argus-info
Sent: Friday, February 5, 2016 11:18 AM
To: 'Argus' <argus-info at lists.andrew.cmu.edu>
Subject: [ARGUS] FW: rasqlinsert - was - IPFIX support timeline.

 

Copying the list.

 

From: David Edelman [mailto:dedelman at iname.com] 
Sent: Friday, February 5, 2016 11:17 AM
To: 'Carter Bullard' <carter at qosient.com <mailto:carter at qosient.com> >
Cc: 'Richard Rothwell' <Richard.Rothwell at aarnet.edu.au <mailto:Richard.Rothwell at aarnet.edu.au> >
Subject: RE: [ARGUS] rasqlinsert - was - IPFIX support timeline.

 

Carter,

 

Thanks for the response. I am seeing it in  3.0.8 and 3.0.3.2.rc1

 

The segfault is in a library call, I’ll spin up an instance under GDB and pull the full details.

 

--Dave

 

From: Carter Bullard [mailto:carter at qosient.com] 
Sent: Friday, February 5, 2016 9:42 AM
To: dedelman at iname.com <mailto:dedelman at iname.com> 
Cc: Richard Rothwell <Richard.Rothwell at aarnet.edu.au <mailto:Richard.Rothwell at aarnet.edu.au> >
Subject: Re: [ARGUS] IPFIX support timeline.

 

Hey David,

There are lots of options, but by default rabins.1 will split the record to conform to the time boundary.

Think of rabins as a rasplit + racluster, into a memory buffer bin, and after the -B sec delay, it cranks out the cache.

Once its cranked out a time slot, it rejects records that would have gone into the slot.

 

What version of rasqlinsert are you using ???  We do have some problems with it in 3.0.8, and I’m working a problem that I’ve seen in 3.0.8.1.  I think the problem is in the ping ponging that happens when you get just past the time boundary.  Records start coming in that are split, that need to be inserted into both tables.  rasqlinsert.1 is trying to be smart, clustering up inserts, updates and deletes to get some performance.  With 3.0.8, when we changed tables, we locked everything, flushed out all the queues and started working on the different table.  With each record getting split, we would ping pong between the last table and the new table a couple of 100, 1000 times, depending on load.

 

If you believe that its the spliting that is causing the problem, try running rasqlinsert (or rabins) with the:

   -M nosplit

option.  This should eliminate the ping ponging and focus us on the right issue.

 

Where does it segfault ????

 

Carter

 

On Feb 4, 2016, at 11:01 PM, David Edelman <dedelman at iname.com <mailto:dedelman at iname.com> > wrote:

 

Carter,

 

How does rabins.1 handle a flow that spans two bins?

I’m in the process of looking at a problem with rasqlinsert.1 when the source is netflow digested by radium configured to use a classifier file for some heavy duty labeling. I think that I can reproduce the segfault by dropping the tables to be one minute long. It seems that when the routine to apportion the flow record that spans two tables is invoked the process segfaults. I know with my normal daily tables, the segfault occurs right around the time that the new table is creatred. 

 

For my more normal collecting where the source is an Argus.8 collector listening to an interface, I don’t see this problem so I’m narrowing down to the combination of netflow (or maybe just minimal data DSRs) and table transition.

 

I’m using the very short table turnover times to increase the incidents of the fault and I  guess that it’s time for GDB.

 

Any suggestion for the –D value?

 

--Dave

 

From: Argus-info [mailto:argus-info-bounces+dedelman=iname.com at lists.andrew.cmu.edu] On Behalf Of Carter Bullard via Argus-info
Sent: Thursday, February 4, 2016 9:55 PM
To: Richard Rothwell <Richard.Rothwell at aarnet.edu.au <mailto:Richard.Rothwell at aarnet.edu.au> >
Cc: Argus <argus-info at lists.andrew.cmu.edu <mailto:argus-info at lists.andrew.cmu.edu> >
Subject: Re: [ARGUS] IPFIX support timeline.

 

Seems to me that you’ve had good luck, its a bug in rabins.1.

Can you share the file that kills rabins ??

Carter

 

On Feb 4, 2016, at 9:45 PM, Richard Rothwell < <mailto:Richard.Rothwell at aarnet.edu.au> Richard.Rothwell at aarnet.edu.au> wrote:

 

Hi Carter,

 

I have followed up on your suggestions. No luck. And the problem is broader than IPFIX handling.

 

Its seems radium can handle the net flow 9 records I am throwing at it.

No problems there. The argus records output file produced by the –w option has sensible contents.

 

FYI I am currently using nfreplay to convert a collection of IPFIX records in files, to a NetFlow 9 network stream and sending that to radium.

This produces a 1.6Gig Argus records file.

 

However rabins falls over whether it is taking records directly from radium or indirectly  via the Argus records file produced by radium.

Adjusting the –B option to 300s causes rabins to fall over, but without producing any output at all.

 

The commands I am using are:

 

sudo /usr/local/sbin/radium -S  <cisco://any:9995> cisco://any:9995 -d -P 562 

With

sudo /usr/local/bin/rabins -S localhost:562  -M time 10s -B 10s -w '/mnt/hgfs/centos_shared/rabins_radium.out’

 

OR 

 

sudo /usr/local/sbin/radium -S  <cisco://any:9995> cisco://any:9995 -d -P 562 -w '/mnt/hgfs/centos_shad/radium_100_10s.out'

With

sudo /usr/local/bin/rabins -r '/mnt/hgfs/centos_shared/radium_100_10s.out' -M time 100s -B 100s –w '/mnt/hgfs/centos_shared/rabins_infile_100_100s_100s.out’

 

Etc

 

Regards

 

 

 

From: Carter Bullard < <mailto:carter at qosient.com> carter at qosient.com>
Date: Friday, 5 February 2016 at 7:39 AM
To: Site License < <mailto:Richard.Rothwell at aarnet.edu.au> Richard.Rothwell at aarnet.edu.au>
Cc: Argus < <mailto:argus-info at lists.andrew.cmu.edu> argus-info at lists.andrew.cmu.edu>
Subject: Re: [ARGUS] IPFIX support timeline.

 

Hey Richard, 

We have preliminary support in argus-clients now for IPFIX UDP and TCP.  That needs debugging and additional support as new IEs are used.  It would be reasonable to read the IPFIX data with radium, and have rabins connect to radium to get the converted data.  That way we can figure out if any bugs are in IPFIX conversion or in record processing later on.

 

rabins.1 has some very specific issues with flow data coming way out of time order.  We’re going to report on time period t1-t2, and if IPFIX sends data late, rabins throws it away … could be the memory leak relates to data out of bounds ???  If so, you need to add a bit of buffering using the -B option, so that rabins doesn’t flush out a time bin, when more IPFIX data is coming.    With some implementations, you may need a “-B 300s” to make sure the data is ok.  But if you can get some guarantees from IPFIX, then the -B can be shorter.

 

If you have a bug report for rabins, please send it to the list.  Try using radium to convert IPFIX to argus format, check to see how out of order the flow records are, then adjust using a ‘-B delay’ option to give the IPFIX data time to show up, and then lets see if you have blow ups or memory leaks ????

 

 <http://gloriad.org/> Gloriad.org, an NSF IRE service provider, has a great argus -> ELK system they have said they will share.  Not sure the status of that.

 

Carter

 

On Feb 4, 2016, at 12:20 AM, Richard Rothwell via Argus-info < <mailto:argus-info at lists.andrew.cmu.edu> argus-info at lists.andrew.cmu.edu> wrote:

 

Hi list,

 

I am investigating all of the bits need to get network monitoring up and running for AARNET.

Front-end most likely would involve the ELK stack in some way with Argus providing the probes.

 

However we are interested in getting our data from the routers rather than network interfaces.

But we have settled on IPFIX. Feeding IPFIX flows into the Argus rabins client seems to work, sort of.

 

There are 2 issues I need to address.

1.	When will proper IPFIX support be available?
2.	What are the limitations of feeding IPFIX flows into the front end of rabins when it expects NetFlow 9. (I’m just the programmer not the network expert.)
3.	Feeding IPFIX data into rabins causes it to blow up pretty quick with a major memory leak. I have studied this with heaptracker, but no definite conclusion yet.

Regards from Richard

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20160205/0afc67f0/attachment.html>


More information about the argus mailing list