[Ntop-misc] Direction and IP/TCP timeout settings

Craig Merchant cmerchant at responsys.com
Mon Jul 22 20:14:42 EDT 2013


Hey, Carter...

I ran a search on the last 200,000 records that had a "?" in the direction field and only about 7% of them had a "g" in the flags.  If gaps in the packets were the problem - whether from an overloaded port, driver, or asymmetric flows (we are using a pair of Cisco VSS switches, but the NetOps team swears that the SPAN port sees all traffic from both switches) - wouldn't we expect that number to be a lot higher?

In your example "ra -S argus.source -M xml - man", can ra read from radium or can it only read from a file?  I presume both are supported since you used the -S switch instead of -r, but when I run it against my radium instance, the command never exits or displays any results.  Do I need to specify an interval for ra to connect?

While I'm doing this testing, I'm running one host with pf_ring and one with the normal Intel ixgbe driver and the directional issues are pretty much even across both hosts.  I've tried connecting my raclients to the argus instances directly (and thus not using radium), but the results are pretty much the same.

When you refer to modifying "sleep timeouts", what configuration option are you referring to?  Is that the IP/TCP timeouts in argus.conf?  I looked through argus.conf, radium.conf, and rarc.conf for "sleep" and didn't find anything...

As for hard-coding destination ports...  Any kind of CSV file or iana-formatted file that you use for ralabel would be easy for me to work with.

Did you have a chance to look at the tcpdump I sent you and see how well Argus picks out the direction from the flows?

Thx.

Craig

From: Carter Bullard [mailto:carter at qosient.com]
Sent: Monday, July 22, 2013 1:40 PM
To: Craig Merchant
Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings

Hey Craig,
So, the whole point to this exercise has been to determine if
you are not getting all the packets from the wire, because
you think you are seeing too many " ? " in your TCP direction
field.

When the sensor doesn't see all the packets that it can,
the most important indicator is a " g " in the flgs field.
This indicates that there are packet gaps that the flow
modeler has detected, which are sequence numbers never seen.
You should be seeing " g "s if random packet loss from
the wire to argus is occurring.

If this was/is the case, then changing the sleep timeouts should
help a great deal in reducing the occurence of " g "s and the
mystery of the apparent lack of SYN and SYN_ACKs would be solved.

If not, but argus is still not reporting all the direction
that you think it should, then selective loss of the SYN
and SYN_ACK packets is a possibility.

pf_ring would be a most natural place to point the finger, in this case.

The argus "man" record reports libpcap packet drop stats,
which count the number of packets that were received and
ready for processing, but were not read.  You can print that
number like this:

   ra -S argus.source -M xml - man

And you will get something like this:

<?xml version ="1.0" encoding="UTF-8"?>
<!--Generated by ra(3.0.7.12) QoSient, LLC-->
<ArgusDataStream
  xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation = "http://qosient.com/argus/Xml/ArgusRecord.3.0.xsd"
  BeginDate = "2013-07-15T13:56:43.109557" CurrentDate = "2013-07-22T16:33:37.086812"
  MajorVersion = "3" MinorVersion = "0" InterfaceType = "DLT_NULL" InterfaceStatus = "Up"
  ArgusSourceId = "192.168.0.68"  NetAddr = "0.0.0.0"  NetMask = "0.0.0.0">

 <ArgusManagementRecord  StartTime = "2013-07-22T16:33:36.982927" Duration = "614213.875000" Flags = "         " Proto = "man" PktsRcvd = "0" Records = "0" BytesRcvd = "0" PktsDropped = "0" State = "STA"></ArgusManagementRecord>
 <ArgusManagementRecord  StartTime = "2013-07-22T16:33:43.194437" Duration = "60.101017" Flags = "         " Proto = "man" PktsRcvd = "52114" Records = "57" BytesRcvd = "47541540" PktsDropped = "0" State = "CON"></ArgusManagementRecord>

The PktsDropped value is something to look for.

If there is still a mystery, flows with the " ? " will exist naturally.
Flows that are long lived, with idle periods longer that the TCP timeout
period, with present with the " ? ".  Also when there is asymmetry, such
as load balancing, you may miss the SYN and SYN_ACK completely.
You get what you get, in that case.

We provide some means to control the direction, when its unknown.
If you want to propose other client based mechanisms, holler away.

Carter


On Jul 21, 2013, at 1:15 AM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:


Just an FYI...  Apparently the DNA/libzero drivers from NTOP support pcap_stats().  But I have absolutely no idea how to access those stats...

From: ntop-misc-bounces at listgateway.unipi.it<mailto:ntop-misc-bounces at listgateway.unipi.it> [mailto:ntop-misc-bounces at listgateway.unipi.it<mailto:misc-bounces at listgateway.unipi.it>] On Behalf Of Alfredo Cardigliano
Sent: Saturday, July 20, 2013 4:03 AM
To: ntop-misc at listgateway.unipi.it<mailto:ntop-misc at listgateway.unipi.it>
Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings

Hi Craig
yes, libpcap over dna cluster queue provides pcap_stats() support.

Alfredo

On Jul 18, 2013, at 9:01 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:



Alfredo,

I ran both pfcount -i dnacluster:10 at 28 (the queue argus monitors) and pfcount -i dna0 (when pfdnacluster_masterr wasn't running).  Both of them showed a 0.1% packet loss.

What about this question that Carter had:

Does the pfdnacluster_master queue provide standard pcap_stats() ?
We should be able to look at the MARs, which will tell us  how
many packets the interface dropped.

I'm not familiar with what pcap_stats() are...

Thanks.

Craig

From: ntop-misc-bounces at listgateway.unipi.it<mailto:ntop-misc-bounces at listgateway.unipi.it> [mailto:ntop-misc-bounces at listgateway.unipi.it<mailto:misc-bounces at listgateway.unipi.it>] On Behalf Of Alfredo Cardigliano
Sent: Thursday, July 18, 2013 12:44 AM
To: ntop-misc at listgateway.unipi.it<mailto:ntop-misc at listgateway.unipi.it>
Subject: Re: [Ntop-misc] FW: [ARGUS] Direction and IP/TCP timeout settings

Hi Craig
what do you mean with "Pfcount says that the queue that argus is running  on is only dropping 0.1% of packets"? You should look at the stats on the queue argus is using.
Select/poll are not supported by the cluster as we experienced that using usleep behaves better than the poll implementation in this case.

Alfredo

On Jul 16, 2013, at 1:51 AM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:




I'm trying to troubleshoot some issues with the argus netflow tool running on top of pfdnacluster_master.  Pfcount says that the queue that argus is running  on is only dropping 0.1% of packets, yet argus can't figure out the direction of about 60% of the flows.  That means for some reason it isn't seeing the SYN and SYNACK of a lot of flows.

The argus developer had a couple questions about the pfdnacluster_master that I can't answer...  They are below.

Thanks.

Craig

From: Carter Bullard [mailto:carter at qosient.com<http://qosient.com/>]
Sent: Monday, July 15, 2013 3:13 PM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [ARGUS] Direction and IP/TCP timeout settings

Hey Craig,
If radium doesn't keep, the argi will drop the connections,
so unless you see radium losing its connection and
then re-establishing, I don't think its radium.  We can measure
all of this, so its not going to be hard to track down, I don't
think.

If argus is generating the same number of flows, then its probably
seeing the same traffic.  So, it seems that we are not getting all
the packets, and it doesn't appear to be due to argus running
out of cycles.  Are we running out of memory? How does vmstat look
on the machine ??  Not swapping out ?

To understand this issue, I need to know if the pfdnacluster_master queue
is a selectable packet source, or not.  We want to use select() to get
packets, so that we can leverage the select()s timeout feature to wake
us up, periodically, so we can do some background maintenance, like queue
timeouts, etc...

When we can't select(), we have to poll the interface, and if
there isn't anything there, we could fall into a nanosleep() call,
waiting for packets.  That may be a very bad thing, causing us to
could be lose packets.

Does the pfdnacluster_master queue provide standard pcap_stats() ?
We should be able to look at the MARs, which will tell us  how
many packets the interface dropped.

Not sure that I understand the problem with multiple argus processes?
You can run 24 copies of argus, and have radium connect to them
all to recreate the single argus data stream, if that is something
you would like to do.

Lets focus on this new interface.  It could be we have to do something
special to get the best performance out of it.

Carter


_______________________________________________
Ntop-misc mailing list
Ntop-misc at listgateway.unipi.it<mailto:Ntop-misc at listgateway.unipi.it>
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130723/c2ff5c4d/attachment.html>


More information about the argus mailing list