[Ntop-misc] Direction and IP/TCP timeout settings

Craig Merchant cmerchant at responsys.com
Thu Jul 25 14:29:24 EDT 2013


I didn't adjust the nanosleep parameter.  Those machines have 32 cores.  If argus being more aggressive about sleeping seems to be making it more accurate, throwing some CPU at the problem is fine with me.



Sorry I wasn't more clear before...  You're correct:  argus -> radium -> rastream -> racluster -> Splunk



The chain gets screwed up with radium.  If I run ralabel against argus directly the labels are fine.



I didn't use an rarc file because (at the time) that just seemed like one more config file to learn and I was under the gun to get it working.  I use the "+0ltime" to move that field to the front of the results.  Splunk wasn't recognizing the unix timestamp in the events, so it was easier to force it to use the first field of a new line.



I didn't know there was a -d option.  It's not in the online documentation or the --help output.  I'll give it a shot.



I tried converting that tcpdump pcap file with the latest release of Argus and I got the same crash again.  Where do I find that tool that you wanted me to run argus inside of to capture more data about the fault?



Thanks.

Craig




From: Carter Bullard [mailto:carter at qosient.com]
Sent: Thursday, July 25, 2013 10:11 AM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu)
Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings

Hey Craig,
Did you get a chance to change the timeout value for nanosleep() to see if
that helped your CPU utilization ??
All is well and happy ????

Carter

On Jul 23, 2013, at 3:24 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:


I've successfully compiled the 3.0.7.4 version of argus on both my sensors.  I added the ARGUS_FAR_STATUS_INTERVAL=5 to /etc/argus.conf.  I checked the /root/argus-3.0.7.4/support/Config/argus.conf file for the ARGUS_FAR_STATUS_INTERVAL (and any other new config options), but it wasn't in the file.  Argus started up just fine.

The percentage of flows that Argus can't determine the direction of is about 20%, which is dramatically better than the 40-60% it was doing with previous versions.  The CPU utilization is still really high (90-100% most of the time).  Are there any changes to the ARGUS_FAR_STATUS_INTERVAL that you think would improve it further?

I downloaded the 3.0.7.12 version of the clients and ran configure:  ./configure --with-GeoIP=yes

When I ran make, I got the following error:

In file included from ./raclient.c:48:
./rasqlinsert.h:87:31: error: readline/readline.h: No such file or directory
./raclient.c: In function âRaProcessEventRecordâ:
./raclient.c:1717: error: âBytefâ undeclared (first use in this function)
./raclient.c:1717: error: (Each undeclared identifier is reported only once
./raclient.c:1717: error: for each function it appears in.)
./raclient.c:1717: error: expected expression before â)â token
make[2]: *** [raclient.o] Error 1
make[2]: Leaving directory `/root/argus-clients-3.0.7.12/examples/ramysql'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/root/argus-clients-3.0.7.12/examples'
make: *** [all] Error 2

Thanks.

Craig


From: Carter Bullard [mailto:carter at qosient.com<http://qosient.com>]
Sent: Monday, July 22, 2013 6:50 PM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings

Well you should have used the ./support/Config/argus.conf file as a starter
configuration, and it has that variable.  The default is 5 seconds.

You should definitely grab argus-3.0.7.4 and try that.
Grab the current argus-latest.tar.gz.

Carter

On Jul 22, 2013, at 9:43 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:



I do...  And it looks like the majority of them have direction problems...

My argus.conf doesn't have that setting in it - and neither does /root/argus-3.0.7.3/support/Config/argus.conf.  Is that a configuration option new to the release you just posted today?  I haven't had a chance to download and install it yet.

What about the ARGUS_ENV="PCAP_MEMORY=300000" setting?  I see it's disabled in the default argus.conf file.  If I want to use pf_ring, is there any way that setting could be impacting things?

Thx.

C

From: Carter Bullard [mailto:carter at qosient.com<http://qosient.com/>]
Sent: Monday, July 22, 2013 6:12 PM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings

The newest version of argus on the dev server fixes the bug you reported where argus seg faults on your packet file.  The bug was introduced when we added larger timeout values trying to fix your problem.

Do any of your records have a " dur gt 5 "  assuming your ARGUS_FAR_STATUS_INTERVAL is 5 seconds ?

Carter

On Jul 22, 2013, at 8:14 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:
Hey, Carter...

I ran a search on the last 200,000 records that had a "?" in the direction field and only about 7% of them had a "g" in the flags.  If gaps in the packets were the problem - whether from an overloaded port, driver, or asymmetric flows (we are using a pair of Cisco VSS switches, but the NetOps team swears that the SPAN port sees all traffic from both switches) - wouldn't we expect that number to be a lot higher?

In your example "ra -S argus.source -M xml - man", can ra read from radium or can it only read from a file?  I presume both are supported since you used the -S switch instead of -r, but when I run it against my radium instance, the command never exits or displays any results.  Do I need to specify an interval for ra to connect?

While I'm doing this testing, I'm running one host with pf_ring and one with the normal Intel ixgbe driver and the directional issues are pretty much even across both hosts.  I've tried connecting my raclients to the argus instances directly (and thus not using radium), but the results are pretty much the same.

When you refer to modifying "sleep timeouts", what configuration option are you referring to?  Is that the IP/TCP timeouts in argus.conf?  I looked through argus.conf, radium.conf, and rarc.conf for "sleep" and didn't find anything...

As for hard-coding destination ports...  Any kind of CSV file or iana-formatted file that you use for ralabel would be easy for me to work with.

Did you have a chance to look at the tcpdump I sent you and see how well Argus picks out the direction from the flows?

Thx.

Craig

From: Carter Bullard [mailto:carter at qosient.com]
Sent: Monday, July 22, 2013 1:40 PM
To: Craig Merchant
Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings

Hey Craig,
So, the whole point to this exercise has been to determine if
you are not getting all the packets from the wire, because
you think you are seeing too many " ? " in your TCP direction
field.

When the sensor doesn't see all the packets that it can,
the most important indicator is a " g " in the flgs field.
This indicates that there are packet gaps that the flow
modeler has detected, which are sequence numbers never seen.
You should be seeing " g "s if random packet loss from
the wire to argus is occurring.

If this was/is the case, then changing the sleep timeouts should
help a great deal in reducing the occurence of " g "s and the
mystery of the apparent lack of SYN and SYN_ACKs would be solved.

If not, but argus is still not reporting all the direction
that you think it should, then selective loss of the SYN
and SYN_ACK packets is a possibility.

pf_ring would be a most natural place to point the finger, in this case.

The argus "man" record reports libpcap packet drop stats,
which count the number of packets that were received and
ready for processing, but were not read.  You can print that
number like this:

   ra -S argus.source -M xml - man

And you will get something like this:

<?xml version ="1.0" encoding="UTF-8"?>
<!--Generated by ra(3.0.7.12) QoSient, LLC-->
<ArgusDataStream
  xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation = "http://qosient.com/argus/Xml/ArgusRecord.3.0.xsd"
  BeginDate = "2013-07-15T13:56:43.109557" CurrentDate = "2013-07-22T16:33:37.086812"
  MajorVersion = "3" MinorVersion = "0" InterfaceType = "DLT_NULL" InterfaceStatus = "Up"
  ArgusSourceId = "192.168.0.68"  NetAddr = "0.0.0.0"  NetMask = "0.0.0.0">

 <ArgusManagementRecord  StartTime = "2013-07-22T16:33:36.982927" Duration = "614213.875000" Flags = "         " Proto = "man" PktsRcvd = "0" Records = "0" BytesRcvd = "0" PktsDropped = "0" State = "STA"></ArgusManagementRecord>
 <ArgusManagementRecord  StartTime = "2013-07-22T16:33:43.194437" Duration = "60.101017" Flags = "         " Proto = "man" PktsRcvd = "52114" Records = "57" BytesRcvd = "47541540" PktsDropped = "0" State = "CON"></ArgusManagementRecord>

The PktsDropped value is something to look for.

If there is still a mystery, flows with the " ? " will exist naturally.
Flows that are long lived, with idle periods longer that the TCP timeout
period, with present with the " ? ".  Also when there is asymmetry, such
as load balancing, you may miss the SYN and SYN_ACK completely.
You get what you get, in that case.

We provide some means to control the direction, when its unknown.
If you want to propose other client based mechanisms, holler away.

Carter


On Jul 21, 2013, at 1:15 AM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:





Just an FYI...  Apparently the DNA/libzero drivers from NTOP support pcap_stats().  But I have absolutely no idea how to access those stats...

From: ntop-misc-bounces at listgateway.unipi.it<mailto:ntop-misc-bounces at listgateway.unipi.it> [mailto:ntop-misc-bounces at listgateway.unipi.it<mailto:misc-bounces at listgateway.unipi.it>] On Behalf Of Alfredo Cardigliano
Sent: Saturday, July 20, 2013 4:03 AM
To: ntop-misc at listgateway.unipi.it<mailto:ntop-misc at listgateway.unipi.it>
Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings

Hi Craig
yes, libpcap over dna cluster queue provides pcap_stats() support.

Alfredo

On Jul 18, 2013, at 9:01 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:






Alfredo,

I ran both pfcount -i dnacluster:10 at 28 (the queue argus monitors) and pfcount -i dna0 (when pfdnacluster_masterr wasn't running).  Both of them showed a 0.1% packet loss.

What about this question that Carter had:

Does the pfdnacluster_master queue provide standard pcap_stats() ?
We should be able to look at the MARs, which will tell us  how
many packets the interface dropped.

I'm not familiar with what pcap_stats() are...

Thanks.

Craig

From: ntop-misc-bounces at listgateway.unipi.it<mailto:ntop-misc-bounces at listgateway.unipi.it> [mailto:ntop-misc-bounces at listgateway.unipi.it<mailto:misc-bounces at listgateway.unipi.it>] On Behalf Of Alfredo Cardigliano
Sent: Thursday, July 18, 2013 12:44 AM
To: ntop-misc at listgateway.unipi.it<mailto:ntop-misc at listgateway.unipi.it>
Subject: Re: [Ntop-misc] FW: [ARGUS] Direction and IP/TCP timeout settings

Hi Craig
what do you mean with "Pfcount says that the queue that argus is running  on is only dropping 0.1% of packets"? You should look at the stats on the queue argus is using.
Select/poll are not supported by the cluster as we experienced that using usleep behaves better than the poll implementation in this case.

Alfredo

On Jul 16, 2013, at 1:51 AM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:







I'm trying to troubleshoot some issues with the argus netflow tool running on top of pfdnacluster_master.  Pfcount says that the queue that argus is running  on is only dropping 0.1% of packets, yet argus can't figure out the direction of about 60% of the flows.  That means for some reason it isn't seeing the SYN and SYNACK of a lot of flows.

The argus developer had a couple questions about the pfdnacluster_master that I can't answer...  They are below.

Thanks.

Craig

From: Carter Bullard [mailto:carter at qosient.com<http://qosient.com/>]
Sent: Monday, July 15, 2013 3:13 PM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [ARGUS] Direction and IP/TCP timeout settings

Hey Craig,
If radium doesn't keep, the argi will drop the connections,
so unless you see radium losing its connection and
then re-establishing, I don't think its radium.  We can measure
all of this, so its not going to be hard to track down, I don't
think.

If argus is generating the same number of flows, then its probably
seeing the same traffic.  So, it seems that we are not getting all
the packets, and it doesn't appear to be due to argus running
out of cycles.  Are we running out of memory? How does vmstat look
on the machine ??  Not swapping out ?

To understand this issue, I need to know if the pfdnacluster_master queue
is a selectable packet source, or not.  We want to use select() to get
packets, so that we can leverage the select()s timeout feature to wake
us up, periodically, so we can do some background maintenance, like queue
timeouts, etc...

When we can't select(), we have to poll the interface, and if
there isn't anything there, we could fall into a nanosleep() call,
waiting for packets.  That may be a very bad thing, causing us to
could be lose packets.

Does the pfdnacluster_master queue provide standard pcap_stats() ?
We should be able to look at the MARs, which will tell us  how
many packets the interface dropped.

Not sure that I understand the problem with multiple argus processes?
You can run 24 copies of argus, and have radium connect to them
all to recreate the single argus data stream, if that is something
you would like to do.

Lets focus on this new interface.  It could be we have to do something
special to get the best performance out of it.

Carter


_______________________________________________
Ntop-misc mailing list
Ntop-misc at listgateway.unipi.it<mailto:Ntop-misc at listgateway.unipi.it>
http://listgateway.unipi.it/mailman/listinfo/ntop-misc


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130725/5401867c/attachment.html>


More information about the argus mailing list