[Ntop-misc] Direction and IP/TCP timeout settings

Carter Bullard carter at qosient.com
Tue Jul 23 17:29:20 EDT 2013


Hey Craig,
I'm glad that we fixed your concern.

The variable is ARGUS_FLOW_STATUS_INTERVAL.  Sorry about that.
I grep for INTERVAL and find it every time.

You can now change your ARGUS_TCP_TIMEOUT to see if you can get
the 20% down a bit, but regardless, about 10% of your connections will present
with " ? " in the dir field, as they are long lived flows with idle times that exceed
any practical timeout you would be willing to use, or they are discovery packets,
where people are trying to stimulate your systems into responding.

Nothing you can do about most of those conditions.  " rasqlinsert -M cache " style
clustering can be used to match the initial records to correct the direction, but
there are still flows whose idle times will exceed 24-48 hours, which is the
time horizon of those techniques.

Your CPU utilization is high because we are only sleeping 5 uSec when there
aren't packets to process.  Increase the nanosleep() timer on line 3825 in ./argus/ArgusSource.c
from 5000 to something like 50000, and see if that isn't helpful.

Carter


On Jul 23, 2013, at 3:24 PM, Craig Merchant <cmerchant at responsys.com> wrote:

> I’ve successfully compiled the 3.0.7.4 version of argus on both my sensors.  I added the ARGUS_FAR_STATUS_INTERVAL=5 to /etc/argus.conf.  I checked the /root/argus-3.0.7.4/support/Config/argus.conf file for the ARGUS_FAR_STATUS_INTERVAL (and any other new config options), but it wasn’t in the file.  Argus started up just fine.
>  
> The percentage of flows that Argus can’t determine the direction of is about 20%, which is dramatically better than the 40-60% it was doing with previous versions.  The CPU utilization is still really high (90-100% most of the time).  Are there any changes to the ARGUS_FAR_STATUS_INTERVAL that you think would improve it further?
>  
> I downloaded the 3.0.7.12 version of the clients and ran configure:  ./configure --with-GeoIP=yes
>  
> When I ran make, I got the following error:
>  
> In file included from ./raclient.c:48:
> ./rasqlinsert.h:87:31: error: readline/readline.h: No such file or directory
> ./raclient.c: In function âRaProcessEventRecordâ:
> ./raclient.c:1717: error: âBytefâ undeclared (first use in this function)
> ./raclient.c:1717: error: (Each undeclared identifier is reported only once
> ./raclient.c:1717: error: for each function it appears in.)
> ./raclient.c:1717: error: expected expression before â)â token
> make[2]: *** [raclient.o] Error 1
> make[2]: Leaving directory `/root/argus-clients-3.0.7.12/examples/ramysql'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/root/argus-clients-3.0.7.12/examples'
> make: *** [all] Error 2
>  
> Thanks.
> 
> Craig
>  
>  
> From: Carter Bullard [mailto:carter at qosient.com] 
> Sent: Monday, July 22, 2013 6:50 PM
> To: Craig Merchant
> Cc: Argus (argus-info at lists.andrew.cmu.edu)
> Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings
>  
> Well you should have used the ./support/Config/argus.conf file as a starter
> configuration, and it has that variable.  The default is 5 seconds.
>  
> You should definitely grab argus-3.0.7.4 and try that.
> Grab the current argus-latest.tar.gz.
>  
> Carter
>  
> On Jul 22, 2013, at 9:43 PM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
> 
> I do…  And it looks like the majority of them have direction problems…
>  
> My argus.conf doesn’t have that setting in it – and neither does /root/argus-3.0.7.3/support/Config/argus.conf.  Is that a configuration option new to the release you just posted today?  I haven’t had a chance to download and install it yet. 
>  
> What about the ARGUS_ENV="PCAP_MEMORY=300000" setting?  I see it’s disabled in the default argus.conf file.  If I want to use pf_ring, is there any way that setting could be impacting things?
>  
> Thx.
> 
> C
>  
> From: Carter Bullard [mailto:carter at qosient.com] 
> Sent: Monday, July 22, 2013 6:12 PM
> To: Craig Merchant
> Cc: Argus (argus-info at lists.andrew.cmu.edu)
> Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings
>  
> The newest version of argus on the dev server fixes the bug you reported where argus seg faults on your packet file.  The bug was introduced when we added larger timeout values trying to fix your problem.
>  
> Do any of your records have a " dur gt 5 "  assuming your ARGUS_FAR_STATUS_INTERVAL is 5 seconds ?
> 
> Carter
> 
> On Jul 22, 2013, at 8:14 PM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
> Hey, Carter…
>  
> I ran a search on the last 200,000 records that had a “?” in the direction field and only about 7% of them had a “g” in the flags.  If gaps in the packets were the problem – whether from an overloaded port, driver, or asymmetric flows (we are using a pair of Cisco VSS switches, but the NetOps team swears that the SPAN port sees all traffic from both switches) – wouldn’t we expect that number to be a lot higher?
>  
> In your example “ra -S argus.source -M xml – man”, can ra read from radium or can it only read from a file?  I presume both are supported since you used the –S switch instead of –r, but when I run it against my radium instance, the command never exits or displays any results.  Do I need to specify an interval for ra to connect?
>  
> While I’m doing this testing, I’m running one host with pf_ring and one with the normal Intel ixgbe driver and the directional issues are pretty much even across both hosts.  I’ve tried connecting my raclients to the argus instances directly (and thus not using radium), but the results are pretty much the same.
>  
> When you refer to modifying “sleep timeouts”, what configuration option are you referring to?  Is that the IP/TCP timeouts in argus.conf?  I looked through argus.conf, radium.conf, and rarc.conf for “sleep” and didn’t find anything…
>  
> As for hard-coding destination ports…  Any kind of CSV file or iana-formatted file that you use for ralabel would be easy for me to work with.
>  
> Did you have a chance to look at the tcpdump I sent you and see how well Argus picks out the direction from the flows?
>  
> Thx.
> 
> Craig
>  
> From: Carter Bullard [mailto:carter at qosient.com] 
> Sent: Monday, July 22, 2013 1:40 PM
> To: Craig Merchant
> Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings
>  
> Hey Craig,
> So, the whole point to this exercise has been to determine if
> you are not getting all the packets from the wire, because
> you think you are seeing too many " ? " in your TCP direction
> field.
>  
> When the sensor doesn't see all the packets that it can,
> the most important indicator is a " g " in the flgs field.
> This indicates that there are packet gaps that the flow
> modeler has detected, which are sequence numbers never seen.
> You should be seeing " g "s if random packet loss from
> the wire to argus is occurring.
>  
> If this was/is the case, then changing the sleep timeouts should
> help a great deal in reducing the occurence of " g "s and the
> mystery of the apparent lack of SYN and SYN_ACKs would be solved.
>  
> If not, but argus is still not reporting all the direction
> that you think it should, then selective loss of the SYN
> and SYN_ACK packets is a possibility.  
>  
> pf_ring would be a most natural place to point the finger, in this case.
>  
> The argus "man" record reports libpcap packet drop stats,
> which count the number of packets that were received and
> ready for processing, but were not read.  You can print that
> number like this:
>  
>    ra -S argus.source -M xml - man
>  
> And you will get something like this:
>  
> <?xml version ="1.0" encoding="UTF-8"?>
> <!--Generated by ra(3.0.7.12) QoSient, LLC-->
> <ArgusDataStream
>   xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" 
>   xsi:noNamespaceSchemaLocation = "http://qosient.com/argus/Xml/ArgusRecord.3.0.xsd"
>   BeginDate = "2013-07-15T13:56:43.109557" CurrentDate = "2013-07-22T16:33:37.086812"
>   MajorVersion = "3" MinorVersion = "0" InterfaceType = "DLT_NULL" InterfaceStatus = "Up"
>   ArgusSourceId = "192.168.0.68"  NetAddr = "0.0.0.0"  NetMask = "0.0.0.0">
>  
>  <ArgusManagementRecord  StartTime = "2013-07-22T16:33:36.982927" Duration = "614213.875000" Flags = "         " Proto = "man" PktsRcvd = "0" Records = "0" BytesRcvd = "0" PktsDropped = "0" State = "STA"></ArgusManagementRecord>
>  <ArgusManagementRecord  StartTime = "2013-07-22T16:33:43.194437" Duration = "60.101017" Flags = "         " Proto = "man" PktsRcvd = "52114" Records = "57" BytesRcvd = "47541540" PktsDropped = "0" State = "CON"></ArgusManagementRecord>
>  
> The PktsDropped value is something to look for.
>  
> If there is still a mystery, flows with the " ? " will exist naturally.
> Flows that are long lived, with idle periods longer that the TCP timeout
> period, with present with the " ? ".  Also when there is asymmetry, such
> as load balancing, you may miss the SYN and SYN_ACK completely.
> You get what you get, in that case.
>  
> We provide some means to control the direction, when its unknown.
> If you want to propose other client based mechanisms, holler away.
>  
> Carter
>  
>  
> On Jul 21, 2013, at 1:15 AM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
> 
> 
> 
> Just an FYI…  Apparently the DNA/libzero drivers from NTOP support pcap_stats().  But I have absolutely no idea how to access those stats…
>  
> From: ntop-misc-bounces at listgateway.unipi.it [mailto:ntop-misc-bounces at listgateway.unipi.it] On Behalf Of Alfredo Cardigliano
> Sent: Saturday, July 20, 2013 4:03 AM
> To: ntop-misc at listgateway.unipi.it
> Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings
>  
> Hi Craig
> yes, libpcap over dna cluster queue provides pcap_stats() support.
>  
> Alfredo
>  
> On Jul 18, 2013, at 9:01 PM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
> 
> 
> 
> 
> Alfredo,
>  
> I ran both pfcount –i dnacluster:10 at 28 (the queue argus monitors) and pfcount –i dna0 (when pfdnacluster_masterr wasn’t running).  Both of them showed a 0.1% packet loss.
>  
> What about this question that Carter had:
>  
> Does the pfdnacluster_master queue provide standard pcap_stats() ?
> We should be able to look at the MARs, which will tell us  how
> many packets the interface dropped.
>  
> I’m not familiar with what pcap_stats() are…
>  
> Thanks.
>  
> Craig
>  
> From: ntop-misc-bounces at listgateway.unipi.it [mailto:ntop-misc-bounces at listgateway.unipi.it] On Behalf Of Alfredo Cardigliano
> Sent: Thursday, July 18, 2013 12:44 AM
> To: ntop-misc at listgateway.unipi.it
> Subject: Re: [Ntop-misc] FW: [ARGUS] Direction and IP/TCP timeout settings
>  
> Hi Craig
> what do you mean with "Pfcount says that the queue that argus is running  on is only dropping 0.1% of packets"? You should look at the stats on the queue argus is using.
> Select/poll are not supported by the cluster as we experienced that using usleep behaves better than the poll implementation in this case.
>  
> Alfredo
>  
> On Jul 16, 2013, at 1:51 AM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
> 
> 
> 
> 
> 
> I’m trying to troubleshoot some issues with the argus netflow tool running on top of pfdnacluster_master.  Pfcount says that the queue that argus is running  on is only dropping 0.1% of packets, yet argus can’t figure out the direction of about 60% of the flows.  That means for some reason it isn’t seeing the SYN and SYNACK of a lot of flows.
>  
> The argus developer had a couple questions about the pfdnacluster_master that I can’t answer…  They are below.
>  
> Thanks.
> 
> Craig
>  
> From: Carter Bullard [mailto:carter at qosient.com] 
> Sent: Monday, July 15, 2013 3:13 PM
> To: Craig Merchant
> Cc: Argus (argus-info at lists.andrew.cmu.edu)
> Subject: Re: [ARGUS] Direction and IP/TCP timeout settings
>  
> Hey Craig,
> If radium doesn't keep, the argi will drop the connections,
> so unless you see radium losing its connection and 
> then re-establishing, I don't think its radium.  We can measure
> all of this, so its not going to be hard to track down, I don't
> think.
>  
> If argus is generating the same number of flows, then its probably
> seeing the same traffic.  So, it seems that we are not getting all
> the packets, and it doesn't appear to be due to argus running
> out of cycles.  Are we running out of memory? How does vmstat look
> on the machine ??  Not swapping out ?
>  
> To understand this issue, I need to know if the pfdnacluster_master queue
> is a selectable packet source, or not.  We want to use select() to get
> packets, so that we can leverage the select()s timeout feature to wake
> us up, periodically, so we can do some background maintenance, like queue
> timeouts, etc…
>  
> When we can't select(), we have to poll the interface, and if
> there isn't anything there, we could fall into a nanosleep() call,
> waiting for packets.  That may be a very bad thing, causing us to
> could be lose packets.
>  
> Does the pfdnacluster_master queue provide standard pcap_stats() ?
> We should be able to look at the MARs, which will tell us  how
> many packets the interface dropped.
>  
> Not sure that I understand the problem with multiple argus processes?
> You can run 24 copies of argus, and have radium connect to them
> all to recreate the single argus data stream, if that is something
> you would like to do.
>  
> Lets focus on this new interface.  It could be we have to do something
> special to get the best performance out of it.
>  
> Carter
>  
> 
>  
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc at listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130723/37450904/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130723/37450904/attachment.bin>


More information about the argus mailing list