[Ntop-misc] Direction and IP/TCP timeout settings

Carter Bullard carter at qosient.com
Mon Jul 22 21:12:07 EDT 2013


The newest version of argus on the dev server fixes the bug you reported where argus seg faults on your packet file.  The bug was introduced when we added larger timeout values trying to fix your problem.

Do any of your records have a " dur gt 5 "  assuming your ARGUS_FAR_STATUS_INTERVAL is 5 seconds ?

Carter

On Jul 22, 2013, at 8:14 PM, Craig Merchant <cmerchant at responsys.com> wrote:

> Hey, Carter…
>  
> I ran a search on the last 200,000 records that had a “?” in the direction field and only about 7% of them had a “g” in the flags.  If gaps in the packets were the problem – whether from an overloaded port, driver, or asymmetric flows (we are using a pair of Cisco VSS switches, but the NetOps team swears that the SPAN port sees all traffic from both switches) – wouldn’t we expect that number to be a lot higher?
>  
> In your example “ra -S argus.source -M xml – man”, can ra read from radium or can it only read from a file?  I presume both are supported since you used the –S switch instead of –r, but when I run it against my radium instance, the command never exits or displays any results.  Do I need to specify an interval for ra to connect?
>  
> While I’m doing this testing, I’m running one host with pf_ring and one with the normal Intel ixgbe driver and the directional issues are pretty much even across both hosts.  I’ve tried connecting my raclients to the argus instances directly (and thus not using radium), but the results are pretty much the same.
>  
> When you refer to modifying “sleep timeouts”, what configuration option are you referring to?  Is that the IP/TCP timeouts in argus.conf?  I looked through argus.conf, radium.conf, and rarc.conf for “sleep” and didn’t find anything…
>  
> As for hard-coding destination ports…  Any kind of CSV file or iana-formatted file that you use for ralabel would be easy for me to work with.
>  
> Did you have a chance to look at the tcpdump I sent you and see how well Argus picks out the direction from the flows?
>  
> Thx.
> 
> Craig
>  
> From: Carter Bullard [mailto:carter at qosient.com] 
> Sent: Monday, July 22, 2013 1:40 PM
> To: Craig Merchant
> Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings
>  
> Hey Craig,
> So, the whole point to this exercise has been to determine if
> you are not getting all the packets from the wire, because
> you think you are seeing too many " ? " in your TCP direction
> field.
>  
> When the sensor doesn't see all the packets that it can,
> the most important indicator is a " g " in the flgs field.
> This indicates that there are packet gaps that the flow
> modeler has detected, which are sequence numbers never seen.
> You should be seeing " g "s if random packet loss from
> the wire to argus is occurring.
>  
> If this was/is the case, then changing the sleep timeouts should
> help a great deal in reducing the occurence of " g "s and the
> mystery of the apparent lack of SYN and SYN_ACKs would be solved.
>  
> If not, but argus is still not reporting all the direction
> that you think it should, then selective loss of the SYN
> and SYN_ACK packets is a possibility.  
>  
> pf_ring would be a most natural place to point the finger, in this case.
>  
> The argus "man" record reports libpcap packet drop stats,
> which count the number of packets that were received and
> ready for processing, but were not read.  You can print that
> number like this:
>  
>    ra -S argus.source -M xml - man
>  
> And you will get something like this:
>  
> <?xml version ="1.0" encoding="UTF-8"?>
> <!--Generated by ra(3.0.7.12) QoSient, LLC-->
> <ArgusDataStream
>   xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" 
>   xsi:noNamespaceSchemaLocation = "http://qosient.com/argus/Xml/ArgusRecord.3.0.xsd"
>   BeginDate = "2013-07-15T13:56:43.109557" CurrentDate = "2013-07-22T16:33:37.086812"
>   MajorVersion = "3" MinorVersion = "0" InterfaceType = "DLT_NULL" InterfaceStatus = "Up"
>   ArgusSourceId = "192.168.0.68"  NetAddr = "0.0.0.0"  NetMask = "0.0.0.0">
>  
>  <ArgusManagementRecord  StartTime = "2013-07-22T16:33:36.982927" Duration = "614213.875000" Flags = "         " Proto = "man" PktsRcvd = "0" Records = "0" BytesRcvd = "0" PktsDropped = "0" State = "STA"></ArgusManagementRecord>
>  <ArgusManagementRecord  StartTime = "2013-07-22T16:33:43.194437" Duration = "60.101017" Flags = "         " Proto = "man" PktsRcvd = "52114" Records = "57" BytesRcvd = "47541540" PktsDropped = "0" State = "CON"></ArgusManagementRecord>
>  
> The PktsDropped value is something to look for.
>  
> If there is still a mystery, flows with the " ? " will exist naturally.
> Flows that are long lived, with idle periods longer that the TCP timeout
> period, with present with the " ? ".  Also when there is asymmetry, such
> as load balancing, you may miss the SYN and SYN_ACK completely.
> You get what you get, in that case.
>  
> We provide some means to control the direction, when its unknown.
> If you want to propose other client based mechanisms, holler away.
>  
> Carter
>  
>  
> On Jul 21, 2013, at 1:15 AM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
> 
> Just an FYI…  Apparently the DNA/libzero drivers from NTOP support pcap_stats().  But I have absolutely no idea how to access those stats…
>  
> From: ntop-misc-bounces at listgateway.unipi.it [mailto:ntop-misc-bounces at listgateway.unipi.it] On Behalf Of Alfredo Cardigliano
> Sent: Saturday, July 20, 2013 4:03 AM
> To: ntop-misc at listgateway.unipi.it
> Subject: Re: [Ntop-misc] [ARGUS] Direction and IP/TCP timeout settings
>  
> Hi Craig
> yes, libpcap over dna cluster queue provides pcap_stats() support.
>  
> Alfredo
>  
> On Jul 18, 2013, at 9:01 PM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
> 
> 
> Alfredo,
>  
> I ran both pfcount –i dnacluster:10 at 28 (the queue argus monitors) and pfcount –i dna0 (when pfdnacluster_masterr wasn’t running).  Both of them showed a 0.1% packet loss.
>  
> What about this question that Carter had:
>  
> Does the pfdnacluster_master queue provide standard pcap_stats() ?
> We should be able to look at the MARs, which will tell us  how
> many packets the interface dropped.
>  
> I’m not familiar with what pcap_stats() are…
>  
> Thanks.
>  
> Craig
>  
> From: ntop-misc-bounces at listgateway.unipi.it [mailto:ntop-misc-bounces at listgateway.unipi.it] On Behalf Of Alfredo Cardigliano
> Sent: Thursday, July 18, 2013 12:44 AM
> To: ntop-misc at listgateway.unipi.it
> Subject: Re: [Ntop-misc] FW: [ARGUS] Direction and IP/TCP timeout settings
>  
> Hi Craig
> what do you mean with "Pfcount says that the queue that argus is running  on is only dropping 0.1% of packets"? You should look at the stats on the queue argus is using.
> Select/poll are not supported by the cluster as we experienced that using usleep behaves better than the poll implementation in this case.
>  
> Alfredo
>  
> On Jul 16, 2013, at 1:51 AM, Craig Merchant <cmerchant at responsys.com> wrote:
> 
> 
> 
> 
> I’m trying to troubleshoot some issues with the argus netflow tool running on top of pfdnacluster_master.  Pfcount says that the queue that argus is running  on is only dropping 0.1% of packets, yet argus can’t figure out the direction of about 60% of the flows.  That means for some reason it isn’t seeing the SYN and SYNACK of a lot of flows.
>  
> The argus developer had a couple questions about the pfdnacluster_master that I can’t answer…  They are below.
>  
> Thanks.
> 
> Craig
>  
> From: Carter Bullard [mailto:carter at qosient.com] 
> Sent: Monday, July 15, 2013 3:13 PM
> To: Craig Merchant
> Cc: Argus (argus-info at lists.andrew.cmu.edu)
> Subject: Re: [ARGUS] Direction and IP/TCP timeout settings
>  
> Hey Craig,
> If radium doesn't keep, the argi will drop the connections,
> so unless you see radium losing its connection and 
> then re-establishing, I don't think its radium.  We can measure
> all of this, so its not going to be hard to track down, I don't
> think.
>  
> If argus is generating the same number of flows, then its probably
> seeing the same traffic.  So, it seems that we are not getting all
> the packets, and it doesn't appear to be due to argus running
> out of cycles.  Are we running out of memory? How does vmstat look
> on the machine ??  Not swapping out ?
>  
> To understand this issue, I need to know if the pfdnacluster_master queue
> is a selectable packet source, or not.  We want to use select() to get
> packets, so that we can leverage the select()s timeout feature to wake
> us up, periodically, so we can do some background maintenance, like queue
> timeouts, etc…
>  
> When we can't select(), we have to poll the interface, and if
> there isn't anything there, we could fall into a nanosleep() call,
> waiting for packets.  That may be a very bad thing, causing us to
> could be lose packets.
>  
> Does the pfdnacluster_master queue provide standard pcap_stats() ?
> We should be able to look at the MARs, which will tell us  how
> many packets the interface dropped.
>  
> Not sure that I understand the problem with multiple argus processes?
> You can run 24 copies of argus, and have radium connect to them
> all to recreate the single argus data stream, if that is something
> you would like to do.
>  
> Lets focus on this new interface.  It could be we have to do something
> special to get the best performance out of it.
>  
> Carter
>  
> 
>  
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc at listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130722/67709d50/attachment.html>


More information about the argus mailing list