A couple troubleshooting questions...

Carter Bullard carter at qosient.com
Fri Jul 25 17:51:15 EDT 2014


Hey Peter,
Glad to see you're still among the digitially living.  Long time no read !!!  

The link for your note regarding performance etc... on the web site is:

   http://www.qosient.com/argus/sensorPerformance.shtml

Hope all is and will be most excellent !!!

Carter

On Jul 25, 2014, at 5:02 PM, Peter Van Epp <vanepp at sfu.ca> wrote:

> On Thu, Jul 24, 2014 at 09:22:51AM -0700, Craig Merchant wrote:
>> Just got some:
>> 
>> 
>> 
>> <ArgusManagementRecord  StartTime = "1395380563.321"    Flags = "         "     Proto = "man"   PktsRcvd = "0"  Records = "0"   BytesRcvd = "0"         PktsDropped = "0"       State = "STA" SrcUserData = ""></ArgusManagementRecord>
>> 
> <snip>
>> 
>> While these records were being generated, I ran the ra client and grep'd for '*\sg' and I saw a ton of flows with gaps.  So, from what you said earlier in the thread, if the problem is that Argus can't keep up, PktsDropped would be greater than zero.
>> 
>> 
>> 
>> We recently implemented a bunch of Gigamon taps instead of using SPAN ports on the Catalyst 6500s.  So, I'm guessing that is where the problem may lie.  I'll have to talk to our netops team to see what kind of troubleshooting tools are available in the Gigamon devices.  You probably have a lot of experience with them, so any suggestions are appreciated.
>> 
> 
> 	Welcome to the fun game of "packet packet who lost the packet?" :-).
> While I'm rusty (I've been retired for 5+ years now) when I first started
> using argus 20+ years ago I poked at (and verified to my satisfaction) that
> argus could accuratly count traffic and some of the methods I used then should
> still apply. For background there used to be some notes on hardware performance
> and the various places packet loss can occur on the argus web site (I no longer
> have the url to hand). That said I'd hope that the Gigamon solution would do 
> better than a span port on a switch as it is designed to capture packets rather
> than doing it as an after thought if there are resources as a span port does. 
> 	While I looked at Gigamon just before I retired, I don't have direct
> experience with them. However I think you should be able to implement what
> Netoptics called regen taps i.e. two (or more) monitor ports outputting the 
> same data with the Gigamon. I had 4 port optical regen tap (at Gig in those 
> days) and a gig sniffer that could do full speed capture at wire speed (for a 
> small amount of time unfortunatly :-)) to test with without impacting 
> production which was nice. I think (your netops folks willing, I was lucky 
> enough to be both net engineering and the security guy) that the Gigamon 
> should let you do the same thing i.e.  give you a test tap of the same data 
> that argus is seeing on another port where you can try and see what is 
> Happening. I say try and see because your test setup has a similar problem to 
> argus in that if the limitation is hardware in the interface cards losing 
> packets (as opposed to the packets being lost in the main real connection, or 
> in the Gigamon which are also both possible). However the test system isn't 
> also trying to process the packets as argus is so if the loss is in fact in 
> the argus system it may do better. 
> 	The ideal situation is to have a capture device (such as a network 
> sniffer or Endace Ninja) that you are sure can keep up with the wire at least 
> for a while and/or a test generator (I used to use tcpreplay for that at 
> 10 gigs I expect you are in Ninja country) that can generate known traffic in 
> a test setup. At 10 gigs all of this is challanging however (as it was even at 
> gig back in the day :-)). One possibly useful thing (however again with the 
> warning about potential performace issues internally) is counters and 
> statistics from the network switches and gigamon. It can be very instructive 
> (and equally hard to do in a non test environment with known traffic) to 
> compare the packet and byte counters that your core switches, the gigamon and 
> argus report for the same time interval (which is usually the rub, finding a 
> correct time interval). Here longish sample times tend to even out the 
> truncation errors caused by uneven start and end times (i.e. loss of a couple 
> of hundred packets in 10s of thousands is less signifigant than loss of 100s 
> of packets in a 1000 packet sample). As noted the ideal (but possibly too 
> expensive) way is to have an IP traffic generator that can generate a 
> repeatable, longish typical (perhaps recorded from the link with tcpdump) 
> traffic from the link at wire speed in a test setup. That enables you to 
> identify what is losing traffic, the gigamon the network interface cards in 
> the argus box at a hardware level, the OS level tap software (pcap, pf-ring 
> etc.) or the argus process itself (or all of them which is unfortunatly also 
> possible :-)). 
> 	Another useful tool can be a tap (hopefully optical to lessen tap loss 
> issues) and a wire speed network switch with rmon or at least port stats and 
> doing nothing else so it has CPU available to capture hopefully accurate counts
> to give you hopefully accurate packet and byte counts at various places in the 
> network. The production path is probably the hardest to arrange but if you can 
> get a tap installed the output of the tap (modulo security, privacy and 
> political issues) is then available for use without being able to impact the 
> production network. If resources are available an Endace Nija 10 gig capture 
> appliance (hundreds K $$$ last I knew :-)) makes an excellent trouble shooting 
> tool. It can capture at close to wire speed for a reasonable period of time 
> (the one I speced for the local regional network before I retired had 16 
> terabytes of disk storage for captures). With such a capture from the 
> production network, it should be possible to replay the traffic in to a test 
> setup and see exactly what is happening as well which is the ideal trouble 
> shooting environment, known repeatable data. 
> 	As you have probably gathered this is also a huge amount of work
> to add to what you already need to do. I was interested in the results and thus
> used to do this on my own time afer hours as I usually couldn't justify it 
> as directly work related although I was also fortunate in having bosses that
> saw the value in argus and supported it with both capital for tools and staff
> time. It comes down to how valuable having accurate (and for what value of 
> accurate :-)) data is to your bosses and somewhat if you are interested to 
> figure out whats happening using tools that you couldn't afford on your own. 
> 	Hope some of this helps, and good luck!
> 
> Peter Van Epp
> 
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140725/55f32194/attachment.bin>


More information about the argus mailing list