FW: Couple things...

Craig Merchant cmerchant at responsys.com
Mon Aug 12 17:20:52 EDT 2013


I can't explain the packet loss in that sample...  I just looked at the last 5 million flows from racluster and searched for packets with g or M in the TCP flags.  We experienced about 1.3% of flows with gaps:


tcp_flags

count

percent


e g

64514

1.290280

e gS

49

0.000980

e gD

38

0.000760

eUg

3

0.000060



But when I search those same 5 million flows and count by direction, there are still around 15% that have directional issues:



direction

count

percent

1

->

3140573

62.811460

2

<->

636891

12.737820

3

<?>

611214

12.224280

4

<-

411492

8.229840

5

?>

159660

3.193200

6

who

23154

0.463080

7

-

14152

0.283040

8

<?

2480

0.049600


I haven't heard anything back from the folks at ntop about the error that another user was experiencing where the SYN and SYNACK packets were getting dropped by pf_ring.  If the handshake gets dropped before argus sees it, will it detect a gap in the flow or does it just treat it as a flow that has been running longer than its TCP cache time?

Thanks!

Craig

From: Carter Bullard [mailto:carter at qosient.com<http://qosient.com>]
Sent: Saturday, August 10, 2013 7:59 AM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [ARGUS] Couple things...

Hey Craig,
In the packet file you sent me, there is evidence of sizable packet loss from the wire to argus.  The biggest indicator is the occurence of the " g " indicator in primitive argus data.  The " g "  indicates gaps, which indicates that packets were missing, not retansmitted, not dropped, just missing.  Seems in your file about 20-30% of the TCP flows had gaps, if memory serves.  That's significant.  You can print the gap value to see how many gaps argus saw.  Should be a byte value.  If you see a bunch of " g " in your flgs field, you've got a packet capture problem.

The packet acquisition system can fail at any point along its path.
It can be pretty complicated, like pf_ring screwing up, egress interface overruns, hardware and interrupt problems, etc....  But you aren't going that fast si the gigamon should make a huge improvement iver the 6500's.  Even 6590's have big port mirroring issues.  You don't notice it until you put a sensitive sensor on the end of the packet path.

Carter

On Aug 10, 2013, at 10:21 AM, Carter Bullard <carter at qosient.com<mailto:carter at qosient.com>> wrote:
I'm not a fan of VLAN spanning, I like physical interface port mirroring or wire tapping, as that gives you the complete set of data to do all things that you need to do for ops, performance, and security.  The gigamon approach will do much better.  You want to see the encapsulations on the wire, you want to see the vlan tags, you need to see what is hitting that switch/router/host interface if you want to solve problems.

You don't want to dedup flows...That is a pretty big no-no from the argus perspective.  We are all about comprehensive monitoring.  If you toss things out, you're not comprehensive.  The reason you want to see everything, is so you can realize when things break/change, that is the operational side, but more important on the security side, you don't want to give an intruder a space to operate in.  If you throw stuff away, one can hide actions in the discarded space.  Especially non-IP traffic on the wire, if you are not monitoring Layer 2 non-IP flows, you're in big trouble of missing how you're getting trashed !!!

Carter

On Aug 9, 2013, at 11:34 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:
Hey, Carter...  So, I sent your email to the NetOps team.  Apparently the duplicate packet problem is well known when you use VLANs as the source of traffic for a SPAN port:

http://blogs.cisco.com/security/span-packet-duplication-problem-and-solution/

I searched over the last 30 million of our flow data events (from racluster) looking for TCP flags with "M" in it.  Not a one.  So, I can't explain why you saw that in the tcpdump file I sent you, but it doesn't appear to be a pervasive problem.

We're going to contact our reps at Gigamon and see whether their products' flow dedup features might  help.

If Argus is seeing duplicate flows, shouldn't it see duplicate SYN and SYN/ACKs?  And if so, shouldn't it be able to figure out the direction?  I'm still trying to figure out if the problem is a sampling issue with a really busy SPAN port or if it's something else...

We're going to try dropping half the VLANs to reduce the traffic volume and see if that has any impact.

Thanks.

Craig

From: Carter Bullard [mailto:carter at qosient.com]
Sent: Thursday, August 08, 2013 11:00 AM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [ARGUS] Couple things...

Hey Craig,
I've been looking at one of your pcap files, and you've go a lot of weird
stuff going on in your network.  You, like a few on the list, have an
observation domain that sees many packets twice.  While some say
these are " duplicates ", they are distinct packets on the wire, and seeing
them twice is really an artifact of either how your spanning the packets,
or how you've set up your network.

Possibly you are spanning multiple interfaces to the same argus , and
the packet traverses both of them ??   Or the packet actually traverses
the same physical link twice, but in different overlays or VPNs ???

Argus can be configured to make these mulitple flows distinct, rather than
having the packets aggregated into a single  5-tuple flow record.  You can
do that by adding the mac addresses to the flow key, or the VLAN tags
to to the flow key.

I can suggest that you add the LAYER_2 information to argus's flow keys,
to see if you don't get a bit better data.   In your argus.conf file:

   ARGUS_FLOW_KEY="CLASSIC_5_TUPLE+LAYER_2"

You will need to check the output, so that you can see what is going on.
Post processing of these flows, especially aggregation, will need to account
for the ethernet addresses (by adding the smac and dmac to the aggregation
keys), with calls such as:

   racluster -m smac dmac saddr daddr proto sport dport

when you want to do default aggregation.

Here is a sample of one of your pings, between two of your hosts, with the old and new flow key definitions.
I've modified the network and ethernet addresses to protect the innocent.

Standard default 5-tuple flow key

ra -r argus.10*old.out -s stime dur flgs smac dmac proto saddr dir daddr spkts dpkts state - icmp
      StartTime        Dur      Flgs             SrcMac             DstMac  Proto       SrcAddr   Dir     DstAddr  SrcPkts  DstPkts State
20:00:22.187940   0.005851  M         00:30:48:aa:bb:cc  00:1e:f7:xx:yy:zz   icmp   10.30.80.41   <->  10.20.2.26        1        2   ECO


Standard default 5-tuple flow key with layer 2 identifiers added

ra -r argus.10*new.out -s stime dur flgs smac dmac proto saddr dir daddr spkts dpkts state - icmp
      StartTime        Dur      Flgs             SrcMac             DstMac  Proto       SrcAddr   Dir     DstAddr  SrcPkts  DstPkts State
20:00:22.187940   0.005851  e         00:30:48:aa:bb:cc  00:1e:f7:xx:yy:zz   icmp   10.30.80.41   <->  10.20.2.26        1        1   ECO
20:00:22.193790   0.000000  e         00:1e:f7:xx:yy:zz  d4:8c:b5:cc:dd:ee   icmp   10.30.80.41   <-   10.20.2.26        0        1   ECR


As you can see argus, with just the standard 5-tuple flow key, thinks there are 3 packets in the
ping volley, one ping request and 2 ping replys.  With the LAYER_2 id's added to the flow key,
we see that one of the echo reply's was also transmitted to another ethernet address ???
The 'M' in the flgs field of the 5-tuple flow record, indicates that there were 'M'ultiple mac addresses
seen for the bi-directional flow.  You don't see that in the new flow key strategy.

I don't see a trend, but you do have a lot of asymmetry in how the packets are duplicated.
Take a look at this new flow data, print out the smac and dmac, and see if you can figure it out.

Hope all is most excellent,


Carter


On Aug 8, 2013, at 10:23 AM, Carter Bullard <carter at qosient.com<mailto:carter at qosient.com>> wrote:




Done some testing with your argus.conf file.

Can't find any argus faults using it with your packet files,
on my machines, but there is one issue with the configuration.

Your ARGUS_MONITOR_ID is inappropriate.  You only get 32-bits
for a source id, so the max string you can use is 4 characters
long.  We'll cut it to 4 chars, so I don't think that this
will cause problems, but it is incorrect.

Carter

On Aug 7, 2013, at 1:40 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:




That worked!  Thanks, David.  Not sure what in my argus.conf could be causing the problem.  Here it is if you're curious:

ARGUS_FLOW_TYPE="Bidirectional"
ARGUS_FLOW_KEY="CLASSIC_5_TUPLE"
ARGUS_DAEMON=no
ARGUS_MONITOR_ID="ids01-dc1"
ARGUS_ACCESS_PORT=561
ARGUS_BIND_IP="10.10.10.10"
ARGUS_INTERFACE=dnacluster:10 at 28
ARGUS_GO_PROMISCUOUS=no
ARGUS_SET_PID=yes
ARGUS_PID_PATH="/var/run"
ARGUS_FLOW_STATUS_INTERVAL=5
ARGUS_IP_TIMEOUT=900
ARGUS_TCP_TIMEOUT=1800
ARGUS_GENERATE_RESPONSE_TIME_DATA=yes
ARGUS_GENERATE_APPBYTE_METRIC=yes
ARGUS_GENERATE_TCP_PERF_METRIC=yes
ARGUS_GENERATE_BIDIRECTIONAL_TIMESTAMPS=yes
ARGUS_CAPTURE_DATA_LEN=10
ARGUS_SELF_SYNCHRONIZE=yes
ARGUS_KEYSTROKE="yes"

From: David Edelman [mailto:dedelman at iname.com<http://iname.com/>]
Sent: Tuesday, August 06, 2013 8:42 PM
To: Craig Merchant; Carter Bullard
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [ARGUS] Couple things...

Craig,

Just in case you are running into something odd in the argus.conf file, I suggest that you add -X as the very first argument to the invocation of argus. I suggest something very simple like:

# /usr/local/bin/argus -X -r somefile.pcap -w /tmp/somefile.argus

If that works (and /tmp is almost always a good place to write the output because it avoids permission problems) then use recount() on the /tmp/somefile.argus to make sure that everything is as expected and let us know what happened.

--Dave


From: Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>>
Date: Tuesday, August 6, 2013 11:28 PM
To: Carter Bullard <carter at qosient.com<mailto:carter at qosient.com>>
Cc: Argus <argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>>
Subject: Re: [ARGUS] Couple things...

I don't know what to tell you.  If you want me to run that trace tool and send you the output, let me know where to get it and I'll figure it out.

Did you take a look at the pcap file to see if there were a lot of missing SYN/SYNACK packets?

Thanks.

Craig

From: Carter Bullard [mailto:carter at qosient.com]
Sent: Tuesday, August 06, 2013 10:02 AM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [ARGUS] Couple things...

Hey Craig,
I'm not having any problems reading your tcpdump.pcap file
with my version of argus, so I can't reproduce a fault.

% thoth:Data carter$ argus -r tcpdump*pcap -w - | racount
racount   records     total_pkts     src_pkts       dst_pkts       total_bytes        src_bytes          dst_bytes
    sum   402665      9999999        5205934        4794065        4795152829         2664296730         2130856099

Is there a specific feature or command line option that generates
your problem?

Carter

On Aug 3, 2013, at 2:23 PM, Carter Bullard <carter at qosient.com<mailto:carter at qosient.com>> wrote:






OK, with the pcap we'll figure it out.

So the ssh keystroke algorithm is round trip sensitive, and its tuned for the enterprise border viewing, but there are a lot of knobs that can be turned.  The real trick is having, again, a packet file of a session so we can see what the algorithm is doing.

Grab a few and we can go over it packet for packet.

Carter

Carter Bullard, QoSient, LLC
150 E. 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax

On Aug 2, 2013, at 3:06 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:
I don't know what to tell you, Carter.  The version of 3.0.7.4 that I'm running has the same MD5 sum as the latest in qosient.com/dev<http://qosient.com/dev>...

I've uploaded the pcap file I'm trying to convert to your FTP server.

I've attached the debug file, but after further testing I think it's an algorithm configuration issue.  I've tried testing normal and reverse keystroke detection between hosts that were in the same data center and dnstroke and snstroke always show up as "0,0" or ",," (the latter happens more when there are directional issues).  But if I watch a host that I ssh into over the VPN from my home connection, Argus detects keystrokes.

I've tried reading through the academic paper you guys published on the keystroke detection and it's beyond me.  If it works for a slower network connection and not a faster network connection (or maybe I should say lower/higher latency connection), which configuration options should I experiment with to find the right balance?

Thanks.

Craig




From: Carter Bullard [mailto:carter at qosient.com]
Sent: Friday, August 02, 2013 8:37 AM
To: Craig Merchant
Cc: Argus (argus-info at lists.andrew.cmu.edu<mailto:argus-info at lists.andrew.cmu.edu>)
Subject: Re: [ARGUS] Couple things...

Hey Craig,
Was in Calif all last week, and just now catching up.

I really think the argus crashing issue is fixed.  At least
it works with all data that has been uploaded.  But if you have
packet data that is blowing argus up, can you send ???

There is a possibility that you may not have the most recent
version of argus-3.0.7.4.  I sometimes put up new software
without changing the number, like if I make a mistake and
put up the wrong version.  So, there could be a race condition.
Check the md5 or date times, or just grab again, if there is
any doubt.

You have to turn on keystroke detection, so, don't comment out
the ARGUS_KEYSTROKE="yes" line.  The CONF line you can comment
out.

To troubleshoot the keystroke algorithm, with argus running, but
not as a daemon, you can send a USR1 signal to it,

   # kill -USR1 argus.pid

and it will print out stats that include the keystroke algorithm
configuration, if its turned on. When you send a USR1 signal to
argus, you increment the Debug flag setting for all of argus, and
so you should start getting debug messages, if the debug facility
is compiled in. Send another USR1 and you'll increase the debug
information.  Most of the per packet keystroke debugging is at
debug level 5.

Send a USR2 signal to argus ( # kill -USR2 argus.pid ) to turn
debug reporting off.

Carter


On Aug 1, 2013, at 7:02 PM, Craig Merchant <cmerchant at responsys.com<mailto:cmerchant at responsys.com>> wrote:







Hey, Carter...

I just wanted to check in and see if you anything else from me on the labeling issue or argus crashing when trying to convert a pcap file.  Let me know...

I'm also having some issues with keystroke detection with the latest release.  The following command used to work in my testing:

/usr/local/bin/ra -S 10.10.10.10:561 -n -u -c "," -s "+0dnstroke,+1snstroke" - host 10.1.1.1 and host 10.1.1.2

I tried both a normal and reverse SSH session between the two hosts and neither one registered keyboard strokes of varying speeds and intensity.

All I've done is commented out the defaults in argus.conf:

ARGUS_KEYSTROKE="yes"
ARGUS_KEYSTROKE_CONF="GPC_MAX=4"

I performed pretty much the same testing a couple months ago and got plenty of flows where keystrokes were detected.  Please let me know what you'd recommend for troubleshooting that.

Thanks.

Craig

<debug.zip>




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130812/589e8f24/attachment.html>


More information about the argus mailing list