rc39 unlikelhy output for ESP flows

Tue Feb 27 23:42:22 EST 2007

The loss data looks like its isn;t getting ntohs()'s properly.
So the little endian, is probably biteing us here.

The INT will be there if there is no return traffic.  I agree it
should probably be a CON, but the dir will be "->" because
the flow is not always  bi-directional.  You have a few ESP
flows that are bi-directional.  That just means that the SPI
was the same for both directions of the ESP flow.

A '*' in a field means that the number/string is bigger than the
width used to print the value.  For ESP traffic, the dport is
the ESP SPI, which is a 32-bit random number.  The dport
is printed as a decimal number so you'll have to translate to
get the hex value.

So argus is a good tool to do loss analysis.  Lets fix the values so that
they are useful.  If you can share the packet files, I'll fix the loss bugs.

Carter

Christoph Badura wrote:

>Hey Carter,
>
>last week I was trying to get a grip on ESP flows with rc.39.
>I captured packet traces with tcpdump, ran "argus -r trace.cap -w trace.argus"
>over them and looked at the results with ra() and racluster().
>This was all done on a i386 laptop, i.e. a little endian machine should it
>matter.
>
>I got some funny looking output.
>
>typical records from "ra -n -s +sloss +dloss -r trace.argus" are:
>
>   16:50:22.072174       F     esp      217.115.67.22          <->      194.127.190.2.20119*     1758       32       216196        25600   CON   31748770 0
>   16:50:22.053544       F     esp      194.127.190.2           ->      217.115.67.22.40081*     5653        0      4297406            0   INT   12723374 0
>
>No, this wasn't initial, there were thousands of packets before that, no
>IKE SA renegotiations and, of course, the SPI's didn't change.  I think
>those should all be listed as "CON" and "<->".
>
>   16:50:24.295125             udp      217.115.67.22.500      <->      194.127.190.2.500           1        1          110          110   CON          0 0
>   16:50:27.468929       F     esp      217.115.67.22          <->      194.127.190.2.20119*     1223       20       158834        16000   CON   23920001 0
>   16:50:27.601715       F     esp      194.127.190.2           ->      217.115.67.22.40081*     4770        0      3756048            0   INT    5410065 0
>   16:50:29.338213             udp      217.115.67.22.500      <->      194.127.190.2.500           1        1          110          110   CON          0 0
>   16:50:32.469992       F     esp      217.115.67.22          <->      194.127.190.2.20119*     2349       42       288150        33600   CON   50176290 0
>   16:50:32.602202       F     esp      194.127.190.2           ->      217.115.67.22.40081*     8576        0      6765172            0   INT    9145791 0
>   16:50:34.399414             udp      217.115.67.22.500      <->      194.127.190.2.500           1        1          110          110   CON          0 0
>   16:50:37.472861       F     esp      217.115.67.22          <->      194.127.190.2.20119*      513       18        64342        14400   CON   11698440 0
>   16:50:37.603317       F     esp      194.127.190.2           ->      217.115.67.22.40081*     1551        0      1217694            0   INT    2259263 0
>   16:50:39.449373             udp      217.115.67.22.500      <-       194.127.190.2.500           0        1            0          110   RSP          0 0
>
>What is the significance of "20119*" and "40081*"?  I think that was
>discussed on the list a while ago, but I can't find it anymore.  The
>relation to the SPIs (0x17e3e6f7 and 0x0bfe0e52) isn't obvious to me.
>
>Note the completely bogus sloss values.
>
>"racluster -n -s +sloss +dloss -r trace.argus" gives:
>
>   16:49:21.850751       F     esp      217.115.67.22          <->      194.127.190.2.20119*    21668     2836      2912456      2268544   CON  257404441 0
>   16:49:21.853440             esp      194.127.190.2           ->      217.115.67.22.40081*    76255        0     58849978            0   INT   67834978 0
>   16:49:22.858836             udp      217.115.67.22.500      <->      194.127.190.2.500          11       16         1210         1760   CON          0 0
>
>This should be one ESP and one UDP flow, I think.  But the behaviour is
>probably consistent with what can be expected from the output of ra().
>
>Sorry, I haven't had time to check out rc.40 yet.  Hopefully tomorrow
>evening.
>
>There's one question I have, though.  As you can see from the F flag,
>these ESP flows have fragments.  Lots of fragments, actually.  They are
>from 1480 byte TCP segments being fragmented on the tunnel entry point.
>What I am currently interested in is trying to figure out how I can use
>Argus (or a different tool) to monitor the packet loss rate of VPN
>tunnels and TCP connections in general. So, assuming I have identified a
>number of argus records with non-empty loss fields (perhaps by splitting
>them into time buckets with rabins()), is there a way to sort of "drill down"
>and get a view of the flows at the IP level?
>
>The idea behind this is, that I think it is easier to get a grip on badly
>performing flows with a tool like argus than with collecting performance
>and packet drop data from really large numbers of routers and switches.
>Many of them will be beyond the local organization's border routers and
>hence not pollable!  And still I'd want to know if my traffic flows going
>off-site are behaving normally or badly because I might have SLAs
>involving them.
>
>I guess your suggestion earlier this year to experiment with monitoring
>the performance of traffic flow for on-line games goes in the same direction.
>
>Maybe this should be moved to a separate thread.
>
>--chris
>
>  
>