Problem with byte-swapped IP addresses

Wed Mar 10 10:48:56 EST 2010

Hey Martijn,
I agree, I don't think argus is too slow.  I'm thinking that there are two possibilities,
   1. the snaplen is too short for the encapsulations that argus is running into, and we
       we are corrupting the pf_ring by modifying data in the ring buffer.

   2. there is an issue with libpcap/pf_ring ?  

We are not doing anything special when  a pf_ring is being used.  should we use
pf_ring() calls rather than the "legacy"  libpcap calls?

The current argus is libpcap only.  We try to use non-blocking reads, and select(),
if available, to see if there are packets to read from the pcap_fd.   Not sure if a 
pf_ring based libpcap interface is selectable.

When the interface is not selectable, if we go to read some packets, and there aren't
any there, we go to sleep for 25 mSec.  This is an arbitrary number, and maybe a
problem for the pf_ring?  Shouldn't be but you never know.

In the next round, we're going to add the multi-threaded ArgusSource.c to argus,
which means we're going to spawn a thread to read the packets, a thread per
interface, and do a better job with scheduling.  In order to do a good job with this,
I'll be adding DAG specific routines, and I can add pf_ring specific routines as well.
If that would be a good idea.

One thing that would be useful, if you still have the packet files, what type of packet
was the first one that was lost, in the tcpdump capture (packet 1 of the 3846 other packets).
If it was non-IP or something other than TCP, that maybe a clue.

Carter

On Mar 10, 2010, at 8:40 AM, Martijn van Oosterhout wrote:

> On Tue, Mar 9, 2010 at 3:57 AM, Peter Van Epp <vanepp at sfu.ca> wrote:
>>> That's an idea. Unfortunately I don't see a simple way to determine if
>>> argus is dropping many packets or not. We've configured argus to have
>>> a ring-buffer of 16384 packets. You can make it bigger, but if argus
>>> isn't keeping up then it doesn't really matter how big you make it.
>>> argus isn't using 100% CPU, but maybe that's a lie.
>> 
>>        Check syslog for complaints from argus. If it is having trouble keeping
>> up (at least in the output task) it will complain about queue sizes to syslog.
> 
> Never seen anything from argus in syslog. It's not using 100% CPU, so
> I don't think it's argus being slow.
> 
>>        This suggests to me that pf-ring over ran the circular buffer with
>> bad results. It looks to perhaps have gone back to the start (or screwed up
>> its pointers) and lost a buffer full of data, probably because argus wasn't
>> getting it out quick enough. What snap length are you using? Are you collecting
>> user data or just header info? If you are collecting user data backing down
>> to just header info (128 bytes or so) may help by reducing the load.
> 
> The snap length is the argus default, I suppose 78-bytes or so? The
> tcpdump was running with a snaplen of 2000, so it wrote much more data
> and didn't lose anything. (or at least, a lot less)
> 
> About the "54825 bytes missing" message, that just the difference
> between the length and it's byte swapped value. It's compaining the IP
> header says there was more data than the PCAP header says.
> 
>>        If the problem is the buffer being overwritten it may be that the
>> kernel is filling the packet with new data (and thus some non swapped and
>> some swapped data) at the same time argus ir reading that data. The result
>> will be very confused as the data will be changing even as the argus is
>> writing the buffer out to the file and decoding the headers (which will be
>> changing under it even as it tries to decode them likely).
> 
> Well, the ringbuffer shouldn't be wrapping like that. Also, 3800
> packets dropped is odd, given that the ringbuffer is much larger than
> that. I would have expected 16384 dropped if the ring-buffer looped.
> 
>>     Some more configuration questions: is the argus machine also archiving
>> the data to disk locally or is it writing to a socket (with no or very little
>> other than update local disk traffic)? At high volumes (about 30 to 50 megabits
>> per second on older machines, don't know about current gen) packet loss occurs
>> if you are writing to local disk so the two machine layout works better.
> 
> It's archiving locally, and there's plenty of write traffic locally
> (for the tcpdump storage), around 70-90 MB/s. But we've tuned the
> system for that, so that shouldn't be a problem. And tcpdump is
> running parallel to this and not dropping packets, remember.
> 
> I'll test the version of argus you've provided.
> 
> Thanks again,
> -- 
> Martijn van Oosterhout <kleptog at gmail.com> http://svana.org/kleptog/
> 

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20100310/a63f2884/attachment.bin>