[ARGUS] argus occasionally show unrealistic mount of packet counter

Ming Fu via Argus-info argus-info at lists.andrew.cmu.edu
Wed Nov 8 12:58:31 EST 2023


Hi Carter,

Is there is a way I can help diagnose this? I believe it would be difficult to get approval.

Regards,
Ming

-----Original Message-----
From: Carter Bullard <carter at qosient.com> 
Sent: Wednesday, November 8, 2023 12:41 PM
To: Ming Fu <Ming.Fu at esentire.com>
Cc: Argus <argus-info at lists.andrew.cmu.edu>
Subject: Re: [ARGUS] argus occasionally show unrealistic mount of packet counter

Hey Ming,
Is it possible to share an argus data file that has the problem ?  I can inspect the original flow to see if it’s an alignment problem or a data type issue.

Carter


> On Nov 7, 2023, at 10:20 AM, Ming Fu <Ming.Fu at esentire.com> wrote:
> 
> Hi Carter,
> 
> Let me know if there is anything I can do to help identify the root of the problem.
> 
> Regards,
> Ming
> 
> -----Original Message-----
> From: Carter Bullard <carter at qosient.com> 
> Sent: Monday, November 6, 2023 8:27 AM
> To: Ming Fu <Ming.Fu at esentire.com>
> Cc: Argus <argus-info at lists.andrew.cmu.edu>
> Subject: Re: [ARGUS] argus occasionally show unrealistic mount of packet counter
> 
> Hey Ming,
> Corrupted records can cause lots of problems, and the ra* programs try to correct for a corrupted stream when it realizes it.
> So it can be slightly complicated when this type of problem happens.  Need to zoom on whether its argus or a ra* program that is the issue.
> 
> There are a few places where a record could be corrupted …  Assuing argus is ok, how are you collecting your flow records ?
> Is argus writing to a file, or are you using a tool like radium, or rasplit to put the records in a file ???
> 
> Is there only one argus source or multiple sources … Is there only one writer to the file ?
> 
> When you rotate the archive, is that a move, or are you processing the file with a ra* program ???
> And are all the programs argus/radium/ra* the latest versions ???
> 
> We’ll figure it out.
> 
> Carter
> 
> 
>> On Nov 6, 2023, at 8:14 AM, Ming Fu <Ming.Fu at esentire.com> wrote:
>> 
>> Hi Carter,
>> 
>> I noticed one thing that might give us a hint. In a few cases when the packets counter is unrealistically high, the one of the byte counters is 0. 
>> 
>> I tried to lower the interval to 10 seconds, the problem persists.
>> 
>> Regards,
>> Ming
>> 
>> -----Original Message-----
>> From: Ming Fu 
>> Sent: Saturday, November 4, 2023 10:06 AM
>> To: Carter Bullard <carter at qosient.com>
>> Cc: Argus <argus-info at lists.andrew.cmu.edu>
>> Subject: RE: [ARGUS] argus occasionally show unrealistic mount of packet counter
>> 
>> Hey Carter,
>> 
>> I extracted the tcp flow of the high packets counter connection from the packet capture archive. The packet capturer runs on the same host as arguses, so they read the same incoming interface. The total packets count of that flow, a minute or 2 before and after the problem period, is a very reasonable number. The connection is TCP and have "IP Don't Fragment" on. The traffic looks like a large data transfer, majority of the payload is 1368 in length with no fragmentation.
>> 
>> Consider I have an argus 3.0.8.1 running on the same host at the same time as the argus 3.0.8.3,  the argus 3.0.8.1 didn't show the exceptionally high packet count during the period. This is not to say the argus 3.0.8.1 is free of the problem. We originally noticed the problem on 3.0.8.1 and started to consider upgrade. The two arguses looking at the same traffic and only one of them show the problem, could this suggests a random data alignment problem or a race condition between threads?
>> 	
>> I did suspect the archive rotation be a fact. On our setup, the archive rotation is a cron job scheduled at every 10th minute of the hour. The problem happened at 7-8th minute into the hour, so it is unlikely a rotation boundary issue.
>> 
>> Regards,
>> Ming
>> 
>> -----Original Message-----
>> From: Carter Bullard <carter at qosient.com>
>> Sent: Friday, November 3, 2023 11:50 PM
>> To: Ming Fu <Ming.Fu at esentire.com>
>> Cc: Argus <argus-info at lists.andrew.cmu.edu>
>> Subject: Re: [ARGUS] argus occasionally show unrealistic mount of packet counter
>> 
>> Hey Ming,
>> Are you seeing any fragments in your flows ???   This is shown with a ‘F’ or ‘f’ in the status field.
>> How fragments contribute to the packet counts of the parent flow is a bit complicated with some interesting timing issues, so there could be a bug there … (just guessing) … Carter
>> 
>> 
>>> On Nov 3, 2023, at 8:18 PM, Ming Fu <Ming.Fu at esentire.com> wrote:
>>> 
>>> Hi Carter,
>>> 
>>> I tried to recreate the problem in a controlled environment, but I can't make it to happen in a test setup. Except high load traffic generator, I tried:
>>> 1. long lasting flow to make sequence number wraps 2. double and 
>>> triple replay the same pcap to make a lot of duplicated packets.
>>> 3. randomize packet order in pcap file so there are a lot of out of 
>>> order packets 4. combine 2 and 3.
>>> 
>>> However, we do have a few production sites that this situation happens. The frequency varies from 1-2 times a day to a few times in a year. 
>>> 
>>> I can try to use a shorter interval. But I would expect the byte count to be affected, not packet counter. Just my guess.
>>> Does shorter interval cause the archive to be larger in size?
>>> 
>>> Regards,
>>> Ming
>>> 
>>> 
>>> -----Original Message-----
>>> From: Carter Bullard <carter at qosient.com>
>>> Sent: Friday, November 3, 2023 5:21 PM
>>> To: Argus <argus-info at lists.andrew.cmu.edu>; Ming Fu 
>>> <Ming.Fu at esentire.com>
>>> Subject: Re: [ARGUS] argus occasionally show unrealistic mount of 
>>> packet counter
>>> 
>>> Hey Ming,
>>> These look pretty terrible … normally you see bad values for the byte counts as these are derived from the TCP sequence numbers in some situations, but the interesting thing is that in these records (if I’m reading them right) the problem is in the packet counts, which are always observed, rather than derived.
>>> 
>>> The biggest problem is sequence number turnover, and with you generating flow records every 30 seconds, you can experience sequence turnover during your status interval.
>>> I would shorten the flow generation status interval to 5 seconds, and then look to see if it doesn’t go away.  If it does, I would go with 5 seconds … if not let’s debug further.
>>> 
>>> Carter
>>> 
>>> 
>>>> On Nov 3, 2023, at 4:00 PM, Ming Fu via Argus-info <argus-info at lists.andrew.cmu.edu> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> We noticed that there are occasional cases when argus archive show unrealistically high amount of packets counter. 
>>>> 
>>>> Here is an example:
>>>> --------------------------------------------------------
>>>> ra-3.0.8.3 -L -1 -c ' ' -n -s
>>>> stime,ltime,saddr,daddr,proto,spkts,dpkts,sport,dport,sbytes,dbytes
>>>> -r /path/to/archive/argus.*| grep tcp | awk '{ print $6+$7, $0 }'| 
>>>> sort -n -r| head
>>>> 14988828386791653376 22:07:48.757785 22:08:18.758381 10.100.250.137
>>>> 10.63.36.11 tcp 67108864 14988828386724545280 51380 2051 0
>>>> 288793326608450864
>>>> 9251874556556148736 22:07:18.731168 22:07:48.757773 10.100.250.137
>>>> 10.63.36.11 tcp 2939797658325221376 6312076898230927432 51380 2051
>>>> 288230378135882496 144115188126187552
>>>> 8200002023161069568 22:08:18.758494 22:08:48.842602 10.100.250.137
>>>> 10.63.36.11 tcp 7911771646942248960 288230376218820608 51380 2051
>>>> 664144640 0
>>>> 2932566 22:15:08.701928 22:15:38.735385 10.100.250.137 10.63.36.11 
>>>> tcp 2746314 186252 51476 2051 3829903796 26056252
>>>> 2902567 22:25:14.426295 22:25:44.470556 10.100.250.137 10.63.36.11 
>>>> tcp 2754226 148341 51680 2051 3937714540 18801530
>>>> 2758335 22:21:20.908170 22:21:50.932300 10.100.250.137 10.63.36.11 
>>>> tcp 2457196 301139 51588 2051 3327733318 45132426
>>>> 2679080 22:06:48.672480 22:07:18.731156 10.100.250.137 10.63.36.11 
>>>> tcp 2499495 179585 51380 2051 3583055390 21369210
>>>> 2557546 22:06:10.561644 22:06:35.690115 10.100.250.137 10.63.36.11 
>>>> tcp 2426367 131179 51354 2051 3478126178 15642994
>>>> 2443147 21:18:43.083593 21:19:11.514668 10.100.250.137 10.63.36.11 
>>>> tcp 2291548 151599 49712 2051 3276577321 26749882
>>>> 2068048 22:11:23.654426 22:11:53.660680 10.100.250.137 10.63.36.11 
>>>> tcp 1918509 149539 51450 2051 2614658170 24009070
>>>> -------------------------------------------------------
>>>> We ran 3.0.8.1 and 3.0.8.3 in parallel on the same input stream. The problem can happen in both versions. One interesting observation is the problem does not happen at the same time for the two version of argus. However, they do always happen on high volume connections. The incoming stream is ~ 2Gbits/s, with fair bit of duplicated packets. The argus does skip a small portion of the traffic (2%) when the load is high. The stream is full size ethernet traffic. The average packet size is over 1000 bytes per packet.
>>>> 
>>>> Any suggestion on how to debug further?
>>>> Regards,
>>>> Ming
>>>> 
>>>> _______________________________________________
>>>> argus mailing list
>>>> argus at qosient.com
>>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpai
>>>> rlist1.pair.net%2Fmailman%2Flistinfo%2Fargus&data=05%7C01%7CMing.Fu%4
>>>> 0esentire.com%7Ca9d3535314e64918305108dbdce9340e%7Ceee603eaf9d24d258c
>>>> 8c4871f28f7767%7C1%7C0%7C638346666391768575%7CUnknown%7CTWFpbGZsb3d8e
>>>> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
>>>> 000%7C%7C%7C&sdata=TMkKsi23c7QBuQzI3E2aVl5DPjtkv9KHGZWrpke4bYg%3D&res
>>>> erved=0
>>> 
>> 
> 



More information about the argus mailing list