[ARGUS] Massively incorrect packet/byte count on one connection

Carter Bullard carter at qosient.com
Tue Jun 22 10:55:29 EDT 2021


Hey Gavin,
The way that it is suppose to work in argus-3.0.8.2+ is that argus uses long long’s (64-bit) to track packet and bytes counts (3.0.7 used ints), and when its time to write the records out on the wire or a disk, it chooses a type that is appropriate for the values in this record (bytes for values < 256, shorts for …) … the argus record structure for the metrics has bits that indicate what size the values are, so that the receiver can blow the values back up to long longs for processing.  Looks like someone isn’t doing the right thing … 

So double checking … all argus components in the data pipeline are the same version ???

If the none of the argus components modify the record in the pipeline, the record written to disk will be the same bits that argus generated, so we maybe able to figure out what is going on ...
If you can share the binary records that show the error, and a little context, I should be able to figure it out …

Carter


> On Jun 22, 2021, at 10:37 AM, Gavin Atkinson <gavin.atkinson at gmail.com> wrote:
> 
> Hi,
> 
> We have argus 3.0.8.2 running on a 20Gbps interface for traffic stats.  Recently it logged this impossible entry:
> 
>          StartTime         LastTime      Flgs  Proto       SrcAddr  Sport  Dir     DstAddr  Dport               TotPkts             TotBytes State
>    05:02:01.159856  05:02:06.160526  * * t       tcp    NN.NN.NN.75.60319   -> NN.NN.NN.162.micro*               471763            960793012   CON
>    05:02:06.160536  05:02:11.159955  * * t       tcp    NN.NN.NN.75.60319   -> NN.NN.NN.162.micro*               426692            980154740   CON
>    05:02:11.165013  05:02:16.164662  * * t       tcp    NN.NN.NN.75.60319   -> NN.NN.NN.162.micro*               247629           3102131695   CON
>    05:02:16.166290  05:02:21.168418  * * t       tcp    NN.NN.NN.75.60319   -> NN.NN.NN.162.micro*  5376178237460198223  4395522059571235538   CON
>    05:02:21.168654  05:02:26.167308  * * t       tcp    NN.NN.NN.75.60319   -> NN.NN.NN.162.micro*                95298           1408335847   CON
> 
> (other log entries are included for context - these will usually always be the same order of magnitude of packets etc).  Those numbers are obviously wrong - this interface has never passed that many bytes or packets, and it works out at an average of less than one byte per packet :)
> 
> The strange thing is that this interface is on an optical fibre tap and identical copies of the stream are fed to two copies of Argus 3.0.8.2, running on two physically separate machines (running different OS), and both machines logged exactly the same numbers.  So it's not something that's happened on the machine (hardware failure etc), but rather appears to be some aspect of the traffic perhaps has tickled a bug?
> 
> Argus is run with "-i bond1 -d -P 561 -U15", with rasplit then writing to files split on five minute boundaries.  The only uncommented options in /etc/argus.conf are:
> ARGUS_FLOW_TYPE="Bidirectional"
> ARGUS_FLOW_KEY="CLASSIC_5_TUPLE"
> ARGUS_FLOW_STATUS_INTERVAL=5
> ARGUS_MAR_STATUS_INTERVAL=60
> 
> Has anybody seen similar before?  I'm assuming there isn't enough data in the saved ra files to reconstruct how this could have happened, but can provide cut down copies of them if useful.  FWIW I'm not aware of it ever happening to us before today.
> 
> Thanks,
> 
> Gavin



More information about the argus mailing list