possible radium issue

Carter Bullard carter at qosient.com
Thu Jul 9 10:48:24 EDT 2009


Hey Phillip,
The option is not a problem, so not to worry about it, just wanted to
make sure that other readers didn't think they were missing something.

So your radium() is attached to a single argus data source?  With  
multiple
readers of data?  If this is the case, we can compare the sequence  
numbers
in the flow records to see if we're missing records.  Take one of the  
files that
seems low, and print the whole thing with ra():
    ra -r bad.file -s +1srcid +2seq

The sequence numbers are those put in by the original data source, and  
so
if radium() is dropping records, or if rasplit() is disconnecting and  
reconnecting,
you'll be able to see sequence numbers with gaps.

If you are seeing gaps,  any messages from radium() that its having to  
close
the connections?  Radium(), like argus() will kill the transport  
connection, if
too many records get queued up for the remote rasplit().

Carter


On Jul 9, 2009, at 10:17 AM, Phillip Deneault wrote:

> Carter Bullard wrote:
>> No, this doesn't seem right at all.   A couple of suggestions.
>> Don't use the "-M norep" for this type of aggregation (basically it
>> just throws away the AGR dsr as the records are being written
>> out, and in some apps this is great, but not necessary  here).
>
> Ok, I'll remove it as this issue presses forward.
>
>> How are your rasplit()s called, I suspect there may be an issue  
>> with that.
>
> /usr/bin/rasplit -d -S <radiumhost> -M time 10m -w /data/argus/ 
> slices/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
> /usr/bin/rasplit -d -S <radiumhost> -M time 1h -w /data/argus/ 
> hourlies/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
> /usr/bin/rasplit -d -S <radiumhost> -M time 1d -w /data/argus/ 
> dailies/\$srcid/argus-%m.%d.%Y.out
>
>
>> In most cases, you don't need the hourly and daily rasplit()  
>> processes, because
>> you can generate both of these from your 10 min split files.  All  
>> depends on
>> whether you want the hourly and daily files updated continuously,  
>> or if you
>> can get away with updating them, say every 10 minutes.
>
> This is true.  Except I don't trust either of them right now. :-)   
> If the issue is that there is some strange blocking process which  
> would require me to run only one rasplit, I would do it and hack up  
> everything I'm doing on the programming end.  But neither CPU,  
> memory, nor IO suggests this to be the case.
>
>> It looks like racluster() is faulting reading one of the files.    
>> When it does that,
>> the pipe closes down, and your racount() reports just the records  
>> it receives.
>> Just need to find the bad file, and then try to figure out how it  
>> got corrupted
>> (at least that is my guess).
>> what are the totals for each of the individual files in your  
>> example(s) without
>> the clustering?
>
> Here's a breakdown using a few methods.  Things mostly make sense  
> except the numbers seem so low by comparison to the locally recorded  
> copies, and the difference from the collection of daily files with  
> the hourlies.
> The 'Clustering' field is with the '-M norep' off piped to an  
> racount. The WithoutClustering field is just the straight racount.   
> I ran the files individually and totaled them, then in a single  
> command line, and I also ran the hourly file the same way for  
> comparison.
>
> I got no errors when running any of these commands on any of these  
> files.
>
> File		Clustering	WithoutClustering
> 00		37		38
> 10		85		85
> 20		138		148
> 30		43		44
> 40		104		110
> 50		254		265
> -----------------------------------------
> SumTotal	661		690
>
> Total with
> all 6 files
> in the '-r'	631		690
>
> Hourly file	252		263
>
> Can anyone else replicate this behavior?
>
> Thanks,
> Phil
>
>
>> Carter
>> On Jul 8, 2009, at 4:24 PM, Phillip Deneault wrote:
>>> I'm running the beta.8 code.  I have a single radium instance  
>>> collecting data from dozens of locations and 3 rasplit processes  
>>> connecting to that radium process, one for 10 minute slices, 1 for  
>>> hourlies, and 1 for dailies.
>>>
>>> It *seems* as if the data I'm recording is lower than what I  
>>> should have.  I say this because I get drastically different  
>>> counts when I check locally recorded data vs. radium recorded data.
>>>
>>> Please yell at me if I am doing this wrong, I performed the  
>>> racluster in an attempt to normalize the flow counts a little.
>>>
>>> Locally recorded data tallies like this.(logs rotated daily, so I  
>>> picked a convenient hour).
>>>
>>> # racluster -t 14 -M norep -r /var/log/argus/argus.out -w - |  
>>> racount -r -
>>> racount   records     total_pkts     src_pkts       dst_pkts  
>>> total_bytes        src_bytes          dst_bytes
>>>   sum   52385       134978         134813         165  
>>> 9982211            9970637            11574
>>>
>>> However, when I run a tally on the hourlies and the slices  
>>> collected by radium, I get two different flow counts, neither of  
>>> which come anywhere close.
>>>
>>> (SLICES)
>>> # racluster -M norep -r argus-07.08.2009-14.50.00.out  
>>> argus-07.08.2009-14.40.00.out argus-07.08.2009-14.30.00.out  
>>> argus-07.08.2009-14.20.00.out argus-07.08.2009-14.10.00.out  
>>> argus-07.08.2009-14.00.00.out -w - | racount -r -
>>> racount   records     total_pkts     src_pkts       dst_pkts  
>>> total_bytes        src_bytes          dst_bytes
>>>   sum   631         1920           1397           523  
>>> 507980             210286             297694
>>>
>>> (HOURLIES)
>>> # racluster -M norep -r argus-07.08.2009-14.00.00.out -w - |  
>>> racount -r -
>>> racount   records     total_pkts     src_pkts       dst_pkts  
>>> total_bytes        src_bytes          dst_bytes
>>>   sum   252         447            348            99  
>>> 95012              57022              37990
>>>
>>> Is this a bug, or me doing something wrong?
>>>
>>> Thanks,
>>> Phil
>>>
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090709/324104d8/attachment.bin>


More information about the argus mailing list