possible radium issue
Carter Bullard
carter at qosient.com
Thu Jul 9 10:48:24 EDT 2009
Hey Phillip,
The option is not a problem, so not to worry about it, just wanted to
make sure that other readers didn't think they were missing something.
So your radium() is attached to a single argus data source? With
multiple
readers of data? If this is the case, we can compare the sequence
numbers
in the flow records to see if we're missing records. Take one of the
files that
seems low, and print the whole thing with ra():
ra -r bad.file -s +1srcid +2seq
The sequence numbers are those put in by the original data source, and
so
if radium() is dropping records, or if rasplit() is disconnecting and
reconnecting,
you'll be able to see sequence numbers with gaps.
If you are seeing gaps, any messages from radium() that its having to
close
the connections? Radium(), like argus() will kill the transport
connection, if
too many records get queued up for the remote rasplit().
Carter
On Jul 9, 2009, at 10:17 AM, Phillip Deneault wrote:
> Carter Bullard wrote:
>> No, this doesn't seem right at all. A couple of suggestions.
>> Don't use the "-M norep" for this type of aggregation (basically it
>> just throws away the AGR dsr as the records are being written
>> out, and in some apps this is great, but not necessary here).
>
> Ok, I'll remove it as this issue presses forward.
>
>> How are your rasplit()s called, I suspect there may be an issue
>> with that.
>
> /usr/bin/rasplit -d -S <radiumhost> -M time 10m -w /data/argus/
> slices/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
> /usr/bin/rasplit -d -S <radiumhost> -M time 1h -w /data/argus/
> hourlies/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
> /usr/bin/rasplit -d -S <radiumhost> -M time 1d -w /data/argus/
> dailies/\$srcid/argus-%m.%d.%Y.out
>
>
>> In most cases, you don't need the hourly and daily rasplit()
>> processes, because
>> you can generate both of these from your 10 min split files. All
>> depends on
>> whether you want the hourly and daily files updated continuously,
>> or if you
>> can get away with updating them, say every 10 minutes.
>
> This is true. Except I don't trust either of them right now. :-)
> If the issue is that there is some strange blocking process which
> would require me to run only one rasplit, I would do it and hack up
> everything I'm doing on the programming end. But neither CPU,
> memory, nor IO suggests this to be the case.
>
>> It looks like racluster() is faulting reading one of the files.
>> When it does that,
>> the pipe closes down, and your racount() reports just the records
>> it receives.
>> Just need to find the bad file, and then try to figure out how it
>> got corrupted
>> (at least that is my guess).
>> what are the totals for each of the individual files in your
>> example(s) without
>> the clustering?
>
> Here's a breakdown using a few methods. Things mostly make sense
> except the numbers seem so low by comparison to the locally recorded
> copies, and the difference from the collection of daily files with
> the hourlies.
> The 'Clustering' field is with the '-M norep' off piped to an
> racount. The WithoutClustering field is just the straight racount.
> I ran the files individually and totaled them, then in a single
> command line, and I also ran the hourly file the same way for
> comparison.
>
> I got no errors when running any of these commands on any of these
> files.
>
> File Clustering WithoutClustering
> 00 37 38
> 10 85 85
> 20 138 148
> 30 43 44
> 40 104 110
> 50 254 265
> -----------------------------------------
> SumTotal 661 690
>
> Total with
> all 6 files
> in the '-r' 631 690
>
> Hourly file 252 263
>
> Can anyone else replicate this behavior?
>
> Thanks,
> Phil
>
>
>> Carter
>> On Jul 8, 2009, at 4:24 PM, Phillip Deneault wrote:
>>> I'm running the beta.8 code. I have a single radium instance
>>> collecting data from dozens of locations and 3 rasplit processes
>>> connecting to that radium process, one for 10 minute slices, 1 for
>>> hourlies, and 1 for dailies.
>>>
>>> It *seems* as if the data I'm recording is lower than what I
>>> should have. I say this because I get drastically different
>>> counts when I check locally recorded data vs. radium recorded data.
>>>
>>> Please yell at me if I am doing this wrong, I performed the
>>> racluster in an attempt to normalize the flow counts a little.
>>>
>>> Locally recorded data tallies like this.(logs rotated daily, so I
>>> picked a convenient hour).
>>>
>>> # racluster -t 14 -M norep -r /var/log/argus/argus.out -w - |
>>> racount -r -
>>> racount records total_pkts src_pkts dst_pkts
>>> total_bytes src_bytes dst_bytes
>>> sum 52385 134978 134813 165
>>> 9982211 9970637 11574
>>>
>>> However, when I run a tally on the hourlies and the slices
>>> collected by radium, I get two different flow counts, neither of
>>> which come anywhere close.
>>>
>>> (SLICES)
>>> # racluster -M norep -r argus-07.08.2009-14.50.00.out
>>> argus-07.08.2009-14.40.00.out argus-07.08.2009-14.30.00.out
>>> argus-07.08.2009-14.20.00.out argus-07.08.2009-14.10.00.out
>>> argus-07.08.2009-14.00.00.out -w - | racount -r -
>>> racount records total_pkts src_pkts dst_pkts
>>> total_bytes src_bytes dst_bytes
>>> sum 631 1920 1397 523
>>> 507980 210286 297694
>>>
>>> (HOURLIES)
>>> # racluster -M norep -r argus-07.08.2009-14.00.00.out -w - |
>>> racount -r -
>>> racount records total_pkts src_pkts dst_pkts
>>> total_bytes src_bytes dst_bytes
>>> sum 252 447 348 99
>>> 95012 57022 37990
>>>
>>> Is this a bug, or me doing something wrong?
>>>
>>> Thanks,
>>> Phil
>>>
>
Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20090709/324104d8/attachment.bin>
More information about the argus
mailing list