possible radium issue
Phillip Deneault
deneault at WPI.EDU
Thu Jul 9 11:08:50 EDT 2009
Ah! Progress!
Technically, my one radium instance connects to dozen's of Argi, but I
split the files back down by $srcid in each directory. When I ran the
command you suggested against a low file, I _should_ be getting only one
srcid, but I do not. In fact, I got 10 different src ids in the same file.
Phil
Carter Bullard wrote:
> Hey Phillip,
> The option is not a problem, so not to worry about it, just wanted to
> make sure that other readers didn't think they were missing something.
>
> So your radium() is attached to a single argus data source? With multiple
> readers of data? If this is the case, we can compare the sequence numbers
> in the flow records to see if we're missing records. Take one of the
> files that
> seems low, and print the whole thing with ra():
> ra -r bad.file -s +1srcid +2seq
>
> The sequence numbers are those put in by the original data source, and so
> if radium() is dropping records, or if rasplit() is disconnecting and
> reconnecting,
> you'll be able to see sequence numbers with gaps.
>
> If you are seeing gaps, any messages from radium() that its having to
> close
> the connections? Radium(), like argus() will kill the transport
> connection, if
> too many records get queued up for the remote rasplit().
>
> Carter
>
>
> On Jul 9, 2009, at 10:17 AM, Phillip Deneault wrote:
>
>> Carter Bullard wrote:
>>> No, this doesn't seem right at all. A couple of suggestions.
>>> Don't use the "-M norep" for this type of aggregation (basically it
>>> just throws away the AGR dsr as the records are being written
>>> out, and in some apps this is great, but not necessary here).
>>
>> Ok, I'll remove it as this issue presses forward.
>>
>>> How are your rasplit()s called, I suspect there may be an issue with
>>> that.
>>
>> /usr/bin/rasplit -d -S <radiumhost> -M time 10m -w
>> /data/argus/slices/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
>> /usr/bin/rasplit -d -S <radiumhost> -M time 1h -w
>> /data/argus/hourlies/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
>> /usr/bin/rasplit -d -S <radiumhost> -M time 1d -w
>> /data/argus/dailies/\$srcid/argus-%m.%d.%Y.out
>>
>>
>>> In most cases, you don't need the hourly and daily rasplit()
>>> processes, because
>>> you can generate both of these from your 10 min split files. All
>>> depends on
>>> whether you want the hourly and daily files updated continuously, or
>>> if you
>>> can get away with updating them, say every 10 minutes.
>>
>> This is true. Except I don't trust either of them right now. :-) If
>> the issue is that there is some strange blocking process which would
>> require me to run only one rasplit, I would do it and hack up
>> everything I'm doing on the programming end. But neither CPU, memory,
>> nor IO suggests this to be the case.
>>
>>> It looks like racluster() is faulting reading one of the files.
>>> When it does that,
>>> the pipe closes down, and your racount() reports just the records it
>>> receives.
>>> Just need to find the bad file, and then try to figure out how it got
>>> corrupted
>>> (at least that is my guess).
>>> what are the totals for each of the individual files in your
>>> example(s) without
>>> the clustering?
>>
>> Here's a breakdown using a few methods. Things mostly make sense
>> except the numbers seem so low by comparison to the locally recorded
>> copies, and the difference from the collection of daily files with the
>> hourlies.
>> The 'Clustering' field is with the '-M norep' off piped to an racount.
>> The WithoutClustering field is just the straight racount. I ran the
>> files individually and totaled them, then in a single command line,
>> and I also ran the hourly file the same way for comparison.
>>
>> I got no errors when running any of these commands on any of these files.
>>
>> File Clustering WithoutClustering
>> 00 37 38
>> 10 85 85
>> 20 138 148
>> 30 43 44
>> 40 104 110
>> 50 254 265
>> -----------------------------------------
>> SumTotal 661 690
>>
>> Total with
>> all 6 files
>> in the '-r' 631 690
>>
>> Hourly file 252 263
>>
>> Can anyone else replicate this behavior?
>>
>> Thanks,
>> Phil
>>
>>
>>> Carter
>>> On Jul 8, 2009, at 4:24 PM, Phillip Deneault wrote:
>>>> I'm running the beta.8 code. I have a single radium instance
>>>> collecting data from dozens of locations and 3 rasplit processes
>>>> connecting to that radium process, one for 10 minute slices, 1 for
>>>> hourlies, and 1 for dailies.
>>>>
>>>> It *seems* as if the data I'm recording is lower than what I should
>>>> have. I say this because I get drastically different counts when I
>>>> check locally recorded data vs. radium recorded data.
>>>>
>>>> Please yell at me if I am doing this wrong, I performed the
>>>> racluster in an attempt to normalize the flow counts a little.
>>>>
>>>> Locally recorded data tallies like this.(logs rotated daily, so I
>>>> picked a convenient hour).
>>>>
>>>> # racluster -t 14 -M norep -r /var/log/argus/argus.out -w - |
>>>> racount -r -
>>>> racount records total_pkts src_pkts dst_pkts
>>>> total_bytes src_bytes dst_bytes
>>>> sum 52385 134978 134813 165
>>>> 9982211 9970637 11574
>>>>
>>>> However, when I run a tally on the hourlies and the slices collected
>>>> by radium, I get two different flow counts, neither of which come
>>>> anywhere close.
>>>>
>>>> (SLICES)
>>>> # racluster -M norep -r argus-07.08.2009-14.50.00.out
>>>> argus-07.08.2009-14.40.00.out argus-07.08.2009-14.30.00.out
>>>> argus-07.08.2009-14.20.00.out argus-07.08.2009-14.10.00.out
>>>> argus-07.08.2009-14.00.00.out -w - | racount -r -
>>>> racount records total_pkts src_pkts dst_pkts
>>>> total_bytes src_bytes dst_bytes
>>>> sum 631 1920 1397 523
>>>> 507980 210286 297694
>>>>
>>>> (HOURLIES)
>>>> # racluster -M norep -r argus-07.08.2009-14.00.00.out -w - | racount
>>>> -r -
>>>> racount records total_pkts src_pkts dst_pkts
>>>> total_bytes src_bytes dst_bytes
>>>> sum 252 447 348 99
>>>> 95012 57022 37990
>>>>
>>>> Is this a bug, or me doing something wrong?
>>>>
>>>> Thanks,
>>>> Phil
>>>>
>>
>
> Carter Bullard
> CEO/President
> QoSient, LLC
> 150 E 57th Street Suite 12D
> New York, New York 10022
>
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
>
>
>
--
--------------------------------------------------------------------
WPI Information Technology will never ask for your password and
you should never give it. http://www.wpi.edu/+infosec/phishing.html
--------------------------------------------------------------------
Phil Deneault Network Security Officer
deneault at wpi.edu Information Security
http://www.wpi.edu/~deneault/ Worcester Polytechnic Institute
--------------------------------------------------------------------
More information about the argus
mailing list