possible radium issue

Phillip Deneault deneault at WPI.EDU
Thu Jul 9 11:08:50 EDT 2009


Ah!  Progress!

Technically, my one radium instance connects to dozen's of Argi, but I 
split the files back down by $srcid in each directory.  When I ran the 
command you suggested against a low file, I _should_ be getting only one 
srcid, but I do not.  In fact, I got 10 different src ids in the same file.

Phil

Carter Bullard wrote:
> Hey Phillip,
> The option is not a problem, so not to worry about it, just wanted to
> make sure that other readers didn't think they were missing something.
> 
> So your radium() is attached to a single argus data source?  With multiple
> readers of data?  If this is the case, we can compare the sequence numbers
> in the flow records to see if we're missing records.  Take one of the 
> files that
> seems low, and print the whole thing with ra():
>    ra -r bad.file -s +1srcid +2seq
> 
> The sequence numbers are those put in by the original data source, and so
> if radium() is dropping records, or if rasplit() is disconnecting and 
> reconnecting,
> you'll be able to see sequence numbers with gaps.
> 
> If you are seeing gaps,  any messages from radium() that its having to 
> close
> the connections?  Radium(), like argus() will kill the transport 
> connection, if
> too many records get queued up for the remote rasplit().
> 
> Carter
> 
> 
> On Jul 9, 2009, at 10:17 AM, Phillip Deneault wrote:
> 
>> Carter Bullard wrote:
>>> No, this doesn't seem right at all.   A couple of suggestions.
>>> Don't use the "-M norep" for this type of aggregation (basically it
>>> just throws away the AGR dsr as the records are being written
>>> out, and in some apps this is great, but not necessary  here).
>>
>> Ok, I'll remove it as this issue presses forward.
>>
>>> How are your rasplit()s called, I suspect there may be an issue with 
>>> that.
>>
>> /usr/bin/rasplit -d -S <radiumhost> -M time 10m -w 
>> /data/argus/slices/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
>> /usr/bin/rasplit -d -S <radiumhost> -M time 1h -w 
>> /data/argus/hourlies/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
>> /usr/bin/rasplit -d -S <radiumhost> -M time 1d -w 
>> /data/argus/dailies/\$srcid/argus-%m.%d.%Y.out
>>
>>
>>> In most cases, you don't need the hourly and daily rasplit() 
>>> processes, because
>>> you can generate both of these from your 10 min split files.  All 
>>> depends on
>>> whether you want the hourly and daily files updated continuously, or 
>>> if you
>>> can get away with updating them, say every 10 minutes.
>>
>> This is true.  Except I don't trust either of them right now. :-)  If 
>> the issue is that there is some strange blocking process which would 
>> require me to run only one rasplit, I would do it and hack up 
>> everything I'm doing on the programming end.  But neither CPU, memory, 
>> nor IO suggests this to be the case.
>>
>>> It looks like racluster() is faulting reading one of the files.   
>>> When it does that,
>>> the pipe closes down, and your racount() reports just the records it 
>>> receives.
>>> Just need to find the bad file, and then try to figure out how it got 
>>> corrupted
>>> (at least that is my guess).
>>> what are the totals for each of the individual files in your 
>>> example(s) without
>>> the clustering?
>>
>> Here's a breakdown using a few methods.  Things mostly make sense 
>> except the numbers seem so low by comparison to the locally recorded 
>> copies, and the difference from the collection of daily files with the 
>> hourlies.
>> The 'Clustering' field is with the '-M norep' off piped to an racount. 
>> The WithoutClustering field is just the straight racount.  I ran the 
>> files individually and totaled them, then in a single command line, 
>> and I also ran the hourly file the same way for comparison.
>>
>> I got no errors when running any of these commands on any of these files.
>>
>> File        Clustering    WithoutClustering
>> 00        37        38
>> 10        85        85
>> 20        138        148
>> 30        43        44
>> 40        104        110
>> 50        254        265
>> -----------------------------------------
>> SumTotal    661        690
>>
>> Total with
>> all 6 files
>> in the '-r'    631        690
>>
>> Hourly file    252        263
>>
>> Can anyone else replicate this behavior?
>>
>> Thanks,
>> Phil
>>
>>
>>> Carter
>>> On Jul 8, 2009, at 4:24 PM, Phillip Deneault wrote:
>>>> I'm running the beta.8 code.  I have a single radium instance 
>>>> collecting data from dozens of locations and 3 rasplit processes 
>>>> connecting to that radium process, one for 10 minute slices, 1 for 
>>>> hourlies, and 1 for dailies.
>>>>
>>>> It *seems* as if the data I'm recording is lower than what I should 
>>>> have.  I say this because I get drastically different counts when I 
>>>> check locally recorded data vs. radium recorded data.
>>>>
>>>> Please yell at me if I am doing this wrong, I performed the 
>>>> racluster in an attempt to normalize the flow counts a little.
>>>>
>>>> Locally recorded data tallies like this.(logs rotated daily, so I 
>>>> picked a convenient hour).
>>>>
>>>> # racluster -t 14 -M norep -r /var/log/argus/argus.out -w - | 
>>>> racount -r -
>>>> racount   records     total_pkts     src_pkts       dst_pkts 
>>>> total_bytes        src_bytes          dst_bytes
>>>>   sum   52385       134978         134813         165 
>>>> 9982211            9970637            11574
>>>>
>>>> However, when I run a tally on the hourlies and the slices collected 
>>>> by radium, I get two different flow counts, neither of which come 
>>>> anywhere close.
>>>>
>>>> (SLICES)
>>>> # racluster -M norep -r argus-07.08.2009-14.50.00.out 
>>>> argus-07.08.2009-14.40.00.out argus-07.08.2009-14.30.00.out 
>>>> argus-07.08.2009-14.20.00.out argus-07.08.2009-14.10.00.out 
>>>> argus-07.08.2009-14.00.00.out -w - | racount -r -
>>>> racount   records     total_pkts     src_pkts       dst_pkts 
>>>> total_bytes        src_bytes          dst_bytes
>>>>   sum   631         1920           1397           523 
>>>> 507980             210286             297694
>>>>
>>>> (HOURLIES)
>>>> # racluster -M norep -r argus-07.08.2009-14.00.00.out -w - | racount 
>>>> -r -
>>>> racount   records     total_pkts     src_pkts       dst_pkts 
>>>> total_bytes        src_bytes          dst_bytes
>>>>   sum   252         447            348            99 
>>>> 95012              57022              37990
>>>>
>>>> Is this a bug, or me doing something wrong?
>>>>
>>>> Thanks,
>>>> Phil
>>>>
>>
> 
> Carter Bullard
> CEO/President
> QoSient, LLC
> 150 E 57th Street Suite 12D
> New York, New York  10022
> 
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
> 
> 
> 

-- 
--------------------------------------------------------------------
   WPI Information Technology will never ask for your password and
you should never give it.  http://www.wpi.edu/+infosec/phishing.html
--------------------------------------------------------------------
Phil Deneault                               Network Security Officer
deneault at wpi.edu                                Information Security
http://www.wpi.edu/~deneault/        Worcester Polytechnic Institute
--------------------------------------------------------------------



More information about the argus mailing list