little endian bug in 2.x record parsing in new clients

Carter Bullard carter at qosient.com
Mon Mar 16 14:31:09 EDT 2009


Jason,
Very confusing, why do you run client software on the bivio?
Cater

On Mar 16, 2009, at 2:02 PM, Jason Carr wrote:

> Yep, sorry I meant -S host:port.
>
> What I'm trying to do is to store the incoming data stream on disk.  
> Currently we have the files up into the directory structure like this:
>
> /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>
> And split it up every 5 minutes.  So we get 12 5 minute files in  
> each hour.  This works out well for us since they are much smaller  
> files than running through an hour or days worth of data.
>
> -D 4 output is this on radium:
>
> *a ton of other lines*
> radium[19615.50999a4000000000]: 13:34:16.865875 ArgusWriteOutSocket  
> (0xacd0a0, 0xacec40) 0 records waiting. returning 328
> radium[19615.e096fd4f7e7f0000]: 13:34:16.866297  
> ArgusReadStreamSocket (0x4ff34010) read 400 bytes
> radium[19615.e096fd4f7e7f0000]: 13:34:16.866321  
> ArgusCopyRecordStruct (0x4ff34600) retn 0x4801f310
> radium[19615.e096fd4f7e7f0000]: 13:34:16.866336 RaProcessRecord  
> (0x4ff96010, 0x4ff34600) returning
> radium[19615.50999a4000000000]: 13:34:16.866373  
> ArgusCopyRecordStruct (0x4801f310) retn 0xb06310
> radium[19615.50999a4000000000]: 13:34:16.866392 ArgusGenerateRecord  
> (0xb06310, 0) old len 400 new len 380
> radium[19615.50999a4000000000]: 13:34:16.866411 ArgusWriteOutSocket  
> (0xacd0a0, 0xacec40) 0 records waiting. returning 400
> radium[19615.e096fd4f7e7f0000]: 13:34:16.868431  
> ArgusReadStreamSocket (0x4ff34010) read 100 bytes
>
> I noticed that the file wasn't moved when that happened.  I'm  
> starting to think it has nothing to do with the file moving.
>
> I upgraded the bivio software from "argus-clients-3.0.2" to "argus- 
> clients-3.0.2.beta.2" and it's consistently crashing now.  3.0.2  
> never crashed previously.
>
> *** glibc detected *** radium: double free or corruption (out):  
> 0x3350a008 ***
> ======= Backtrace: =========
> /lib/libc.so.6[0xfd4a964]
> /lib/libc.so.6(__libc_free+0xc8)[0xfd4ded8]
> radium[0x1004d500]
> radium[0x10055ac4]
> radium[0x10062ebc]
> /lib/libpthread.so.0[0xfe55994]
> /lib/libc.so.6(__clone+0x84)[0xfdb7794]
>
> - Jason
>
> On 3/16/09 12:50 PM, Carter Bullard wrote:
>> Hey Jason,
>> Its not "-R host:port" its "-S host:port". I'm sure that was a slip?
>> So things look like they are working for rasplit() then, and with
>> your Bivio box.
>>
>> So I cannot replicate your problem on any machine, and I've
>> tested it on 8 different machines now. I do not recommend
>> radium writing out to a file when there are so many other ways
>> to manage the data streams. Why are you doing this? or rather
>> What are you trying to do?
>>
>> There are a lot of ways to misconfigure a radium. Having it
>> connect to itself is the common bad thing, and we try to catch
>> this, but there are clever ways of getting around that ;o)
>>
>> Usually radium doesn't update a moved file, because there
>> are no new records coming in. It doesn't know to do anything
>> until a data record shows up, so more than likely your server
>> connection is poorly configured.
>>
>> The best way to see what's going on, is to run with something
>> like "-D 4" to see the debug information, to assure that you
>> are attaching properly, that data is coming in, and that it
>> see's that the file is gone.
>>
>> So run with -D 4 and then delete the file, and see what it sez.
>>
>> Carter
>>
>>
>> On Mar 16, 2009, at 11:36 AM, Jason Carr wrote:
>>
>>> Carter,
>>>
>>> For me, rasplit only wrote one file for 5 minutes after giving  
>>> rasplit
>>> the -R x.x.x.x:561 option. It doesn't exit after the 5 minutes
>>> anymore, not sure why though. Is there a way to do that?
>>>
>>> I still have the issue of radium not letting go of the file when  
>>> it's
>>> been moved.
>>>
>>> - Jason
>>>
>>> On 3/12/09 12:51 PM, Carter Bullard wrote:
>>>> Hey Jason,
>>>> Hmmmm, so I can't replicate your problem here on any machine.
>>>> If you can run rasplit() under gdb, (and you can, of course  
>>>> generate
>>>> files with a shorter granularity), maybe you will catch the  
>>>> rasplit()
>>>> exiting. Be sure and ./configure and compile with these tag files
>>>> in the root directory:
>>>>
>>>> % touch .devel .debug
>>>> % ./configure;make clean;make
>>>> % gdb bin/rasplit
>>>>
>>>> For rasplit() do you have write permission to the entire directory
>>>> structure that you are trying to write to?
>>>>
>>>> For rastream(), you do need the $srcid in the path, at least  
>>>> right now
>>>> you do, so you should add it.
>>>>
>>>> rastream ....... -w /data/argus/core/\$srcid/%Y/%m/%d/ 
>>>> argus.........
>>>>
>>>> Carter
>>>>
>>>>
>>>>
>>>> On Mar 12, 2009, at 1:38 AM, Jason Carr wrote:
>>>>
>>>>> I'm running all of this on a x86_64 Linux box.
>>>>>
>>>>> When I run rasplit like this:
>>>>>
>>>>> # rasplit -S argus-source:561 -M time 5m -w
>>>>> /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>>
>>>>> It only creates one 5 minute file and then exits. Can I make it  
>>>>> just
>>>>> keep listening and keep on writing the 5 minute files?
>>>>>
>>>>> rastream segfaults when I run it:
>>>>>
>>>>> # rastream -S argus-source:561 -M time 5m -w
>>>>> '/data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S' -B 15s -f
>>>>> ~/argus/test.sh
>>>>> rastream[5979]: 01:34:40.205530 output string requires $srcid
>>>>> Segmentation fault
>>>>>
>>>>> We've historically always used radium to write to a file and  
>>>>> move the
>>>>> file every 5 minutes. radium has always just recreates the  
>>>>> output file
>>>>> once it's been moved. It is a little bit of a hack and if
>>>>> rastream/rasplit already does it, that would be awesome. I'll use
>>>>> either at this point :)
>>>>>
>>>>> - Jason
>>>>>
>>>>>
>>>>> Carter Bullard wrote:
>>>>>> Hey Jason,
>>>>>> I'm answering on the mailing list, so we capture the thread.
>>>>>>
>>>>>> Glad to hear that your argus-2.x data on your Bivio box is
>>>>>> working well for you. Sorry about your radium() file behavior.
>>>>>>
>>>>>> I still can't duplicate your error, but I'll try again later  
>>>>>> today.
>>>>>> What kind of machine is your radium() running on?
>>>>>>
>>>>>> To get rasplit to generate the files run:
>>>>>> rasplit -S radium -M time 5m \
>>>>>> -w /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>>>
>>>>>> you can run it as a daemon with the "-d" option. Some like to run
>>>>>> it from a script that respawns it if it dies, but its been a  
>>>>>> really
>>>>>> stable
>>>>>> program.
>>>>>>
>>>>>> The reason you want to use rasplit(), is that it will actually  
>>>>>> clip
>>>>>> argus records to ensure that all the argus data for a  
>>>>>> particular time
>>>>>> frame is only in the specific timed file. So if a record starts  
>>>>>> at
>>>>>> 12:04:12.345127 and ends at 12:06:12.363131, rasplit will clip
>>>>>> that record into 2 records with timestamps:
>>>>>> 12:04:12.345127 - 12:05:00.000000
>>>>>> 12:05:00.000000 - 12:06:12.363131
>>>>>>
>>>>>> This makes it so you can process, graph the files, and you don't
>>>>>> have to worry about whether you've got all the data or not.
>>>>>>
>>>>>> Argus does a good job keeping its output records in time
>>>>>> order, but if you're processing multiple streams of data, the
>>>>>> time order can get a bit skewed. So that when you pull the file
>>>>>> out from underneath radium() you can still get records that
>>>>>> have data that should be in another file.
>>>>>>
>>>>>> To compress the data, have a cron job compress the files say
>>>>>> an hour later.
>>>>>>
>>>>>> If you want to experiment, use rastream(), which is rasplit()  
>>>>>> with
>>>>>> the addition feature to run a shell script against files after  
>>>>>> they
>>>>>> close.
>>>>>> You need to tell rastream() how long to wait before running the
>>>>>> script,
>>>>>> so you know that all the records for a particular time period are
>>>>>> "in".
>>>>>> If you are collecting netflow records, that time period could  
>>>>>> be hours
>>>>>> and in some cases days (so don't use rastream for netflow data).
>>>>>>
>>>>>> rastream -S radium -M time 5m \
>>>>>> -w /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S \
>>>>>> -B 15s -f /usr/local/bin/rastream.sh
>>>>>>
>>>>>> So the "-B 15s" option indicates that rastream() should close and
>>>>>> process its files 15 seconds after a 5 minute boundary is passed.
>>>>>>
>>>>>> The sample rastream.sh in ./support/Config compresses files. You
>>>>>> can do anything you like in there, but do test ;o)
>>>>>>
>>>>>> The shell script will be passed a single parameter, the full  
>>>>>> pathname
>>>>>> of the file that is being closed.
>>>>>>
>>>>>> I have not done extensive testing on this specific version of
>>>>>> rastream(),
>>>>>> but something like it has been running for years in a number of  
>>>>>> sites,
>>>>>> so please give it a try, and if you have any problems, holler!!!
>>>>>>
>>>>>> Carter
>>>>>>
>>>>>>
>>>>>> On Mar 11, 2009, at 12:18 AM, Jason Carr wrote:
>>>>>>
>>>>>>> Pretty simple file:
>>>>>>>
>>>>>>> RADIUM_DAEMON=no
>>>>>>> RADIUM_MONITOR_ID=4
>>>>>>> RADIUM_MAR_STATUS_INTERVAL=60
>>>>>>> RADIUM_ARGUS_SERVER=bivio-01.iso.cmu.local:561
>>>>>>> RADIUM_ACCESS_PORT=561
>>>>>>> RADIUM_BIND_IP=127.0.0.1
>>>>>>> RADIUM_OUTPUT_FILE=/data/argus/var/core.out
>>>>>>>
>>>>>>> radium is ran like this: ./radium
>>>>>>>
>>>>>>> Does rasplit split out files into a directory structure like  
>>>>>>> this:
>>>>>>>
>>>>>>> /data/argus/core/2009/03/03/22/
>>>>>>>
>>>>>>> With 5 minute files that are gzipped inside of that directory  
>>>>>>> for the
>>>>>>> entire hour.
>>>>>>>
>>>>>>> Carter Bullard wrote:
>>>>>>>> Hey Jason,
>>>>>>>> So I can't reproduce your situation on any of my machines here.
>>>>>>>> How are you running radium()? Do you have a /etc/radium.conf  
>>>>>>>> file?
>>>>>>>> If so can you send it?
>>>>>>>>
>>>>>>>> Carter
>>>>>>>>
>>>>>>>> On Mar 10, 2009, at 1:21 PM, Jason Carr wrote:
>>>>>>>>
>>>>>>>>> Good news, it seemed to have fixed my problem, thank you  
>>>>>>>>> very much.
>>>>>>>>>
>>>>>>>>> Bad news, there seems to be a problem with writing to the  
>>>>>>>>> output
>>>>>>>>> file.
>>>>>>>>>
>>>>>>>>> Did the way files are written to change in this version?  
>>>>>>>>> Right now
>>>>>>>>> we're telling radium to write to one file and then every 5  
>>>>>>>>> minutes
>>>>>>>>> move
>>>>>>>>> the file. Previously, radium would recreate the file but now  
>>>>>>>>> it
>>>>>>>>> does
>>>>>>>>> not and lsof showed that the file was still open by radium.
>>>>>>>>>
>>>>>>>>> - Jason
>>>>>>>>>
>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>> There is a good chance that it will. But the Bivio part is
>>>>>>>>>> still an
>>>>>>>>>> unknown, so if it doesn't we'll still have work to do.
>>>>>>>>>> Carter
>>>>>>>>>>
>>>>>>>>>> On Mar 6, 2009, at 1:48 PM, Jason Carr wrote:
>>>>>>>>>>
>>>>>>>>>>> Carter,
>>>>>>>>>>>
>>>>>>>>>>> Do you think that this would be a fix for my problem? I'll  
>>>>>>>>>>> try it
>>>>>>>>>>> out
>>>>>>>>>>> regardless... :)
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Jason
>>>>>>>>>>>
>>>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>>>> Gentle people,
>>>>>>>>>>>> I have found a bug in the little-endian processing of 2.x
>>>>>>>>>>>> data in
>>>>>>>>>>>> the
>>>>>>>>>>>> argus-clients-3.0.2 code, so I have uploaded a new
>>>>>>>>>>>> argus-clients-3.0.2.beta.2.tar.gz file to the dev  
>>>>>>>>>>>> directory.
>>>>>>>>>>>>
>>>>>>>>>>>> ftp://qosient.com/dev/argus-3.0/argus-clients-3.0.2.beta.2.tar.gz
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I have tested reading data from as many argus files as I  
>>>>>>>>>>>> have,
>>>>>>>>>>>> regardless
>>>>>>>>>>>> of version, and it seems to work, ......
>>>>>>>>>>>>
>>>>>>>>>>>> I also fixed problems with racluster() results having short
>>>>>>>>>>>> counts.
>>>>>>>>>>>> I have gone through all the clients and revised their file
>>>>>>>>>>>> closing
>>>>>>>>>>>> strategies,
>>>>>>>>>>>> and so ....
>>>>>>>>>>>>
>>>>>>>>>>>> This does not fix the "-lz" issue for some ./configure
>>>>>>>>>>>> runs.......
>>>>>>>>>>>>
>>>>>>>>>>>> For those having problems, please grab this set of code for
>>>>>>>>>>>> testing.
>>>>>>>>>>>> And don't hesitate to report any problems you find!!!!!
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!!!!!
>>>>>>>>>>>>
>>>>>>>>>>>> Carter
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>
>>
>>
>>
>>
>

Carter Bullard
CEO/President
QoSient, LLC
150 E 57th Street Suite 12D
New York, New York  10022

+1 212 588-9133 Phone
+1 212 588-9134 Fax






More information about the argus mailing list