little endian bug in 2.x record parsing in new clients
Jason Carr
jcarr at andrew.cmu.edu
Mon Mar 16 14:02:12 EDT 2009
Yep, sorry I meant -S host:port.
What I'm trying to do is to store the incoming data stream on disk.
Currently we have the files up into the directory structure like this:
/data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
And split it up every 5 minutes. So we get 12 5 minute files in each
hour. This works out well for us since they are much smaller files than
running through an hour or days worth of data.
-D 4 output is this on radium:
*a ton of other lines*
radium[19615.50999a4000000000]: 13:34:16.865875 ArgusWriteOutSocket
(0xacd0a0, 0xacec40) 0 records waiting. returning 328
radium[19615.e096fd4f7e7f0000]: 13:34:16.866297 ArgusReadStreamSocket
(0x4ff34010) read 400 bytes
radium[19615.e096fd4f7e7f0000]: 13:34:16.866321 ArgusCopyRecordStruct
(0x4ff34600) retn 0x4801f310
radium[19615.e096fd4f7e7f0000]: 13:34:16.866336 RaProcessRecord
(0x4ff96010, 0x4ff34600) returning
radium[19615.50999a4000000000]: 13:34:16.866373 ArgusCopyRecordStruct
(0x4801f310) retn 0xb06310
radium[19615.50999a4000000000]: 13:34:16.866392 ArgusGenerateRecord
(0xb06310, 0) old len 400 new len 380
radium[19615.50999a4000000000]: 13:34:16.866411 ArgusWriteOutSocket
(0xacd0a0, 0xacec40) 0 records waiting. returning 400
radium[19615.e096fd4f7e7f0000]: 13:34:16.868431 ArgusReadStreamSocket
(0x4ff34010) read 100 bytes
I noticed that the file wasn't moved when that happened. I'm starting
to think it has nothing to do with the file moving.
I upgraded the bivio software from "argus-clients-3.0.2" to
"argus-clients-3.0.2.beta.2" and it's consistently crashing now. 3.0.2
never crashed previously.
*** glibc detected *** radium: double free or corruption (out):
0x3350a008 ***
======= Backtrace: =========
/lib/libc.so.6[0xfd4a964]
/lib/libc.so.6(__libc_free+0xc8)[0xfd4ded8]
radium[0x1004d500]
radium[0x10055ac4]
radium[0x10062ebc]
/lib/libpthread.so.0[0xfe55994]
/lib/libc.so.6(__clone+0x84)[0xfdb7794]
- Jason
On 3/16/09 12:50 PM, Carter Bullard wrote:
> Hey Jason,
> Its not "-R host:port" its "-S host:port". I'm sure that was a slip?
> So things look like they are working for rasplit() then, and with
> your Bivio box.
>
> So I cannot replicate your problem on any machine, and I've
> tested it on 8 different machines now. I do not recommend
> radium writing out to a file when there are so many other ways
> to manage the data streams. Why are you doing this? or rather
> What are you trying to do?
>
> There are a lot of ways to misconfigure a radium. Having it
> connect to itself is the common bad thing, and we try to catch
> this, but there are clever ways of getting around that ;o)
>
> Usually radium doesn't update a moved file, because there
> are no new records coming in. It doesn't know to do anything
> until a data record shows up, so more than likely your server
> connection is poorly configured.
>
> The best way to see what's going on, is to run with something
> like "-D 4" to see the debug information, to assure that you
> are attaching properly, that data is coming in, and that it
> see's that the file is gone.
>
> So run with -D 4 and then delete the file, and see what it sez.
>
> Carter
>
>
> On Mar 16, 2009, at 11:36 AM, Jason Carr wrote:
>
>> Carter,
>>
>> For me, rasplit only wrote one file for 5 minutes after giving rasplit
>> the -R x.x.x.x:561 option. It doesn't exit after the 5 minutes
>> anymore, not sure why though. Is there a way to do that?
>>
>> I still have the issue of radium not letting go of the file when it's
>> been moved.
>>
>> - Jason
>>
>> On 3/12/09 12:51 PM, Carter Bullard wrote:
>>> Hey Jason,
>>> Hmmmm, so I can't replicate your problem here on any machine.
>>> If you can run rasplit() under gdb, (and you can, of course generate
>>> files with a shorter granularity), maybe you will catch the rasplit()
>>> exiting. Be sure and ./configure and compile with these tag files
>>> in the root directory:
>>>
>>> % touch .devel .debug
>>> % ./configure;make clean;make
>>> % gdb bin/rasplit
>>>
>>> For rasplit() do you have write permission to the entire directory
>>> structure that you are trying to write to?
>>>
>>> For rastream(), you do need the $srcid in the path, at least right now
>>> you do, so you should add it.
>>>
>>> rastream ....... -w /data/argus/core/\$srcid/%Y/%m/%d/argus.........
>>>
>>> Carter
>>>
>>>
>>>
>>> On Mar 12, 2009, at 1:38 AM, Jason Carr wrote:
>>>
>>>> I'm running all of this on a x86_64 Linux box.
>>>>
>>>> When I run rasplit like this:
>>>>
>>>> # rasplit -S argus-source:561 -M time 5m -w
>>>> /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>
>>>> It only creates one 5 minute file and then exits. Can I make it just
>>>> keep listening and keep on writing the 5 minute files?
>>>>
>>>> rastream segfaults when I run it:
>>>>
>>>> # rastream -S argus-source:561 -M time 5m -w
>>>> '/data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S' -B 15s -f
>>>> ~/argus/test.sh
>>>> rastream[5979]: 01:34:40.205530 output string requires $srcid
>>>> Segmentation fault
>>>>
>>>> We've historically always used radium to write to a file and move the
>>>> file every 5 minutes. radium has always just recreates the output file
>>>> once it's been moved. It is a little bit of a hack and if
>>>> rastream/rasplit already does it, that would be awesome. I'll use
>>>> either at this point :)
>>>>
>>>> - Jason
>>>>
>>>>
>>>> Carter Bullard wrote:
>>>>> Hey Jason,
>>>>> I'm answering on the mailing list, so we capture the thread.
>>>>>
>>>>> Glad to hear that your argus-2.x data on your Bivio box is
>>>>> working well for you. Sorry about your radium() file behavior.
>>>>>
>>>>> I still can't duplicate your error, but I'll try again later today.
>>>>> What kind of machine is your radium() running on?
>>>>>
>>>>> To get rasplit to generate the files run:
>>>>> rasplit -S radium -M time 5m \
>>>>> -w /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>>
>>>>> you can run it as a daemon with the "-d" option. Some like to run
>>>>> it from a script that respawns it if it dies, but its been a really
>>>>> stable
>>>>> program.
>>>>>
>>>>> The reason you want to use rasplit(), is that it will actually clip
>>>>> argus records to ensure that all the argus data for a particular time
>>>>> frame is only in the specific timed file. So if a record starts at
>>>>> 12:04:12.345127 and ends at 12:06:12.363131, rasplit will clip
>>>>> that record into 2 records with timestamps:
>>>>> 12:04:12.345127 - 12:05:00.000000
>>>>> 12:05:00.000000 - 12:06:12.363131
>>>>>
>>>>> This makes it so you can process, graph the files, and you don't
>>>>> have to worry about whether you've got all the data or not.
>>>>>
>>>>> Argus does a good job keeping its output records in time
>>>>> order, but if you're processing multiple streams of data, the
>>>>> time order can get a bit skewed. So that when you pull the file
>>>>> out from underneath radium() you can still get records that
>>>>> have data that should be in another file.
>>>>>
>>>>> To compress the data, have a cron job compress the files say
>>>>> an hour later.
>>>>>
>>>>> If you want to experiment, use rastream(), which is rasplit() with
>>>>> the addition feature to run a shell script against files after they
>>>>> close.
>>>>> You need to tell rastream() how long to wait before running the
>>>>> script,
>>>>> so you know that all the records for a particular time period are
>>>>> "in".
>>>>> If you are collecting netflow records, that time period could be hours
>>>>> and in some cases days (so don't use rastream for netflow data).
>>>>>
>>>>> rastream -S radium -M time 5m \
>>>>> -w /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S \
>>>>> -B 15s -f /usr/local/bin/rastream.sh
>>>>>
>>>>> So the "-B 15s" option indicates that rastream() should close and
>>>>> process its files 15 seconds after a 5 minute boundary is passed.
>>>>>
>>>>> The sample rastream.sh in ./support/Config compresses files. You
>>>>> can do anything you like in there, but do test ;o)
>>>>>
>>>>> The shell script will be passed a single parameter, the full pathname
>>>>> of the file that is being closed.
>>>>>
>>>>> I have not done extensive testing on this specific version of
>>>>> rastream(),
>>>>> but something like it has been running for years in a number of sites,
>>>>> so please give it a try, and if you have any problems, holler!!!
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>> On Mar 11, 2009, at 12:18 AM, Jason Carr wrote:
>>>>>
>>>>>> Pretty simple file:
>>>>>>
>>>>>> RADIUM_DAEMON=no
>>>>>> RADIUM_MONITOR_ID=4
>>>>>> RADIUM_MAR_STATUS_INTERVAL=60
>>>>>> RADIUM_ARGUS_SERVER=bivio-01.iso.cmu.local:561
>>>>>> RADIUM_ACCESS_PORT=561
>>>>>> RADIUM_BIND_IP=127.0.0.1
>>>>>> RADIUM_OUTPUT_FILE=/data/argus/var/core.out
>>>>>>
>>>>>> radium is ran like this: ./radium
>>>>>>
>>>>>> Does rasplit split out files into a directory structure like this:
>>>>>>
>>>>>> /data/argus/core/2009/03/03/22/
>>>>>>
>>>>>> With 5 minute files that are gzipped inside of that directory for the
>>>>>> entire hour.
>>>>>>
>>>>>> Carter Bullard wrote:
>>>>>>> Hey Jason,
>>>>>>> So I can't reproduce your situation on any of my machines here.
>>>>>>> How are you running radium()? Do you have a /etc/radium.conf file?
>>>>>>> If so can you send it?
>>>>>>>
>>>>>>> Carter
>>>>>>>
>>>>>>> On Mar 10, 2009, at 1:21 PM, Jason Carr wrote:
>>>>>>>
>>>>>>>> Good news, it seemed to have fixed my problem, thank you very much.
>>>>>>>>
>>>>>>>> Bad news, there seems to be a problem with writing to the output
>>>>>>>> file.
>>>>>>>>
>>>>>>>> Did the way files are written to change in this version? Right now
>>>>>>>> we're telling radium to write to one file and then every 5 minutes
>>>>>>>> move
>>>>>>>> the file. Previously, radium would recreate the file but now it
>>>>>>>> does
>>>>>>>> not and lsof showed that the file was still open by radium.
>>>>>>>>
>>>>>>>> - Jason
>>>>>>>>
>>>>>>>> Carter Bullard wrote:
>>>>>>>>> There is a good chance that it will. But the Bivio part is
>>>>>>>>> still an
>>>>>>>>> unknown, so if it doesn't we'll still have work to do.
>>>>>>>>> Carter
>>>>>>>>>
>>>>>>>>> On Mar 6, 2009, at 1:48 PM, Jason Carr wrote:
>>>>>>>>>
>>>>>>>>>> Carter,
>>>>>>>>>>
>>>>>>>>>> Do you think that this would be a fix for my problem? I'll try it
>>>>>>>>>> out
>>>>>>>>>> regardless... :)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Jason
>>>>>>>>>>
>>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>>> Gentle people,
>>>>>>>>>>> I have found a bug in the little-endian processing of 2.x
>>>>>>>>>>> data in
>>>>>>>>>>> the
>>>>>>>>>>> argus-clients-3.0.2 code, so I have uploaded a new
>>>>>>>>>>> argus-clients-3.0.2.beta.2.tar.gz file to the dev directory.
>>>>>>>>>>>
>>>>>>>>>>> ftp://qosient.com/dev/argus-3.0/argus-clients-3.0.2.beta.2.tar.gz
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have tested reading data from as many argus files as I have,
>>>>>>>>>>> regardless
>>>>>>>>>>> of version, and it seems to work, ......
>>>>>>>>>>>
>>>>>>>>>>> I also fixed problems with racluster() results having short
>>>>>>>>>>> counts.
>>>>>>>>>>> I have gone through all the clients and revised their file
>>>>>>>>>>> closing
>>>>>>>>>>> strategies,
>>>>>>>>>>> and so ....
>>>>>>>>>>>
>>>>>>>>>>> This does not fix the "-lz" issue for some ./configure
>>>>>>>>>>> runs.......
>>>>>>>>>>>
>>>>>>>>>>> For those having problems, please grab this set of code for
>>>>>>>>>>> testing.
>>>>>>>>>>> And don't hesitate to report any problems you find!!!!!
>>>>>>>>>>>
>>>>>>>>>>> Thanks!!!!!
>>>>>>>>>>>
>>>>>>>>>>> Carter
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>
>
>
>
>
More information about the argus
mailing list