little endian bug in 2.x record parsing in new clients

Jason Carr jcarr at andrew.cmu.edu
Wed Mar 18 12:54:29 EDT 2009


Connecting directly yields this error after a minute or so of running:

radium[29333.e096c4076b7f0000]: 12:45:44.241519 ArgusReadStreamSocket 
(0x7ba4010) read 1312 bytes
radium[29333.e096c4076b7f0000]: 12:45:44.241643 ArgusReadStreamSocket 
(0x7ba4010) read 1448 bytes
radium[29333.e096c4076b7f0000]: 12:45:44.241778 ArgusReadStreamSocket 
(0x7ba4010) read 2088 bytes
radium[29333.e096c4076b7f0000]: 12:45:44.241933 ArgusReadStreamSocket 
(0x7ba4010) read 1448 bytes
radium[29333.e096c4076b7f0000]: 12:45:44.242006 ArgusReadStreamSocket 
(0x7ba4010) read 880 bytes
radium[29333.e096c4076b7f0000]: 12:45:44.242168 ArgusReadStreamSocket 
(0x7ba4010) read 1448 bytes
radium[29333.e096c4076b7f0000]: 12:45:44.242243 ArgusReadStreamSocket 
(0x7ba4010) read 1128 bytes
radium[29333.e096c4076b7f0000]: 12:45:44.242263 ArgusCopyRecordStruct 
(0x7ba4600) retn 0xf8007710
radium[29333.e096c4076b7f0000]: 12:45:44.242277 RaProcessRecord 
(0x7c06010, 0x7ba4600) returning
radium[29333.e096c4076b7f0000]: 12:45:44.242288 ArgusHandleDatum 
(0xadf1b0, 0x7c26a38) received closing Mar
radium[29333.e096c4076b7f0000]: 12:45:44.242300 
ArgusCloseInput(0x7ba4010) closing
radium[29333.e096c4076b7f0000]: 12:45:44.242310 ArgusWriteConnection: 
write(6, 0x49d7a9, 6)
radium[29333.e096c4076b7f0000]: 12:45:44.242330 
ArgusWriteConnection(0x7ba4010, 0x49d7a9, 6) returning 6
radium[29333.5019cf4100000000]: 12:45:44.242368 ArgusOutputProcess() 
received mar 0xf8007710 totals 129200 count 0 remaining 0
radium[29333.e096c4076b7f0000]: 12:45:44.242381 
ArgusCloseInput(0x7ba4010) done
radium[29333.5019cf4100000000]: 12:45:44.242385 ArgusCopyRecordStruct 
(0xf8007710) retn 0xb2aa60
radium[29333.5019cf4100000000]: 12:45:44.242448 ArgusWriteOutSocket 
(0xacd0a0, 0xacec40) 0 records waiting. returning 128
radium[29333.50d9d15700000000]: 12:45:44.932325 
ArgusConnectRemote(0x7ba4010) starting
radium[29333.50d9d15700000000]: 12:45:44.932362 Trying 172.19.41.41 port 
561 Expecting Argus records
radium[29333.50d9d15700000000]: 12:45:44.932659 connected
radium[29333.50d9d15700000000]: 12:45:44.932675 ArgusGetServerSocket 
(0x7ba4010) returning 6
radium[29333.50d9d15700000000]: 12:45:44.944349 ArgusReadConnection() 
read 16 bytes
*** glibc detected *** radium: double free or corruption (out): 
0x0000000000b3e900 ***
======= Backtrace: =========
/lib/libc.so.6[0x7f6b0709508a]
/lib/libc.so.6(cfree+0x8c)[0x7f6b07098c1c]
radium[0x44c8d9]
radium[0x4579c8]
radium[0x453511]
radium[0x4602e0]
/lib/libpthread.so.0[0x7f6b073893f7]
/lib/libc.so.6(clone+0x6d)[0x7f6b070f8b3d]


On 3/18/09 1:50 AM, Jason Carr wrote:
> The Bivio unit has 12 cores, each has a, for a lack of a better phrase,
> virtual machine, that is connected to the head node via an internal
> network. I just added an iptables rule to forward the connection from
> radium on the x86_64 box to the Bivio argus server.
>
> If that works long term I'll try out the various rasplit and rastream
> stuff and let you know how it turns out.
>
> Thanks again,
>
> - Jason
>
> On 3/16/09 2:31 PM, Carter Bullard wrote:
>> Jason,
>> Very confusing, why do you run client software on the bivio?
>> Cater
>>
>> On Mar 16, 2009, at 2:02 PM, Jason Carr wrote:
>>
>>> Yep, sorry I meant -S host:port.
>>>
>>> What I'm trying to do is to store the incoming data stream on disk.
>>> Currently we have the files up into the directory structure like this:
>>>
>>> /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>
>>> And split it up every 5 minutes. So we get 12 5 minute files in each
>>> hour. This works out well for us since they are much smaller files
>>> than running through an hour or days worth of data.
>>>
>>> -D 4 output is this on radium:
>>>
>>> *a ton of other lines*
>>> radium[19615.50999a4000000000]: 13:34:16.865875 ArgusWriteOutSocket
>>> (0xacd0a0, 0xacec40) 0 records waiting. returning 328
>>> radium[19615.e096fd4f7e7f0000]: 13:34:16.866297 ArgusReadStreamSocket
>>> (0x4ff34010) read 400 bytes
>>> radium[19615.e096fd4f7e7f0000]: 13:34:16.866321 ArgusCopyRecordStruct
>>> (0x4ff34600) retn 0x4801f310
>>> radium[19615.e096fd4f7e7f0000]: 13:34:16.866336 RaProcessRecord
>>> (0x4ff96010, 0x4ff34600) returning
>>> radium[19615.50999a4000000000]: 13:34:16.866373 ArgusCopyRecordStruct
>>> (0x4801f310) retn 0xb06310
>>> radium[19615.50999a4000000000]: 13:34:16.866392 ArgusGenerateRecord
>>> (0xb06310, 0) old len 400 new len 380
>>> radium[19615.50999a4000000000]: 13:34:16.866411 ArgusWriteOutSocket
>>> (0xacd0a0, 0xacec40) 0 records waiting. returning 400
>>> radium[19615.e096fd4f7e7f0000]: 13:34:16.868431 ArgusReadStreamSocket
>>> (0x4ff34010) read 100 bytes
>>>
>>> I noticed that the file wasn't moved when that happened. I'm starting
>>> to think it has nothing to do with the file moving.
>>>
>>> I upgraded the bivio software from "argus-clients-3.0.2" to
>>> "argus-clients-3.0.2.beta.2" and it's consistently crashing now. 3.0.2
>>> never crashed previously.
>>>
>>> *** glibc detected *** radium: double free or corruption (out):
>>> 0x3350a008 ***
>>> ======= Backtrace: =========
>>> /lib/libc.so.6[0xfd4a964]
>>> /lib/libc.so.6(__libc_free+0xc8)[0xfd4ded8]
>>> radium[0x1004d500]
>>> radium[0x10055ac4]
>>> radium[0x10062ebc]
>>> /lib/libpthread.so.0[0xfe55994]
>>> /lib/libc.so.6(__clone+0x84)[0xfdb7794]
>>>
>>> - Jason
>>>
>>> On 3/16/09 12:50 PM, Carter Bullard wrote:
>>>> Hey Jason,
>>>> Its not "-R host:port" its "-S host:port". I'm sure that was a slip?
>>>> So things look like they are working for rasplit() then, and with
>>>> your Bivio box.
>>>>
>>>> So I cannot replicate your problem on any machine, and I've
>>>> tested it on 8 different machines now. I do not recommend
>>>> radium writing out to a file when there are so many other ways
>>>> to manage the data streams. Why are you doing this? or rather
>>>> What are you trying to do?
>>>>
>>>> There are a lot of ways to misconfigure a radium. Having it
>>>> connect to itself is the common bad thing, and we try to catch
>>>> this, but there are clever ways of getting around that ;o)
>>>>
>>>> Usually radium doesn't update a moved file, because there
>>>> are no new records coming in. It doesn't know to do anything
>>>> until a data record shows up, so more than likely your server
>>>> connection is poorly configured.
>>>>
>>>> The best way to see what's going on, is to run with something
>>>> like "-D 4" to see the debug information, to assure that you
>>>> are attaching properly, that data is coming in, and that it
>>>> see's that the file is gone.
>>>>
>>>> So run with -D 4 and then delete the file, and see what it sez.
>>>>
>>>> Carter
>>>>
>>>>
>>>> On Mar 16, 2009, at 11:36 AM, Jason Carr wrote:
>>>>
>>>>> Carter,
>>>>>
>>>>> For me, rasplit only wrote one file for 5 minutes after giving rasplit
>>>>> the -R x.x.x.x:561 option. It doesn't exit after the 5 minutes
>>>>> anymore, not sure why though. Is there a way to do that?
>>>>>
>>>>> I still have the issue of radium not letting go of the file when it's
>>>>> been moved.
>>>>>
>>>>> - Jason
>>>>>
>>>>> On 3/12/09 12:51 PM, Carter Bullard wrote:
>>>>>> Hey Jason,
>>>>>> Hmmmm, so I can't replicate your problem here on any machine.
>>>>>> If you can run rasplit() under gdb, (and you can, of course generate
>>>>>> files with a shorter granularity), maybe you will catch the rasplit()
>>>>>> exiting. Be sure and ./configure and compile with these tag files
>>>>>> in the root directory:
>>>>>>
>>>>>> % touch .devel .debug
>>>>>> % ./configure;make clean;make
>>>>>> % gdb bin/rasplit
>>>>>>
>>>>>> For rasplit() do you have write permission to the entire directory
>>>>>> structure that you are trying to write to?
>>>>>>
>>>>>> For rastream(), you do need the $srcid in the path, at least right
>>>>>> now
>>>>>> you do, so you should add it.
>>>>>>
>>>>>> rastream ....... -w /data/argus/core/\$srcid/%Y/%m/%d/argus.........
>>>>>>
>>>>>> Carter
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mar 12, 2009, at 1:38 AM, Jason Carr wrote:
>>>>>>
>>>>>>> I'm running all of this on a x86_64 Linux box.
>>>>>>>
>>>>>>> When I run rasplit like this:
>>>>>>>
>>>>>>> # rasplit -S argus-source:561 -M time 5m -w
>>>>>>> /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>>>>
>>>>>>> It only creates one 5 minute file and then exits. Can I make it just
>>>>>>> keep listening and keep on writing the 5 minute files?
>>>>>>>
>>>>>>> rastream segfaults when I run it:
>>>>>>>
>>>>>>> # rastream -S argus-source:561 -M time 5m -w
>>>>>>> '/data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S' -B 15s -f
>>>>>>> ~/argus/test.sh
>>>>>>> rastream[5979]: 01:34:40.205530 output string requires $srcid
>>>>>>> Segmentation fault
>>>>>>>
>>>>>>> We've historically always used radium to write to a file and move
>>>>>>> the
>>>>>>> file every 5 minutes. radium has always just recreates the output
>>>>>>> file
>>>>>>> once it's been moved. It is a little bit of a hack and if
>>>>>>> rastream/rasplit already does it, that would be awesome. I'll use
>>>>>>> either at this point :)
>>>>>>>
>>>>>>> - Jason
>>>>>>>
>>>>>>>
>>>>>>> Carter Bullard wrote:
>>>>>>>> Hey Jason,
>>>>>>>> I'm answering on the mailing list, so we capture the thread.
>>>>>>>>
>>>>>>>> Glad to hear that your argus-2.x data on your Bivio box is
>>>>>>>> working well for you. Sorry about your radium() file behavior.
>>>>>>>>
>>>>>>>> I still can't duplicate your error, but I'll try again later today.
>>>>>>>> What kind of machine is your radium() running on?
>>>>>>>>
>>>>>>>> To get rasplit to generate the files run:
>>>>>>>> rasplit -S radium -M time 5m \
>>>>>>>> -w /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>>>>>
>>>>>>>> you can run it as a daemon with the "-d" option. Some like to run
>>>>>>>> it from a script that respawns it if it dies, but its been a really
>>>>>>>> stable
>>>>>>>> program.
>>>>>>>>
>>>>>>>> The reason you want to use rasplit(), is that it will actually clip
>>>>>>>> argus records to ensure that all the argus data for a particular
>>>>>>>> time
>>>>>>>> frame is only in the specific timed file. So if a record starts at
>>>>>>>> 12:04:12.345127 and ends at 12:06:12.363131, rasplit will clip
>>>>>>>> that record into 2 records with timestamps:
>>>>>>>> 12:04:12.345127 - 12:05:00.000000
>>>>>>>> 12:05:00.000000 - 12:06:12.363131
>>>>>>>>
>>>>>>>> This makes it so you can process, graph the files, and you don't
>>>>>>>> have to worry about whether you've got all the data or not.
>>>>>>>>
>>>>>>>> Argus does a good job keeping its output records in time
>>>>>>>> order, but if you're processing multiple streams of data, the
>>>>>>>> time order can get a bit skewed. So that when you pull the file
>>>>>>>> out from underneath radium() you can still get records that
>>>>>>>> have data that should be in another file.
>>>>>>>>
>>>>>>>> To compress the data, have a cron job compress the files say
>>>>>>>> an hour later.
>>>>>>>>
>>>>>>>> If you want to experiment, use rastream(), which is rasplit() with
>>>>>>>> the addition feature to run a shell script against files after they
>>>>>>>> close.
>>>>>>>> You need to tell rastream() how long to wait before running the
>>>>>>>> script,
>>>>>>>> so you know that all the records for a particular time period are
>>>>>>>> "in".
>>>>>>>> If you are collecting netflow records, that time period could be
>>>>>>>> hours
>>>>>>>> and in some cases days (so don't use rastream for netflow data).
>>>>>>>>
>>>>>>>> rastream -S radium -M time 5m \
>>>>>>>> -w /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S \
>>>>>>>> -B 15s -f /usr/local/bin/rastream.sh
>>>>>>>>
>>>>>>>> So the "-B 15s" option indicates that rastream() should close and
>>>>>>>> process its files 15 seconds after a 5 minute boundary is passed.
>>>>>>>>
>>>>>>>> The sample rastream.sh in ./support/Config compresses files. You
>>>>>>>> can do anything you like in there, but do test ;o)
>>>>>>>>
>>>>>>>> The shell script will be passed a single parameter, the full
>>>>>>>> pathname
>>>>>>>> of the file that is being closed.
>>>>>>>>
>>>>>>>> I have not done extensive testing on this specific version of
>>>>>>>> rastream(),
>>>>>>>> but something like it has been running for years in a number of
>>>>>>>> sites,
>>>>>>>> so please give it a try, and if you have any problems, holler!!!
>>>>>>>>
>>>>>>>> Carter
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 11, 2009, at 12:18 AM, Jason Carr wrote:
>>>>>>>>
>>>>>>>>> Pretty simple file:
>>>>>>>>>
>>>>>>>>> RADIUM_DAEMON=no
>>>>>>>>> RADIUM_MONITOR_ID=4
>>>>>>>>> RADIUM_MAR_STATUS_INTERVAL=60
>>>>>>>>> RADIUM_ARGUS_SERVER=bivio-01.iso.cmu.local:561
>>>>>>>>> RADIUM_ACCESS_PORT=561
>>>>>>>>> RADIUM_BIND_IP=127.0.0.1
>>>>>>>>> RADIUM_OUTPUT_FILE=/data/argus/var/core.out
>>>>>>>>>
>>>>>>>>> radium is ran like this: ./radium
>>>>>>>>>
>>>>>>>>> Does rasplit split out files into a directory structure like this:
>>>>>>>>>
>>>>>>>>> /data/argus/core/2009/03/03/22/
>>>>>>>>>
>>>>>>>>> With 5 minute files that are gzipped inside of that directory
>>>>>>>>> for the
>>>>>>>>> entire hour.
>>>>>>>>>
>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>> Hey Jason,
>>>>>>>>>> So I can't reproduce your situation on any of my machines here.
>>>>>>>>>> How are you running radium()? Do you have a /etc/radium.conf
>>>>>>>>>> file?
>>>>>>>>>> If so can you send it?
>>>>>>>>>>
>>>>>>>>>> Carter
>>>>>>>>>>
>>>>>>>>>> On Mar 10, 2009, at 1:21 PM, Jason Carr wrote:
>>>>>>>>>>
>>>>>>>>>>> Good news, it seemed to have fixed my problem, thank you very
>>>>>>>>>>> much.
>>>>>>>>>>>
>>>>>>>>>>> Bad news, there seems to be a problem with writing to the output
>>>>>>>>>>> file.
>>>>>>>>>>>
>>>>>>>>>>> Did the way files are written to change in this version? Right
>>>>>>>>>>> now
>>>>>>>>>>> we're telling radium to write to one file and then every 5
>>>>>>>>>>> minutes
>>>>>>>>>>> move
>>>>>>>>>>> the file. Previously, radium would recreate the file but now it
>>>>>>>>>>> does
>>>>>>>>>>> not and lsof showed that the file was still open by radium.
>>>>>>>>>>>
>>>>>>>>>>> - Jason
>>>>>>>>>>>
>>>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>>>> There is a good chance that it will. But the Bivio part is
>>>>>>>>>>>> still an
>>>>>>>>>>>> unknown, so if it doesn't we'll still have work to do.
>>>>>>>>>>>> Carter
>>>>>>>>>>>>
>>>>>>>>>>>> On Mar 6, 2009, at 1:48 PM, Jason Carr wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Carter,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you think that this would be a fix for my problem? I'll
>>>>>>>>>>>>> try it
>>>>>>>>>>>>> out
>>>>>>>>>>>>> regardless... :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>
>>>>>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>>>>>> Gentle people,
>>>>>>>>>>>>>> I have found a bug in the little-endian processing of 2.x
>>>>>>>>>>>>>> data in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> argus-clients-3.0.2 code, so I have uploaded a new
>>>>>>>>>>>>>> argus-clients-3.0.2.beta.2.tar.gz file to the dev directory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ftp://qosient.com/dev/argus-3.0/argus-clients-3.0.2.beta.2.tar.gz
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have tested reading data from as many argus files as I
>>>>>>>>>>>>>> have,
>>>>>>>>>>>>>> regardless
>>>>>>>>>>>>>> of version, and it seems to work, ......
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also fixed problems with racluster() results having short
>>>>>>>>>>>>>> counts.
>>>>>>>>>>>>>> I have gone through all the clients and revised their file
>>>>>>>>>>>>>> closing
>>>>>>>>>>>>>> strategies,
>>>>>>>>>>>>>> and so ....
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This does not fix the "-lz" issue for some ./configure
>>>>>>>>>>>>>> runs.......
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For those having problems, please grab this set of code for
>>>>>>>>>>>>>> testing.
>>>>>>>>>>>>>> And don't hesitate to report any problems you find!!!!!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!!!!!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Carter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> Carter Bullard
>> CEO/President
>> QoSient, LLC
>> 150 E 57th Street Suite 12D
>> New York, New York 10022
>>
>> +1 212 588-9133 Phone
>> +1 212 588-9134 Fax
>>
>>
>>
>>
>



More information about the argus mailing list