little endian bug in 2.x record parsing in new clients

Jason Carr jcarr at andrew.cmu.edu
Thu Mar 19 16:15:46 EDT 2009


Now that I'm directly connecting to the argus server from the client, 
rasplit from 3.0.2 works just fine.  3.0.2.beta.2 crashes.

Does it take awhile for rasplit to close the file after it's been 
completely written to?  I've had times were ra only will read ~800 flows 
and exit because there was an error:

ra[1576]: 09:29:10.582079 ArgusReadStreamSocket (0xf2b0) record length 
is zero

With the newer clients, we can't use the -s user -d s128:d128 like 
before.  Now I'm using -s ddata:128 -s sdata:128.  This is interesting 
because it looks like it still leaves 128 characters of space for the 
source even if the source didn't need it.  I also noticed that the data 
fields don't have quotes around them any more.  Can these be 
enabled/disabled (maybe I had them enabled on my older box)?

- Jason

On 3/18/09 12:54 PM, Jason Carr wrote:
> Connecting directly yields this error after a minute or so of running:
>
> radium[29333.e096c4076b7f0000]: 12:45:44.241519 ArgusReadStreamSocket
> (0x7ba4010) read 1312 bytes
> radium[29333.e096c4076b7f0000]: 12:45:44.241643 ArgusReadStreamSocket
> (0x7ba4010) read 1448 bytes
> radium[29333.e096c4076b7f0000]: 12:45:44.241778 ArgusReadStreamSocket
> (0x7ba4010) read 2088 bytes
> radium[29333.e096c4076b7f0000]: 12:45:44.241933 ArgusReadStreamSocket
> (0x7ba4010) read 1448 bytes
> radium[29333.e096c4076b7f0000]: 12:45:44.242006 ArgusReadStreamSocket
> (0x7ba4010) read 880 bytes
> radium[29333.e096c4076b7f0000]: 12:45:44.242168 ArgusReadStreamSocket
> (0x7ba4010) read 1448 bytes
> radium[29333.e096c4076b7f0000]: 12:45:44.242243 ArgusReadStreamSocket
> (0x7ba4010) read 1128 bytes
> radium[29333.e096c4076b7f0000]: 12:45:44.242263 ArgusCopyRecordStruct
> (0x7ba4600) retn 0xf8007710
> radium[29333.e096c4076b7f0000]: 12:45:44.242277 RaProcessRecord
> (0x7c06010, 0x7ba4600) returning
> radium[29333.e096c4076b7f0000]: 12:45:44.242288 ArgusHandleDatum
> (0xadf1b0, 0x7c26a38) received closing Mar
> radium[29333.e096c4076b7f0000]: 12:45:44.242300
> ArgusCloseInput(0x7ba4010) closing
> radium[29333.e096c4076b7f0000]: 12:45:44.242310 ArgusWriteConnection:
> write(6, 0x49d7a9, 6)
> radium[29333.e096c4076b7f0000]: 12:45:44.242330
> ArgusWriteConnection(0x7ba4010, 0x49d7a9, 6) returning 6
> radium[29333.5019cf4100000000]: 12:45:44.242368 ArgusOutputProcess()
> received mar 0xf8007710 totals 129200 count 0 remaining 0
> radium[29333.e096c4076b7f0000]: 12:45:44.242381
> ArgusCloseInput(0x7ba4010) done
> radium[29333.5019cf4100000000]: 12:45:44.242385 ArgusCopyRecordStruct
> (0xf8007710) retn 0xb2aa60
> radium[29333.5019cf4100000000]: 12:45:44.242448 ArgusWriteOutSocket
> (0xacd0a0, 0xacec40) 0 records waiting. returning 128
> radium[29333.50d9d15700000000]: 12:45:44.932325
> ArgusConnectRemote(0x7ba4010) starting
> radium[29333.50d9d15700000000]: 12:45:44.932362 Trying 172.19.41.41 port
> 561 Expecting Argus records
> radium[29333.50d9d15700000000]: 12:45:44.932659 connected
> radium[29333.50d9d15700000000]: 12:45:44.932675 ArgusGetServerSocket
> (0x7ba4010) returning 6
> radium[29333.50d9d15700000000]: 12:45:44.944349 ArgusReadConnection()
> read 16 bytes
> *** glibc detected *** radium: double free or corruption (out):
> 0x0000000000b3e900 ***
> ======= Backtrace: =========
> /lib/libc.so.6[0x7f6b0709508a]
> /lib/libc.so.6(cfree+0x8c)[0x7f6b07098c1c]
> radium[0x44c8d9]
> radium[0x4579c8]
> radium[0x453511]
> radium[0x4602e0]
> /lib/libpthread.so.0[0x7f6b073893f7]
> /lib/libc.so.6(clone+0x6d)[0x7f6b070f8b3d]
>
>
> On 3/18/09 1:50 AM, Jason Carr wrote:
>> The Bivio unit has 12 cores, each has a, for a lack of a better phrase,
>> virtual machine, that is connected to the head node via an internal
>> network. I just added an iptables rule to forward the connection from
>> radium on the x86_64 box to the Bivio argus server.
>>
>> If that works long term I'll try out the various rasplit and rastream
>> stuff and let you know how it turns out.
>>
>> Thanks again,
>>
>> - Jason
>>
>> On 3/16/09 2:31 PM, Carter Bullard wrote:
>>> Jason,
>>> Very confusing, why do you run client software on the bivio?
>>> Cater
>>>
>>> On Mar 16, 2009, at 2:02 PM, Jason Carr wrote:
>>>
>>>> Yep, sorry I meant -S host:port.
>>>>
>>>> What I'm trying to do is to store the incoming data stream on disk.
>>>> Currently we have the files up into the directory structure like this:
>>>>
>>>> /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>
>>>> And split it up every 5 minutes. So we get 12 5 minute files in each
>>>> hour. This works out well for us since they are much smaller files
>>>> than running through an hour or days worth of data.
>>>>
>>>> -D 4 output is this on radium:
>>>>
>>>> *a ton of other lines*
>>>> radium[19615.50999a4000000000]: 13:34:16.865875 ArgusWriteOutSocket
>>>> (0xacd0a0, 0xacec40) 0 records waiting. returning 328
>>>> radium[19615.e096fd4f7e7f0000]: 13:34:16.866297 ArgusReadStreamSocket
>>>> (0x4ff34010) read 400 bytes
>>>> radium[19615.e096fd4f7e7f0000]: 13:34:16.866321 ArgusCopyRecordStruct
>>>> (0x4ff34600) retn 0x4801f310
>>>> radium[19615.e096fd4f7e7f0000]: 13:34:16.866336 RaProcessRecord
>>>> (0x4ff96010, 0x4ff34600) returning
>>>> radium[19615.50999a4000000000]: 13:34:16.866373 ArgusCopyRecordStruct
>>>> (0x4801f310) retn 0xb06310
>>>> radium[19615.50999a4000000000]: 13:34:16.866392 ArgusGenerateRecord
>>>> (0xb06310, 0) old len 400 new len 380
>>>> radium[19615.50999a4000000000]: 13:34:16.866411 ArgusWriteOutSocket
>>>> (0xacd0a0, 0xacec40) 0 records waiting. returning 400
>>>> radium[19615.e096fd4f7e7f0000]: 13:34:16.868431 ArgusReadStreamSocket
>>>> (0x4ff34010) read 100 bytes
>>>>
>>>> I noticed that the file wasn't moved when that happened. I'm starting
>>>> to think it has nothing to do with the file moving.
>>>>
>>>> I upgraded the bivio software from "argus-clients-3.0.2" to
>>>> "argus-clients-3.0.2.beta.2" and it's consistently crashing now. 3.0.2
>>>> never crashed previously.
>>>>
>>>> *** glibc detected *** radium: double free or corruption (out):
>>>> 0x3350a008 ***
>>>> ======= Backtrace: =========
>>>> /lib/libc.so.6[0xfd4a964]
>>>> /lib/libc.so.6(__libc_free+0xc8)[0xfd4ded8]
>>>> radium[0x1004d500]
>>>> radium[0x10055ac4]
>>>> radium[0x10062ebc]
>>>> /lib/libpthread.so.0[0xfe55994]
>>>> /lib/libc.so.6(__clone+0x84)[0xfdb7794]
>>>>
>>>> - Jason
>>>>
>>>> On 3/16/09 12:50 PM, Carter Bullard wrote:
>>>>> Hey Jason,
>>>>> Its not "-R host:port" its "-S host:port". I'm sure that was a slip?
>>>>> So things look like they are working for rasplit() then, and with
>>>>> your Bivio box.
>>>>>
>>>>> So I cannot replicate your problem on any machine, and I've
>>>>> tested it on 8 different machines now. I do not recommend
>>>>> radium writing out to a file when there are so many other ways
>>>>> to manage the data streams. Why are you doing this? or rather
>>>>> What are you trying to do?
>>>>>
>>>>> There are a lot of ways to misconfigure a radium. Having it
>>>>> connect to itself is the common bad thing, and we try to catch
>>>>> this, but there are clever ways of getting around that ;o)
>>>>>
>>>>> Usually radium doesn't update a moved file, because there
>>>>> are no new records coming in. It doesn't know to do anything
>>>>> until a data record shows up, so more than likely your server
>>>>> connection is poorly configured.
>>>>>
>>>>> The best way to see what's going on, is to run with something
>>>>> like "-D 4" to see the debug information, to assure that you
>>>>> are attaching properly, that data is coming in, and that it
>>>>> see's that the file is gone.
>>>>>
>>>>> So run with -D 4 and then delete the file, and see what it sez.
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>> On Mar 16, 2009, at 11:36 AM, Jason Carr wrote:
>>>>>
>>>>>> Carter,
>>>>>>
>>>>>> For me, rasplit only wrote one file for 5 minutes after giving
>>>>>> rasplit
>>>>>> the -R x.x.x.x:561 option. It doesn't exit after the 5 minutes
>>>>>> anymore, not sure why though. Is there a way to do that?
>>>>>>
>>>>>> I still have the issue of radium not letting go of the file when it's
>>>>>> been moved.
>>>>>>
>>>>>> - Jason
>>>>>>
>>>>>> On 3/12/09 12:51 PM, Carter Bullard wrote:
>>>>>>> Hey Jason,
>>>>>>> Hmmmm, so I can't replicate your problem here on any machine.
>>>>>>> If you can run rasplit() under gdb, (and you can, of course generate
>>>>>>> files with a shorter granularity), maybe you will catch the
>>>>>>> rasplit()
>>>>>>> exiting. Be sure and ./configure and compile with these tag files
>>>>>>> in the root directory:
>>>>>>>
>>>>>>> % touch .devel .debug
>>>>>>> % ./configure;make clean;make
>>>>>>> % gdb bin/rasplit
>>>>>>>
>>>>>>> For rasplit() do you have write permission to the entire directory
>>>>>>> structure that you are trying to write to?
>>>>>>>
>>>>>>> For rastream(), you do need the $srcid in the path, at least right
>>>>>>> now
>>>>>>> you do, so you should add it.
>>>>>>>
>>>>>>> rastream ....... -w /data/argus/core/\$srcid/%Y/%m/%d/argus.........
>>>>>>>
>>>>>>> Carter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mar 12, 2009, at 1:38 AM, Jason Carr wrote:
>>>>>>>
>>>>>>>> I'm running all of this on a x86_64 Linux box.
>>>>>>>>
>>>>>>>> When I run rasplit like this:
>>>>>>>>
>>>>>>>> # rasplit -S argus-source:561 -M time 5m -w
>>>>>>>> /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>>>>>
>>>>>>>> It only creates one 5 minute file and then exits. Can I make it
>>>>>>>> just
>>>>>>>> keep listening and keep on writing the 5 minute files?
>>>>>>>>
>>>>>>>> rastream segfaults when I run it:
>>>>>>>>
>>>>>>>> # rastream -S argus-source:561 -M time 5m -w
>>>>>>>> '/data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S' -B 15s -f
>>>>>>>> ~/argus/test.sh
>>>>>>>> rastream[5979]: 01:34:40.205530 output string requires $srcid
>>>>>>>> Segmentation fault
>>>>>>>>
>>>>>>>> We've historically always used radium to write to a file and move
>>>>>>>> the
>>>>>>>> file every 5 minutes. radium has always just recreates the output
>>>>>>>> file
>>>>>>>> once it's been moved. It is a little bit of a hack and if
>>>>>>>> rastream/rasplit already does it, that would be awesome. I'll use
>>>>>>>> either at this point :)
>>>>>>>>
>>>>>>>> - Jason
>>>>>>>>
>>>>>>>>
>>>>>>>> Carter Bullard wrote:
>>>>>>>>> Hey Jason,
>>>>>>>>> I'm answering on the mailing list, so we capture the thread.
>>>>>>>>>
>>>>>>>>> Glad to hear that your argus-2.x data on your Bivio box is
>>>>>>>>> working well for you. Sorry about your radium() file behavior.
>>>>>>>>>
>>>>>>>>> I still can't duplicate your error, but I'll try again later
>>>>>>>>> today.
>>>>>>>>> What kind of machine is your radium() running on?
>>>>>>>>>
>>>>>>>>> To get rasplit to generate the files run:
>>>>>>>>> rasplit -S radium -M time 5m \
>>>>>>>>> -w /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S
>>>>>>>>>
>>>>>>>>> you can run it as a daemon with the "-d" option. Some like to run
>>>>>>>>> it from a script that respawns it if it dies, but its been a
>>>>>>>>> really
>>>>>>>>> stable
>>>>>>>>> program.
>>>>>>>>>
>>>>>>>>> The reason you want to use rasplit(), is that it will actually
>>>>>>>>> clip
>>>>>>>>> argus records to ensure that all the argus data for a particular
>>>>>>>>> time
>>>>>>>>> frame is only in the specific timed file. So if a record starts at
>>>>>>>>> 12:04:12.345127 and ends at 12:06:12.363131, rasplit will clip
>>>>>>>>> that record into 2 records with timestamps:
>>>>>>>>> 12:04:12.345127 - 12:05:00.000000
>>>>>>>>> 12:05:00.000000 - 12:06:12.363131
>>>>>>>>>
>>>>>>>>> This makes it so you can process, graph the files, and you don't
>>>>>>>>> have to worry about whether you've got all the data or not.
>>>>>>>>>
>>>>>>>>> Argus does a good job keeping its output records in time
>>>>>>>>> order, but if you're processing multiple streams of data, the
>>>>>>>>> time order can get a bit skewed. So that when you pull the file
>>>>>>>>> out from underneath radium() you can still get records that
>>>>>>>>> have data that should be in another file.
>>>>>>>>>
>>>>>>>>> To compress the data, have a cron job compress the files say
>>>>>>>>> an hour later.
>>>>>>>>>
>>>>>>>>> If you want to experiment, use rastream(), which is rasplit() with
>>>>>>>>> the addition feature to run a shell script against files after
>>>>>>>>> they
>>>>>>>>> close.
>>>>>>>>> You need to tell rastream() how long to wait before running the
>>>>>>>>> script,
>>>>>>>>> so you know that all the records for a particular time period are
>>>>>>>>> "in".
>>>>>>>>> If you are collecting netflow records, that time period could be
>>>>>>>>> hours
>>>>>>>>> and in some cases days (so don't use rastream for netflow data).
>>>>>>>>>
>>>>>>>>> rastream -S radium -M time 5m \
>>>>>>>>> -w /data/argus/core/%Y/%m/%d/argus.%Y.%m.%d.%H.%M.%S \
>>>>>>>>> -B 15s -f /usr/local/bin/rastream.sh
>>>>>>>>>
>>>>>>>>> So the "-B 15s" option indicates that rastream() should close and
>>>>>>>>> process its files 15 seconds after a 5 minute boundary is passed.
>>>>>>>>>
>>>>>>>>> The sample rastream.sh in ./support/Config compresses files. You
>>>>>>>>> can do anything you like in there, but do test ;o)
>>>>>>>>>
>>>>>>>>> The shell script will be passed a single parameter, the full
>>>>>>>>> pathname
>>>>>>>>> of the file that is being closed.
>>>>>>>>>
>>>>>>>>> I have not done extensive testing on this specific version of
>>>>>>>>> rastream(),
>>>>>>>>> but something like it has been running for years in a number of
>>>>>>>>> sites,
>>>>>>>>> so please give it a try, and if you have any problems, holler!!!
>>>>>>>>>
>>>>>>>>> Carter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mar 11, 2009, at 12:18 AM, Jason Carr wrote:
>>>>>>>>>
>>>>>>>>>> Pretty simple file:
>>>>>>>>>>
>>>>>>>>>> RADIUM_DAEMON=no
>>>>>>>>>> RADIUM_MONITOR_ID=4
>>>>>>>>>> RADIUM_MAR_STATUS_INTERVAL=60
>>>>>>>>>> RADIUM_ARGUS_SERVER=bivio-01.iso.cmu.local:561
>>>>>>>>>> RADIUM_ACCESS_PORT=561
>>>>>>>>>> RADIUM_BIND_IP=127.0.0.1
>>>>>>>>>> RADIUM_OUTPUT_FILE=/data/argus/var/core.out
>>>>>>>>>>
>>>>>>>>>> radium is ran like this: ./radium
>>>>>>>>>>
>>>>>>>>>> Does rasplit split out files into a directory structure like
>>>>>>>>>> this:
>>>>>>>>>>
>>>>>>>>>> /data/argus/core/2009/03/03/22/
>>>>>>>>>>
>>>>>>>>>> With 5 minute files that are gzipped inside of that directory
>>>>>>>>>> for the
>>>>>>>>>> entire hour.
>>>>>>>>>>
>>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>>> Hey Jason,
>>>>>>>>>>> So I can't reproduce your situation on any of my machines here.
>>>>>>>>>>> How are you running radium()? Do you have a /etc/radium.conf
>>>>>>>>>>> file?
>>>>>>>>>>> If so can you send it?
>>>>>>>>>>>
>>>>>>>>>>> Carter
>>>>>>>>>>>
>>>>>>>>>>> On Mar 10, 2009, at 1:21 PM, Jason Carr wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Good news, it seemed to have fixed my problem, thank you very
>>>>>>>>>>>> much.
>>>>>>>>>>>>
>>>>>>>>>>>> Bad news, there seems to be a problem with writing to the
>>>>>>>>>>>> output
>>>>>>>>>>>> file.
>>>>>>>>>>>>
>>>>>>>>>>>> Did the way files are written to change in this version? Right
>>>>>>>>>>>> now
>>>>>>>>>>>> we're telling radium to write to one file and then every 5
>>>>>>>>>>>> minutes
>>>>>>>>>>>> move
>>>>>>>>>>>> the file. Previously, radium would recreate the file but now it
>>>>>>>>>>>> does
>>>>>>>>>>>> not and lsof showed that the file was still open by radium.
>>>>>>>>>>>>
>>>>>>>>>>>> - Jason
>>>>>>>>>>>>
>>>>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>>>>> There is a good chance that it will. But the Bivio part is
>>>>>>>>>>>>> still an
>>>>>>>>>>>>> unknown, so if it doesn't we'll still have work to do.
>>>>>>>>>>>>> Carter
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mar 6, 2009, at 1:48 PM, Jason Carr wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Carter,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you think that this would be a fix for my problem? I'll
>>>>>>>>>>>>>> try it
>>>>>>>>>>>>>> out
>>>>>>>>>>>>>> regardless... :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Carter Bullard wrote:
>>>>>>>>>>>>>>> Gentle people,
>>>>>>>>>>>>>>> I have found a bug in the little-endian processing of 2.x
>>>>>>>>>>>>>>> data in
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> argus-clients-3.0.2 code, so I have uploaded a new
>>>>>>>>>>>>>>> argus-clients-3.0.2.beta.2.tar.gz file to the dev directory.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ftp://qosient.com/dev/argus-3.0/argus-clients-3.0.2.beta.2.tar.gz
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have tested reading data from as many argus files as I
>>>>>>>>>>>>>>> have,
>>>>>>>>>>>>>>> regardless
>>>>>>>>>>>>>>> of version, and it seems to work, ......
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also fixed problems with racluster() results having short
>>>>>>>>>>>>>>> counts.
>>>>>>>>>>>>>>> I have gone through all the clients and revised their file
>>>>>>>>>>>>>>> closing
>>>>>>>>>>>>>>> strategies,
>>>>>>>>>>>>>>> and so ....
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This does not fix the "-lz" issue for some ./configure
>>>>>>>>>>>>>>> runs.......
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For those having problems, please grab this set of code for
>>>>>>>>>>>>>>> testing.
>>>>>>>>>>>>>>> And don't hesitate to report any problems you find!!!!!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!!!!!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Carter
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> Carter Bullard
>>> CEO/President
>>> QoSient, LLC
>>> 150 E 57th Street Suite 12D
>>> New York, New York 10022
>>>
>>> +1 212 588-9133 Phone
>>> +1 212 588-9134 Fax
>>>
>>>
>>>
>>>
>>
>



More information about the argus mailing list