Argus and rasqlinsert problems

Leif Tishendorf ltishend at gmail.com
Fri Apr 15 19:46:03 EDT 2011


Carter,

An update before the weekend here.  So I reverted everything to the 
latest stable release and put it back in a working local logging 
configuration (3 argus on localhost, 1 radium collector, rasplit to 
files).  Made sure that was all working stable.  Then made a change to 
stop rasplit on the argus box and fire up rasqlinsert on the remote 
box(no other changes to Argus config or radium config) with the expected 
problem rasqlinsert stops inserting within about a minute and Argus 
eventually crashs.

I then changed it to have a radium instance on the remote(DB) box 
connect to the radium instance on the Argus box.  That stayed up and 
stable, and then as soon as I started rasqlinsert again against the now 
local instance of radium the same problems returned.  I also noticed 
this seems to cause a cascading problem where rasqlinsert will stop 
inserting and the local radium instance stops outputting data and the 
upstream instance of radium (on the Argus box) stops outputting and 
eventually the Argus instances crash.

I haven't had a chance to recompile and test with devel and debug 
enabled but I thought I'd send out that bit of info and see if it lit 
any light bulbs.  I'll recompile for testing monday.

Thanks,

-Leif

On 04/15/2011 01:07 PM, Carter Bullard wrote:
> Hey Lief,
> Well, that is disappointing.  I would recommend that you shift back,
> to get something stable going,  and I'll work with you to get things so
> you can go the database route.
>
> I am not seeing this type of instability, but that doesn't mean anything.
>
> First, things first.  We need to fix the argus seg faulting.  Did this start
> with argus-3.0.4, or with the radium() connection approach?
>
> If you can run gdb(), the best thing would be to run argus under gdb,
> after compiling the symbols in, so it will tell us where it is dying.  In
> the argus root directory:
>
>     % touch .devel .debug
>     % ./configure; make clean; make
>     % sudo gdb ./bin/argus
>     (gdb)
>
> Stop your running argus, and then run the argus under gdb.
> Assuming that your argus was running as a daemon, use the -d switch
> when running argus, so that it won't go into the background while in gdb:
>
>     (gdb) argus -d
>
> Hopefully it will cough up blood and tell us where it was.  That should
> help me to fix that.
>
> Rather than have rasqlinsert() connect to a remote radium(), you can
> radium() on the database system, connecting to the other radium(), and
> have rasqlinsert() attach to a local radium.  That may or may not help,
> but it at least leaves record distribution to radium, and lets the other
> programs have local access to data.
>
> With rasqlinsert(), there are a few possibilities.  When the CPU goes
> down, has rasqlinsert() stopped inserting records into the database?
> It may be having problems receiving records, or it could be having
> problems with mysqld.
>
> Are there any error messages in your mysqld error logs?
>
> Sometimes its hard to find where the logs are.  I use:
>     lsof -n | fgrep mysql
> to show me where the directory is. You may have to be root to see.
>
> How are you calling rasqlinsert?
>
> If you would like to take this off the email list, feel free to email me
> directly, although it is late on Friday, I'll still read some email this
> weekend.
>
> Carter
>
> On Apr 15, 2011, at 2:42 PM, Leif Tishendorf wrote:
>
>> Hey Carter,
>>
>> I've change how we're logging argus data from regular files to a MySQL
>> DB.  We used to have 3 Argus instances collected by one Radium
>> instance and then logged to disk by rasplit, and it was all working
>> fine.  Now everything is the same except instead of rasplit we use
>> rasqlinsert and instead of logging local rasqlinsert is running on
>> another system connecting to the radium instance via a private address
>> direct link.
>>
>> The first issue I noticed was every few minutes the argus instances
>> were dieing(not necessarily at the same time) with the following
>> syslog error:
>>
>> kernel: [4374754.132368] argus[28333]: segfault at 188 ip
>> 00007f27b7e61f7c sp 00007f27a63e7828 error 6 in
>> libc-2.12.1.so[7f27b7ddb000+17a000]
>>
>> Then the second issue we're having is rasqlinsert will work fine and
>> then we'll see CPU/RAM usage decline over about 30 seconds until it's
>> eventually no longer inserting new argus records.  We can get it
>> working again (without touching the running rasqlinsert instance) by
>> sometimes restarting radium and sometimes restarting the argus
>> instances and sometimes it takes both.  but after a minute or so it
>> all happens again.  The crashes don't coincide with the inserts
>> stopping, although they do sometimes fix it when my monitor scripts
>> restart the argus instances.
>>
>> I'm currently running Argus version 3.0.4 and Argus-clients 3.0.5.5
>>
>> Any ideas on where I should start troubleshooting this?
>>
>> Thanks,
>>
>> -Leif
>>
>

-- 
--Leif



More information about the argus mailing list