Argus and rasqlinsert problems

Carter Bullard carter at qosient.com
Sat Apr 16 15:48:36 EDT 2011


Well, we definately need to fix argus so it doesn't crash.

radium sounds like its doing the right thing.  The local radium
should stop outputing to the rasqlinsert() as its not reading records.
the upstream radium should not stop, however, and argus shouldn't
see anything wrong, so that is a big concern.

I'll try to replicate this here, to see what could be going on.

Carter


On Apr 15, 2011, at 7:46 PM, Leif Tishendorf wrote:

> Carter,
> 
> An update before the weekend here.  So I reverted everything to the latest stable release and put it back in a working local logging configuration (3 argus on localhost, 1 radium collector, rasplit to files).  Made sure that was all working stable.  Then made a change to stop rasplit on the argus box and fire up rasqlinsert on the remote box(no other changes to Argus config or radium config) with the expected problem rasqlinsert stops inserting within about a minute and Argus eventually crashs.
> 
> I then changed it to have a radium instance on the remote(DB) box connect to the radium instance on the Argus box.  That stayed up and stable, and then as soon as I started rasqlinsert again against the now local instance of radium the same problems returned.  I also noticed this seems to cause a cascading problem where rasqlinsert will stop inserting and the local radium instance stops outputting data and the upstream instance of radium (on the Argus box) stops outputting and eventually the Argus instances crash.
> 
> I haven't had a chance to recompile and test with devel and debug enabled but I thought I'd send out that bit of info and see if it lit any light bulbs.  I'll recompile for testing monday.
> 
> Thanks,
> 
> -Leif
> 
> On 04/15/2011 01:07 PM, Carter Bullard wrote:
>> Hey Lief,
>> Well, that is disappointing.  I would recommend that you shift back,
>> to get something stable going,  and I'll work with you to get things so
>> you can go the database route.
>> 
>> I am not seeing this type of instability, but that doesn't mean anything.
>> 
>> First, things first.  We need to fix the argus seg faulting.  Did this start
>> with argus-3.0.4, or with the radium() connection approach?
>> 
>> If you can run gdb(), the best thing would be to run argus under gdb,
>> after compiling the symbols in, so it will tell us where it is dying.  In
>> the argus root directory:
>> 
>>    % touch .devel .debug
>>    % ./configure; make clean; make
>>    % sudo gdb ./bin/argus
>>    (gdb)
>> 
>> Stop your running argus, and then run the argus under gdb.
>> Assuming that your argus was running as a daemon, use the -d switch
>> when running argus, so that it won't go into the background while in gdb:
>> 
>>    (gdb) argus -d
>> 
>> Hopefully it will cough up blood and tell us where it was.  That should
>> help me to fix that.
>> 
>> Rather than have rasqlinsert() connect to a remote radium(), you can
>> radium() on the database system, connecting to the other radium(), and
>> have rasqlinsert() attach to a local radium.  That may or may not help,
>> but it at least leaves record distribution to radium, and lets the other
>> programs have local access to data.
>> 
>> With rasqlinsert(), there are a few possibilities.  When the CPU goes
>> down, has rasqlinsert() stopped inserting records into the database?
>> It may be having problems receiving records, or it could be having
>> problems with mysqld.
>> 
>> Are there any error messages in your mysqld error logs?
>> 
>> Sometimes its hard to find where the logs are.  I use:
>>    lsof -n | fgrep mysql
>> to show me where the directory is. You may have to be root to see.
>> 
>> How are you calling rasqlinsert?
>> 
>> If you would like to take this off the email list, feel free to email me
>> directly, although it is late on Friday, I'll still read some email this
>> weekend.
>> 
>> Carter
>> 
>> On Apr 15, 2011, at 2:42 PM, Leif Tishendorf wrote:
>> 
>>> Hey Carter,
>>> 
>>> I've change how we're logging argus data from regular files to a MySQL
>>> DB.  We used to have 3 Argus instances collected by one Radium
>>> instance and then logged to disk by rasplit, and it was all working
>>> fine.  Now everything is the same except instead of rasplit we use
>>> rasqlinsert and instead of logging local rasqlinsert is running on
>>> another system connecting to the radium instance via a private address
>>> direct link.
>>> 
>>> The first issue I noticed was every few minutes the argus instances
>>> were dieing(not necessarily at the same time) with the following
>>> syslog error:
>>> 
>>> kernel: [4374754.132368] argus[28333]: segfault at 188 ip
>>> 00007f27b7e61f7c sp 00007f27a63e7828 error 6 in
>>> libc-2.12.1.so[7f27b7ddb000+17a000]
>>> 
>>> Then the second issue we're having is rasqlinsert will work fine and
>>> then we'll see CPU/RAM usage decline over about 30 seconds until it's
>>> eventually no longer inserting new argus records.  We can get it
>>> working again (without touching the running rasqlinsert instance) by
>>> sometimes restarting radium and sometimes restarting the argus
>>> instances and sometimes it takes both.  but after a minute or so it
>>> all happens again.  The crashes don't coincide with the inserts
>>> stopping, although they do sometimes fix it when my monitor scripts
>>> restart the argus instances.
>>> 
>>> I'm currently running Argus version 3.0.4 and Argus-clients 3.0.5.5
>>> 
>>> Any ideas on where I should start troubleshooting this?
>>> 
>>> Thanks,
>>> 
>>> -Leif
>>> 
>> 
> 
> -- 
> --Leif
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3681 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20110416/6976f86d/attachment.bin>


More information about the argus mailing list