rastream stopped processing

Tue Jul 8 12:31:16 EDT 2014

On Jul 8, 2014, at 11:05 AM, Carter Bullard <carter at qosient.com> wrote:

> Did radium stop collecting or sending ??  We’ve got some
> reports on reliable connection failure, so it maybe your
> rastream() disconnected and didn’t reconnect ????

It seems that radium is collecting; art least I can attach to the radium instance and receive 100 records with “ra -r 127.0.0.1 -N 100"

> check out your system log /var/log/messages /var/log/system.log
> to see if radium complained about the client going away, or if
> radium stopped reading.  If radium is still running you can just
> connect to it, to see if its transmitting anything.
> 
It looks like it must be on the rastream side...??:

Jul  6 22:59:00 test radium[57599]: 2014-07-06 22:59:00.572718 connect from localhost[127.0.0.1]
Jul  7 08:00:21 test radium[57599]: 2014-07-07 08:00:21.541077 ArgusWriteOutSocket(0x1269d0) client not processing: disconnecting

Likely unrelated, but I’m also seeing many of these messages in the logs:

Jul  2 16:31:26 test radium[47571]: 2014-07-02 16:31:26.358574 ArgusWriteOutSocket(0x181269d0) max queue exceeded 500001
Jul  2 16:31:26 test radium[47571]: 2014-07-02 16:31:26.390583 ArgusWriteOutSocket(0x181269d0) max queue exceeded 500001

> If there is a problem, and you’ve compiled with symbols in (.devel),
> then attach to radium with gdb() and look to see if any of the
> threads have terminated.
> 
> (gdb) attach pid.of.radium
> (gdb) info threads
> (gdb) thread 1
> (gdb) where
> (gdb) thread 2
> (gdb) where
> 
> etc ….. may not be exact syntax, but its something like that.
> With all the various end systems using clang and lldb, I’m kind
> of schizophrenic on debugging right now.
> 

Radium output:
(gdb) info threads
  3 Thread 0x7f001f752700 (LWP 57600)  0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f001ed51700 (LWP 57601)  0x0000003b532acced in nanosleep () from /lib64/libc.so.6
* 1 Thread 0x7f001fc87700 (LWP 57599)  0x0000003b532e15d3 in select () from /lib64/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f001fc87700 (LWP 57599))]#0  0x0000003b532e15d3 in select () from /lib64/libc.so.6
(gdb) where
#0  0x0000003b532e15d3 in select () from /lib64/libc.so.6
#1  0x00000000004669ee in ArgusReadStream (parser=0x7f001fb42010, queue=0x19511f0) at ./argus_client.c:738
#2  0x000000000040746c in main (argc=3, argv=0x7fff4ae0a728) at ./argus_main.c:387
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f001ed51700 (LWP 57601))]#0  0x0000003b532acced in nanosleep () from /lib64/libc.so.6
(gdb) where
#0  0x0000003b532acced in nanosleep () from /lib64/libc.so.6
#1  0x0000003b532acb60 in sleep () from /lib64/libc.so.6
#2  0x0000000000466455 in ArgusConnectRemotes (arg=0x1951190) at ./argus_client.c:579
#3  0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003b532e8b5d in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f001f752700 (LWP 57600))]#0  0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) where
#0  0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004ac1be in ArgusOutputProcess (arg=0x1953310) at ./argus_output.c:897
#2  0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003b532e8b5d in clone () from /lib64/libc.so.6

Connecting to the failing rastream process gave odd results:

(gdb) detach
Detaching from program: /usr/local/bin/radium, process 57599
(gdb) attach 57605
Attaching to program: /usr/local/bin/radium, process 57605
Cannot access memory at address 0x706f636373007064
(gdb) where
#0  0x0000003b53a0ef3d in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) info threads
* 1 process 57605  0x0000003b53a0ef3d in ?? ()

What should my next step be? Ensure the reliable connection setting is on? Run rastream under gdb?

Thanks and cheers,

Jesse

> Carter
> 
> 
> On Jul 7, 2014, at 4:28 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
> 
>> Hello,
>> 
>> Over the weekend my rastream process stopped processing records for some reason. The current setup is:
>> 
>> netflow records -> radium -> rastream -M time 5m
>> 
>> I noticed that records were no longer being written to disk. I connected a new ra instance to radium, and had no problems receiving records. Attaching strace to the rastream process all I could see were calls:
>> 
>> <snip>
>> nanosleep({0, 50000000}, NULL)          = 0
>> nanosleep({0, 50000000}, NULL)          = 0
>> nanosleep({0, 50000000}, NULL)          = 0 
>> <snip>
>> 
>> Is there any settings I can tweak or logs to check for or correct the issue? I vaguely recall something about persistent connections where if lost an attempt would be made to reconnect, but my gut says that’s not what’s happening here...
>> 
>> Cheers,
>> 
>> Jesse
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 204 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140708/37b5baec/attachment.sig>