rastream stopped processing
Jesse Bowling
jessebowling at gmail.com
Tue Jul 8 12:31:16 EDT 2014
On Jul 8, 2014, at 11:05 AM, Carter Bullard <carter at qosient.com> wrote:
> Did radium stop collecting or sending ?? We’ve got some
> reports on reliable connection failure, so it maybe your
> rastream() disconnected and didn’t reconnect ????
It seems that radium is collecting; art least I can attach to the radium instance and receive 100 records with “ra -r 127.0.0.1 -N 100"
> check out your system log /var/log/messages /var/log/system.log
> to see if radium complained about the client going away, or if
> radium stopped reading. If radium is still running you can just
> connect to it, to see if its transmitting anything.
>
It looks like it must be on the rastream side...??:
Jul 6 22:59:00 test radium[57599]: 2014-07-06 22:59:00.572718 connect from localhost[127.0.0.1]
Jul 7 08:00:21 test radium[57599]: 2014-07-07 08:00:21.541077 ArgusWriteOutSocket(0x1269d0) client not processing: disconnecting
Likely unrelated, but I’m also seeing many of these messages in the logs:
Jul 2 16:31:26 test radium[47571]: 2014-07-02 16:31:26.358574 ArgusWriteOutSocket(0x181269d0) max queue exceeded 500001
Jul 2 16:31:26 test radium[47571]: 2014-07-02 16:31:26.390583 ArgusWriteOutSocket(0x181269d0) max queue exceeded 500001
> If there is a problem, and you’ve compiled with symbols in (.devel),
> then attach to radium with gdb() and look to see if any of the
> threads have terminated.
>
> (gdb) attach pid.of.radium
> (gdb) info threads
> (gdb) thread 1
> (gdb) where
> (gdb) thread 2
> (gdb) where
>
> etc ….. may not be exact syntax, but its something like that.
> With all the various end systems using clang and lldb, I’m kind
> of schizophrenic on debugging right now.
>
Radium output:
(gdb) info threads
3 Thread 0x7f001f752700 (LWP 57600) 0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7f001ed51700 (LWP 57601) 0x0000003b532acced in nanosleep () from /lib64/libc.so.6
* 1 Thread 0x7f001fc87700 (LWP 57599) 0x0000003b532e15d3 in select () from /lib64/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f001fc87700 (LWP 57599))]#0 0x0000003b532e15d3 in select () from /lib64/libc.so.6
(gdb) where
#0 0x0000003b532e15d3 in select () from /lib64/libc.so.6
#1 0x00000000004669ee in ArgusReadStream (parser=0x7f001fb42010, queue=0x19511f0) at ./argus_client.c:738
#2 0x000000000040746c in main (argc=3, argv=0x7fff4ae0a728) at ./argus_main.c:387
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f001ed51700 (LWP 57601))]#0 0x0000003b532acced in nanosleep () from /lib64/libc.so.6
(gdb) where
#0 0x0000003b532acced in nanosleep () from /lib64/libc.so.6
#1 0x0000003b532acb60 in sleep () from /lib64/libc.so.6
#2 0x0000000000466455 in ArgusConnectRemotes (arg=0x1951190) at ./argus_client.c:579
#3 0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003b532e8b5d in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f001f752700 (LWP 57600))]#0 0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) where
#0 0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000004ac1be in ArgusOutputProcess (arg=0x1953310) at ./argus_output.c:897
#2 0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
#3 0x0000003b532e8b5d in clone () from /lib64/libc.so.6
Connecting to the failing rastream process gave odd results:
(gdb) detach
Detaching from program: /usr/local/bin/radium, process 57599
(gdb) attach 57605
Attaching to program: /usr/local/bin/radium, process 57605
Cannot access memory at address 0x706f636373007064
(gdb) where
#0 0x0000003b53a0ef3d in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) info threads
* 1 process 57605 0x0000003b53a0ef3d in ?? ()
What should my next step be? Ensure the reliable connection setting is on? Run rastream under gdb?
Thanks and cheers,
Jesse
> Carter
>
>
> On Jul 7, 2014, at 4:28 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>
>> Hello,
>>
>> Over the weekend my rastream process stopped processing records for some reason. The current setup is:
>>
>> netflow records -> radium -> rastream -M time 5m
>>
>> I noticed that records were no longer being written to disk. I connected a new ra instance to radium, and had no problems receiving records. Attaching strace to the rastream process all I could see were calls:
>>
>> <snip>
>> nanosleep({0, 50000000}, NULL) = 0
>> nanosleep({0, 50000000}, NULL) = 0
>> nanosleep({0, 50000000}, NULL) = 0
>> <snip>
>>
>> Is there any settings I can tweak or logs to check for or correct the issue? I vaguely recall something about persistent connections where if lost an attempt would be made to reconnect, but my gut says that’s not what’s happening here...
>>
>> Cheers,
>>
>> Jesse
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 204 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140708/37b5baec/attachment.sig>
More information about the argus
mailing list