rastream stopped processing

Carter Bullard carter at qosient.com
Tue Jul 8 12:49:50 EDT 2014


Looks like radium is doing the right thing, and that rastream() is falling behind.
So on the machine where rastream() is running, is it possible that the script
that it runs eats all the memory on the machine, which will cause rastream to
slow down in stream processing, backing up to radium, where radium shuts it down.

When radium hangs up, rastream should just reconnect, and I’m looking into why
that may not be happening, but what is your rastream script doing ???
Not sorting I hope ???

Carter


On Jul 8, 2014, at 12:31 PM, Jesse Bowling <jessebowling at gmail.com> wrote:

> 
> On Jul 8, 2014, at 11:05 AM, Carter Bullard <carter at qosient.com> wrote:
> 
>> Did radium stop collecting or sending ??  We’ve got some
>> reports on reliable connection failure, so it maybe your
>> rastream() disconnected and didn’t reconnect ????
> 
> It seems that radium is collecting; art least I can attach to the radium instance and receive 100 records with “ra -r 127.0.0.1 -N 100"
> 
>> check out your system log /var/log/messages /var/log/system.log
>> to see if radium complained about the client going away, or if
>> radium stopped reading.  If radium is still running you can just
>> connect to it, to see if its transmitting anything.
>> 
> It looks like it must be on the rastream side...??:
> 
> Jul  6 22:59:00 test radium[57599]: 2014-07-06 22:59:00.572718 connect from localhost[127.0.0.1]
> Jul  7 08:00:21 test radium[57599]: 2014-07-07 08:00:21.541077 ArgusWriteOutSocket(0x1269d0) client not processing: disconnecting
> 
> Likely unrelated, but I’m also seeing many of these messages in the logs:
> 
> Jul  2 16:31:26 test radium[47571]: 2014-07-02 16:31:26.358574 ArgusWriteOutSocket(0x181269d0) max queue exceeded 500001
> Jul  2 16:31:26 test radium[47571]: 2014-07-02 16:31:26.390583 ArgusWriteOutSocket(0x181269d0) max queue exceeded 500001
> 
> 
>> If there is a problem, and you’ve compiled with symbols in (.devel),
>> then attach to radium with gdb() and look to see if any of the
>> threads have terminated.
>> 
>> (gdb) attach pid.of.radium
>> (gdb) info threads
>> (gdb) thread 1
>> (gdb) where
>> (gdb) thread 2
>> (gdb) where
>> 
>> etc ….. may not be exact syntax, but its something like that.
>> With all the various end systems using clang and lldb, I’m kind
>> of schizophrenic on debugging right now.
>> 
> 
> Radium output:
> (gdb) info threads
>  3 Thread 0x7f001f752700 (LWP 57600)  0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>  2 Thread 0x7f001ed51700 (LWP 57601)  0x0000003b532acced in nanosleep () from /lib64/libc.so.6
> * 1 Thread 0x7f001fc87700 (LWP 57599)  0x0000003b532e15d3 in select () from /lib64/libc.so.6
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x7f001fc87700 (LWP 57599))]#0 0x0000003b532e15d3 in select () from /lib64/libc.so.6
> (gdb) where
> #0  0x0000003b532e15d3 in select () from /lib64/libc.so.6
> #1  0x00000000004669ee in ArgusReadStream (parser=0x7f001fb42010, queue=0x19511f0) at ./argus_client.c:738
> #2  0x000000000040746c in main (argc=3, argv=0x7fff4ae0a728) at ./argus_main.c:387
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7f001ed51700 (LWP 57601))]#0 0x0000003b532acced in nanosleep () from /lib64/libc.so.6
> (gdb) where
> #0  0x0000003b532acced in nanosleep () from /lib64/libc.so.6
> #1  0x0000003b532acb60 in sleep () from /lib64/libc.so.6
> #2  0x0000000000466455 in ArgusConnectRemotes (arg=0x1951190) at ./argus_client.c:579
> #3  0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
> #4  0x0000003b532e8b5d in clone () from /lib64/libc.so.6
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7f001f752700 (LWP 57600))]#0 0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> (gdb) where
> #0  0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x00000000004ac1be in ArgusOutputProcess (arg=0x1953310) at ./argus_output.c:897
> #2  0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
> #3  0x0000003b532e8b5d in clone () from /lib64/libc.so.6
> 
> Connecting to the failing rastream process gave odd results:
> 
> (gdb) detach
> Detaching from program: /usr/local/bin/radium, process 57599
> (gdb) attach 57605
> Attaching to program: /usr/local/bin/radium, process 57605
> Cannot access memory at address 0x706f636373007064
> (gdb) where
> #0  0x0000003b53a0ef3d in ?? ()
> #1  0x0000000000000000 in ?? ()
> (gdb) info threads
> * 1 process 57605  0x0000003b53a0ef3d in ?? ()
> 
> What should my next step be? Ensure the reliable connection setting is on? Run rastream under gdb?
> 
> Thanks and cheers,
> 
> Jesse
> 
>> Carter
>> 
>> 
>> On Jul 7, 2014, at 4:28 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> Over the weekend my rastream process stopped processing records for some reason. The current setup is:
>>> 
>>> netflow records -> radium -> rastream -M time 5m
>>> 
>>> I noticed that records were no longer being written to disk. I connected a new ra instance to radium, and had no problems receiving records. Attaching strace to the rastream process all I could see were calls:
>>> 
>>> <snip>
>>> nanosleep({0, 50000000}, NULL)          = 0
>>> nanosleep({0, 50000000}, NULL)          = 0
>>> nanosleep({0, 50000000}, NULL)          = 0 
>>> <snip>
>>> 
>>> Is there any settings I can tweak or logs to check for or correct the issue? I vaguely recall something about persistent connections where if lost an attempt would be made to reconnect, but my gut says that’s not what’s happening here...
>>> 
>>> Cheers,
>>> 
>>> Jesse

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140708/a305fd49/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140708/a305fd49/attachment.sig>


More information about the argus mailing list