rastream stopped processing
Carter Bullard
carter at qosient.com
Tue Jul 8 15:56:22 EDT 2014
Hey Jesse,
The reconnection is now fixed, and will be in the argus-clients-3.0.8.rc.4
tarball I’ll put up tonight, so that shouldn’t be an issue, at least
on paper.
The trick to these events is for your infrastructure to survive, and keep
processing. So there are two ways to go, from a systems design in our
case. We can either fail when we can’t transmit to a client, or we
can drop records, or a combination of both. All of these options are
currently in the radium output engine and argus distribution architecture.
When you run into the max queue exceeded message, radium
should be dropping records. Currently we’re designed to drop
128 records at a time to ease the queue pressure and provide
memory, and we shouldn’t disconnect. I’m looking at that now, and
if there are any problems, I’ll have a fix in the the new code tonight.
Sometimes, clients fail and go away. To detect that and respond,
we have a maximum number of socket errors, that we track. These are where
we try to write to the socket, but we’re failing as the socket output
queues are full. We count these and disconnect when they hit 500,000,
as well. These get reset when we do get through, and so we have to have
a lot of errors in a row to disconnect. If the socket is gone, we
disconnect immediately.
So, I’ll make it so that we don’t reject the queue due to max exceeded
errors, but we’ll still disconnect if the client is gone (many errors in
a row).
The ratop() not handling the :q, the curses thread may have failed (that
is the thread that is reading the curses input). If you could attach to
it with gdb() and check the threads and see if you can print the stack
“where” for the one that is dead, I’ll fix that post haste !!!!
Carter
On Jul 8, 2014, at 3:34 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
> Thanks for that Carter.
>
> It looks like with the rmon option ratop can’t keep up as I get:
>
> Jul 8 15:25:43 test radium[40569]: 2014-07-08 15:25:43.153935 connect from localhost[127.0.0.1]
> Jul 8 15:26:48 test radium[40569]: 2014-07-08 15:26:48.708503 ArgusWriteOutSocket(0x136dc0) max queue exceeded 500001
> Jul 8 15:26:48 test radium[40569]: 2014-07-08 15:26:48.708594 ArgusWriteOutSocket(0x136dc0) max queue exceeded 500001
>
> along with the symptom that :q won’t quit ratop...
>
> As far as the rastream issue I originally reported, I’ll test the theory that having the reconnect option turned on helps...However I’m a bit worried about the implications of a disconnect. Is it fair to assume that a disconnect in my case is implying that rastream can’t keep up with the incoming records? If so, is there anything else I can tweak?
>
> I’m concerned about the timing of my first rastream failure (we were being massively scanned at that time, based on activity to our honeypots which saw a 17 fold increase in attempts) and about later on in the year (when our traffic levels tend to dramatically increase compared to the summer...
>
> Cheers,
>
> Jesse
>
>
> On Jul 8, 2014, at 3:22 PM, Carter Bullard <carter at qosient.com> wrote:
>
>> Hey Jesse,
>> Looks good.
>>
>> ratop() removes idle entries, and the timeout is default 60 seconds.
>>
>> So, 31 source originating AS’s active in the last 60 seconds (assuming defaults).
>> “ProcessQueue" is total number of caches ratop is tracking, “DisplayQueue”
>> is the number on the screen.
>>
>> So with a -m sas, you are only tracking the source originating AS numbers.
>> If you want to see all AS's, use the “-M rmon” option, so that you track all
>> (source and destination) as a single object.
>>
>> ratop -M rmon -m sas
>>
>> and you will see all the AS’s as singular objects. The only thing to print
>> in that case is:
>>
>> -s stime flgs sas spkts:16 dpkts:16 sbytes:20 dbytes:20 pcr
>>
>>
>> If you want to track the AS matrix, which unique sAS and dAS’s, DONT use
>> the -M rmon option, just add the das to the aggregation mask
>>
>> ratop -m sas das
>>
>> Now you will see the AS matrix.
>>
>> Check out your timeouts using the “:T” value, and change that to say 3600.
>>
>> Carter
>>
>>
>>
>> On Jul 8, 2014, at 3:13 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>
>>>
>>> On Jul 8, 2014, at 2:50 PM, Carter Bullard <carter at qosient.com> wrote:
>>>
>>>> Hey Jesse,
>>>> Well, 17K records per second is also a bit snappy. At that rate,
>>>> your queues can back up in possibly 10-20 seconds, which isn’t
>>>> that long. Looks like you hit it in about 16 seconds.
>>>>
>>>> Because the rate is a bit snappy, your problems with rastream()
>>>> maybe related to the duration of the run of the script.
>>>> So your rastream() doesn’t block to run its script, but there
>>>> maybe some weird locking such that it doesn’t process its queue
>>>> if the script is running. This isn’t right so I’ll look into
>>>> that.
>>>>
>>>> So, ratop was bumped off because it was slow, just like your
>>>> rastream was bumped off, and its now just sitting there, wondering
>>>> what to do with no input. Does it show any records on the screen ???
>>>>
>>>> If you don’t care about sorting, which is what is killing ratop(),
>>>> you can set
>>>>
>>>> You should be able to navigate
>>>> and see the 100Ks of records in its cache ???
>>>>
>>>> If you want to run ratop, definately change the UPDATE variable in
>>>> its rarc file, and try running with a “-m” option to reduce the number
>>>> of objects that its tracking, like:
>>>>
>>>> -m matrix/16
>>>> -m smac dmac
>>>> -m mac addr (this will kickin the -M rmon option, so print out the smac saddr fields)
>>>>
>>>> or your -m sas value if you’ve labeled them.
>>>>
>>>
>>> OK, setting -m sas let’s ratop not backlog...and now I know the actual RPS is around 30K-35K/sec. :)
>>>
>>> Nothing crashes, but I’m not convinced I’m seeing everything either...Does this look correct?
>>>
>>> ProcessQueue 31 DisplayQueue 31 TotalRecords 15079221 Secs 34473.2227 rps Status Active
>>>
>>> The ProcessQueue and DisplayQueue totals don’t really vary (although sometimes it drops to 30)...Is that letting me know what is drawn on the screen (as opposed to a backlog, which is what I assumed at first)?
>>>
>>> I also see what seem like data oddities:
>>>
>>> StartTime Flgs Proto SrcAddr Sport sCo sAS Dir DstAddr Dport dCo dAS SrcPkts DstPkts SrcBytes DstBytes
>>> 2014-07-08 14:52:21.698000 N ip 0.0.0.0 US <snip> <-> 0.0.0.0 20940 2695281* 1060512* 156417443277 88651276430
>>> 2014-07-08 14:52:22.293000 N ip 0.0.0.0 <-> 0.0.0.0 51007216 228 32196610358 19574
>>>
>>> Specifically, I doubt that the top entry traffic is all destined for AS 20940; a second run and all traffic is headed to AS 1668, a third 33739...Should I assume that das doesn’t make sense here? Also, I’m collecting Netflow records, so I would expect dst counters (bytes, pkts, etc) to all be zero, but for some of the top ASN’s, it’s not.
>>>
>>> As a final touch, when I hit :q to quit, I got on two of three runs (appears to be related to how long I let ratop run):
>>>
>>> Segmentation fault (core dumped)
>>>
>>> Thanks for all the help and sorry for the likely silly questions; I’ve just not used ratop much...
>>>
>>> Cheers,
>>>
>>> Jesse
>>>
>>>> Carter
>>>>
>>>>
>>>> On Jul 8, 2014, at 2:24 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>>>
>>>>>
>>>>> On Jul 8, 2014, at 2:07 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>>
>>>>>> well, 7K records per second is a bit snappy.
>>>>>> are you using the latest ratop() as earlier versions have real performance problems ??
>>>>>>
>>>>> $ ratop --version
>>>>> RaCurses Version 3.0.7.36
>>>>>
>>>>> Updated to 3.0.8.rc.1 and now...
>>>>>
>>>>> Actually, looking at it now I’m seeing a starting value of around 17K/sec and similar behavior where the RPS falls off constantly and the queues build to around 100K each...then around then then the queues start falling and although the status remains “Active” records are no longer displayed, and after a :q command ratop doesn’t actually quit:
>>>>>
>>>>> Jul 8 14:19:10 test radium[40569]: 2014-07-08 14:19:10.453649 connect from localhost[127.0.0.1]
>>>>> Jul 8 14:19:26 test radium[40569]: 2014-07-08 14:19:26.774006 ArgusWriteOutSocket(0x136ce0) max queue exceeded 500001
>>>>> Jul 8 14:19:26 test radium[40569]: 2014-07-08 14:19:26.774141 ArgusWriteOutSocket(0x136ce0) max queue exceeded 500001
>>>>>
>>>>> (gdb) where
>>>>> #0 0x0000003b53a0822d in pthread_join () from /lib64/libpthread.so.0
>>>>> #1 0x000000000040519d in main (argc=5, argv=0x7fffa260f248) at ./racurses.c:85
>>>>> (gdb) info threads
>>>>> 2 Thread 0x7fe176f90700 (LWP 40799) 0x0000003b53a0ef3d in nanosleep () from /lib64/libpthread.so.0
>>>>> * 1 Thread 0x7fe178d5c700 (LWP 40798) 0x0000003b53a0822d in pthread_join () from /lib64/libpthread.so.0
>>>>> (gdb) thread 2
>>>>> [Switching to thread 2 (Thread 0x7fe176f90700 (LWP 40799))]#0 0x0000003b53a0ef3d in nanosleep () from /lib64/libpthread.so.0
>>>>> (gdb) where
>>>>> #0 0x0000003b53a0ef3d in nanosleep () from /lib64/libpthread.so.0
>>>>> #1 0x00000000004055c2 in ArgusCursesProcess (arg=0x0) at ./racurses.c:234
>>>>> #2 0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
>>>>> #3 0x0000003b532e8b5d in clone () from /lib64/libc.so.6
>>>>> (gdb)
>>>>>
>>>>>
>>>>>
>>>>>> And by default on ratop, your 100K records are being sorted 10x per second, if
>>>>>> so you will have some performance issues. 500K records in the queue for 7K records
>>>>>> a second, is only 70 seconds of build up. Still, rastream() should keep up.
>>>>>>
>>>>>> To run ratop() at these rates, you will want to modify the RA_UPDATE_INTERVAL in
>>>>>> your .rarc file to so that it doesn’t update the screen, which means sort the
>>>>>> entire queue, every tenth of a second. Or use the “:u” option to modify it
>>>>>> on the fly, once you start ratop(). Every 2 seconds works pretty good.
>>>>>>
>>>>>> Carter
>>>>>>
>>>>>> On Jul 8, 2014, at 1:59 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Jul 8, 2014, at 1:12 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>>>>
>>>>>>>> That type of processing shouldn’t cause any problems, but
>>>>>>>> you can be doing that level of processing in real time with
>>>>>>>> rasqlinsert() or doing your rabins() as another task reading
>>>>>>>> from radium. You can also be running a local radium
>>>>>>>> that is collecting from the remote radium, so that many local
>>>>>>>> clients can feed from the local radium, but you’ve only written
>>>>>>>> the data once onto the wire. That maybe the best way to
>>>>>>>> go.
>>>>>>>>
>>>>>>>> Hmmmm, so is rastream() busy ?? Lots of records per second ???
>>>>>>>> ratop connected to the radium, will print the average rps on
>>>>>>>> the bottom of the screen.
>>>>>>>>
>>>>>>>
>>>>>>> rastream doesn’t appear to be that busy, CPU-wise at the moment, however ratop definitely has issues keeping up it seems...
>>>>>>>
>>>>>>> Immediately upon connecting (ratop -S 127.0.0.1) the rps is about 7K, and the ProcessQueue and DisplayQueue rapidly increase until they hit about 100K (as the RPS drops during that time)...Once the two queue values drop to 0, the status changes to idle and ratop won’t respond to commands (:q, Ctrl-C, Ctrl-Z)...Additionally I get the queue exceed messages out of /var/log/messages:
>>>>>>>
>>>>>>> Jul 8 13:51:31 test radium[9015]: 2014-07-08 13:51:31.427787 connect from localhost[127.0.0.1]
>>>>>>> Jul 8 13:51:50 test radium[9015]: 2014-07-08 13:51:50.591503 ArgusWriteOutSocket(0x78136ce0) max queue exceeded 500001
>>>>>>> Jul 8 13:51:50 test radium[9015]: 2014-07-08 13:51:50.591591 ArgusWriteOutSocket(0x78136ce0) max queue exceeded 500001
>>>>>>>
>>>>>>> (no similar messages for rastream since last restart)
>>>>>>>
>>>>>>> Any other steps to look at here?
>>>>>>>
>>>>>>> In the meantime I’ll look at using a second radium collecting from the first to feed clients...
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Jesse
>>>>>>>
>>>>>>>> Carter
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jul 8, 2014, at 1:08 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jul 8, 2014, at 12:49 PM, Carter Bullard <carter at qosient.com> wrote:
>>>>>>>>>
>>>>>>>>>> Looks like radium is doing the right thing, and that rastream() is falling behind.
>>>>>>>>>> So on the machine where rastream() is running, is it possible that the script
>>>>>>>>>> that it runs eats all the memory on the machine, which will cause rastream to
>>>>>>>>>> slow down in stream processing, backing up to radium, where radium shuts it down.
>>>>>>>>>>
>>>>>>>>>> When radium hangs up, rastream should just reconnect, and I’m looking into why
>>>>>>>>>> that may not be happening, but what is your rastream script doing ???
>>>>>>>>>> Not sorting I hope ???
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Well, the ra.conf I was using did not have the reliable connect option on, but it’s now set thusly:
>>>>>>>>>
>>>>>>>>> RA_RELIABLE_CONNECT=yes
>>>>>>>>>
>>>>>>>>> so that might at least help that...The post-process script looks like this:
>>>>>>>>>
>>>>>>>>> #!/bin/bash
>>>>>>>>>
>>>>>>>>> #
>>>>>>>>> # Argus Client Software. Tools to read, analyze and manage Argus data.
>>>>>>>>> # Copyright (C) 2000-2014 QoSient, LLC.
>>>>>>>>> # All Rights Reserved
>>>>>>>>> #
>>>>>>>>> # Script called by rastream, to process files.
>>>>>>>>> #
>>>>>>>>> # Since this is being called from rastream(), it will have only a single
>>>>>>>>> # parameter, filename,
>>>>>>>>> #
>>>>>>>>> # Carter Bullard <carter at qosient.com>
>>>>>>>>> #
>>>>>>>>>
>>>>>>>>> PATH="/usr/local/bin:$PATH"; export PATH
>>>>>>>>> package="argus-clients"
>>>>>>>>> version="3.0.8rc3"
>>>>>>>>>
>>>>>>>>> OPTIONS="$*"
>>>>>>>>> FILE=
>>>>>>>>> while test $# != 0
>>>>>>>>> do
>>>>>>>>> case "$1" in
>>>>>>>>> -r) shift; FILE="$1"; break;;
>>>>>>>>> esac
>>>>>>>>> shift
>>>>>>>>> done
>>>>>>>>>
>>>>>>>>> RANGE=`/usr/local/bin/ratimerange -p 0 -r ${FILE} | awk -F ' - ' '{print $2}'`
>>>>>>>>> ERANGE=`date +%s -d "${RANGE}"`
>>>>>>>>> NOW=`date +%s`
>>>>>>>>> if [[ "${ERANGE}" > "${NOW}" ]]
>>>>>>>>> then
>>>>>>>>> echo "${RANGE}" | /bin/mailx -s "ratimerange reporting bad dates" me@${DAYJOB}
>>>>>>>>> fi
>>>>>>>>>
>>>>>>>>> ASN_DIR=/asn/`date +%Y/%m/%d`
>>>>>>>>> mkdir -p ${ASN_DIR}
>>>>>>>>> ASN_FILE=`basename ${FILE}`
>>>>>>>>> /usr/local/bin/racluster -m sas -r ${FILE} -w ${ASN_DIR}/${ASN_FILE}
>>>>>>>>>
>>>>>>>>> As an aside, would it be a better strategy to attach an rabins process to radium aggregating on sas and writing out a file on some interval (say one minute)?
>>>>>>>>>
>>>>>>>>> Running that racluster manually shows it only using a few megs of memory, and for the most part this machine appears to have lots of free memory:
>>>>>>>>>
>>>>>>>>> 54.51user 0.51system 0:55.42elapsed 99%CPU (0avgtext+0avgdata 76848maxresident)k
>>>>>>>>> 0inputs+0outputs (0major+5426minor)pagefaults 0swaps
>>>>>>>>>
>>>>>>>>> Here’s a snapshot of the box as it is currently (collecting netflow and not running the post-process script):
>>>>>>>>>
>>>>>>>>> $ free -m
>>>>>>>>> total used free shared buffers cached
>>>>>>>>> Mem: 7864 7746 118 0 48 7163
>>>>>>>>> -/+ buffers/cache: 534 7329
>>>>>>>>> Swap: 9919 63 9856
>>>>>>>>>
>>>>>>>>> So almost all of it’s 8 GB is free (aside from file caching of course)
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Jesse
>>>>>>>>>
>>>>>>>>>> Carter
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Jul 8, 2014, at 12:31 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Jul 8, 2014, at 11:05 AM, Carter Bullard <carter at qosient.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Did radium stop collecting or sending ?? We’ve got some
>>>>>>>>>>>> reports on reliable connection failure, so it maybe your
>>>>>>>>>>>> rastream() disconnected and didn’t reconnect ????
>>>>>>>>>>>
>>>>>>>>>>> It seems that radium is collecting; art least I can attach to the radium instance and receive 100 records with “ra -r 127.0.0.1 -N 100"
>>>>>>>>>>>
>>>>>>>>>>>> check out your system log /var/log/messages /var/log/system.log
>>>>>>>>>>>> to see if radium complained about the client going away, or if
>>>>>>>>>>>> radium stopped reading. If radium is still running you can just
>>>>>>>>>>>> connect to it, to see if its transmitting anything.
>>>>>>>>>>>>
>>>>>>>>>>> It looks like it must be on the rastream side...??:
>>>>>>>>>>>
>>>>>>>>>>> Jul 6 22:59:00 test radium[57599]: 2014-07-06 22:59:00.572718 connect from localhost[127.0.0.1]
>>>>>>>>>>> Jul 7 08:00:21 test radium[57599]: 2014-07-07 08:00:21.541077 ArgusWriteOutSocket(0x1269d0) client not processing: disconnecting
>>>>>>>>>>>
>>>>>>>>>>> Likely unrelated, but I’m also seeing many of these messages in the logs:
>>>>>>>>>>>
>>>>>>>>>>> Jul 2 16:31:26 test radium[47571]: 2014-07-02 16:31:26.358574 ArgusWriteOutSocket(0x181269d0) max queue exceeded 500001
>>>>>>>>>>> Jul 2 16:31:26 test radium[47571]: 2014-07-02 16:31:26.390583 ArgusWriteOutSocket(0x181269d0) max queue exceeded 500001
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> If there is a problem, and you’ve compiled with symbols in (.devel),
>>>>>>>>>>>> then attach to radium with gdb() and look to see if any of the
>>>>>>>>>>>> threads have terminated.
>>>>>>>>>>>>
>>>>>>>>>>>> (gdb) attach pid.of.radium
>>>>>>>>>>>> (gdb) info threads
>>>>>>>>>>>> (gdb) thread 1
>>>>>>>>>>>> (gdb) where
>>>>>>>>>>>> (gdb) thread 2
>>>>>>>>>>>> (gdb) where
>>>>>>>>>>>>
>>>>>>>>>>>> etc ….. may not be exact syntax, but its something like that.
>>>>>>>>>>>> With all the various end systems using clang and lldb, I’m kind
>>>>>>>>>>>> of schizophrenic on debugging right now.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Radium output:
>>>>>>>>>>> (gdb) info threads
>>>>>>>>>>> 3 Thread 0x7f001f752700 (LWP 57600) 0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>>>>>>>>>> 2 Thread 0x7f001ed51700 (LWP 57601) 0x0000003b532acced in nanosleep () from /lib64/libc.so.6
>>>>>>>>>>> * 1 Thread 0x7f001fc87700 (LWP 57599) 0x0000003b532e15d3 in select () from /lib64/libc.so.6
>>>>>>>>>>> (gdb) thread 1
>>>>>>>>>>> [Switching to thread 1 (Thread 0x7f001fc87700 (LWP 57599))]#0 0x0000003b532e15d3 in select () from /lib64/libc.so.6
>>>>>>>>>>> (gdb) where
>>>>>>>>>>> #0 0x0000003b532e15d3 in select () from /lib64/libc.so.6
>>>>>>>>>>> #1 0x00000000004669ee in ArgusReadStream (parser=0x7f001fb42010, queue=0x19511f0) at ./argus_client.c:738
>>>>>>>>>>> #2 0x000000000040746c in main (argc=3, argv=0x7fff4ae0a728) at ./argus_main.c:387
>>>>>>>>>>> (gdb) thread 2
>>>>>>>>>>> [Switching to thread 2 (Thread 0x7f001ed51700 (LWP 57601))]#0 0x0000003b532acced in nanosleep () from /lib64/libc.so.6
>>>>>>>>>>> (gdb) where
>>>>>>>>>>> #0 0x0000003b532acced in nanosleep () from /lib64/libc.so.6
>>>>>>>>>>> #1 0x0000003b532acb60 in sleep () from /lib64/libc.so.6
>>>>>>>>>>> #2 0x0000000000466455 in ArgusConnectRemotes (arg=0x1951190) at ./argus_client.c:579
>>>>>>>>>>> #3 0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
>>>>>>>>>>> #4 0x0000003b532e8b5d in clone () from /lib64/libc.so.6
>>>>>>>>>>> (gdb) thread 3
>>>>>>>>>>> [Switching to thread 3 (Thread 0x7f001f752700 (LWP 57600))]#0 0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>>>>>>>>>> (gdb) where
>>>>>>>>>>> #0 0x0000003b53a0b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>>>>>>>>>> #1 0x00000000004ac1be in ArgusOutputProcess (arg=0x1953310) at ./argus_output.c:897
>>>>>>>>>>> #2 0x0000003b53a079d1 in start_thread () from /lib64/libpthread.so.0
>>>>>>>>>>> #3 0x0000003b532e8b5d in clone () from /lib64/libc.so.6
>>>>>>>>>>>
>>>>>>>>>>> Connecting to the failing rastream process gave odd results:
>>>>>>>>>>>
>>>>>>>>>>> (gdb) detach
>>>>>>>>>>> Detaching from program: /usr/local/bin/radium, process 57599
>>>>>>>>>>> (gdb) attach 57605
>>>>>>>>>>> Attaching to program: /usr/local/bin/radium, process 57605
>>>>>>>>>>> Cannot access memory at address 0x706f636373007064
>>>>>>>>>>> (gdb) where
>>>>>>>>>>> #0 0x0000003b53a0ef3d in ?? ()
>>>>>>>>>>> #1 0x0000000000000000 in ?? ()
>>>>>>>>>>> (gdb) info threads
>>>>>>>>>>> * 1 process 57605 0x0000003b53a0ef3d in ?? ()
>>>>>>>>>>>
>>>>>>>>>>> What should my next step be? Ensure the reliable connection setting is on? Run rastream under gdb?
>>>>>>>>>>>
>>>>>>>>>>> Thanks and cheers,
>>>>>>>>>>>
>>>>>>>>>>> Jesse
>>>>>>>>>>>
>>>>>>>>>>>> Carter
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Jul 7, 2014, at 4:28 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Over the weekend my rastream process stopped processing records for some reason. The current setup is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> netflow records -> radium -> rastream -M time 5m
>>>>>>>>>>>>>
>>>>>>>>>>>>> I noticed that records were no longer being written to disk. I connected a new ra instance to radium, and had no problems receiving records. Attaching strace to the rastream process all I could see were calls:
>>>>>>>>>>>>>
>>>>>>>>>>>>> <snip>
>>>>>>>>>>>>> nanosleep({0, 50000000}, NULL) = 0
>>>>>>>>>>>>> nanosleep({0, 50000000}, NULL) = 0
>>>>>>>>>>>>> nanosleep({0, 50000000}, NULL) = 0
>>>>>>>>>>>>> <snip>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there any settings I can tweak or logs to check for or correct the issue? I vaguely recall something about persistent connections where if lost an attempt would be made to reconnect, but my gut says that’s not what’s happening here...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jesse
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20140708/47b4231f/attachment.sig>
More information about the argus
mailing list