rastream exits after multi day run
Jesse Bowling
jessebowling at gmail.com
Thu Oct 30 17:14:56 EDT 2014
So perhaps rastream is the symptom and not the cause...I checked in on the restarted process just now and noticed that the “process_argus_file.bash” was still working on a file from 10 am this morning...Specifically it seemed stuck on the racluster command. I attached with strace initially and then gdb and it seems to be stuck in this loop:
5608 unsigned short *ptr1 = (unsigned short *) hstruct->buf;
(gdb)
5609 unsigned short *ptr2 = (unsigned short *) target->hstruct.buf;
(gdb)
5611 if (ptr1 && ptr2) {
(gdb)
5612 for (i = 0, len = hstruct->len/sizeof(unsigned short); i < len; i++)
(gdb)
5613 if (*ptr1++ != *ptr2++)
(gdb)
5612 for (i = 0, len = hstruct->len/sizeof(unsigned short); i < len; i++)
(gdb)
5613 if (*ptr1++ != *ptr2++)
(gdb)
5612 for (i = 0, len = hstruct->len/sizeof(unsigned short); i < len; i++)
(gdb)
5613 if (*ptr1++ != *ptr2++)
(gdb)
5614 break;
(gdb)
5615 if (i == len) {
(gdb)
5611 if (ptr1 && ptr2) {
(gdb)
5626 target = target->nxt;
(gdb)
5627 } while (target != head);
(gdb) where
#0 0x0000000000475c21 in ArgusFindRecord (htable=0x1b74470, hstruct=0x1b75078) at ./argus_client.c:5609
#1 0x0000000000405c0d in RaProcessThisRecord (parser=0x7fac4def9010, argus=0x7fac4de07630) at ./racluster.c:720
#2 0x000000000040591d in RaProcessRecord (parser=0x7fac4def9010, ns=0x7fac4de07630) at ./racluster.c:667
#3 0x0000000000416688 in RaScheduleRecord (parser=0x7fac4def9010, ns=0x7fac4de07630) at ./argus_util.c:2664
#4 0x0000000000416ca5 in ArgusHandleRecord (parser=0x7fac4def9010, input=0x7fac4de07010, ptr=0x7fac4d9fa7c4, filter=0x7fac4e01a800) at ./argus_util.c:2776
#5 0x0000000000466d2f in ArgusReadStreamSocket (parser=0x7fac4def9010, input=0x7fac4de07010) at ./argus_client.c:440
#6 0x0000000000466f3d in ArgusReadFileStream (parser=0x7fac4def9010, input=0x7fac4de07010) at ./argus_client.c:496
#7 0x0000000000407841 in main (argc=5, argv=0x7fffc8ebbd68) at ./argus_main.c:253
Meanwhile, the scratch space for the original file gets filled up and rastream has nowhere to write to...Which I believe is why rastream eventually fails. Thoughts? Here’s where the rastream process is stuck:
(gdb) where
#0 0x0000003b436db4ad in write () from /lib64/libc.so.6
#1 0x0000003b43671ae3 in _IO_new_file_write () from /lib64/libc.so.6
#2 0x0000003b436719aa in _IO_new_file_xsputn () from /lib64/libc.so.6
#3 0x0000003b43649691 in buffered_vfprintf () from /lib64/libc.so.6
#4 0x0000003b4364422e in vfprintf () from /lib64/libc.so.6
#5 0x0000003b4364f0f8 in fprintf () from /lib64/libc.so.6
#6 0x000000000044fcf4 in ArgusDebug (d=1, fmt=0x4ac7a0 "ArgusProcessScripts(): waitpid(%d) returned for %d") at ./argus_util.c:20325
#7 0x00000000004051f2 in ArgusProcessScripts () at ./rastream.c:546
#8 0x00000000004053d0 in RaParseComplete (sig=3) at ./rastream.c:601
#9 0x000000000045f736 in ArgusWriteNewLogfile (parser=0x7fc202419010, input=0x7fc202098010, wfile=0x1da9200, argus=0x7fffab5bcb50) at ./argus_util.c:25872
#10 0x0000000000407535 in RaSendArgusRecord (argus=0x1da8fc0) at ./rastream.c:1345
#11 0x00000000004064fa in RaProcessRecord (parser=0x7fc202419010, ns=0x7fc202098630) at ./rastream.c:963
#12 0x0000000000422b48 in RaScheduleRecord (parser=0x7fc202419010, ns=0x7fc202098630) at ./argus_util.c:2664
#13 0x0000000000423165 in ArgusHandleRecord (parser=0x7fc202419010, input=0x7fc202098010, ptr=0x7fc1f12959e0, filter=0x7fc20253a800) at ./argus_util.c:2776
#14 0x000000000046849f in ArgusReadStreamSocket (parser=0x7fc202419010, input=0x7fc202098010) at ./argus_client.c:440
#15 0x0000000000469024 in ArgusReadStream (parser=0x7fc202419010, queue=0x1d841f0) at ./argus_client.c:755
#16 0x00000000004095e4 in main (argc=16, argv=0x7fffab60d1b8) at ./argus_main.c:387
(gdb) n
Single stepping until exit from function write,
which has no line number information.
^C
Program received signal SIGINT, Interrupt.
0x0000003b436db4ad in write () from /lib64/libc.so.6
I also noticed while watching the -D3 log that I have a LOT of this type of message:
rastream[33066.00e75502c27f0000]: 2014-10-30 10:51:34.829949 RaSendArgusRecord () rejecting late record secs 1414680600 done file secs
Is that normal?? Seems unrelated put potentially problematic?
Cheers,
Jesse
On Oct 30, 2014, at 11:18 AM, Carter Bullard <carter at qosient.com> wrote:
> Hey Jesse,
> Looks like rastream() isn’t keeping up and radium.1 is dropping the connection.
> Your script is a bit heavy, but that runs in a separate process, and shouldn’t
> affect rastream, at least that is the design.
>
> In gdb, you can’t run with the daemon flag, so take the ‘-d' out. You say it crashes ???
> If so, can you type ‘where’ so we can see where the fault occurs ??
>
> Run rastream with a ‘-D3’. That will generate messages about the script and any time shift issues. ‘-D4’ will add the socket read messages, which maybe of interest, but it may
> generate quite a bit of messages. Try -D3 to see if we see anything interesting.
>
>
> Carter
>
>
>> On Oct 29, 2014, at 8:09 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>
>> Hi,
>>
>> I’ve had an issue using rastream recently. What I’m seeing is that rastream will run successfully for a while (usually around 2 days or less) and then will simply exit. The only log messages I get are:
>>
>> Oct 15 13:06:24 netf-test radium[33140]: 2014-10-15 13:06:24.908361 started
>> Oct 15 13:06:29 netf-test radium[33140]: 2014-10-15 13:06:29.599856 connect from localhost[127.0.0.1]
>> Oct 16 10:52:59 netf-test radium[33140]: 2014-10-16 10:52:59.537044 ArgusWriteOutSocket(0xb81269d0) client not processing: disconnecting
>>
>> I tried running under gdb, but it quickly finishes/crashes...I’d can run it with -D, but unsure what level might catch what’s happening. Any recommendations?
>>
>> Any ideas of why this might be exiting? Is the bash script too heavy? Would it help to call a wrapper script that simply backgrounds the actual script I want to run?
>>
>> Config information below.
>>
>> Cheers,
>>
>> Jesse
>>
>> $ egrep -v '^#|^[ ]*$' /etc/radium.conf
>> RADIUM_DAEMON="yes"
>> RADIUM_MONITOR_ID=10.138.3.168
>> RADIUM_MAR_STATUS_INTERVAL=60
>> RADIUM_ARGUS_SERVER=cisco://10.138.3.168:9995
>> RADIUM_ACCESS_PORT=561
>> RADIUM_BIND_IP=127.0.0.1
>> RADIUM_CLASSIFIER_FILE=/etc/ralabel.conf
>>
>> $ egrep -v '^#|^[ ]*$' /etc/ra.conf
>> RA_SET_PID="no"
>> RA_PID_PATH="/var/run"
>> RA_RUN_TIME=0
>> RA_GENERATE_BIN_MAR_RECORDS=yes
>> RA_PRINT_LABELS=0
>> RA_FIELD_SPECIFIER="stime:25 flgs proto saddr sport sco sas dir daddr dport dco das spkts dpkts sbytes dbytes"
>> RA_FIELD_DELIMITER=''
>> RA_SEPARATE_ADDR_FROM_PORT_WITH_PERIOD="no"
>> RA_PRINT_NAMES='none'
>> RA_PRINT_RESPONSE_DATA=no
>> RA_PRINT_UNIX_TIME=no
>> RA_TIME_FORMAT="%F %T.%f"
>> RA_USEC_PRECISION=6
>> RA_USERDATA_ENCODE=Ascii
>> RA_SORT_ALGORITHMS="dpkts spkts saddr dport daddr sport"
>> RA_UPDATE_INTERVAL=2
>> RA_DELEGATED_IP="/usr/local/argus/delegated-ipv4-latest"
>> RA_RELIABLE_CONNECT=yes
>> RA_COLOR_SUPPORT="yes"
>> RA_COLOR_CONFIG=/etc/racolor.conf
>> RA_LOCAL=/usr/local/argus/local.addrs
>>
>> Starting like this:
>>
>> /usr/local/sbin/radium -f /etc/radium.conf
>> /usr/local/bin/rastream -F /etc/ra.conf -M time 5m -B 10s -S 127.0.0.1 -w /srv/scratch/%Y-%m-%d_%H-%M-%S.argus -f /argus/process_argus_file.bash -d
>>
>> Contents of process_argus_file.bash:
>>
>> #!/bin/bash
>>
>> #
>> # Argus Client Software. Tools to read, analyze and manage Argus data.
>> # Copyright (C) 2000-2014 QoSient, LLC.
>> # All Rights Reserved
>> #
>> # Script called by rastream, to process files.
>> #
>> # Since this is being called from rastream(), it will have only a single
>> # parameter, filename,
>> #
>> # Carter Bullard <carter at qosient.com>
>> #
>>
>> PATH="/usr/local/bin:$PATH"; export PATH
>> package="argus-clients"
>> version="3.0.8rc3"
>>
>> OPTIONS="$*"
>> FILE=
>> while test $# != 0
>> do
>> case "$1" in
>> -r) shift; FILE="$1"; break;;
>> esac
>> shift
>> done
>>
>> # Cluster ASN level data
>> ASN_DIR=/net/nfdata/asn/`date +%Y/%m/%d`
>> mkdir -p ${ASN_DIR}
>> ASN_FILE=`basename ${FILE}`
>> /usr/local/bin/racluster -m sas -r ${FILE} -w ${ASN_DIR}/${ASN_FILE}
>> chgrp -R oititso ${ASN_DIR}
>> chmod -R g+r ${ASN_DIR}
>>
>> BASE=`basename ${FILE}`
>> DIR=/net/nfdata/archive/`echo ${BASE}|sed -e 's/_.*//'|awk -F '-' '{print $1"/"$2"/"$3}'`
>> mkdir -p $DIR
>> /usr/local/bin/racluster -r ${FILE} -w ${DIR}/${BASE} && rm ${FILE}
>> chgrp -R oititso ${DIR}
>> find /net/nfdata/archive/ -type f -exec chmod -R 0640 {} \;
>> find /net/nfdata/archive/ -type d -exec chmod -R 0750 {} \;
>>
>>
>
More information about the argus
mailing list