radium fails (rc.50 from 2007-08-31)

Peter Van Epp vanepp at sfu.ca
Mon Sep 3 16:58:38 EDT 2007


	Looks like there are issues with .threads (and may be -J). The argus
I started on our Internet link has frozen, and doesn't seem to be responding
to a HUP.

root      5193  5.0 36.2 1468932 1426644 ?     SLl  10:04  10:36 argus -J -P 560 -i eth0 -i eth1 -U 512 -m -F /scratch/argus.conf
root      5536  0.0  0.0   3132   832 pts/1    S+   13:36   0:00 grep argus
hcids:/scratch # gdb64 argus 5193
GNU gdb 6.5
(gdb) where
#0  0x00000400002fa904 in ___newselect_nocancel ()
   from /lib64/power5+/libc.so.6
#1  0x0000000010019a3c in ArgusGetPackets (src=0x102006c0)
    at ArgusSource.c:1648
#2  0x0000000010006308 in main (argc=13, argv=0xfffffb24188) at argus.c:545
(gdb) where
#0  0x00000400002fa904 in ___newselect_nocancel ()
   from /lib64/power5+/libc.so.6
#1  0x0000000010019a3c in ArgusGetPackets (src=0x102006c0)
    at ArgusSource.c:1648
#2  0x0000000010006308 in main (argc=13, argv=0xfffffb24188) at argus.c:545
(gdb) where
#0  0x00000400002fa904 in ___newselect_nocancel ()
   from /lib64/power5+/libc.so.6
#1  0x0000000010019a3c in ArgusGetPackets (src=0x102006c0)
    at ArgusSource.c:1648
#2  0x0000000010006308 in main (argc=13, argv=0xfffffb241

	no complaints in /var/log/messages:

Sep  3 10:04:44 hcids kernel: RING: succesfully allocated 0 KB [tot_mem=12664896][order=12]
Sep  3 10:04:44 hcids kernel: RING: allocated 10851 slots [slot_len=1546][tot_mem=16777216]
Sep  3 10:39:04 hcids syslog-ng[3228]: STATS: dropped 0
Sep  3 11:39:04 hcids syslog-ng[3228]: STATS: dropped 0
Sep  3 12:39:04 hcids syslog-ng[3228]: STATS: dropped 0
Sep  3 13:33:33 hcids sshd[5485]: Accepted keyboard-interactive/pam for vanepp from 142.58.1.234 port 51959 ssh2
Sep  3 13:35:36 hcids sshd[5507]: Accepted keyboard-interactive/pam for vanepp from 142.58.1.234 port 51961 ssh2
Sep  3 13:35:41 hcids su: (to root) vanepp on /dev/pts/1
Sep  3 13:39:05 hcids syslog-ng[3228]: STATS: dropped 0

	and the HUP seems to have partly worked, but the task is still hung
despite the fact it should be exiting which likely indicates a deadlock 
somewhere:

hcids:/scratch # tail debug.log
  ArgusWarning: argus[5193.0000040000026f50]: 03 Sep 07 10:04:44.072427 started
  ArgusWarning: argus[5193.0000040000026f50]: 03 Sep 07 10:04:44.072649 ArgusGetInterfaceStatus: interface eth1 is up
  ArgusWarning: argus[5193.0000040000026f50]: 03 Sep 07 10:04:44.072695 ArgusGetInterfaceStatus: interface eth0 is up
     ArgusInfo: argus[5193.0000040002b98230]: 03 Sep 07 10:05:02.904007 connect from test4.ucs.sfu.ca
argus: Time 12852.714123 Flows 5068420   Closed 5057624   Sends 1077907   BSends 306      Updates 109096783 Cache 104026933
eth1
    Total Pkts 62852564  Rate 4890.217226
eth0
    Total Pkts 46242422  Rate 3597.872135

	which appears to be correct from gdb:


warning: Breakpoint address adjusted from 0x40000035668 to 0x40000016078.
0x000004000010e3d4 in .pthread_join () from /lib64/power5+/libpthread.so.0
(gdb) where
#0  0x000004000010e3d4 in .pthread_join () from /lib64/power5+/libpthread.so.0
#1  0x000000001002294c in ArgusCloseOutput (output=0x10251480)
    at ArgusOutput.c:261
#2  0x00000000100067f8 in ArgusComplete () at argus.c:620
#3  0x00000000100069d0 in ArgusShutDown (sig=0) at argus.c:672
#4  0x000000001000b5a4 in ArgusProcessPacket (model=0x10180010,
    p=0x10251c10 "", length=128, tvp=0xfffffb23778, type=0)
    at ArgusModeler.c:1086
#5  0x00000000100169d8 in ArgusEtherPacket (user=0x102006c0 "",
    h=0xfffffb23778, p=0x10251c10 "") at ArgusSource.c:630
#6  0x000004000007c5f0 in .pcap_read_linux () from /usr/local/lib/libpcap.so.0
#7  0x000004000007cad8 in .pcap_dispatch () from /usr/local/lib/libpcap.so.0
#8  0x0000000010019c18 in ArgusGetPackets (src=0x102006c0)
    at ArgusSource.c:1661
#9  0x0000000010006308 in main (argc=13, argv=0xfffffb24188) at argus.c:545

	we seem to have a thread which isn't terminating (which may also be
incorrectly holding some of the memory argus is using) since this would be 
consistant with the argus no longer creating output to the ra listener. 
	I'll see what I can do about figuring out what thread has died by
starting again with -D2. I'll probably also start without -J in case thats 
where the problem lies.

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada




More information about the argus mailing list