Strace output
CS Lee
geek00l at gmail.com
Fri Jun 8 11:47:00 EDT 2012
hi Carter,
I basically run argus on bivio, and radium on another linux box, but they
are connected via direct 10G link.
Now I run everything in the bivio box, In order for argus to run in
foreground and check, I need to force it to run on 1 cpu, I start argus and
radium, nothing much happening and it stays, however when I use ra to
connect to radium, after a while here's what I get -
argus
argus[1708.48c93490]: 08 Jun 12 22:05:38.142199 ArgusWriteSocket: write (4,
0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142226 ArgusWriteSocket: write (4,
0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142251 ArgusWriteSocket: write (4,
0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142277 ArgusWriteSocket: write (4,
0x693015e0, 32, ...) -1
Killed
radium -
radium[1756]: 22:03:15.953146 connect from localhost
radium[1756]: 22:03:55.399637 ArgusWriteOutSocket(0x49b5a4e8) client not
processing: disconnecting
radium[1756]: 22:05:47.968393 connect to 10.0.0.1:561 failed 'Connection
refused'
ra just quit
By the way now argus is running on less than 1G traffic. I used to run
argus on gigabit network and never see such issue, anyway bivio is new for
me as I have never used it last time.
On Fri, Jun 8, 2012 at 10:44 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey CS Lee,
> OK, so two things, first there does seem to be a bug in how argus tries
> to gracefully recover from this type of problem. I am working on that now.
> Second, we need to get things such that the argus data flow is stable, then
> add components to see what is causing the problem. Also, we'd like
> to insulate argus from all this, so that it doesn't die.
>
> What seems to be the problem is your clients are connecting, but not
> reading
> flow data fast enough ( my interpretation of the write failure messages,
> and
> possibly the "client not ready" messages ). Argus is designed to allow for
> a large number of write errors that are related to client queuing and flow
> control, but the real bug is that argus is not dealing with slow clients
> very
> well, leaving data in queues, not clearing status quickly enough, and then
> giving up, but not terminating properly.
>
> As a work around to this problem, we need to get the first link in your
> data
> chain, argus -> radium, so that the channel never back pressures argus.
>
> Does the argus radium connection work without any ra* clients attached?
>
> Where does your radium run? On the Bivio or another machine ?
>
> If radium is not running on Bivio, I would recommend that we do that, so
> that
> radium is managing the interface that remote clients interact with, and
> argus only see's a single consistent connect from a single radium.
>
> But I will also recommend that you run a radium on the remote machine,
> so that the data chain is [ argus -> radium ] -> [ radium->ra*].
>
> Lets get the data flow going reliably, without ra* clients, and then see
> what
> is going on when it attaches.
>
> Carter
>
>
> On Jun 8, 2012, at 7:07 AM, CS Lee wrote:
>
> hi Carter,
>
> I'm not sure if this is useful to help, here's the output from strace -
>
> strace -c /usr/local/sbin/argus -i s0.e0
> argus[28208]: 08 Jun 12 17:12:50.271411 started
> argus[28208]: 08 Jun 12 17:12:50.292235 ArgusGetInterfaceStatus: interface
> s0.e0 is up
> argus[28208]: 08 Jun 12 17:14:18.699681 connect from 10.0.0.3
>
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 99.68 41.720000 164252 254 126 futex
> 0.17 0.072972 973 75 mmap
> 0.12 0.050000 50000 1 1 restart_syscall
> 0.02 0.009062 432 21 munmap
> 0.00 0.000884 34 26 5 setsockopt
> 0.00 0.000144 3 46 10 open
> 0.00 0.000000 0 112 read
> 0.00 0.000000 0 1 write
> 0.00 0.000000 0 62 close
> 0.00 0.000000 0 1 waitpid
> 0.00 0.000000 0 1 execve
> 0.00 0.000000 0 4 time
> 0.00 0.000000 0 1 setuid
> 0.00 0.000000 0 2 getuid
> 0.00 0.000000 0 1 1 access
> 0.00 0.000000 0 5 brk
> 0.00 0.000000 0 1 getgid
> 0.00 0.000000 0 56 1 ioctl
> 0.00 0.000000 0 3 clone
> 0.00 0.000000 0 28 mprotect
> 0.00 0.000000 0 3 _llseek
> 0.00 0.000000 0 1 select
> 0.00 0.000000 0 1 writev
> 0.00 0.000000 0 2 sched_get_priority_max
> 0.00 0.000000 0 2 sched_get_priority_min
> 0.00 0.000000 0 8 rt_sigaction
> 0.00 0.000000 0 2 rt_sigprocmask
> 0.00 0.000000 0 1 getrlimit
> 0.00 0.000000 0 5 mmap2
> 0.00 0.000000 0 1 stat64
> 0.00 0.000000 0 30 fstat64
> 0.00 0.000000 0 2 getdents64
> 0.00 0.000000 0 5 fcntl64
> 0.00 0.000000 0 1 set_tid_address
> 0.00 0.000000 0 126 clock_gettime
> 0.00 0.000000 0 1 tgkill
> 0.00 0.000000 0 1 get_robust_list
> 0.00 0.000000 0 1 SYS_317
> 0.00 0.000000 0 27 socket
> 0.00 0.000000 0 8 bind
> 0.00 0.000000 0 7 3 connect
> 0.00 0.000000 0 1 listen
> 0.00 0.000000 0 5 getsockname
> 0.00 0.000000 0 4 sendto
> 0.00 0.000000 0 9 getsockopt
> 0.00 0.000000 0 11 recvmsg
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 41.853062 966 147 total
>
> Hopefully this strace is helpful.
>
> --
> Best Regards,
>
> CS Lee<geek00L[at]gmail.com>
>
> http://geek00l.blogspot.com
> http://defcraft.net
> <bivio-argus-strace.log>
>
>
>
--
Best Regards,
CS Lee<geek00L[at]gmail.com>
http://geek00l.blogspot.com
http://defcraft.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120608/d340d26a/attachment.html>
More information about the argus
mailing list