Strace output

CS Lee geek00l at gmail.com
Fri Jun 8 11:47:00 EDT 2012


hi Carter,

I basically run argus on bivio, and radium on another linux box, but they
are connected via direct 10G link.

Now I run everything in the bivio box, In order for argus to run in
foreground and check, I need to force it to run on 1 cpu, I start argus and
radium, nothing much happening and it stays, however when I use ra to
connect to radium, after a while here's what I get  -

argus
argus[1708.48c93490]: 08 Jun 12 22:05:38.142199 ArgusWriteSocket: write (4,
0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142226 ArgusWriteSocket: write (4,
0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142251 ArgusWriteSocket: write (4,
0x693015e0, 32, ...) -1
argus[1708.48c93490]: 08 Jun 12 22:05:38.142277 ArgusWriteSocket: write (4,
0x693015e0, 32, ...) -1
Killed

radium -
radium[1756]: 22:03:15.953146 connect from localhost
radium[1756]: 22:03:55.399637 ArgusWriteOutSocket(0x49b5a4e8) client not
processing: disconnecting
radium[1756]: 22:05:47.968393 connect to 10.0.0.1:561 failed 'Connection
refused'

ra just quit

By the way now argus is running on less than 1G traffic. I used to run
argus on gigabit network and never see such issue, anyway bivio is new for
me as I have never used it last time.



On Fri, Jun 8, 2012 at 10:44 PM, Carter Bullard <carter at qosient.com> wrote:

> Hey CS Lee,
> OK, so two things, first there does seem to be a bug in how argus tries
> to gracefully recover from this type of problem.  I am working on that now.
> Second, we need to get things such that the argus data flow is stable, then
> add components to see what is causing the problem.  Also, we'd like
> to insulate argus from all this, so that it doesn't die.
>
> What seems to be the problem is your clients are connecting, but not
> reading
> flow data fast enough ( my interpretation of the write failure messages,
> and
> possibly the "client not ready" messages ).  Argus is designed to allow for
> a large number of write errors that are related to client queuing and flow
> control, but the real bug is that argus is not dealing with slow clients
> very
> well, leaving data in queues, not clearing status quickly enough, and then
> giving up, but not terminating properly.
>
> As a work around to this problem, we need to get the first  link in your
> data
> chain, argus -> radium, so that the channel never back pressures argus.
>
> Does the argus radium connection work without any ra* clients attached?
>
> Where does your radium run?  On the Bivio or another machine ?
>
> If radium is not running on Bivio, I would recommend that we do that, so
> that
> radium is managing the interface that remote clients interact with, and
> argus only see's a single consistent connect from a single radium.
>
> But I will also recommend that you run a radium on the remote machine,
> so that the data chain is [ argus -> radium ] -> [ radium->ra*].
>
> Lets get the data flow going reliably, without ra* clients, and then see
> what
> is going on when it attaches.
>
> Carter
>
>
> On Jun 8, 2012, at 7:07 AM, CS Lee wrote:
>
> hi Carter,
>
> I'm not sure if this is useful to help, here's the output from strace -
>
> strace -c /usr/local/sbin/argus -i s0.e0
> argus[28208]: 08 Jun 12 17:12:50.271411 started
> argus[28208]: 08 Jun 12 17:12:50.292235 ArgusGetInterfaceStatus: interface
> s0.e0 is up
> argus[28208]: 08 Jun 12 17:14:18.699681 connect from 10.0.0.3
>
>
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  99.68   41.720000      164252       254       126 futex
>   0.17    0.072972         973        75           mmap
>   0.12    0.050000       50000         1         1 restart_syscall
>   0.02    0.009062         432        21           munmap
>   0.00    0.000884          34        26         5 setsockopt
>   0.00    0.000144           3        46        10 open
>   0.00    0.000000           0       112           read
>   0.00    0.000000           0         1           write
>   0.00    0.000000           0        62           close
>   0.00    0.000000           0         1           waitpid
>   0.00    0.000000           0         1           execve
>   0.00    0.000000           0         4           time
>   0.00    0.000000           0         1           setuid
>   0.00    0.000000           0         2           getuid
>   0.00    0.000000           0         1         1 access
>   0.00    0.000000           0         5           brk
>   0.00    0.000000           0         1           getgid
>   0.00    0.000000           0        56         1 ioctl
>   0.00    0.000000           0         3           clone
>   0.00    0.000000           0        28           mprotect
>   0.00    0.000000           0         3           _llseek
>   0.00    0.000000           0         1           select
>   0.00    0.000000           0         1           writev
>   0.00    0.000000           0         2           sched_get_priority_max
>   0.00    0.000000           0         2           sched_get_priority_min
>   0.00    0.000000           0         8           rt_sigaction
>   0.00    0.000000           0         2           rt_sigprocmask
>   0.00    0.000000           0         1           getrlimit
>   0.00    0.000000           0         5           mmap2
>   0.00    0.000000           0         1           stat64
>   0.00    0.000000           0        30           fstat64
>   0.00    0.000000           0         2           getdents64
>   0.00    0.000000           0         5           fcntl64
>   0.00    0.000000           0         1           set_tid_address
>   0.00    0.000000           0       126           clock_gettime
>   0.00    0.000000           0         1           tgkill
>   0.00    0.000000           0         1           get_robust_list
>   0.00    0.000000           0         1           SYS_317
>   0.00    0.000000           0        27           socket
>   0.00    0.000000           0         8           bind
>   0.00    0.000000           0         7         3 connect
>   0.00    0.000000           0         1           listen
>   0.00    0.000000           0         5           getsockname
>   0.00    0.000000           0         4           sendto
>   0.00    0.000000           0         9           getsockopt
>   0.00    0.000000           0        11           recvmsg
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   41.853062                   966       147 total
>
> Hopefully this strace is helpful.
>
> --
> Best Regards,
>
> CS Lee<geek00L[at]gmail.com>
>
> http://geek00l.blogspot.com
> http://defcraft.net
>  <bivio-argus-strace.log>
>
>
>


-- 
Best Regards,

CS Lee<geek00L[at]gmail.com>

http://geek00l.blogspot.com
http://defcraft.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120608/d340d26a/attachment.html>


More information about the argus mailing list