Argus server exits with "maximum errors exceeded 200000"
Carter Bullard
carter at qosient.com
Tue Nov 24 11:55:49 EST 2009
Hey Guy,
Sorry for the delayed response.
DAG cards don't really support a socket interface, so the setArgusInterfaceStatus()
is there to fool code farther down the line into thinking that the DAG card is up.
This may chew up cycles, but it should not be a source of errors.
If we get back to your original thread, you have a producer/consumer problem, in
that argus() is generating more data than your clients can consume. The MAX
QUEUE EXCEEDED messages are the clue. By adding more depth to those
queues, you're just delaying the inevitable. We need to solve this problem,
turn your queue lengths down and all should be well.
There are many issues that can cause a client to not perform well. If its on the
same box as argus(), it could be busy writing to disk or argus() is so busy that
the client never gets a time slice, and just can't consume the load.
If the client is on another box, the problem could be packet loss between argus()
and the client program. Because argus uses TCP, it must retransmit data that is
dropped. Loss can occur due to bad cables, out of scope network cards, and
limited available network capacity. Argus() is a great tool to use to monitor
its own transport connections. Have you looked at the argus data for the
argus data transport TCP to see if its losing packets?
Also i the client is on another box, the problem could be flow control. TCP allows
the client to "shut the transmitter up", so to speak. This can happen if the disks
that its writing to (or the screen) slows it down such that it can't read the socket.
Argus is also the tool of choice here. Look at the argus records for the transport
stream looking for "S" or "D" indicators in the flags field. This indicates source
or destination flow control (you should see 'S's). If this is the case, you need to
beef up your consumer, or use filters.
When a reader can't keep up, argus has only one recourse, to close the output
connection and keep going. So your argus would do well, its just the clients
would attach and leave and then attach and leave again. But, by increasing
the queue length to 1M, you generated a situation where argus can encounter
a critical error, like out of memory etc... and then it thinks it has to terminate.
Lets lower the queue length back to 200K, and the try to figure out why your
clients are consuming fast enough.
Carter
On Nov 19, 2009, at 2:49 PM, Guy Dickinson wrote:
> Greetings, Argus Developers and Subscribers:
>
> For some time, I have been attempting to troubleshoot an argus server
> instance sitting atop a ~1Gbps link which has presented some stability
> issues. To date, I have had two issues, one which I think I have solved,
> and one which remains open.
>
> The first has been described before in a handful of mailing list
> postings, not dissimilar to this one:
>
> http://thread.gmane.org/gmane.network.argus/5010/focus=5011
>
> The argus server would run fine, but after a few hours of connection
> from a ra client, it would disconnect without warning with the
> "ArgusWriteOutSocket [...] max queue exceeded 100001" error. I was able
> to suppress this error by changing the size of ArgusMaxListLength in
> ArgusUtil.c:
>
> int ArgusMaxListLength = 1000000;
>
> Now, however, I am beginning to see a different problem with the argus
> server. After a day or so of a connected ra client, the argus server
> exits with the debug message
>
> argus[7386]: 19 Nov 09 14:19:28.712777 ArgusWriteOutSocket(0xad21b008)
> maximum errors exceeded 200000
>
> Could someone shed some light on these errors and what may be causing
> them? While running the server with debug set to 1, I see these messages
> a few times an hour:
>
> argus[7386]: 19 Nov 09 11:48:12.456533 ArgusNewFlow() flow key is not
> correct len equals zero
>
>
> Client and Server Version: 3.0.2
> Network Capture Hardware: Endace DAG 4.5G2
> Client and Server OS: RHEL5.4
> Capture Bandwidth: 700Mbit/sec - 1Gbps
>
> Both the argus server and ra client are running on some fairly serious
> hardware. The former is running on an Endace NinjaBox and the latter on
> an 8-core box with an awful lot of memory.
>
> Any help would be greatly appreciated.
>
> Regards,
> Guy Dickinson
>
> --
> ------------------
> Guy Dickinson, Network Security Analyst
> NYU ITS Technology Security Services
> guy.dickinson at nyu.edu
> (212) 998-3052
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20091124/c3906133/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20091124/c3906133/attachment.bin>
More information about the argus
mailing list