What to do in this situation

Peter Van Epp vanepp at sfu.ca
Thu Mar 15 10:57:16 EST 2001


	I like Brent Chapman's cut on this class of attack (which while in 
relation to syslog, applies here too): you are past the point where automation
can deal with the problem, call a human to decide what to do. As part of that
you need to build in the human response time (since if its off hours the 
human may need to be called in). In the syslog case that was the topic of this
discussion, that means a script that detects an abnormal increase in the 
volume of syslog traffic well before the volume fills and initiates a call 
to a human to deal with the attack/error (since it may be either) before 
syslog fills and data is lost. 
	In the argus case I'd suggest the appropriate action is the same (but
response time may be a little more difficult). At the least argus needs to 
note that you have run out of resources and need a bigger machine (or are
under attack and need to do something else). I'd be in favor of a shutdown
(and touching a predetermined file name to indicate a crash so a cron job
can check for the file and restart argus, or raise an alarm or both). The
paranoid among us would probably like a second file that gets touched when
the limit is getting near (or some similar alert mechanism anyway) but before
argus has to discard data  that could and should start a tcpdump task capturing 
the entire contents of the link for some period of time (which preserves the 
input data while the human is summonsed to deal with the situation). 
	In the attack case, the attacker has already lost because he has been 
detected and can't be sure that tcpdump isn't capturing everything on the link 
anyway even if argus has been defeated because argus failed gracefully. The
typical IDS problem is that a fragmentation attack (for instance) will slide
a signature undetectably by the IDS. In this case that isn't whats happening,
but rather a detectable (and dealable with) DOS is being done against argus
it isn't undetectable and thus isn't serious (if the appropriate backup is 
in place). Thats one of argus's many advantages, it isn't making decisions 
based on the data (and thus subject to being fooled into ignoring something
it shouldn't). The only attack senario (other then bugs of course) is to attempt
a DOS against the machine and that can and should be detected and dealt with.
Of course the tcpdump machine needs to be big enough to deal with the full 
capacity of the link for some (possibly fairly long) period of time, but thats 
just a matter of money and/or how important detecting the attack is (which 
comes back to money :-)). 

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada



> 
> Gentle people,
>    The error that has slowed us down has a component 
> that needs to be engineered for, so I'd like your
> opinion.
> 
>    A part of the problem is that on machines that are
> exhibiting the problem, flow records aren't being
> processed fast enough, and the output queue to the process
> that is writing records to the disk is getting too big.
> At some point, Argus has to make the decision that the
> load is too much for the file writing process, and gracefully
> shut it down.  Currently the watermark is 8096 records in the
> outbound queue.
> 
>    The problem situation occurs because of how we deal with
> this condition, which has been to just shutdown the queue.
> This causes the disk process to finish, but that's it.  Because
> it no longer has an input queue, it just won't get any more
> records.  Thus the "argus just stops writing output records,
> but everything else seems OK".
> 
>    Why were getting to this point, I'm not quite sure, but one
> of the machines is having difficulty probably is either
> underpowered or the scheduling is terrible, as its reporting
> that its dropping ~5% of the packets on the floor.
> 
>    In the fix, I've added kill the process and wait,
> to deal with the resulting zombie, and then continue on,
> writing some strong language to syslog().
> 
>    Now the question.  Should argus() exit if the problem
> output process is the one writing to the file?  If the output
> process is a network based remote access, I'd say keep going,
> but if its the file, we should exit so that we can start again
> as soon as possible.  Does this sound reasonable?
> 
> Carter
> 
> 
> Carter Bullard
> QoSient, LLC
> 300 E. 56th Street, Suite 18K
> New York, New York  10022
> 
> carter at qosient.com
> Phone +1 212 588-9133
> Fax   +1 212 588-9134
> http://qosient.com
> 
> ------=_NextPart_000_009A_01C0ACB4.EA3B9D00
> Content-Type: text/html;
> 	charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
> 
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
> <HTML>
> <HEAD>
> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
> charset=3DWindows-1252">
> <META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
> 6.0.4417.0">
> <TITLE>What to do in this situation</TITLE>
> </HEAD>
> <BODY>
> <!-- Converted from text/plain format -->
> 
> <P><FONT SIZE=3D2>Gentle people,</FONT>
> 
> <BR><FONT SIZE=3D2>   The error that has slowed us down has a =
> component </FONT>
> 
> <BR><FONT SIZE=3D2>that needs to be engineered for, so I'd like =
> your</FONT>
> 
> <BR><FONT SIZE=3D2>opinion.</FONT>
> </P>
> 
> <P><FONT SIZE=3D2>   A part of the problem is that on machines =
> that are</FONT>
> 
> <BR><FONT SIZE=3D2>exhibiting the problem, flow records aren't =
> being</FONT>
> 
> <BR><FONT SIZE=3D2>processed fast enough, and the output queue to the =
> process</FONT>
> 
> <BR><FONT SIZE=3D2>that is writing records to the disk is getting too =
> big.</FONT>
> 
> <BR><FONT SIZE=3D2>At some point, Argus has to make the decision that =
> the</FONT>
> 
> <BR><FONT SIZE=3D2>load is too much for the file writing process, and =
> gracefully</FONT>
> 
> <BR><FONT SIZE=3D2>shut it down.  Currently the watermark is 8096 =
> records in the</FONT>
> 
> <BR><FONT SIZE=3D2>outbound queue.</FONT>
> </P>
> 
> <P><FONT SIZE=3D2>   The problem situation occurs because of =
> how we deal with</FONT>
> 
> <BR><FONT SIZE=3D2>this condition, which has been to just shutdown the =
> queue.</FONT>
> 
> <BR><FONT SIZE=3D2>This causes the disk process to finish, but that's =
> it.  Because</FONT>
> 
> <BR><FONT SIZE=3D2>it no longer has an input queue, it just won't get =
> any more</FONT>
> 
> <BR><FONT SIZE=3D2>records.  Thus the "argus just stops =
> writing output records,</FONT>
> 
> <BR><FONT SIZE=3D2>but everything else seems OK".</FONT>
> </P>
> 
> <P><FONT SIZE=3D2>   Why were getting to this point, I'm not =
> quite sure, but one</FONT>
> 
> <BR><FONT SIZE=3D2>of the machines is having difficulty probably is =
> either</FONT>
> 
> <BR><FONT SIZE=3D2>underpowered or the scheduling is terrible, as its =
> reporting</FONT>
> 
> <BR><FONT SIZE=3D2>that its dropping ~5% of the packets on the =
> floor.</FONT>
> </P>
> 
> <P><FONT SIZE=3D2>   In the fix, I've added kill the process =
> and wait,</FONT>
> 
> <BR><FONT SIZE=3D2>to deal with the resulting zombie, and then continue =
> on,</FONT>
> 
> <BR><FONT SIZE=3D2>writing some strong language to syslog().</FONT>
> </P>
> 
> <P><FONT SIZE=3D2>   Now the question.  Should argus() =
> exit if the problem</FONT>
> 
> <BR><FONT SIZE=3D2>output process is the one writing to the file?  =
> If the output</FONT>
> 
> <BR><FONT SIZE=3D2>process is a network based remote access, I'd say =
> keep going,</FONT>
> 
> <BR><FONT SIZE=3D2>but if its the file, we should exit so that we can =
> start again</FONT>
> 
> <BR><FONT SIZE=3D2>as soon as possible.  Does this sound =
> reasonable?</FONT>
> </P>
> 
> <P><FONT SIZE=3D2>Carter</FONT>
> </P>
> <BR>
> 
> <P><FONT SIZE=3D2>Carter Bullard</FONT>
> 
> <BR><FONT SIZE=3D2>QoSient, LLC</FONT>
> 
> <BR><FONT SIZE=3D2>300 E. 56th Street, Suite 18K</FONT>
> 
> <BR><FONT SIZE=3D2>New York, New York  10022</FONT>
> </P>
> 
> <P><FONT SIZE=3D2>carter at qosient.com</FONT>
> 
> <BR><FONT SIZE=3D2>Phone +1 212 588-9133</FONT>
> 
> <BR><FONT SIZE=3D2>Fax   +1 212 588-9134</FONT>
> 
> <BR><FONT SIZE=3D2><A =
> HREF=3D"http://qosient.com">http://qosient.com</A></FONT>
> </P>
> 
> </BODY>
> </HTML>
> ------=_NextPart_000_009A_01C0ACB4.EA3B9D00--
> 
> 



More information about the argus mailing list