What to do in this situation

Thu Mar 15 12:16:52 EST 2001

I like this idea as well.  What I am working on, partially has to do with DoS 
attacks, and though I know there will be times when a machine/argus just can't 
keep up to the traffic levels, it should try veyr hard at warning someone 
real, about the problem.  So, I like the graduated warning when argus is 
nearing it's limits.  I also like the idea of a 'last ditch effort', where 
argus spawns off a tcpdump to capture as much as it can while the attack 
continues.  Argus should also give lots of log messages to say what happened.  
Another good idea might be to reinstate itself after the levels have dropped 
off.

ie: argus chruns away... starts seeing an attack, and loads build.  Start 
warning about resource starvation, eventually decide that it cannot handle the 
load, stop the collector and log to errorlog, spawn off tcpdump -> file, then 
check every so many seconds to see if the levels have subsided.  If so, 
reinstate collection, and send message to logs that it has restarted 
collecting.

Chris

>===== Original Message From <carter at qosient.com> =====
>Hey Peter,
>   Thanks!!  So currently I've got LOG_WARNING's
>coming out when we drop records and we'll drop 128
>records each time.  But you are also suggesting a
>pre warning that queues are getting big.
>
>How often do you want to get these messages?
>every 30 seconds when we are above the mark?
>
>Carter
>
>Carter Bullard
>QoSient, LLC
>300 E. 56th Street, Suite 18K
>New York, New York  10022
>
>carter at qosient.com
>Phone +1 212 588-9133
>Fax   +1 212 588-9134
>http://qosient.com
>
>> -----Original Message-----
>> From: owner-argus-info at lists.andrew.cmu.edu
>> [mailto:owner-argus-info at lists.andrew.cmu.edu]On Behalf Of
>> Peter Van Epp
>> Sent: Thursday, March 15, 2001 10:57 AM
>> To: argus
>> Subject: Re: What to do in this situation
>>
>>
>> 	I like Brent Chapman's cut on this class of attack
>> (which while in
>> relation to syslog, applies here too): you are past the point
>> where automation
>> can deal with the problem, call a human to decide what to do.
>> As part of that
>> you need to build in the human response time (since if its
>> off hours the
>> human may need to be called in). In the syslog case that was
>> the topic of this
>> discussion, that means a script that detects an abnormal
>> increase in the
>> volume of syslog traffic well before the volume fills and
>> initiates a call
>> to a human to deal with the attack/error (since it may be
>> either) before
>> syslog fills and data is lost.
>> 	In the argus case I'd suggest the appropriate action is
>> the same (but
>> response time may be a little more difficult). At the least
>> argus needs to
>> note that you have run out of resources and need a bigger
>> machine (or are
>> under attack and need to do something else). I'd be in favor
>> of a shutdown
>> (and touching a predetermined file name to indicate a crash
>> so a cron job
>> can check for the file and restart argus, or raise an alarm
>> or both). The
>> paranoid among us would probably like a second file that gets
>> touched when
>> the limit is getting near (or some similar alert mechanism
>> anyway) but before
>> argus has to discard data  that could and should start a
>> tcpdump task capturing
>> the entire contents of the link for some period of time
>> (which preserves the
>> input data while the human is summonsed to deal with the situation).
>> 	In the attack case, the attacker has already lost
>> because he has been
>> detected and can't be sure that tcpdump isn't capturing
>> everything on the link
>> anyway even if argus has been defeated because argus failed
>> gracefully. The
>> typical IDS problem is that a fragmentation attack (for
>> instance) will slide
>> a signature undetectably by the IDS. In this case that isn't
>> whats happening,
>> but rather a detectable (and dealable with) DOS is being done
>> against argus
>> it isn't undetectable and thus isn't serious (if the
>> appropriate backup is
>> in place). Thats one of argus's many advantages, it isn't
>> making decisions
>> based on the data (and thus subject to being fooled into
>> ignoring something
>> it shouldn't). The only attack senario (other then bugs of
>> course) is to attempt
>> a DOS against the machine and that can and should be detected
>> and dealt with.
>> Of course the tcpdump machine needs to be big enough to deal
>> with the full
>> capacity of the link for some (possibly fairly long) period
>> of time, but thats
>> just a matter of money and/or how important detecting the
>> attack is (which
>> comes back to money :-)).
>>
>> Peter Van Epp / Operations and Technical Support
>> Simon Fraser University, Burnaby, B.C. Canada
>>
>>
>>
>> >
>> > Gentle people,
>> >    The error that has slowed us down has a component
>> > that needs to be engineered for, so I'd like your
>> > opinion.
>> >
>> >    A part of the problem is that on machines that are
>> > exhibiting the problem, flow records aren't being
>> > processed fast enough, and the output queue to the process
>> > that is writing records to the disk is getting too big.
>> > At some point, Argus has to make the decision that the
>> > load is too much for the file writing process, and gracefully
>> > shut it down.  Currently the watermark is 8096 records in the
>> > outbound queue.
>> >
>> >    The problem situation occurs because of how we deal with
>> > this condition, which has been to just shutdown the queue.
>> > This causes the disk process to finish, but that's it.  Because
>> > it no longer has an input queue, it just won't get any more
>> > records.  Thus the "argus just stops writing output records,
>> > but everything else seems OK".
>> >
>> >    Why were getting to this point, I'm not quite sure, but one
>> > of the machines is having difficulty probably is either
>> > underpowered or the scheduling is terrible, as its reporting
>> > that its dropping ~5% of the packets on the floor.
>> >
>> >    In the fix, I've added kill the process and wait,
>> > to deal with the resulting zombie, and then continue on,
>> > writing some strong language to syslog().
>> >
>> >    Now the question.  Should argus() exit if the problem
>> > output process is the one writing to the file?  If the output
>> > process is a network based remote access, I'd say keep going,
>> > but if its the file, we should exit so that we can start again
>> > as soon as possible.  Does this sound reasonable?
>> >
>> > Carter
>> >
>> >
>> > Carter Bullard
>> > QoSient, LLC
>> > 300 E. 56th Street, Suite 18K
>> > New York, New York  10022
>> >
>> > carter at qosient.com
>> > Phone +1 212 588-9133
>> > Fax   +1 212 588-9134
>> > http://qosient.com
>> >
>> > ------=_NextPart_000_009A_01C0ACB4.EA3B9D00
>> > Content-Type: text/html;
>> > 	charset="iso-8859-1"
>> > Content-Transfer-Encoding: quoted-printable
>> >
>> > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
>> > <HTML>
>> > <HEAD>
>> > <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
>> > charset=3DWindows-1252">
>> > <META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
>> > 6.0.4417.0">
>> > <TITLE>What to do in this situation</TITLE>
>> > </HEAD>
>> > <BODY>
>> > <!-- Converted from text/plain format -->
>> >
>> > <P><FONT SIZE=3D2>Gentle people,</FONT>
>> >
>> > <BR><FONT SIZE=3D2>ÿÿ The error that has slowed
>> us down has a =
>> > component </FONT>
>> >
>> > <BR><FONT SIZE=3D2>that needs to be engineered for, so I'd like =
>> > your</FONT>
>> >
>> > <BR><FONT SIZE=3D2>opinion.</FONT>
>> > </P>
>> >
>> > <P><FONT SIZE=3D2>ÿÿ A part of the problem is
>> that on machines =
>> > that are</FONT>
>> >
>> > <BR><FONT SIZE=3D2>exhibiting the problem, flow records aren't =
>> > being</FONT>
>> >
>> > <BR><FONT SIZE=3D2>processed fast enough, and the output
>> queue to the =
>> > process</FONT>
>> >
>> > <BR><FONT SIZE=3D2>that is writing records to the disk is
>> getting too =
>> > big.</FONT>
>> >
>> > <BR><FONT SIZE=3D2>At some point, Argus has to make the
>> decision that =
>> > the</FONT>
>> >
>> > <BR><FONT SIZE=3D2>load is too much for the file writing
>> process, and =
>> > gracefully</FONT>
>> >
>> > <BR><FONT SIZE=3D2>shut it down.ÿ Currently the
>> watermark is 8096 =
>> > records in the</FONT>
>> >
>> > <BR><FONT SIZE=3D2>outbound queue.</FONT>
>> > </P>
>> >
>> > <P><FONT SIZE=3D2>ÿÿ The problem situation occurs
>> because of =
>> > how we deal with</FONT>
>> >
>> > <BR><FONT SIZE=3D2>this condition, which has been to just
>> shutdown the =
>> > queue.</FONT>
>> >
>> > <BR><FONT SIZE=3D2>This causes the disk process to finish,
>> but that's =
>> > it.ÿ Because</FONT>
>> >
>> > <BR><FONT SIZE=3D2>it no longer has an input queue, it just
>> won't get =
>> > any more</FONT>
>> >
>> > <BR><FONT SIZE=3D2>records.ÿ Thus the "argus just stops =
>> > writing output records,</FONT>
>> >
>> > <BR><FONT SIZE=3D2>but everything else seems OK".</FONT>
>> > </P>
>> >
>> > <P><FONT SIZE=3D2>ÿÿ Why were getting to this
>> point, I'm not =
>> > quite sure, but one</FONT>
>> >
>> > <BR><FONT SIZE=3D2>of the machines is having difficulty
>> probably is =
>> > either</FONT>
>> >
>> > <BR><FONT SIZE=3D2>underpowered or the scheduling is
>> terrible, as its =
>> > reporting</FONT>
>> >
>> > <BR><FONT SIZE=3D2>that its dropping ~5% of the packets on the =
>> > floor.</FONT>
>> > </P>
>> >
>> > <P><FONT SIZE=3D2>ÿÿ In the fix, I've added kill
>> the process =
>> > and wait,</FONT>
>> >
>> > <BR><FONT SIZE=3D2>to deal with the resulting zombie, and
>> then continue =
>> > on,</FONT>
>> >
>> > <BR><FONT SIZE=3D2>writing some strong language to syslog().</FONT>
>> > </P>
>> >
>> > <P><FONT SIZE=3D2>ÿÿ Now the question.ÿ
>> Should argus() =
>> > exit if the problem</FONT>
>> >
>> > <BR><FONT SIZE=3D2>output process is the one writing to the
>> file?ÿ =
>> > If the output</FONT>
>> >
>> > <BR><FONT SIZE=3D2>process is a network based remote
>> access, I'd say =
>> > keep going,</FONT>
>> >
>> > <BR><FONT SIZE=3D2>but if its the file, we should exit so
>> that we can =
>> > start again</FONT>
>> >
>> > <BR><FONT SIZE=3D2>as soon as possible.ÿ Does this sound =
>> > reasonable?</FONT>
>> > </P>
>> >
>> > <P><FONT SIZE=3D2>Carter</FONT>
>> > </P>
>> > <BR>
>> >
>> > <P><FONT SIZE=3D2>Carter Bullard</FONT>
>> >
>> > <BR><FONT SIZE=3D2>QoSient, LLC</FONT>
>> >
>> > <BR><FONT SIZE=3D2>300 E. 56th Street, Suite 18K</FONT>
>> >
>> > <BR><FONT SIZE=3D2>New York, New Yorkÿ 10022</FONT>
>> > </P>
>> >
>> > <P><FONT SIZE=3D2>carter at qosient.com</FONT>
>> >
>> > <BR><FONT SIZE=3D2>Phone +1 212 588-9133</FONT>
>> >
>> > <BR><FONT SIZE=3D2>Faxÿÿ +1 212 588-9134</FONT>
>> >
>> > <BR><FONT SIZE=3D2><A =
>> > HREF=3D"http://qosient.com">http://qosient.com</A></FONT>
>> > </P>
>> >
>> > </BODY>
>> > </HTML>
>> > ------=_NextPart_000_009A_01C0ACB4.EA3B9D00--
>> >
>> >
>>
>>

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

Chris Newton, Systems Analyst
Computing Services, University of New Brunswick
newton at unb.ca 506-447-3212(voice) 506-453-3590(fax)