ra looping problem still in Beta 8 on FreeBSD

Sun Mar 4 22:55:45 EST 2001

On Sat, 3 Mar 2001 19:30:43 -0500 Carter Bullard <carter at qosient.com> 
wrote:

> Hey Russell,
>    Hmmm, EAGAIN on a read() should mean that O_NONBLOCK
> is set and there was no data to read.  Now we shouldn't
> have gotten here, because we aren't using non blocking
> IO.  Also the select() should not have indicated that there
> was anything there to read, when there wasn't.  So I'm
> thinking that there must be a really wierd problem.

i.e. the real problem isn't in ra, which is what I though.

> 
>    I would suspect that we should be able to exit if
> we get an EAGAIN, as its just not suppose to happen.
> I'll have to test this.
> 
>    Is your ra() the process using up a lot of memory?
> If so we definately need to fix that.

Hmmm... top now shows plenty of free memory??

last pid: 48573;  load averages:  1.48,  1.81,  1.57                                                            up 22+21:26:44  10:13:04
32 processes:  2 running, 26 sleeping, 4 zombie
CPU states: 50.4% user,  0.0% nice, 48.0% system,  1.6% interrupt,  0.0% idle
Mem: 66M Active, 8636K Inact, 12M Wired, 2028K Cache, 21M Buf, 26M Free
Swap: 244M Total, 1640K Used, 243M Free

  PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
47987 argus     63   0  2212K  1480K RUN    505:40 78.81% 78.81% ra

Ahh... Memory runs short every hour when I use ragator and gzip to
compact the the log files.  I'll try starting the slowscan job after
they finish, it does not take an hour to run.

hmmm... the ra process needs -9 to kill it.

Here is a chunk of ps output, ra itself isn't using that much memory,
it is the perl scipt that is hogging it.

  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
 1001 47954 47950   0  10  0   616  224 wait   Is    ??    0:00.01 /bin/sh -c cd sw;scan_watch -q  -S -s history -l history  -d 2
 1001 47958 47954 137  -6  0 26700 25960 piperd I     ??    1:32.79 /usr/bin/perl -w /home/argus/bin/scan_watch -q -S -s history -l histo
 1001 47986 47958 235  10  0   620  228 wait   I     ??    0:00.00 sh -c /home/argus/bin/ra  -F /home/argus/lib/ra.conf -I -AZs -r /home
 1001 47987 47986 259  60  0  2212 1480 -      R     ??  500:00.99 /home/argus/bin/ra -F /home/argus/lib/ra.conf -I -AZs -r /home/argus/

I am pretty sure that I can work around this by splitting the job in
two one for tcp and one for udp, thus cutting the memory need for
the script.

If we should never get EAGAIN with read returning 0 then I suggest
that ra should simply exit with an error message.  That would stop
the looping and alert user that something isn't quite as it should be.

I have not yet worked out where it starts looping.  i.e. is it at the 
end of file or not.

I guess this is a FreeBSD specific problem.

Russell Fulton, Computer and Network Security Officer
The University of Auckland,  New Zealand