ra oddities

Thu Mar 6 19:41:28 EST 2003

Hey Steve,
   We don't have any corrupt file recovery logic in the ra*
programs right now, although it should be straight forward,
as there are plenty of "landmarks" in an argus record to
look for.  Just a few suggestions:

   Argus data files have a 128 byte header at the beginning
of the file that must be there for the programs to recognize
the file and decode it.  Be sure and "prime" your new
argus data file with the 128 byte management record that is
at the beginning of the existing corrupt file.

   All argus records have a rather conspicuous 16 byte header,
and are not too hard to find.  

   struct ArgusRecordHeader { 
      unsigned char type, cause;
      unsigned short length;
      unsigned int status; 
      unsigned int argusid; 
      unsigned int seqNumber;
   };

   When decoding argus headers, you have to remember one thing:
all values in an argus record are in network order.  So you
have to convert 16 and 32 bit values if you're on a little
endian machine like an Intel box, using the macro ntohs()
and ntohl().  Once you do this conversion, the rest is pretty
straight forward.

   The value of the first byte should be either, 0x80 or
0x01.  The 0x80 is a management record and the 0x01 is a
data record.  The cause byte can be rather complicated, so
don't worry about it.  The length field is the length
of the record, including the 16 byte header, and should
generally be less than 512 bytes.  Remember its in network
order, so you may have to convert it.  So, the basic method
could be as simple as find a 0x01 or a 0x80, and then
look at the ptr[length] and check to see if it's a 0x01 or
0x80.  Once that happens you're probably aligned again.
The other fields can help to reinforce the alignment.

The status is complicated and so not very useful, but the
argusid field will be the same for all records (except the
very first record).  You don't have to convert it to realize
that it is the same from record to record.  The seqNumber
should be monotonically increasing with each data record, but
will be zero for management records.  This is important to
convert to realize if its incrementing by one, if you need to.

   This should help you find the header boundaries.  Once
you get two or more records with similar headers, you should
be cool. 

   I would be very interested to understand the problem where
2.0.6 clients read less records than the 2.0.5 clients, as
that is not suppose to happen.  Is there any chance that you
could share the corrupt file?

Carter

> -----Original Message-----
> From: owner-argus-info at lists.andrew.cmu.edu 
> [mailto:owner-argus-info at lists.andrew.cmu.edu] On Behalf Of 
> Steve McInerney
> Sent: Thursday, March 06, 2003 6:56 PM
> To: argus-info at lists.andrew.cmu.edu
> Subject: ra oddities
> 
> 
> Hi,
> 
> got a bit of an odd one here:
> 
> We have an argus output file that has gotten corrupted. I 
> believe it was 
> due to a powerfail, and then compounded by a lack of argus outfile 
> rotation on restart. Ouch.
> 
> Consequently the 2.0.5 version of ra can read thru to the 
> point of the 
> failure, but no further - which is a bummer as that's only 
> about 400Mb 
> ish thru a 680Mb ish uncompressed file...
> 
> 
> Being optimistic, I thought I'd give the ra version from 
> argus-clients-2.0.6.beta.38 a whirl. Curiously it doesn't even get as 
> far as the 2.0.5 ra. Like about 9 days earlier. Which is, needless to 
> say but will anyway, odd. There's no system reasons that I'm aware of 
> that would have caused anything evil at that point in time.
> 
> 
> FWIW, the original file is a gz, if I run the 2.0.5 ra against the 
> compressed file it segfaults at the point of the corruption. If I 
> manually uncompress or feed via zcat to stdin, no segfault, 
> no warning - 
> just not enough records....
> The beta 38 ra doesn't give any messages as to why it's 
> dropped earlier 
> unless, same as 2.0.5, it's also dealing directly with the compressed 
> file. Then you get: "ArgusWarning: ra[3002]: ArgusReadSocketStream: 
> malformed argus record len 0"
> 
> 
> Any thoughts/suggestions as to what I could do/try to get 
> access to the 
> rest of the file? My current thinking is along the lines of 
> "binary cut 
> the early ok part and corruption bit's from the start of the 
> uncompressed file; ra the post corruption as a separate 
> file". I have no 
> idea how easy or difficult this will/would be. :-)
> 
> 
> Thanks
> 
> 
> - Steve
> 
>