reading argus files using non-sequential access

Fri Feb 2 13:01:17 EST 2007

Hey Philipp,
I have added the field "offset" to the ascii output system to print
the byte offset for the record.  I think that would be a little more
elegant, and support features like using filters to pick specific
records of interest, like the "man"agement records, or records that
pertain to a specific event.  That will be in the next rc set and in
the final version.

Your perl script could initially index the file by running something
like,
        'ra -r file -us stime offset'

and that would spit out the data needed to do a time index, on the fly,
as an example.  Or you can do a "0-100, 101-200"  real easy.   Hold
those indexes in some array, and go for it that way.

What do you think?

Carter

On Feb 2, 2007, at 5:49 AM, Philipp E. Letschert wrote:

> I've played around for generating an offset table from a given file,
> I think this should be a functionality somewhere in the argus-clients,
> probably in racount, to generate a list of offsets and export it as a
> string to use it in exteranl applications.
>
> A cool feature for ra would be head and tail and ranges by index.
>
>
> # perl snippet to generate an offset table
> # just for fun, since I guess this needs five-fold the time of
> # an implementation in C. Perl sucks, no future for ArgusEye?
>
> use FileHandle;
> use Fcntl qw(SEEK_SET);
> use vars qw(@offsets);
>
> my $argus_file = ARGV[0] or die("no file");
>
> sub gen_offsets {
>     my ($bytes, $type, $cause, $len, $offset);
>
>     open(my $fh, '<', $argus_file);
>     binmode($fh);
>     @offsets = (0);
>
>     $fh->sysread($bytes, 4);
>     ($type, $cause, $len) = unpack("H2 H2 H4", $bytes);
>     $offset += hex($len) * 4;
>
>     while (! $fh->eof ) {
>         $fh->seek($offset, SEEK_SET);
>         $fh->sysread($bytes, 4);
>         ($type, $cause, $len) = unpack("H2 H2 H4", $bytes);
>         if (hex($len) > 0) {
>             $offset += hex($len) * 4;
>             push(@offsets, $offset);
>         }
>     }
>     close($fh);
> }
>
> &gen_offsets;
> print "offsets for " . scalar(@offsets) . " records generated.\n";
>
>
>
> On Thu, Feb 01, 2007 at 03:41:31PM +0100, Philipp Letschert wrote:
>> Hey this is great!
>>
>> this feature would make it possible to handle larger files in  
>> ArgusEye.
>>
>> At moment it reads all available fields of each transaction into  
>> memory and
>> builds a complete view. This is stupid, time-consuming, eating up  
>> all the memory
>> and allows only a limited file size. To improve that I could try  
>> to build a list
>> of offsets and do the partial reading with ra only for the rows  
>> that actually
>> fit into the view, as you suggested some months ago. This seems  
>> realizable to
>> me, but this would only make scrolling of the rows
>> possible, but inspecting a large file by scrolling millions of  
>> rows doesn't seem
>> very inspiring to me...
>>
>> So how to use that for sorting and filtering? For the display  
>> filter I can
>> imagine just applying a ra filter expression. That would be a good  
>> solution
>> anyway, because my current attempt to do filtering with an acceptable
>> performance in Perl is anything but successful.
>> As the filtering is done while scrolling the view, there would be  
>> no information
>> available
>> how many transactions are affected by the filter, but thats  
>> affordable.
>>
>> And sorting? I like sorting of transactions in the view, as this  
>> is helpful for
>> finding patterns, and it should be possible for a filtered display  
>> as well, but
>> with partial reading there is no information on a transactions  
>> position in a
>> sorted context. I can imagine reading sort keys from the file when  
>> the list of
>> offsets is generated or to use rasort and generate a new list of  
>> offsets, but
>> both seems very time-consuming to me...
>>
>>
>> Thanks for revealing that feature, this will help for a better GUI!
>>
>> - but wait, didn't I promise to help in documentation? Shame on  
>> me, this should
>> now really be the next task for me...
>>
>>
>> Philipp
>>
>>
>> On Wed, Jan 31, 2007 at 09:36:05PM -0500, Carter Bullard wrote:
>>> Gentle people,
>>> All ra* programs have the ability to read argus files using starting
>>> and ending byte offsets.  If you have a list of offsets, this type
>>> of feature can make processing large argus files very fast/ 
>>> efficient.
>>>
>>> The syntax for reading files using offsets has been/is/will be/ 
>>> could be:
>>>   "-r file::ostart:oend"
>>>
>>> (or at least that is how I've implemented it in the past)
>>> where ostart is the starting offset, and oend is the ending offset.
>>>
>>> This is not a useful feature if you don't know where the record
>>> boundaries are in the file, so I haven't 'exposed' this feature  
>>> yet, but
>>> I think that it is something that we can begin to work with, or at
>>> least talk about how we could use it.
>>>
>>> Anyone interested in this type of feature and would like to
>>> talk about how we could use this?
>>>
>>> Carter
>>>
>