reading argus files using non-sequential access
Carter Bullard
carter at qosient.com
Fri Feb 2 12:14:44 EST 2007
Hey Yotam,
We have that type of support, using the "-N " option, although we
don't support
the "every other x record" option. Again in this case, you need to
run through
the entire file, counting records, to find out who is in the range,
quitting when
you get to the end of the range, so you are still potentially running
through
the entire file.
If you happen to have preprocessed the file, you could have the byte
offsets
for, say every 1000th record, or something like it, and then you
could use the
feature I'm describing to implement your feature; skipping close to the
record of interest and start processing at that point.
All of this really leads up to the discussion, "what indexing tools
do we
want to build", for processing large files. I have a time indexer
that I can
share, and a general strategy for indexing any kind of object. This is
of course needed if you're working with billions of records and you
don't
want to stick them in a database (so you have to do some of this
yourself).
My indexers put the data into a mysql database, and we have ra* clients
that know how to use those indexes to fetch data. Which means that
I do have mysql based database tools for gargoyle data, in both C and
perl. I was planning on transitioning some of them to argus after we
release argus-3.0. These will allow you to insert any type of argus
data,
any field, etc..., as primary keys with any field as an attribute,
etc... I have
found it very useful for managing database tables of heavily aggregated
data, like address inventories, etc....
OK, so if this is interesting, please send comments/etc... so we can get
the ball rolling after the argus-3.0 release.
Carter
On Feb 2, 2007, at 3:19 AM, Yotam Rubin wrote:
> (Haven't used ra/argus in a while now, but I'm still on this list)
> Hi Carter,
>
> Your question is a general one. You wish to supply random-access
> to ra file records. The syntax should not be byte-oriented, but
> rather record oriented. Consider the following syntax (borrowing
> from Python's slicing syntax):
> ra file[3:10] - Would process all the records between the range
> of 3 and 10
> ra file[5:] - Would process all the records in the file starting
> from the fifth record
> ra file[:50] - Would process all the records from the beginning
> of the file till the 50'th one.
> ra file[::5] - Process every fifth record in the file
> ra file[1:100:3] - Process every third record in the file
> between the first and 100'th records
>
> I'm not sure whether this is useful or not, but I'm not really an
> argus user these days. Maybe it can come in handy if one wants to
> 'sample' a ra file to gain a quick insight as to the general flows
> it contains
>
> (My main thought is that most ra processing utilities should be
> written in some high-level language, and not in C - would probably
> benefit your users greatly)
>
> On 2/1/07, Carter Bullard <carter at qosient.com> wrote:
> Gentle people,
> All ra* programs have the ability to read argus files using starting
> and ending byte offsets. If you have a list of offsets, this type
> of feature can make processing large argus files very fast/efficient.
>
> The syntax for reading files using offsets has been/is/will be/
> could be:
> "-r file::ostart:oend"
>
> (or at least that is how I've implemented it in the past)
> where ostart is the starting offset, and oend is the ending offset.
>
> This is not a useful feature if you don't know where the record
> boundaries are in the file, so I haven't 'exposed' this feature
> yet, but
> I think that it is something that we can begin to work with, or at
> least talk about how we could use it.
>
> Anyone interested in this type of feature and would like to
> talk about how we could use this?
>
> Carter
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20070202/f7622bbd/attachment.html>
More information about the argus
mailing list