archive management tools

Mon Feb 15 18:05:39 EST 2010

Gentle people,
Item #2 on the current list of efforts is to improve argus archive management and performance.

There are two basic Argus Archives that we now officially support.
  1) A native file system archive, usually based on $srcid/year/month/day/5minute.files
  2) A MySQL table archive, where there are daily tables that hold the primitive argus records.

There are advantages to both, and I suspect that the best argus archive will be one
that uses them both.

We need tools to manage the archive, which means to me, archive establishment, maintenance,
and retention policy enforcement, which should cover data insertion, things like data indexing,
partitioning, distribution and even file compression, and then data rollout/removal.

In argus-3.0 we introduced two programs that deal with archive generation.
   1) rastream()/rasplit()
   2) rasqlinsert()

I'll be documenting those tools this week, on the web site under "Using Argus"/"Archives".

The next step will be to start discussing performance tools around the archive.
I am interested in massively distributed (federated) archives using modern techniques
to provide very fast query resolution, but it will take some time to get to that stage.
Using tools like Sector/Sphere, Hadoop, Big Table, MapReduce are all fair game, so
if someone has an interest in discussing these techniques, lets start the discussion.

But we're a few steps away from getting this technology around an Argus Archive.
So other tools are still needed.  Here is my first tool, I invite anyone to present your
tools, strategies or tool requests!!!!

The first tool I have to discuss is a search tool for the native file system archive.
It's called rasqltimeindex(), and it creates a time index of every file that is in the argus
archive.   Basically, you know what time period you're interested in, and this strategy
is designed to provide a ra* tool that can give you that data in < 1s.

rasqltimeindex() indexes each file, so when rastream() closes a file, its script will run
this program.  It will create and maintain an ArgusArchive database, that has a "FileName"
table that has a unique "fileid" and a "path" for every file, and a "Seconds" table which
holds a fileid, and the byte offsets in the file that cover each second.

This type of index allows a program like rasql() to find records, regardless of archive
mapping, based on time, with a second as the resolution.  Because all queries into the
ArgusArchive are scoped in time, having the ability to get the files and their byte offsets
to search, will give us a good speed up.

This kind of strategy causes us to have to remove the indexes when we rollout the
file, so we'll need a program that can remove a file, and all its indexes.  We'll
call that "rasqlfiledelete()".

OK, this is a first step, and I understand that it might not make anyone happy (other than
me), so if anyone has any suggestions, recommendations, wishlist tools, etc... please
don't hesitate to send email to the list, or directly to me.

Hope all is most excellent, and expect something to come down the pipe on this this week.

Carter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3815 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20100215/8e89a10c/attachment.bin>