argus label metadata syntax and processing
Carter Bullard
carter at qosient.com
Thu Nov 15 16:04:40 EST 2012
Hey Jesse,
Sure. Your file format is correct, but its the structure of the label that you need to consider.
This email will describe the current implementation. All of this can be changed as we
realize that we want more support.
I am trying to use metadata standards for these labels. And the syntax for metadata
standards is pretty free form. So we have that philosophy, but the implementation is
new, so I expect to make changes to support what we need to support.
Right now:
Labels can come in many flavors, and a single flow can be labeled many times.
A label is a colon separated set of free formatted strings. When a new labeler
adds a new label to a flow that has an existing label, it tacks on a ':' and then appends
its label, at least in theory. Because labels can't be infinite length, we try to be
smart as to how multiple labelers can use the same label, and get decent results.
You must consider that the tools will process your labels, remove
duplicate entries, sort them and aggregate them. The key field separators are
" : " " , " and " ; ". So be sure and use these field separators carefully
in your labels. When they are in double quotes they aren't considered to be
field separators.
A label is a colon separated set of strings. This is a valid label file entry.
filter="host x.y.z.w" label="red:blue:green"
This is a label that will be added to flows that involve host x.y.z.w. and the
red, blue and green are separate and distinctive labels. If another process
labeled the same flow with this label,
filter="net x.y.z.0/24" label="green:yellow"
Then, because we're smart about labels, or at least try to be, the resulting
label should be:
"red:blue:green:yellow"
as the label processors would see that green is already there.
Colon separated labels.
The label strings can have any format, but the tools are sensitive to this format:
value=attribute,attribute,attribute
Now the " value= " is optional, that's why its written in brackets:
[value=]attribute
There can be one or more attributes. Additional attributes are separated by
comma's. So in the format syntax, you put the commas and the additional
attributes in brackets as well:
[value=]attribute[,attribute[,attribute]]]
When you use geolocation labeling, you'll see that the resulting labels look like:
scity=-38.000000,-97.0000000
When you consume this label, the consumer has to know that it is a city label
with lat, lon values. But intermediate flow processors that want to add more
labels, don't need to know the format. As long as they aren't adding to the
" scity " label, then no problem. The new label will just be appended to the
existing label. This is a valid label for me:
pid=114:usr=root:app=WebServer:dcity=-42.356172,-93.256748:snort="[**] [116:56:1] (snort_decoder): T/TCP Detected [**]"
When an argus labeler see's a value=attribute label, it will try to be smart
about it. It will parse the label into its values, and then for each value it will
parse out the attributes. This is all important for aggregation.
So you can have labels like
filter="host x.y.z.w" label="addr=red,blue,green"
also, you could add more detail.
filter="src host x.y.z.w" label="saddr=red,blue,green"
As an example, if our flow was labeled as "saddr=red,blue,green", and
we labeled it again with this label:
filter="net x.y.z.0/24" label="green:yellow"
Then the resulting label would be:
"saddr=red,blue,green:green:yellow"
as the green is duplicated. Its an attribute of 'saddr', and its an attribute for the
entire flow, so no collision.
So currently we only support 4 comma separated attributes for a label object.
So your flow_label file, where you have dozens of comma separated values,
isn't going to work. But I can modify the code to enable that.
The " ; " isn't parsed right now, but I have it reserved as we build on label use.
Whew, thats a bunch for now, consume that, and then lets keep the conversation
going. This is important stuff, and so I want it to be useful !!!!!!
Carter
On Nov 15, 2012, at 3:07 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
> I think I need some assistance with the format of the flow_file format (i.e., label.strategy.specification.file)....Based on items in NSMWiki, I thought the format was:
>
> filter="host 1.1.1.1" label="Some label"
>
> I'm not sure how to read the syntax you mentioned in the previous email...Can you dumb it down for me please, and provide an example like the above? :)
>
> Cheers,
>
> Jesse
>
>
> On Wed, Nov 14, 2012 at 9:55 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey Jesse,
> Maximum label currently is MAXSTRLEN, which on most machines is 1024 bytes.
> This is trivial to extend, MAXSTRLEN is used as for convenience.
>
> You may be running into a limit on the number of label value attributes.
> We support 256 colon separated label values, but for aggregation purposes,
> we limit the use of 4 attributes per value. attributes are the comma separated
> fields in a given label value. This is the syntax today:
>
> label[:label...]
> label :: [object=]word[,word][;object=]word[,word]]
>
> These numbers are arbitrary, and I can change them anytime.
> So once I see what you're up to, then I can change the code accordingly.
>
> Your strategy is an interesting one, but you create the issue where you get very
> large labels that maybe repeated in the resulting data. Efficiency is low.
> While not appropriate for all cases, but you could use an index into a table
> that has the big strings, and label the flows with that index. Or put your table
> in a database, and we can do lookups, or… whatever.
>
> Carter
>
>
> On Nov 14, 2012, at 8:41 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>
>> I suspect it's array space...My flow_label file contains entries of the form: filter="host X.X.X.X" label="dns.name.for.host,otherdns.name.for.host..."
>>
>> Some of those entries are rather large (for instance, with some google addresses)...I'll send the contents in another email...
>>
>> Is there a known limit to the length of a label? I can certainly add some checks to ensure my labels come in under it...
>>
>> Cheers,
>>
>> Jesse
>>
>>
>>
>> On Wed, Nov 14, 2012 at 4:27 PM, Carter Bullard <carter at qosient.com> wrote:
>> Hey Jesse,
>> So what is in that flow_label file ?
>> So we're blowing up reading the file. I suspect that we're either running out of
>> array space somewhere, or there is a syntax error that we're not handling well.
>>
>> Can you share your flow_label file ?
>>
>> Carter
>>
>> On Nov 14, 2012, at 1:46 PM, Jesse Bowling <jessebowling at gmail.com> wrote:
>>
>>> Seems I can't learn a new use for argus without finding a way to break it... :)
>>>
>>> Another segfault, this time in ralabel.conf. Please let me know if I can do anything to help debug this...
>>>
>>> $ egrep -v '^#' ralabel.conf
>>>
>>> RALABEL_ARIN_COUNTRY_CODES=yes
>>> RA_DELEGATED_IP="/usr/local/argus/delegated-ipv4-latest"
>>> RALABEL_ARGUS_FLOW=yes
>>> RALABEL_ARGUS_FLOW_FILE="./flow_label"
>>>
>>> $ wc -l flow_label
>>> 2066 flow_label
>>>
>>> $ gdb ralabel
>>> GNU gdb (GDB) 7.1-ubuntu
>>> Copyright (C) 2010 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from /usr/local/bin/ralabel...done.
>>> (gdb) run -r host.argus -f ./ralabel.conf -s "+sco +dco +label:40"
>>> Starting program: /usr/local/bin/ralabel -r host.argus -f ./ralabel.conf -s "+sco +dco +label:40"
>>> [Thread debugging using libthread_db enabled]
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x00007ffff742833b in memcpy () from /lib/libc.so.6
>>> (gdb) up
>>> #1 0x00007ffff740a96d in _IO_getline_info () from /lib/libc.so.6
>>> (gdb) up
>>> #2 0x00007ffff7409879 in fgets () from /lib/libc.so.6
>>> (gdb) up
>>> #3 0x00000000004981da in RaReadFlowLabels (parser=0x7ffff7e9f010, labeler=0xb3c1f0,
>>> file=0x7fffffffcb68 "38.g.akamai.net,a1840.g.akamai.net,a1846.g.akamai.net,a1854.g.akamai.net,a1878.g.akamai.net,a190.g.akamai.net,a1932.g.akamai.net.0.1.cn.akamaitech.net,a1932.g.akamai.net,a1950.g.a
>>> at ./argus_label.c:716
>>> 716 ./argus_label.c: No such file or directory.
>>> in ./argus_label.c
>>> (gdb) where
>>> #0 0x00007ffff742833b in memcpy () from /lib/libc.so.6
>>> #1 0x00007ffff740a96d in _IO_getline_info () from /lib/libc.so.6
>>> #2 0x00007ffff7409879 in fgets () from /lib/libc.so.6
>>> #3 0x00000000004981da in RaReadFlowLabels (parser=0x7ffff7e9f010, labeler=0xb3c1f0,
>>> file=0x7fffffffcb68 "38.g.akamai.net,a1840.g.akamai.net,a1846.g.akamai.net,a1854.g.akamai.net,a1878.g.akamai.net,a190.g.akamai.net,a1932.g.akamai.net.0.1.cn.akamaitech.net,a1932.g.akamai.net,a1950.g.a
>>> at ./argus_label.c:716
>>> #4 0x2e32393731612c74 in ?? ()
>>> #5 0x69616d616b612e67 in ?? ()
>>> #6 0x3731612c74656e2e in ?? ()
>>> <snip>
>>> #1192 0x6f7a616d612c7465 in ?? ()
>>> Cannot access memory at address 0x7ffffffff000
>>>
>>> (gdb) backtrace full
>>> #0 0x00007ffff742833b in memcpy () from /lib/libc.so.6
>>> No symbol table info available.
>>> #1 0x00007ffff740a96d in _IO_getline_info () from /lib/libc.so.6
>>> No symbol table info available.
>>> #2 0x00007ffff7409879 in fgets () from /lib/libc.so.6
>>> No symbol table info available.
>>> #3 0x00000000004981da in RaReadFlowLabels (parser=0x7ffff7e9f010, labeler=0xb3c1f0,
>>> file=0x7fffffffcb68 "38.g.akamai.net,a1840.g.akamai.net,a1846.g.akamai.net,a1854.g.akamai.net,a1878.g.akamai.net,a190.g.akamai.net,a1932.g.akamai.net.0.1.cn.akamaitech.net,a1932.g.akamai.net,a1950.g.a
>>> at ./argus_label.c:716
>>> strbuf = "filter=\"host 184.72.235.54\000 label=\"cookiemonster-production-1222235838.us-east-1.elb.amazonaws.com\000\nfilter=\"host 23.10.192.103\000 label=\"e5529.g.akamaiedge.net.0.1.cn.akamaie
>>> str = 0x7fffffffeb83 "filter=\"host 208.111.160.6\" label=\"aarp.vo.llnwd.net,abcentmktg.vo.llnwd.net,adkeeper.vo.llnwd.net,admeta.vo.llnwd.net,adperk.vo.llnwd.net,advantech.vo.llnwd.net,aglaiasof
>>> ptr = 0x7fffffffeb83 "filter=\"host 208.111.160.6\" label=\"aarp.vo.llnwd.net,abcentmktg.vo.llnwd.net,adkeeper.vo.llnwd.net,admeta.vo.llnwd.net,adperk.vo.llnwd.net,advantech.vo.llnwd.net,aglaiasof
>>> end = 0x7fffffffeb82 "\nfilter=\"host 208.111.160.6\" label=\"aarp.vo.llnwd.net,abcentmktg.vo.llnwd.net,adkeeper.vo.llnwd.net,admeta.vo.llnwd.net,adperk.vo.llnwd.net,advantech.vo.llnwd.net,aglaias
>>> .
>>> value = 0x3f07630 "a1116.x.akamai.net,a112.w23.akamai.net,a1223.cp.akamai.net.0.1.cn.akamaitech.net,a1223.cp.akamai.net,a1248.g.akamai.net,a1249.g.akamai.net,a1294.w20.akamai.net,a1362.w3.akamai.n
>>> filter = 0x3f00a10 "host 23.66.231.57"
>>> label = 0x3f07630 "a1116.x.akamai.net,a112.w23.akamai.net,a1223.cp.akamai.net.0.1.cn.akamaitech.net,a1223.cp.akamai.net,a1248.g.akamai.net,a1249.g.akamai.net,a1294.w20.akamai.net,a1362.w3.akamai.n
>>> retn = 0
>>> linenum = 11
>>> fd = 0xb3ca30
>>> #4 0x2e32393731612c74 in ?? ()
>>> No symbol table info available.
>>> #5 0x69616d616b612e67 in ?? ()
>>> No symbol table info available.
>>> <snip>
>>> #1192 0x6f7a616d612c7465 in ?? ()
>>> No symbol table info available.
>>> Cannot access memory at address 0x7ffffffff000
>>>
>>>
>>>
>>> --
>>> Jesse Bowling
>>>
>>>
>>
>>
>>
>>
>> --
>> Jesse Bowling
>>
>>
>
>
>
>
> --
> Jesse Bowling
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121115/d1323294/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121115/d1323294/attachment.bin>
More information about the argus
mailing list