Feature request: grep hex strings with -e

Carter Bullard carter at qosient.com
Thu Oct 11 12:14:40 EDT 2012


Hey Markku,

So I'm quite puzzled by the complete lack of dialog around regex and binary
data.  It does appear that REG_ENHANCED is just an apple thing, I had
assumed that it was a BSD thing.

I found this link to a comparison of regular expression engines on wikipedia.

   http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

And they don't talk about support/conformity to \xNN notation for regex matching.
Because of all the unicode work around regex, you would think that \xNN support
would be a given, but I don't see any dialog anywhere that talks about it in any
detail.  I do see a lot of " broken " dialog though when I search for regex \xNN.

Its clear to me that binary buffer pattern matching wasn't what  IEEE STD 1003.1
was all about.  Now PCRE is not IEEE STD 1003.1, so its not really a generic
solution either.

The best I can do is to offer a ./configure option to look for PCRE and to use that
library instead, when it exists.  That is a bit of work, so if this is very important,
say if others would like this support as well, then I can schedule PCRE support
into argus-clients.  But, I'll need some  " show of hands " type of support to
put it officially on the list.

Carter 


On Oct 10, 2012, at 11:04 AM, Carter Bullard <carter at qosient.com> wrote:

> Hmmm, I'll upload a new 3.0.7.3 without.  Well, it will take a little while to figure
> out how to provide you with hex base binary regular expression matching.
> Especially if its a regular expression library specific issue.  
> 
> I'll put in compile time conditionals for when REG_ENHANCED is defined.
> That maybe the best I can do.
> 
> Here is the patch to fix the argus_grep compile problem.
> 
> ==== //depot/argus/clients/common/argus_grep.c#11 - /Volumes/Users/carter/argus/clients/common/argus_grep.c ====
> 57c57
> <          int options = REG_EXTENDED | REG_ENHANCED | REG_NOSUB;
> ---
>>         int options = REG_EXTENDED | REG_NOSUB;
> 59a60,62
>> #if defined(REG_ENHANCED)
>>         options |= REG_ENHANCED;
>> #endif
> 
> Carter
> 
> On Oct 10, 2012, at 9:42 AM, Markku Parviainen <maketsi at gmail.com> wrote:
> 
>> Hi,
>> 
>> I have been busy a couple of days and wasn't able to test your patch until now.
>> 
>> Thank you for the quick response on changes. Unfortunately constant
>> REG_ENHANCED is not defined in CentOS-version of /usr/include/regex.h,
>> and thus your newer client packages fail to compile. The actual error
>> is also easily missed on build flood, producing an installation that
>> contain just a couple of client binaries (5 to be exact). Logs below.
>> 
>> I browsed around a bit in regex.h, but couldn't yet find an option
>> that would enable \xNN notation. There are a lot of options there, but
>> at the first glance, none of them seemed to have anything to do with
>> that. I'm not entirely sure if it's still possible with that library
>> though.
>> 
>> I downloaded the source for my local gnu grep, which supports \xNN
>> notation via 'perl' grepping (-P option). It seems that it is using
>> libpcre library for matching (pcre_compile). Dunno if that's helping
>> you.
>> 
>> 
>> gcc -O3 -I. -I../include   -DHAVE_CONFIG_H -DARGUS_SYSLOG -c ./argus_grep.c
>> ./argus_grep.c: In function 'ArgusInitializeGrep':
>> ./argus_grep.c:57: error: 'REG_ENHANCED' undeclared (first use in this function)
>> ./argus_grep.c:57: error: (Each undeclared identifier is reported only once
>> ./argus_grep.c:57: error: for each function it appears in.)
>> make[1]: *** [argus_grep.o] Error 1
>> make[1]: Leaving directory `--/argus-clients-3.0.7.3/common'
>> ....
>> make[1]: Entering directory `--/argus-clients-3.0.7.3/clients'
>> gcc -O3 -I. -I../include -I../common  -DHAVE_CONFIG_H -c ./ra.c
>> make[1]: *** No rule to make target `../lib/argus_client.a', needed by
>> `../bin/ra'.  Stop.
>> make[1]: Leaving directory `--/argus-clients-3.0.7.3/clients'
>> making in ./examples
>> ...
>> 
>> # fgrep -r -i REG_ENHANCED /usr/include/
>> #
>> 
>> Package version used is argus-clients-3.0.7.3.
>> 
>> # gcc -v
>> Using built-in specs.
>> Target: x86_64-redhat-linux
>> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
>> --infodir=/usr/share/info --enable-shared --enable-threads=posix
>> --enable-checking=release --with-system-zlib --enable-__cxa_atexit
>> --disable-libunwind-exceptions --enable-libgcj-multifile
>> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada
>> --enable-java-awt=gtk --disable-dssi --disable-plugin
>> --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre
>> --with-cpu=generic --host=x86_64-redhat-linux
>> Thread model: posix
>> gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)
>> 
>> # uname -a
>> Linux srv 2.6.18-274.17.1.el5 #1 SMP Tue Jan 10 17:25:58 EST 2012
>> x86_64 x86_64 x86_64 GNU/Linux
>> 
>> # cat /etc/redhat-release
>> CentOS release 5.7 (Final)
>> 
>> Btw. I use my opportunity here to bring this compile warning up too.
>> Probably nothing, didn't evaluate:
>> 
>> ./rarpwatch.c: In function 'RaProcessRecord':
>> ./rarpwatch.c:451: warning: passing argument 1 of
>> 'RaValidateArpFlowRecord' makes integer from pointer without a cast
>> ./rarpwatch.c:451: warning: passing argument 2 of
>> 'RaValidateArpFlowRecord' makes integer from pointer without a cast
>> 
>> 
>> 2012/10/5 Carter Bullard <carter at qosient.com>:
>>> Hey Markku,
>>> OK, try this patch on your clients code, to see if it works.  Not sure what version
>>> you're running, so I'm going to assume its argus-clients-3.0.7.1.  The patch's
>>> exact line numbers may not be exact, so change the lines that look similar to
>>> the new line, and all should be fine.
>>> 
>>> thoth:common carter$ p4 diff ...
>>> ==== //depot/argus/clients/common/argus_grep.c#11 - /Volumes/Users/carter/argus/clients/common/argus_grep.c ====
>>> 57c57
>>> <          int options = REG_EXTENDED | REG_NOSUB;;
>>> ---
>>>>        int options = REG_EXTENDED | REG_ENHANCED | REG_NOSUB;
>>> 
>>> 
>>> On my Mac OS X, I needed to add this option to the regular expression compile
>>> code to get this to work as you suggested.  Not sure if that is a Mac thing or not.
>>> Here is what I get now:
>>> 
>>> thoth:common carter$ ../bin/ra -r /tmp/*09.35* -e s:\\x41\\x70 -s +suser:32
>>>                StartTime        Dur      Flgs  Proto            SrcAddr  Sport   Dir            DstAddr  Dport  SrcPkts  DstPkts     SrcBytes     DstBytes State                 srcUdata
>>> 2012/10/05.09:36:11.842712   0.099477  e           udp       192.168.0.33.mdns      ->        224.0.0.251.mdns          2        0          357            0   INT s[32]=.............Apt._tivo-videostre
>>> 2012/10/05.09:37:50.429561   0.095191  e           udp       192.168.0.33.mdns      ->        224.0.0.251.mdns          2        0          357            0   INT s[32]=.............Apt._tivo-videostre
>>> 2012/10/05.09:39:29.012663   0.101437  e           udp       192.168.0.33.mdns      ->        224.0.0.251.mdns          2        0          357            0   INT s[32]=.............Apt._tivo-videostre
>>> 
>>> Holler if that does it for you.
>>> 
>>> Carter
>>> 
>>> 
>>> On Oct 5, 2012, at 8:55 AM, Carter Bullard <carter at qosient.com> wrote:
>>> 
>>>> Hey Markku,
>>>> I'll look into it today.  Need to run under gdb() to see what actually makes it to recomp().  It maybe as dumb as double quotes vs single quotes, but I'll check it out before lunch.
>>>> 
>>>> Carter
>>>> 
>>>> On Oct 5, 2012, at 2:04 AM, Markku Parviainen <maketsi at gmail.com> wrote:
>>>> 
>>>>> 2012/10/4 Carter Bullard <carter at qosient.com>:
>>>>>> ra* clients use the available regular expression library, and should support hexadecimal codes for matching now.
>>>>>> So, there is nothing keeping ra* from doing hexidecimal code matching.  Because you have to use '\xNN' to specify the
>>>>>> codes, when you provide it on the command line, you may need to escape the ' \ ' to get it past the shell.
>>>>> 
>>>>> The param was already quoted so that the shell (bash) would not interfere.
>>>>> Anyway, for some reason it just doesn't work. I attached a sample (240
>>>>> bytes) for your analysis.
>>>>> 
>>>>> # ra -r regex-anon.ra -M printer=encode32 -s suser:32
>>>>>                             srcUdata
>>>>> s[32]=333712F228948DABC9C0D199D1C3B00F
>>>>> 
>>>>> # ra -r regex-anon.ra -e '\x33'
>>>>> # ra -r regex-anon.ra -e '\\x33'
>>>>> # ra -r regex-anon.ra -e '33'
>>>>> 
>>>>> None of them produce anything (whereas only the first one should). Ideas?
>>>>> 
>>>>> I tried enabling debug output, but even -D10 does not produce any
>>>>> lines about regex behaviour.
>>>>> The system is CentOS 5.7 64bit, gcc v4.1.2, ra v3.0.7.1.
>>>>> 
>>>>> 
>>>>> Btw. To confirm what the shell is delivering to the prog when \x is
>>>>> single quoted:
>>>>> 
>>>>> # echo '\x33'
>>>>> \x33
>>>>> # echo \x33
>>>>> x33
>>>>> # perl -e 'print join(", ", @ARGV) ."\n"' -- -e '\x33'
>>>>> -e, \x33
>>>>> # echo '\\x33'
>>>>> \\x33
>>>>> <regex-anon.ra>
>>>> 
>>> 
>> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121011/40f5d840/attachment.bin>


More information about the argus mailing list