Anonymization of argus flow data

Carter Bullard carter at qosient.com
Tue Sep 3 13:36:30 EDT 2013


Hey Kaustubh,
If its still writing records to the output file, its not in an infinite loop,
although I'm sure that it feels like one.  So, no need to print debug msgs
or run under gdb().

Hmmmmmm, you must have a very large number of IP addresses.  racount() isn't doing
anything exotic with the "-M addr" mode.  Its hashing and storing each unique
IP address, so that we can report on how many and what types.

My guess is that you must be short on physical memory, and the programs are swapping,
which means that everything on this machine will be going very slowly.
Run " top " to see if one of our programs is eating all the memory, or
use vmstat() or vm_stat() or whatever to see if there is any paging.

If it is a memory problem, then you will need to subdivide the data based
on size, not on time, using rasplit().  And yes its easy to merge split files
back to a single file.

UNFORTUNATELY, because the scope of anonymization is the file, anonymizing a
single big file of records will generate different results compared to
anonymizing a set of split files created from the big file.  Address A will be
anonymized potentially to a different address in each file.

The configuration provides the means to get consistent results between files,
but its a bit of work to do so.

Do you think you're running out of memory?

Carter



On Sep 3, 2013, at 1:11 PM, Kaustubh Gadkari <kaustubh.gadkari at gmail.com> wrote:

> On Tue, Sep 3, 2013 at 8:49 AM, Carter Bullard <carter at qosient.com> wrote:
>> Hmmm, if racount() takes 18min, I would think ranonymize() should take about 20min
>> to complete.   You can run " racount -M addr " to get racount() to printout address
>> information, like how many addresses are in the file.
>> 
> 
> Carter, I ran racount with -M addr, but the process hasn't finished
> yet (it's been running for about 90 min now). I'll let it run for a
> while longer and keep you updated.
> 
>> ranonymize() works on a single argus record at a time, reading a single record,
>> anonymizing all the various data elements, and then writing the anonymized
>> record out to the output file.  If ranonymize() hasn't written out a record recently,
>> then its possible that its in an infinite loop, especially if its running at 100%, and
>> its been running for a month, and it seems to have stopped writing into the file.
>> What was the last " modified " time on your output file ???
>> 
> 
> It hasn't stopped writing to file .. the last modified time is right
> now, since the process is still running.
> 
> 
>> If you've compiled debug support into your ra* programs, you can send a USR1
>> signal to the running ranonymize() and it will start writing debug information out
>> to stderr().  Send a USR2 to turn debug output off.  Assuming that ranonymize()s
>> process id is 35122, you can do this:
>> 
>>   % kill -USR1 35122
>>   % kill -USR2 35122
>> 
>> If you've compiled development support into your programs, you can attach
>> to ranonymize() using gdb(), and then step through the program to see where
>> it is.
>> 
> 
> I haven't compiled my ra* programs with debug or development support.
> If you can tell me what I need to change in the Makefiles, I can do so
> and run ranonymize with gdb and see what's happening.
> 
> Kaustubh
> 
>>   % gdb ranonymize 35122
>> 
>> This will attach to the program, and stop the acitve process.  If this all seems
>> unfamiliar, send more email, and I'll walk you through one of these strategies.
>> 
>> Carter
>> 
>> 
>> On Sep 3, 2013, at 9:56 AM, Kaustubh Gadkari <kaustubh.gadkari at gmail.com> wrote:
>> 
>>> On Tue, Sep 3, 2013 at 7:19 AM, Kaustubh Gadkari
>>> <kaustubh.gadkari at gmail.com> wrote:
>>>> On Tue, Sep 3, 2013 at 6:00 AM, Carter Bullard <carter at qosient.com> wrote:
>>>>> Hmmmm,
>>>>> There shouldn't be any performance issues with anonymizing a file, if your
>>>>> just
>>>>> anonymizing the IP addresses.  How many addresses are in the file?
>>>>> What does your ranonymize.conf file look like?   How much memory is it
>>>>> using?
>>>>> 
>>>> 
>>>> I am not quite sure how many IP addresses there are in the file. My
>>>> ranonymize.conf looks like this:
>>>> 
>>>> RANON_PRESERVE_ETHERNET_VENDOR=yes
>>>> RANON_PRESERVE_BROADCAST_ADDRESS=yes
>>>> RANON_NET_ANONYMIZATION=sequential
>>>> RANON_HOST_ANONYMIZATION=sequential
>>>> RANON_PRESERVE_NET_ADDRESS_HIERARCHY=class
>>>> 
>>>> I took a look at how much memory ranonymize is using .. the usage is
>>>> about 42% on a machine with 32GB RAM.
>>>> 
>>>>> ranonymize() can be a little complex O(nLogN + C), but it should be
>>>>> in the same time frame as racount().  How long does it take for racount()
>>>>> to read the file?
>>>>> 
>>>> 
>>>> I am running racount right now .. I will post results once it finishes.
>>> 
>>> racount takes about 18min to run on the file:
>>> 
>>> real    17m58.528s
>>> user    17m12.413s
>>> sys     2m0.332s
>>> 
>>> Kaustubh
>>> 
>>>>> Just a rule of thumb. If a ra* program doesn't complete in a few minutes,
>>>>> you
>>>>> should stop it and try to figure out if there is a memory problem or not.
>>>>> 
>>>> 
>>>> Thanks, I'll keep this in mind :)
>>>> 
>>>> Thanks,
>>>> Kaustubh
>>>> 
>>>>> Carter
>>>>> 
>>>>> On Sep 2, 2013, at 2:20 PM, Kaustubh Gadkari <kaustubh.gadkari at gmail.com>
>>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I have a set of argus flow data captured at our data capture vantage point,
>>>>> and I want to anonymize the IP addresses (both source and destination) fully
>>>>> i.e. I want to replace both the addresses, using a prefix preserving
>>>>> technique. I have tried using ranonymize, but it is taking an extremely long
>>>>> time to anonymize the file (I started the process a couple of months ago, on
>>>>> a ~125GB file, and the output file size today is only ~30GB).
>>>>> 
>>>>> Can anyone suggest the right way to go about anonymizing the data set I
>>>>> have? Is ranonymize the right tool for the job?
>>>>> 
>>>>> Thanks,
>>>>> Kaustubh
>>>>> 
>>>>> --
>>>>> Kaustubh Gadkari
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Kaustubh Gadkari
>>> 
>>> 
>>> 
>>> --
>>> Kaustubh Gadkari
>>> 
>> 
> 
> 
> 
> -- 
> Kaustubh Gadkari
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130903/778a0a0f/attachment.bin>


More information about the argus mailing list