Anonymization of argus flow data

Carter Bullard carter at qosient.com
Tue Sep 3 10:49:57 EDT 2013


Hmmm, if racount() takes 18min, I would think ranonymize() should take about 20min
to complete.   You can run " racount -M addr " to get racount() to printout address
information, like how many addresses are in the file.

ranonymize() works on a single argus record at a time, reading a single record,
anonymizing all the various data elements, and then writing the anonymized
record out to the output file.  If ranonymize() hasn't written out a record recently,
then its possible that its in an infinite loop, especially if its running at 100%, and
its been running for a month, and it seems to have stopped writing into the file.
What was the last " modified " time on your output file ???

If you've compiled debug support into your ra* programs, you can send a USR1
signal to the running ranonymize() and it will start writing debug information out
to stderr().  Send a USR2 to turn debug output off.  Assuming that ranonymize()s
process id is 35122, you can do this:

   % kill -USR1 35122
   % kill -USR2 35122

If you've compiled development support into your programs, you can attach
to ranonymize() using gdb(), and then step through the program to see where
it is.

   % gdb ranonymize 35122

This will attach to the program, and stop the acitve process.  If this all seems
unfamiliar, send more email, and I'll walk you through one of these strategies.

Carter 


On Sep 3, 2013, at 9:56 AM, Kaustubh Gadkari <kaustubh.gadkari at gmail.com> wrote:

> On Tue, Sep 3, 2013 at 7:19 AM, Kaustubh Gadkari
> <kaustubh.gadkari at gmail.com> wrote:
>> On Tue, Sep 3, 2013 at 6:00 AM, Carter Bullard <carter at qosient.com> wrote:
>>> Hmmmm,
>>> There shouldn't be any performance issues with anonymizing a file, if your
>>> just
>>> anonymizing the IP addresses.  How many addresses are in the file?
>>> What does your ranonymize.conf file look like?   How much memory is it
>>> using?
>>> 
>> 
>> I am not quite sure how many IP addresses there are in the file. My
>> ranonymize.conf looks like this:
>> 
>> RANON_PRESERVE_ETHERNET_VENDOR=yes
>> RANON_PRESERVE_BROADCAST_ADDRESS=yes
>> RANON_NET_ANONYMIZATION=sequential
>> RANON_HOST_ANONYMIZATION=sequential
>> RANON_PRESERVE_NET_ADDRESS_HIERARCHY=class
>> 
>> I took a look at how much memory ranonymize is using .. the usage is
>> about 42% on a machine with 32GB RAM.
>> 
>>> ranonymize() can be a little complex O(nLogN + C), but it should be
>>> in the same time frame as racount().  How long does it take for racount()
>>> to read the file?
>>> 
>> 
>> I am running racount right now .. I will post results once it finishes.
> 
> racount takes about 18min to run on the file:
> 
> real    17m58.528s
> user    17m12.413s
> sys     2m0.332s
> 
> Kaustubh
> 
>>> Just a rule of thumb. If a ra* program doesn't complete in a few minutes,
>>> you
>>> should stop it and try to figure out if there is a memory problem or not.
>>> 
>> 
>> Thanks, I'll keep this in mind :)
>> 
>> Thanks,
>> Kaustubh
>> 
>>> Carter
>>> 
>>> On Sep 2, 2013, at 2:20 PM, Kaustubh Gadkari <kaustubh.gadkari at gmail.com>
>>> wrote:
>>> 
>>> Hi,
>>> 
>>> I have a set of argus flow data captured at our data capture vantage point,
>>> and I want to anonymize the IP addresses (both source and destination) fully
>>> i.e. I want to replace both the addresses, using a prefix preserving
>>> technique. I have tried using ranonymize, but it is taking an extremely long
>>> time to anonymize the file (I started the process a couple of months ago, on
>>> a ~125GB file, and the output file size today is only ~30GB).
>>> 
>>> Can anyone suggest the right way to go about anonymizing the data set I
>>> have? Is ranonymize the right tool for the job?
>>> 
>>> Thanks,
>>> Kaustubh
>>> 
>>> --
>>> Kaustubh Gadkari
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Kaustubh Gadkari
> 
> 
> 
> -- 
> Kaustubh Gadkari
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20130903/7475fba5/attachment.bin>


More information about the argus mailing list