Anonymization of argus flow data
Kaustubh Gadkari
kaustubh.gadkari at gmail.com
Tue Sep 3 13:11:15 EDT 2013
On Tue, Sep 3, 2013 at 8:49 AM, Carter Bullard <carter at qosient.com> wrote:
> Hmmm, if racount() takes 18min, I would think ranonymize() should take about 20min
> to complete. You can run " racount -M addr " to get racount() to printout address
> information, like how many addresses are in the file.
>
Carter, I ran racount with -M addr, but the process hasn't finished
yet (it's been running for about 90 min now). I'll let it run for a
while longer and keep you updated.
> ranonymize() works on a single argus record at a time, reading a single record,
> anonymizing all the various data elements, and then writing the anonymized
> record out to the output file. If ranonymize() hasn't written out a record recently,
> then its possible that its in an infinite loop, especially if its running at 100%, and
> its been running for a month, and it seems to have stopped writing into the file.
> What was the last " modified " time on your output file ???
>
It hasn't stopped writing to file .. the last modified time is right
now, since the process is still running.
> If you've compiled debug support into your ra* programs, you can send a USR1
> signal to the running ranonymize() and it will start writing debug information out
> to stderr(). Send a USR2 to turn debug output off. Assuming that ranonymize()s
> process id is 35122, you can do this:
>
> % kill -USR1 35122
> % kill -USR2 35122
>
> If you've compiled development support into your programs, you can attach
> to ranonymize() using gdb(), and then step through the program to see where
> it is.
>
I haven't compiled my ra* programs with debug or development support.
If you can tell me what I need to change in the Makefiles, I can do so
and run ranonymize with gdb and see what's happening.
Kaustubh
> % gdb ranonymize 35122
>
> This will attach to the program, and stop the acitve process. If this all seems
> unfamiliar, send more email, and I'll walk you through one of these strategies.
>
> Carter
>
>
> On Sep 3, 2013, at 9:56 AM, Kaustubh Gadkari <kaustubh.gadkari at gmail.com> wrote:
>
>> On Tue, Sep 3, 2013 at 7:19 AM, Kaustubh Gadkari
>> <kaustubh.gadkari at gmail.com> wrote:
>>> On Tue, Sep 3, 2013 at 6:00 AM, Carter Bullard <carter at qosient.com> wrote:
>>>> Hmmmm,
>>>> There shouldn't be any performance issues with anonymizing a file, if your
>>>> just
>>>> anonymizing the IP addresses. How many addresses are in the file?
>>>> What does your ranonymize.conf file look like? How much memory is it
>>>> using?
>>>>
>>>
>>> I am not quite sure how many IP addresses there are in the file. My
>>> ranonymize.conf looks like this:
>>>
>>> RANON_PRESERVE_ETHERNET_VENDOR=yes
>>> RANON_PRESERVE_BROADCAST_ADDRESS=yes
>>> RANON_NET_ANONYMIZATION=sequential
>>> RANON_HOST_ANONYMIZATION=sequential
>>> RANON_PRESERVE_NET_ADDRESS_HIERARCHY=class
>>>
>>> I took a look at how much memory ranonymize is using .. the usage is
>>> about 42% on a machine with 32GB RAM.
>>>
>>>> ranonymize() can be a little complex O(nLogN + C), but it should be
>>>> in the same time frame as racount(). How long does it take for racount()
>>>> to read the file?
>>>>
>>>
>>> I am running racount right now .. I will post results once it finishes.
>>
>> racount takes about 18min to run on the file:
>>
>> real 17m58.528s
>> user 17m12.413s
>> sys 2m0.332s
>>
>> Kaustubh
>>
>>>> Just a rule of thumb. If a ra* program doesn't complete in a few minutes,
>>>> you
>>>> should stop it and try to figure out if there is a memory problem or not.
>>>>
>>>
>>> Thanks, I'll keep this in mind :)
>>>
>>> Thanks,
>>> Kaustubh
>>>
>>>> Carter
>>>>
>>>> On Sep 2, 2013, at 2:20 PM, Kaustubh Gadkari <kaustubh.gadkari at gmail.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have a set of argus flow data captured at our data capture vantage point,
>>>> and I want to anonymize the IP addresses (both source and destination) fully
>>>> i.e. I want to replace both the addresses, using a prefix preserving
>>>> technique. I have tried using ranonymize, but it is taking an extremely long
>>>> time to anonymize the file (I started the process a couple of months ago, on
>>>> a ~125GB file, and the output file size today is only ~30GB).
>>>>
>>>> Can anyone suggest the right way to go about anonymizing the data set I
>>>> have? Is ranonymize the right tool for the job?
>>>>
>>>> Thanks,
>>>> Kaustubh
>>>>
>>>> --
>>>> Kaustubh Gadkari
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Kaustubh Gadkari
>>
>>
>>
>> --
>> Kaustubh Gadkari
>>
>
--
Kaustubh Gadkari
More information about the argus
mailing list