ranonymize too slow?

Christos Papadopoulos christos at cs.colostate.edu
Tue Dec 2 09:14:35 EST 2014


Hi Jesper,

Thanks for your suggestion.

We thought of that, but it is important to be able to track IP addresses 
along the whole span of the file. The file contains 24H's worth of data 
to begin with.

Ideally we would like to anonymize a month's worth of data with the same 
key so we get consistent mapping, but we'll cross that bridge when we 
get to it.

Christos.

On 12/02/2014 02:52 AM, Jesper Skou Jensen wrote:
> Is it important to be able to track the same IP across the entire
> log-span? If not I would suggest that you split the input file into
> smaller bites and then ranonymize those bites one at a time.
>
> Depending on how many hours/days the log spans, maybe something like
> splitting it into 1 hour or 1 day logfiles would make it easier for
> ranonymize to handle it?
>
> eg.
> rasplit -M time 1h -r input.ra -w output.ra
>
>
> Regards
> Jesper
>
> On 02-12-2014 08:40, Carter Bullard wrote:
>> Hey Christos,
>> Did you specify a ranonymize.conf file, or are you using all defaults ?
>> You may want to allocate addresses using a different strategy.  Using
>> the default algorithm, the allocation of 55M addresses will take some
>> time, did you get any output at all  ???
>>
>> Carter
>>
>>
>>
>>> On Dec 2, 2014, at 2:38 AM, Christos Papadopoulos
>>> <christos at cs.colostate.edu> wrote:
>>>
>>> Hi Carter,
>>>
>>> We are using the latest version of the client tools.
>>>
>>> After letting it run for 4.5 hours I had to kill it. There are just
>>> under a billion records in the file. When I killed it, this is what I
>>> got. I have no idea how much longer it would run.
>>>
>>> Address Summary
>>>   IPv4 Unicast              src 11411339    dst 43953546
>>>   IPv4 Unicast Private      src 85          dst 353
>>>   IPv4 Unicast Reserved     src 12654028    dst 51692353
>>>   IPv4 Multicast Local      src 0           dst 2
>>>
>>> Christos.
>>>
>>>> On 12/01/2014 11:49 AM, Carter Bullard wrote:
>>>> Hey Christos,
>>>> The primary demand in IP address anonymization is the number of IP
>>>> addresses that need to be anonymized.   So how many addresses are in
>>>> the file ??
>>>>
>>>>    racount -M addr -r big.file
>>>>
>>>> What version of clients are you using ??
>>>> Carter
>>>>
>>>>> On Dec 1, 2014, at 1:14 AM, Christos Papadopoulos
>>>>> <christos at cs.colostate.edu> wrote:
>>>>>
>>>>> Hi folks,
>>>>>
>>>>> I am trying to use ranonymize for some large argus files. This is
>>>>> useful for us because we want to share some argus data with fellow
>>>>> researchers, but anonymize them to protect the innocent.
>>>>>
>>>>> The file I am trying to anonymize is large, about 18GB compressed.
>>>>> As you can imagine, there are millions of flows in there.
>>>>>
>>>>> I only want IP address anonymization, so I turned everything else
>>>>> off in the ranonymize.conf file.
>>>>>
>>>>> Well, ranonymize has been running for almost 3 hours with about
>>>>> 1/20th of the file done. It is using 100% of a CPU, but only 4% of
>>>>> memory in a 32GB machine. Clearly it's not a memory or swap issue.
>>>>>
>>>>> I can't figure out why it's taking so long. I thought it would be
>>>>> almost as fast as reading and writing the file plus some time to
>>>>> compress/decompress and some time for checking the hash for the
>>>>> anonymized addresses.
>>>>>
>>>>> Any idea what's pounding the CPU and slowing it down? I can
>>>>> investigate further by profiling the code, but thought I throw the
>>>>> question out there first in case someone else has done it.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Christos.
>>>




More information about the argus mailing list