ranonymize too slow?
Carter Bullard
carter at qosient.com
Fri Dec 5 06:27:26 EST 2014
Hey Christos,
If you don’t mind trying a bit of an experiment, if you could modify the source code
of ranonymize.c in the ./clients directory, change the value for RaHashSize to
something really big, you may get some changes.
Try this type of change:
osiris:clients carter$ diff ranonymize.c ranonymize.c.new
50c50
< unsigned int RaHashSize = 1024;
---
> unsigned int RaHashSize = 0x20000;
> On Dec 5, 2014, at 12:14 PM, Carter Bullard <carter at qosient.com> wrote:
>
> Hey Christos,
> We could be a bit more scientific about this. How much of the file was completed after 3 hours ?
> Did you try cidr/8 and cidr/16 ?? What kind of machine is this running on ???
>
> Carter
>
>> On Dec 5, 2014, at 7:53 AM, Christos Papadopoulos <christos at cs.colostate.edu> wrote:
>>
>> On 12/04/2014 02:54 AM, Carter Bullard wrote:
>>> Hey Christos,
>>> With CIDR/24 address hierarchy preservation, it maybe thrashing trying to find an appropriate
>>> CIDR/24 prefix that hasn’t been allocated, when it needs a new one. I suspect that your 55M
>>> addresses are really 55M CIDR/24’s. You may get some real speed up if you go to CIDR/16,
>>> or CIDR/8. If you could try that, just as an experiment, and see if the output is a bit quicker,
>>> I think I can make some changes to improve the allocation.
>>
>> I tried it by changing the config file to CIDR/8. I don't think it made much of a difference. I let the process run for over 3 hours before I had to kill it again. At that point I saw similar progress as before.
>>
>> Sorry!
>>
>> Christos.
>>
>>>
>>> I suspect that you get decent output at first and then it slows down to a crawl, as its busy
>>> trying to find an address slot that is appropriate for the next CIDR/24. Its a hash collision
>>> and then a search for an open slot, which may not be optimal. It should be easy to thread
>>> out to another processor.
>>>
>>> Carter
>>>
>>>> On Dec 2, 2014, at 2:59 PM, Christos Papadopoulos <christos at cs.colostate.edu> wrote:
>>>>
>>>> On 12/02/2014 12:40 AM, Carter Bullard wrote:
>>>>> Hey Christos,
>>>>> Did you specify a ranonymize.conf file, or are you using all defaults ?
>>>>
>>>> I customized the ranonymize.conf file do anonymize IP adresses only. See below.
>>>>
>>>>> You may want to allocate addresses using a different strategy. Using the default algorithm, the allocation of 55M addresses will take some time, did you get any output at all ???
>>>>
>>>> I need to use prefix-preserving anonymization, similar to cryptopan. Which algorithm would you suggest?
>>>>
>>>> I do see the output file growing. It just takes a really long time, to the point where it is unusable for our case.
>>>>
>>>> Here are the settings I used. Please let me know if I should change anything. I only need IP addresses anonymized,
>>>>
>>>> RANON_SEED=29384938
>>>> RANON_TRANSREFNUM_OFFSET=no
>>>> RANON_SEQNUM_OFFSET=no
>>>> RANON_TIME_SEC_OFFSET=no
>>>> RANON_TIME_USEC_OFFSET=no
>>>> RANON_ETHERNET_ANONYMIZATION=no
>>>> RANON_PRESERVE_ETHERNET_VENDOR=yes
>>>> RANON_PRESERVE_ETHERNET_BROADCAST=yes
>>>> RANON_PRESERVE_ETHERNET_MULTICAST=yes
>>>>
>>>> RANON_NET_ANONYMIZATION=sequential
>>>> RANON_HOST_ANONYMIZATION=sequential
>>>> RANON_AS_ANONYMIZATION=sequential
>>>> RANON_NETWORK_ADDRESS_LENGTH=24
>>>>
>>>> RANON_PRESERVE_NET_ADDRESS_HIERARCHY=cidr/24
>>>> RANON_PRESERVE_BROADCAST_ADDRESS=yes
>>>> RANON_PRESERVE_MULTICAST_ADDRESS=yes
>>>> RANON_PRESERVE_IP_ID=none
>>>> RANON_PRESERVE_ICMPMAPPED_TTL=yes
>>>> RANON_PRESERVE_IP_TTL=none
>>>> RANON_PRESERVE_IP_TOS=none
>>>> RANON_PRESERVE_WELLKNOWN_PORT_NUMS=yes
>>>> RANON_PRESERVE_REGISTERED_PORT_NUMS=yes
>>>> RANON_PRESERVE_PRIVATE_PORT_NUMS=yes
>>>> RANON_PORT_METHOD=no
>>>>
>>>> Christos.
>>>>
>>>>>
>>>>> Carter
>>>>>
>>>>>
>>>>>
>>>>>> On Dec 2, 2014, at 2:38 AM, Christos Papadopoulos <christos at cs.colostate.edu> wrote:
>>>>>>
>>>>>> Hi Carter,
>>>>>>
>>>>>> We are using the latest version of the client tools.
>>>>>>
>>>>>> After letting it run for 4.5 hours I had to kill it. There are just under a billion records in the file. When I killed it, this is what I got. I have no idea how much longer it would run.
>>>>>>
>>>>>> Address Summary
>>>>>> IPv4 Unicast src 11411339 dst 43953546
>>>>>> IPv4 Unicast Private src 85 dst 353
>>>>>> IPv4 Unicast Reserved src 12654028 dst 51692353
>>>>>> IPv4 Multicast Local src 0 dst 2
>>>>>>
>>>>>> Christos.
>>>>>>
>>>>>>> On 12/01/2014 11:49 AM, Carter Bullard wrote:
>>>>>>> Hey Christos,
>>>>>>> The primary demand in IP address anonymization is the number of IP addresses that need to be anonymized. So how many addresses are in the file ??
>>>>>>>
>>>>>>> racount -M addr -r big.file
>>>>>>>
>>>>>>> What version of clients are you using ??
>>>>>>> Carter
>>>>>>>
>>>>>>>> On Dec 1, 2014, at 1:14 AM, Christos Papadopoulos <christos at cs.colostate.edu> wrote:
>>>>>>>>
>>>>>>>> Hi folks,
>>>>>>>>
>>>>>>>> I am trying to use ranonymize for some large argus files. This is useful for us because we want to share some argus data with fellow researchers, but anonymize them to protect the innocent.
>>>>>>>>
>>>>>>>> The file I am trying to anonymize is large, about 18GB compressed. As you can imagine, there are millions of flows in there.
>>>>>>>>
>>>>>>>> I only want IP address anonymization, so I turned everything else off in the ranonymize.conf file.
>>>>>>>>
>>>>>>>> Well, ranonymize has been running for almost 3 hours with about 1/20th of the file done. It is using 100% of a CPU, but only 4% of memory in a 32GB machine. Clearly it's not a memory or swap issue.
>>>>>>>>
>>>>>>>> I can't figure out why it's taking so long. I thought it would be almost as fast as reading and writing the file plus some time to compress/decompress and some time for checking the hash for the anonymized addresses.
>>>>>>>>
>>>>>>>> Any idea what's pounding the CPU and slowing it down? I can investigate further by profiling the code, but thought I throw the question out there first in case someone else has done it.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Christos.
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20141205/4b3fc30c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6837 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20141205/4b3fc30c/attachment.bin>
More information about the argus
mailing list