ranonymize too slow?
Kaustubh Gadkari
kaustubh at cs.colostate.edu
Sat Dec 6 01:16:13 EST 2014
Hi,
I had kicked off a run of ranonymize with the new hash size. Good news: the code doesn't segfault. Bad news: ranonymize quits with the following error after about 12 minutes.
RaMapNewNetwork: no addresses
RANON_PRESERVE_NET_ADDRESS_HIERARCHY is set to cidr/8 in the config file.
Kaustubh
> On Dec 5, 2014, at 4:38 PM, Christos Papadopoulos <christos at CS.ColoState.EDU> wrote:
>
> Thanks Kaustubh!
>
> Carter, Kaustubh is one of my graduate students and he took it upon himself to look into the problem.
>
> He will do another anonymization run and report back to us with timing results.
>
> This is good progress, thanks all!
>
> Christos.
>
> On 12/05/2014 04:06 PM, Kaustubh Gadkari wrote:
>>
>>
>> On Fri, Dec 5, 2014 at 1:05 PM, Christos Papadopoulos
>> <christos at cs.colostate.edu <mailto:christos at cs.colostate.edu>> wrote:
>>
>> On 12/05/2014 12:41 PM, Carter Bullard wrote:
>>
>> That is about 2500 records per sec. We should be able to do
>> 10-50x that. I have gotten upto 1M rps, but not with open
>> source argus.
>> The has size change should make a huge difference !!
>>
>>
>> With the new hash, ranonymize produces 37,615 records in just over a
>> second and then promptly crashes with both cidr/8 and cidr/16.
>>
>>
>> I think I fixed the segfault issue. The patch is simple:
>>
>> kaustubh at proton:~/argus-clients-3.0.8/clients$ diff ranonymize.c
>> ranonymize.c.new
>> 1454c1454
>> < int RaMapHash = 0;
>> ---
>> > unsigned int RaMapHash = 0;
>>
>> Kaustubh
>>
>>
>> If you have some quick suggestions I can try them, else it will take
>> some time to dig deeper.
>>
>> Christos.
>>
>>
>>
>> Carter
>>
>> On Dec 5, 2014, at 2:54 PM, Christos Papadopoulos
>> <christos at cs.colostate.edu
>> <mailto:christos at cs.colostate.edu>> wrote:
>>
>> Hi Carter,
>>
>> You are right, my apologies.
>>
>> With cidr/8 after three hours it anonymized about 8.8M
>> records out of the nearly 1B records in the file. I counted
>> this by running wc on the output file, which is a text file.
>>
>> The machine is a Dell Poweredge 2950, 3GHz Xeon with 8
>> cores, 32GB of RAM and about 30TB of directly attached
>> storage, running 64bit CentOS 6.6.
>>
>> I will try running it with cidr/16 and also with the change
>> in the hash function you suggested in your other message.
>>
>> Thanks for your help!
>>
>> Christos.
>>
>> On 12/05/2014 04:14 AM, Carter Bullard wrote:
>> Hey Christos,
>> We could be a bit more scientific about this. How much
>> of the file was completed after 3 hours ?
>> Did you try cidr/8 and cidr/16 ?? What kind of machine
>> is this running on ???
>>
>> Carter
>>
>> On Dec 5, 2014, at 7:53 AM, Christos Papadopoulos
>> <christos at cs.colostate.edu
>> <mailto:christos at cs.colostate.edu>> wrote:
>>
>> On 12/04/2014 02:54 AM, Carter Bullard wrote:
>>
>> Hey Christos,
>> With CIDR/24 address hierarchy preservation, it
>> maybe thrashing trying to find an appropriate
>> CIDR/24 prefix that hasn’t been allocated, when
>> it needs a new one. I suspect that your 55M
>> addresses are really 55M CIDR/24’s. You may get
>> some real speed up if you go to CIDR/16,
>> or CIDR/8. If you could try that, just as an
>> experiment, and see if the output is a bit quicker,
>> I think I can make some changes to improve the
>> allocation.
>>
>>
>> I tried it by changing the config file to CIDR/8. I
>> don't think it made much of a difference. I let the
>> process run for over 3 hours before I had to kill it
>> again. At that point I saw similar progress as before.
>>
>> Sorry!
>>
>> Christos.
>>
>>
>> I suspect that you get decent output at first
>> and then it slows down to a crawl, as its busy
>> trying to find an address slot that is
>> appropriate for the next CIDR/24. Its a hash
>> collision
>> and then a search for an open slot, which may
>> not be optimal. It should be easy to thread
>> out to another processor.
>>
>> Carter
>>
>> On Dec 2, 2014, at 2:59 PM, Christos
>> Papadopoulos <christos at cs.colostate.edu
>> <mailto:christos at cs.colostate.edu>> wrote:
>>
>> On 12/02/2014 12:40 AM, Carter Bullard wrote:
>>
>> Hey Christos,
>> Did you specify a ranonymize.conf file,
>> or are you using all defaults ?
>>
>>
>> I customized the ranonymize.conf file do
>> anonymize IP adresses only. See below.
>>
>> You may want to allocate addresses using
>> a different strategy. Using the default
>> algorithm, the allocation of 55M
>> addresses will take some time, did you
>> get any output at all ???
>>
>>
>> I need to use prefix-preserving
>> anonymization, similar to cryptopan. Which
>> algorithm would you suggest?
>>
>> I do see the output file growing. It just
>> takes a really long time, to the point where
>> it is unusable for our case.
>>
>> Here are the settings I used. Please let me
>> know if I should change anything. I only
>> need IP addresses anonymized,
>>
>> RANON_SEED=29384938
>> RANON_TRANSREFNUM_OFFSET=no
>> RANON_SEQNUM_OFFSET=no
>> RANON_TIME_SEC_OFFSET=no
>> RANON_TIME_USEC_OFFSET=no
>> RANON_ETHERNET_ANONYMIZATION=__no
>> RANON_PRESERVE_ETHERNET___VENDOR=yes
>> RANON_PRESERVE_ETHERNET___BROADCAST=yes
>> RANON_PRESERVE_ETHERNET___MULTICAST=yes
>>
>> RANON_NET_ANONYMIZATION=__sequential
>> RANON_HOST_ANONYMIZATION=__sequential
>> RANON_AS_ANONYMIZATION=__sequential
>> RANON_NETWORK_ADDRESS_LENGTH=__24
>>
>> RANON_PRESERVE_NET_ADDRESS___HIERARCHY=cidr/24
>> RANON_PRESERVE_BROADCAST___ADDRESS=yes
>> RANON_PRESERVE_MULTICAST___ADDRESS=yes
>> RANON_PRESERVE_IP_ID=none
>> RANON_PRESERVE_ICMPMAPPED_TTL=__yes
>> RANON_PRESERVE_IP_TTL=none
>> RANON_PRESERVE_IP_TOS=none
>> RANON_PRESERVE_WELLKNOWN_PORT___NUMS=yes
>> RANON_PRESERVE_REGISTERED___PORT_NUMS=yes
>> RANON_PRESERVE_PRIVATE_PORT___NUMS=yes
>> RANON_PORT_METHOD=no
>>
>> Christos.
>>
>>
>> Carter
>>
>>
>>
>> On Dec 2, 2014, at 2:38 AM, Christos
>> Papadopoulos
>> <christos at cs.colostate.edu
>> <mailto:christos at cs.colostate.edu>>
>> wrote:
>>
>> Hi Carter,
>>
>> We are using the latest version of
>> the client tools.
>>
>> After letting it run for 4.5 hours I
>> had to kill it. There are just under
>> a billion records in the file. When
>> I killed it, this is what I got. I
>> have no idea how much longer it
>> would run.
>>
>> Address Summary
>> IPv4 Unicast src
>> 11411339 dst 43953546
>> IPv4 Unicast Private src 85
>> dst 353
>> IPv4 Unicast Reserved src
>> 12654028 dst 51692353
>> IPv4 Multicast Local src 0
>> dst 2
>>
>> Christos.
>>
>> On 12/01/2014 11:49 AM, Carter
>> Bullard wrote:
>> Hey Christos,
>> The primary demand in IP address
>> anonymization is the number of
>> IP addresses that need to be
>> anonymized. So how many
>> addresses are in the file ??
>>
>> racount -M addr -r big.file
>>
>> What version of clients are you
>> using ??
>> Carter
>>
>> On Dec 1, 2014, at 1:14 AM,
>> Christos Papadopoulos
>> <christos at cs.colostate.edu
>> <mailto:christos at cs.colostate.edu>>
>> wrote:
>>
>> Hi folks,
>>
>> I am trying to use
>> ranonymize for some large
>> argus files. This is useful
>> for us because we want to
>> share some argus data with
>> fellow researchers, but
>> anonymize them to protect
>> the innocent.
>>
>> The file I am trying to
>> anonymize is large, about
>> 18GB compressed. As you can
>> imagine, there are millions
>> of flows in there.
>>
>> I only want IP address
>> anonymization, so I turned
>> everything else off in the
>> ranonymize.conf file.
>>
>> Well, ranonymize has been
>> running for almost 3 hours
>> with about 1/20th of the
>> file done. It is using 100%
>> of a CPU, but only 4% of
>> memory in a 32GB machine.
>> Clearly it's not a memory or
>> swap issue.
>>
>> I can't figure out why it's
>> taking so long. I thought it
>> would be almost as fast as
>> reading and writing the file
>> plus some time to
>> compress/decompress and some
>> time for checking the hash
>> for the anonymized addresses.
>>
>> Any idea what's pounding the
>> CPU and slowing it down? I
>> can investigate further by
>> profiling the code, but
>> thought I throw the question
>> out there first in case
>> someone else has done it.
>>
>> Thanks!
>>
>> Christos.
>>
>>
>>
>>
>>
>>
>>
>> --
>> Kaustubh Gadkari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5272 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20141205/2c956199/attachment.bin>
More information about the argus
mailing list