Ranonymize a subset of IPs

John Gerth gerth at graphics.stanford.edu
Fri Jul 15 17:53:37 EDT 2011


I don't have a lot to add to this discussion other than a couple of pointers to
some papers on related subjects (attached) that I've found interesting. In the first
one by Pang, they consider a number of these problems with respect to packet header traces.
The second paper by Xu is one on a prefix-preserving anonymization with the twist that it
is driven by cryptography.  The folks in the first paper use it for external IPs,
but argue for different rules for internal IPs.

I would also like to say that I've often had the same desire as Huy, that is, to be
able share data with only the internal IPs mangled because the main anxiety is not
that some malefactor is trying to break our scheme, but that if someone publishes
research based on that data, it might potentially embarrass one of our users - even
if they are the only one who notice it.  With that as a goal, I believe one only needs
to mangle the IP field and perhaps shift the timestamps (although for that one should
mostly be shifting the date so as not to disturb the diurnal patterns in the traffic.)

And we have to remain humble here. If the data is coming from one enterprise, unless
you out-and-out randomize all IPs, it's going to be obvious which ones are internal and
which are external.

/John

On 7/15/2011 12:15 PM, Carter Bullard wrote:
> Hey Huy,
> Need to keep this thread on the mailing list.
> 
> If you could take a look at the current configuration of ranonymize(), this lists all the features that we
> currently support.   The issue is that we don't have, currently, the concept that half of the flow records
> information has been modified and the other half has not been changed.  This type of incomplete
> anonymization makes the classic case of reverse engineering of the anonymization maps pretty trivial.
> 
> So you anonymize your addresses, keeping the external addresses intact, for example, and
> release the data.  Flow monitors in the external addresses network will have their collected flow
> information, which can be easily correlated to your data (byte count, packet count, their IP address),
> and now they know the translation of at least one of your addresses.
> 
> But, I'm not worried about this, as much as I'm interested in logic and how to approach it.
> It is possible that we can translate a subset of the IP addresses, even providing hierarchy
> preservation, etc...., without any coupling to Layer 2 anonymization, or port anonymization, etc.....
> 
> Carter
> 
> 
> On Jul 15, 2011, at 2:32 PM, Huy N. Hang wrote:
> 
>> Hi Carter!
>>
>> When I made the request, our group at the University of California at Riverside are working on a traffic collection project where we do just that. We, however, would like to release the traffic collection to the public for research purposes. We don't like the idea of anonymizing everything, so we only like to do so for the information of the hosts we are collecting from to protect their identities. That is why we only wish to anonymize a range of IP (that encompasses our school's prefix) and leave everything else intact. I want to do this because I'd like to see where a person's traffic is going, but I don't want to know who that person is (even the payload has been removed).
>>
>> To answer your question then, we'd be very happy if the new feature could let us pick and explicitly choose which attributes of the hosts we would like to anonymize (taking in a configuration file to do this would be awesome) and leave the other ones untouched. THis would give us enough freedom to pick and choose so that we can release the most information without compromising our monitored hosts' privacy.
>>
>> And yes, we would like to preserve everything if we want to preserve a range of IP as well.
>>
>> Have I answered your questions?
>>
>> Please tell me if you need me to clarify :)
>>
>> Thanks!
>>
>>
>> On 07/15/2011 10:57 AM, Carter Bullard wrote:
>>> Hey Huy,
>>> I'm starting to consider the implementation of this feature, and its a little complicated, so I
>>> need to talk about it a bit.
>>>
>>> You have asked that we consider partial stream anonymization, where we would anonymize
>>> some IP addresses and not others.  There are a few "gotchas" to be considered here, the
>>> worst one is where a flow needs one of its IP addresses to be anonymized, but the other
>>> IP is not going to be anonymized.  This is a little of a brain teaser, but I do like the idea.
>>>
>>> There are a lot of fields to consider when you anonymize a record.  Time, all network identifiers,
>>> such as ethernet addresses, fragmentation identifiers, TTLs, DSByte encodings, transport
>>> identifiers, like port numbers, sequence numbers, etc....  Many of these attributes are attributes
>>> of the host.  So if I preserve a particular IP address, should I preserve all the other host attributes
>>> that apply to that IP address?  When you think about this, it gets interesting, so what were you
>>> really wanting in your request?
>>>
>>> Should I assume that the decision to anonymize anything in a flow record is based on
>>> whether I anonymize one of the IP addresses in the flow?
>>>
>>> I'll anonymize time regardless of whether the IP address is going to be anonymized or not?
>>>
>>> What do you think?
>>>
>>> Carter
>>>
>>> On Jul 3, 2011, at 12:33 PM, Huy N. Hang wrote:
>>>
>>>> Hey Carter,
>>>>
>>>> That would be awesome! :D
>>>>
>>>>> Hey Huy,
>>>>> I'll have to add two directives, I think to make this convenient.  1) a
>>>>> RANON_PRESERVE_ADDRESS_RANGE directive and 2) a
>>>>> RANON_SPECIFY_ADDRESS_RANGE to override that address range.  This would
>>>>> allow you to anonymize  select IP address
>>>>> ranges. That may take a bit of time, but I'll check it out this week.
>>>>>
>>>>> Cater
>>>>>
>>>>> On Jul 2, 2011, at 8:40 PM, Huy N. Hang wrote:
>>>>>
>>>>>> Hi Carter and other gentlefolks,
>>>>>>
>>>>>> I've been tinkering with Ranonymize to explore its options. I've been
>>>>>> getting it to work on most of what I want, so I'm glad, but I have a
>>>>>> quick
>>>>>> question:
>>>>>>
>>>>>> Can I force ranonymize to anonymize only a subset of IPs? Namely, can I
>>>>>> provide a list of IPs that I wish to anonymize and leave all other IPs
>>>>>> intact?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> ==================================================
>>>> I swear to all that is holy that one day,
>>>> I shall use Elvish and/or Klingon alphabets
>>>> to name the variables in my research papers!
>>>> Revenge can never be more elegant or sweet!
>>>> ==================================================
>>>> Huy N. Hang, Ph.D. student,
>>>> Department of Computer Science and Engineering.
>>>> U.C. Riverside
>>>> ==================================================
>>>>
>>>>
>>
>>
> 


-- 
John Gerth      gerth at graphics.stanford.edu  Gates 378   (650) 725-3273  fax 723-0033
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PangPaxson06_DevilAnon.pdf
Type: application/pdf
Size: 119057 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20110715/a3a0d828/attachment.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cryptopan_xu.pdf
Type: application/pdf
Size: 330576 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20110715/a3a0d828/attachment-0001.pdf>


More information about the argus mailing list