new ranonymize() tool

Mon Oct 14 15:04:21 EDT 2002

	I expect this will be fine. The case I posted was us trying to see if
we could publish our traces from our network while preserving all the timing
issues to allow traffic research (much like the traces from CAIDA). 
Unfortunatly in that case we don't want to change the timing relationships in 
the traffic, but given the constraints I don't think that is possible for the 
reasons I posted. For things that don't care so much about the overall 
relationship of traffic (i.e. attack signatures that aren't timeing sensitive
or more correctly timing change sensitve) your anonymizer looks to do the job
just fine and is a valuable addition.

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada

> 
> Hey Peter,
> I should describe how ranonymize() anonymizes IPv4 addresses
> so we can see what kind of problems might exist.  ranonymize()
> provides several methods for address anonymization, I'll describe
> the default one to start out.
> 
> IPv4 addresses are anonymized using a non-cryptographic Class based
> 24-bit prefix preserving sequential allocation strategy, which
> is not distributable.  So what does this mean.  All IPv4
> addresses are treated as having a 24-bit netmask.  Each unique
> 24-bit network address is translated to a reserved 24-bit netmask
> from the same Class, sequentially on a first come basis.  So a
> Class A address is assigned a reserved Class A network part, a
> Class B address is assigned a reserved ClassB network part.  These
> addresses are allocated sequentially, so that the first Class A address
> encountered in s stream will get 1.0.1, the second will get 1.0.2.
> Class B's start with 100.0.1, and Class C's start with 197.0.1.
> Multicast addresses start with 224.0.1.  You can specify exceptions
> and specific net or complete address translations, so there is some
> flexibility.
> 
> The 8-bit host part is allocated sequentially, starting with 1.
> I've included an example below.
> 
> Once an address has been allocated, any occurrence of that address
> in any part of an argus record in the stream is translated to the
> new anonymized address, using a hashed lookup strategy.
> 
> This approach provides a Class preserving, 24-bit prefix preserving
> anonymizing strategy that is pseudo-random, but not distributable.
> 
> Since addresses would arrive in an argus() stream somewhat randomly,
> you get a pseudo-random assignment.  This helps to assure that
> two independent anonymizers using the same algorithms, seeds and
> everything, anonymizing argus data streams from differing parts of
> the network, will not anonymize transactions to the same anonymized
> addresses.  However for research purposes, this may not be what
> we're looking for and a keyed version of this should allow us to
> provide distributable anonymization.
> 
> Simple method, pretty fast.  Uses memory, so persistent anonymization
> will grow to hold the growing translation table.  So what do you think?
> If I gave you the traces below, am I in trouble?  (bytes and packet
> counts are the only metrics not anonymized, and differential stats,
> like transaction duration are also preserved, so there are opportunities
> for comparison, but if the bad guys are not on the same network, its
> going to be a challenge, to find common transactions, and if they
> break one 24-bit network assignment, they don't get any others).
> 
> Carter
> 
> 
> [qosient at isis tmp]$ ra !*
>  ra -nr argus.out -p3 -s startime proto saddr sport dir daddr dport
> status
> 
>          StartTime      Type     SrcAddr      Sport Dir     DstAddr
> Dport State
> 2002/10/08.15:59:53.759  tcp   192.168.0.161.1661    ->
> 66.12.27.73.5190   CON
> 2002/10/08.15:59:54.748  tcp    192.168.0.64.3997    ->
> 215.92.197.167.110    FIN
> 2002/10/08.15:59:54.760  tcp    192.168.0.64.3999    ->
> 216.46.170.10.110    FIN
> 2002/10/08.16:00:01.124  tcp    192.168.0.64.4002    ->
> 236.92.197.167.110    FIN
> 2002/10/08.16:00:35.634  udp    192.168.0.16.1102   <->
> 149.192.0.38.53     CON
> 2002/10/08.16:00:35.657  tcp   192.168.0.161.1835    ->
> 66.94.185.200.80     CON
> 2002/10/08.16:00:41.109  tcp   192.168.0.161.1835    ->
> 66.94.185.200.80     RST
> 2002/10/08.16:00:48.652  tcp   192.168.0.161.1656    ->
> 62.124.26.194.5190  CON
> 2002/10/08.16:00:51.766  tcp   192.168.0.161.1661    ->
> 61.12.27.73.5190   CON
> 
> [qosient at isis tmp]$ ranonymize !*
> ranonymize -nr argus.out -p3 -s startime proto saddr sport dir daddr
> dport status
> 
>          StartTime      Type     SrcAddr      Sport Dir     DstAddr
> Dport State
> 1996/05/05.00:41:05.297  tcp       197.0.1.3.13461   ->
> 1.0.2.1.16990  CON
> 1996/05/05.00:41:06.286  tcp       197.0.1.4.15797   ->
> 197.0.2.1.110    FIN
> 1996/05/05.00:41:06.297  tcp       197.0.1.4.15799   ->
> 197.0.3.1.110    FIN
> 1996/05/05.00:41:12.661  tcp       197.0.1.4.15802   ->
> 197.0.2.1.110    FIN
> 1996/05/05.00:41:47.172  udp       197.0.1.5.12902  <->
> 100.0.1.1.53     CON
> 1996/05/05.00:41:47.194  tcp       197.0.1.3.13635   ->
> 1.0.3.1.80     CON
> 1996/05/05.00:41:52.646  tcp       197.0.1.3.13635   ->
> 1.0.3.1.80     RST
> 1996/05/05.00:42:00.189  tcp       197.0.1.3.13456   ->
> 1.0.4.1.16990  CON
> 1996/05/05.00:42:03.304  tcp       197.0.1.3.13461   ->
> 1.0.2.1.16990  CON
> 
> 
> 
> -----Original Message-----
> From: owner-argus-info at lists.andrew.cmu.edu
> [mailto:owner-argus-info at lists.andrew.cmu.edu] On Behalf Of Peter Van
> Epp
> Sent: Thursday, October 10, 2002 4:16 PM
> To: argus
> Subject: Re: new ranonymize() tool
> 
> 
> 	Without (yet) having looked at Carter's new tool here are some
> thoughts
> on this subject from a discussion some months ago  about putting Argus
> up 
> locally and being able to release the traffic traces for network
> researchers. 
> Note in this case we want to keep at least destination port numbers to
> allow 
> researchers to determine what kind of traffic it was and keep the time 
> syncronization (possibly offset by a constant amount to obscure it
> slightly). 
> A later look over the CAIDA web site indicates they don't have a
> solution 
> either, the anomymiser they use is fairly simple and doesn't appear to
> address 
> the issues raised below.
> 
> 	 A fly in the anonymous ointment. Unfortunatly I thought about
> the 
> issue of anonymizing trace data on the way back to the hill. It is
> essentially
> cryptography (we want to encryt the data but not decrypt it) which is 
> unfortunatly trivially subject to a chosen plaintext attack which will
> defeat
> the encryption (and thus the anonymity).
> 	If we postulate the following users: I (innocent victem) A
> (scumbag
> attacker) and sites AS (attacker's site) IS (innocent victem's site) P1
> (porno
> site 1) and p2 (porno site 2) then look at the possibilities in
> anonymized
> trace data we find a problem. Assume we have anonymized both IP
> addresses by 
> random translation and shifted time by a fixed amount to try and defeat
> traffic 
> pattern analysis as we discussed this morning. Unfortunatly since we are
> on a 
> public network, if we assume the attacker can identify the victem and
> determine 
> the IP address the victem is using then our entire scheme can be
> defeated as 
> follows:
> 
> A pings (logging the current time on machine AS) the victem's machine
> IS, 
> P1, and P2. He may need to ping in an unusual pattern to make the
> pattern 
> stand out in that anonymized logfile. Now the attacker obtains the
> anonymized
> trace file for the time period described above. By sorting all the data
> by
> source and dest IP address he can pick out the ping pattern that he
> initiated
> above. He knows his IP address (and now what his IP address has
> translated in
> to in the anonymous trace, no net gain here). Unfortunatly by the first
> ping
> made by his machine (who's anonymous ID he now knows) he has identified
> the 
> anonymized IP address of the victem's machine IS. The next 2 pings give
> him 
> the anonymized IP addresses of porn sites p1 and p2. Now a search of the
> trace 
> file for anonymized IS for connections to anonymized p1 and p2 will tell
> the 
> attacker if the victem IP address has accessed the porn sites which is
> what we 
> are trying to prevent. On the way by (given the time stamps in our trace
> file 
> and the real time from his local log) he has also extracted the fixed
> time 
> offset we used and can trivially convert the trace file back to real
> time.
> I'm not sure thats deadly, but it does make the time shift idea not
> really 
> useful for defeating traffic analysis attacks.
> 	This may make an interesting problem for a grad student
> interested in 
> crypto since there may be a solution (although I have a sneaking
> suspicion 
> because of the uncontrolled nature of the public net there isn't ...).
> We 
> should also ask the CIADA folks how they deal with this problem in their
> traces
> (or if indeed they have thought of this issue, although I hope they
> have). We 
> do need to make the risk clear to the bosses that have to approve this
> being 
> done. I'm pretty sure Worth was assuming that I meant that the data
> would be 
> anonymous (which I just demonstrated it isn't) when he said he thought
> he 
> could get permission to release our traces. In the end all it may mean
> is that 
> we have to restrict distribution of trace files more than we would like
> (i.e. 
> researchers in I2 and elsewhere may not be deemed safe enough ...).
> 	Happy paranoia day :-)
> 
> Peter Van Epp / Operations and Technical Support 
> Simon Fraser University, Burnaby, B.C. Canada
> 
> 
>