new ranonymize() tool
Carter Bullard
carter at qosient.com
Thu Oct 10 20:32:02 EDT 2002
Hey Peter,
It is unclear that anonymized data must be a one-way
function in all conditions in order to be useful. If,
for example, all traffic in the anonymized set is local
traffic, how can there be a threat to the anonymization
strategy when giving data to someone outside the local net?
I think there are a number of constraints that can be
placed on anonymized data that can make it secure. Even
cryptography works only in the presence of constraints,
such as keeping the private key private.
Carter
Carter Bullard
QoSient, LLC
300 E. 56th Street, Suite 18K
New York, New York 10022
carter at qosient.com
Phone +1 212 588-9133
Fax +1 212 588-9134
http://qosient.com
-----Original Message-----
From: owner-argus-info at lists.andrew.cmu.edu
[mailto:owner-argus-info at lists.andrew.cmu.edu] On Behalf Of Peter Van
Epp
Sent: Thursday, October 10, 2002 4:16 PM
To: argus
Subject: Re: new ranonymize() tool
Without (yet) having looked at Carter's new tool here are some
thoughts
on this subject from a discussion some months ago about putting Argus
up
locally and being able to release the traffic traces for network
researchers.
Note in this case we want to keep at least destination port numbers to
allow
researchers to determine what kind of traffic it was and keep the time
syncronization (possibly offset by a constant amount to obscure it
slightly).
A later look over the CAIDA web site indicates they don't have a
solution
either, the anomymiser they use is fairly simple and doesn't appear to
address
the issues raised below.
A fly in the anonymous ointment. Unfortunatly I thought about
the
issue of anonymizing trace data on the way back to the hill. It is
essentially
cryptography (we want to encryt the data but not decrypt it) which is
unfortunatly trivially subject to a chosen plaintext attack which will
defeat
the encryption (and thus the anonymity).
If we postulate the following users: I (innocent victem) A
(scumbag
attacker) and sites AS (attacker's site) IS (innocent victem's site) P1
(porno
site 1) and p2 (porno site 2) then look at the possibilities in
anonymized
trace data we find a problem. Assume we have anonymized both IP
addresses by
random translation and shifted time by a fixed amount to try and defeat
traffic
pattern analysis as we discussed this morning. Unfortunatly since we are
on a
public network, if we assume the attacker can identify the victem and
determine
the IP address the victem is using then our entire scheme can be
defeated as
follows:
A pings (logging the current time on machine AS) the victem's machine
IS,
P1, and P2. He may need to ping in an unusual pattern to make the
pattern
stand out in that anonymized logfile. Now the attacker obtains the
anonymized
trace file for the time period described above. By sorting all the data
by
source and dest IP address he can pick out the ping pattern that he
initiated
above. He knows his IP address (and now what his IP address has
translated in
to in the anonymous trace, no net gain here). Unfortunatly by the first
ping
made by his machine (who's anonymous ID he now knows) he has identified
the
anonymized IP address of the victem's machine IS. The next 2 pings give
him
the anonymized IP addresses of porn sites p1 and p2. Now a search of the
trace
file for anonymized IS for connections to anonymized p1 and p2 will tell
the
attacker if the victem IP address has accessed the porn sites which is
what we
are trying to prevent. On the way by (given the time stamps in our trace
file
and the real time from his local log) he has also extracted the fixed
time
offset we used and can trivially convert the trace file back to real
time.
I'm not sure thats deadly, but it does make the time shift idea not
really
useful for defeating traffic analysis attacks.
This may make an interesting problem for a grad student
interested in
crypto since there may be a solution (although I have a sneaking
suspicion
because of the uncontrolled nature of the public net there isn't ...).
We
should also ask the CIADA folks how they deal with this problem in their
traces
(or if indeed they have thought of this issue, although I hope they
have). We
do need to make the risk clear to the bosses that have to approve this
being
done. I'm pretty sure Worth was assuming that I meant that the data
would be
anonymous (which I just demonstrated it isn't) when he said he thought
he
could get permission to release our traces. In the end all it may mean
is that
we have to restrict distribution of trace files more than we would like
(i.e.
researchers in I2 and elsewhere may not be deemed safe enough ...).
Happy paranoia day :-)
Peter Van Epp / Operations and Technical Support
Simon Fraser University, Burnaby, B.C. Canada
More information about the argus
mailing list