My anonymity is maintained in Tor as long as no single entity can link me to my destination. If an attacker controls the entry and the exit of my circuit, her anonymity can be compromised, as the attacker is able to perform traffic or timing analysis to link my traffic to the destination. For hidden services, this implies that the attacker needs to control the two entry guards used for the communication between the client and the hidden service. This significantly limits the attacker, as the probability that both the client and the hidden service select a malicious entry guard is much lower than the probability that only one of them makes a bad choice.
Our goal is to show that it is possible for a local passive adversary to deanonymize users with hidden service activities without the need to perform end-to-end traffic analysis. We assume that the attacker is able to monitor the traffic between the user and the Tor network. The attacker’s goal is to identify that a user is either operating or connected to a hidden service. In addition, the attacker then aims to identify the hidden service associated with the user.
In order for our attack to work effectively, the attacker needs to be able to extract circuit-level details such as the lifetime, number of incoming and outgoing cells, sequences of packets, and timing information. We discuss the conditions under which our assumptions are true for the case of a network admin/ISP and an entry guard.
Network administrator or ISP: A network administrator (or ISP) may be interested in finding out who is accessing a specific hidden service, or if a hidden service is being run from the network. Under some conditions, such an attacker can extract circuit-level knowledge from the TCP traces by monitoring all the TCP connections between me and my entry guards. For example, if only a single active circuit is used in every TCP connection to the guards, the TCP segments will be easily mapped to the corresponding Tor cells. While it is hard to estimate how often this condition happens in the live network, as users have different usage models, we argue that the probability of observing this condition increases over time.
Malicious entry guard: Entry guard status is bestowed upon relays in the Tor network that offer plenty of bandwidth and demonstrate reliable uptime for a few days or weeks. To become one an attacker only needs to join the network as a relay, keep their head down and wait. The attacker can now focus their efforts to deanonymise users and hidden services on a much smaller amount of traffic. The next step is to observe the traffic and identify what’s going on inside it – something the researchers achieved with technique called website fingerprinting. Because each web page is different the network traffic it generates as it’s downloaded is different too. Even if you can’t see the content inside the traffic you can identify the page from the way it passes through the network, if you’ve seen it before. Controlling entry guards allows the adversary to perform the attack more realistically and effectively. Entry guards are in a perfect position to perform our traffic analysis attacks since they have full visibility to Tor circuits. In today’s Tor network, each OP chooses 3 entry guards and uses them for 45 days on average, after which it switches to other guards. For circuit establishment, those entry guards are chosen with equal probability. Every entry guard thus relays on average 33.3% of a user’s traffic, and relays 50% of a user’s traffic if one entry guard is down. Note that Tor is currently considering using a single fast entry guard for each user. This will provide the attacker with even better circuit visibility which will exacerbate the effectiveness of our attack. This adversary is shown in the figure below:
The Tor project has responded to the coverage generated by the research with an article of its own written by Roger Dingledine, Tor’s project leader and one of the project’s original developers. Fingerprinting home pages is all well and good he suggests, but hidden services aren’t just home pages:
…is their website fingerprinting classifier actually accurate in practice? They consider a world of 1000 front pages, but ahmia.fi and other onion-space crawlers have found millions of pages by looking beyond front pages. Their 2.9% false positive rate becomes enormous in the face of this many pages – and the result is that the vast majority of the classification guesses will be mistakes.