Researchers Demonstrate New Fingerprinting Attack on Tor Encrypted Traffic
A new analysis of website fingerprinting (WF) attacks aimed at the Tor web browser has revealed that it’s possible for an adversary to glean a website frequented by a victim, but only in scenarios where the threat actor is interested in a specific subset of the websites visited by users.
“While attacks can exceed 95% accuracy when monitoring a small set of five popular websites, indiscriminate (non-targeted) attacks against sets of 25 and 100 websites fail to exceed an accuracy of 80% and 60%, respectively,” researchers Giovanni Cherubin, Rob Jansen, and Carmela Troncoso said in a newly published paper.
Tor browser offers “un-linkable communication” to its users by routing internet traffic through an overlay network, consisting of more than six thousand relays, with the goal of anonymizing the originating location and usage from third parties conducting network surveillance or traffic analysis. It achieves this by building a circuit that traverses via an entry, middle, and exit relay, before forwarding the requests to the destination IP addresses.
On top of that, the requests are encrypted once for each relay to further hinder analysis and avoid information leakage. While the Tor clients themselves are not anonymous with respect to their entry relays, because the traffic is encrypted and the requests jump through multiple hops, the entry relays cannot identify the clients’ destination, just as the exit nodes cannot discern a client for the same reason.
Website fingerprinting attacks on Tor aim to break these anonymity protections and enable an adversary observing the encrypted traffic patterns between a victim and the Tor network to predict the website visited by the victim. The threat model devised by the academics presupposes an attacker running an exit node — so as to capture the diversity of traffic generated by real users — which is then used as a source to collect Tor traffic traces and devise a machine-learning-based classification model atop the gathered information to infer users’ website visits.