The discussion of false positives is key to this new paper too: Sambuddho‘s paper mentions a false positive rate of 6%. That sounds like it means if you see a traffic flow at one side of the Tor network, and you have a set of 100000 flows on the other side and you‘re trying to find the match, then 6000 of those flows will look like a match. It‘s easy to see how at scale, this „base rate fallacy“ problem could make the attack effectively useless.
And that high false positive rate is not at all surprising, since he is trying to capture only a summary of the flows at each side and then do the correlation using only those summaries. It would be neat (in a theoretical sense) to learn that it works, but it seems to me that there‘s a lot of work left here in showing that it would work in practice. It also seems likely that his definition of false positive rate and my use of it above don‘t line up completely: it would be great if somebody here could work on reconciling them.
For a possibly related case where a series of academic research papers misunderstood the base rate fallacy and came to bad conclusions, see Mike‘s critique of website fingerprinting attacks plus the follow-up paper from CCS this year confirming that he‘s right.