Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When visiting links, trackers from one page may incorrectly be attributed to the link target #333

Open
philipp-classen opened this issue Sep 26, 2023 · 8 comments
Assignees
Labels

Comments

@philipp-classen
Copy link
Member

This has been reported on the Ghostery Extension already (ghostery/ghostery-extension#1241), but worth keeping here also. When following links, it can happen that trackers present the source website may be attributed to the target website.

A good example is the Ghostery Search, but it affects other sites as well:
https://whotracks.me/websites/ghosterysearch.com.html

Screenshot taken on 2023-09-26:
ghostery-search

It lists various Google services (including Google Analytics), even though it was never used on the site. The numbers are relatively low (<5%), especially compared to resources present on the site (e.g. jsdelivery with 95.6%). Still, it breaks some metrics (e.g. number of trackers present).

Since the bug is client side, it is not something that we can expect to fix here (or in the processing pipeline). However, if it is fixed, ghosterysearch.com is a useful test page to verify that the false-positives go down.

@philipp-classen
Copy link
Member Author

Also, stats like Google trackers are present on 75% of the web traffic (Oct 2023) are most likely inflated.

@philipp-classen
Copy link
Member Author

Looks like the stats are affected only on Firefox and is caused by the webNavigation.onBeforeNavigate listener firing twice (see https://bugzilla.mozilla.org/show_bug.cgi?id=1732564). Though it is an old bug, it used to happen only in rare edge cases; but recent architectural changes in Firefox have increased the likelihood of being hit.

The proper solution is to fix it in the browser, but until then we can already apply workarounds to filter out duplicated event (see whotracksme/webextension-packages#58).

@ghostwords
Copy link

Looks like the stats are affected only on Firefox

Are you sure?

  1. Install Ghostery into a fresh Chrome profile
  2. Do the opt in/enable thing
  3. Visit google.com
  4. Visit example.com
  5. See some Google tracker reported on example.com. This doesn't always happen, but if you try a few times, it eventually will.

@philipp-classen
Copy link
Member Author

@ghostwords Thanks, we have to check that. Could be that there are more paths affected.

@philipp-classen
Copy link
Member Author

@chrmod I remember you found a problem also with the adblocker, which could explain the behavior on Chrome. Very likely, we have at least one bug left; though it is not clear yet if it also affects the WhoTracks.me collection. (WhoTracks.me is built only on anti-tracking messages, so the question is if the remaining problems are isolated to the UI, or if they also affect anti-tracking messages.)

@philipp-classen
Copy link
Member Author

philipp-classen commented Jun 17, 2024

Still applies: https://www.ghostery.com/whotracksme/websites/ghosterysearch.com

I'm not sure if it a reasonable expectation to end with zero noise. Maybe we should implement thresholds and do not count trackers that do not exceed them.

We changed the processing recently to consider only data from the latest Ghostery 8 clients and from Ghostery 10. The effects will be visible only in about two months though (with the July release at begin of August).

@philipp-classen
Copy link
Member Author

philipp-classen commented Sep 11, 2024

We could narrow it down to at least two classes of misattributions:

  1. When starting a navigation through the URL bar, requests from the currently open tab can leak
  2. When doing back-and-forth navigations, requests can leak

We expect 1) to be more common. It could also explain why search engines seem to be especially affected by the problem.

A good way to reproduce is to open a news site, and then use the URL bar to navigate to ghosterysearch.com (either by typing or pasting a link). In ghostery.WTM.webRequestReporter.webRequestPipeline.pageStore.staged, requests to Google domains ended up being associated with the URL https://ghosterysearch.com:
page

@philipp-classen
Copy link
Member Author

One way to reliably reproduce with in Ghostery 10 in Chrome is as follows:

  • Enable prefetching
  • Go to tiktok.com and let it play a video (the browser will now make constantly requests to TikTok domains)
  • In the URL bar, type ghosterysearch.com and confirm

The result is that some requests will be associated with ghosterysearch.com. In the examples that I have seen, they were all aborted or cached requests:
error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants