Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get headers from a user-agent #286

Open
Kikobeats opened this issue Apr 26, 2024 · 3 comments
Open

get headers from a user-agent #286

Kikobeats opened this issue Apr 26, 2024 · 3 comments
Assignees
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@Kikobeats
Copy link

Kikobeats commented Apr 26, 2024

Hello,

I love the library, I have been playing with it. It's very complete with lots of data 👏.

I was wondering if it would be possible to get headers from an input user agent instead of relaying them into browserlist.

So this is supported today:

const { HeaderGenerator, PRESETS } = require('header-generator');
const headerGenerator = new HeaderGenerator(PRESETS.MODERN_WINDOWS_CHROME);
console.log(headerGenerator.getHeaders())

and that is what I'm suggesting:

const { HeaderGenerator } = require('header-generator');

const userAgentString = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15';
const headerGenerator = HeaderGenerator.fromUserAgent(userAgentString);

console.log(headerGenerator.getHeaders())

This would be extremely helpful to have a more granular control to debug which cases can be detected or not.

@B4nan B4nan added the t-tooling Issues with this label are in the ownership of the tooling team. label Apr 29, 2024
@Kikobeats
Copy link
Author

updated with an example!

@barjin
Copy link
Collaborator

barjin commented May 2, 2024

Hello @Kikobeats - and thank you for your interest in this project!

All our generated data is based on collected data from real web traffic. Without going into too much detail, we have a (constantly updating) dataset of user fingerprints. These contain the user-agent string as well as more intricate details (screen resolution, total amount of memory installed in the system etc.)

During the training phase, we take all these attributes and train a Bayesian network on them. Every possible value of any attribute is then expressed as a conditional probability of the "parent" attributes.

Now, this is where the user-agent comes to play. In our Bayesian network, all the fingerprint fields are based on the user-agent field. For example, let's say our training dataset had 5 records in total, 2 with user-agent: 'desktop', 3 with user-agent: 'mobile'. The other fields are based on those - e.g. for screenResolution, the probability distribution of screen sizes will be skewed towards smaller screens with user-agent:mobile. Every fingerprint combination with non-zero conditional probability must have existed in the training data - this way, we ensure we're generating convincing fingerprints all the time.

Because of this, the user-agent strings need to be sampled from our collection of known user-agents. If you were to submit your own free-form user-agent string, it might not be in the conditional probability tables for the other fingerprint fields and the header-generator would not be able to generate the fingerprint.

Unfortunately, this makes this feature a wontfix for me... But we're still curious! Is there a use case you have for this? We'd love to hear it! Hopefully, we'll be able to find another way around the problem you're trying to solve.

Cheers!

@Kikobeats
Copy link
Author

No worries and thanks for the explanation, it's really helpful to understand how the library works.

I asked for that because I already has a collection of most used user agent that is updated periodically:
https://github.com/microlinkhq/top-user-agents/blob/master/src/mobile.json

This data is collected from more than 100M that are performed every month, so the sample is large enough.

In order to simulate real traffic, I want to generate realistic headers based in the user agent as input. I already did some tuning with https-tls about TLS fingerprint but I though that maybe I canse use fingerprint-suite to get realistic browser headers (sec-*, etc).

I noted the library is at the end of the process outputting the headers that is the thing I need, so I tried to play a bit with the code to see if I would get similar headers as output but using an user agent as input.

I still think it's possible if found a way to turn the user agent into an unique browserlist match or any other way to connect it before going to bayesian network 😆 but I totally understand it's not the point of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

3 participants