Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TF-IDF normalization #870

Merged
merged 16 commits into from
Sep 26, 2024
Merged

Add TF-IDF normalization #870

merged 16 commits into from
Sep 26, 2024

Conversation

VladimirShitov
Copy link
Collaborator

@VladimirShitov VladimirShitov commented Aug 28, 2024

Changelog

Add TF-IDF normalization for ATAC data.

Issue ticket number and link

Contributes to #398

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributor's guide

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI tests succeed!

src/transform/tfidf/config.vsh.yaml Outdated Show resolved Hide resolved
src/transform/tfidf/config.vsh.yaml Outdated Show resolved Hide resolved
src/transform/tfidf/config.vsh.yaml Outdated Show resolved Hide resolved
src/transform/tfidf/config.vsh.yaml Outdated Show resolved Hide resolved
src/transform/tfidf/script.py Show resolved Hide resolved
Copy link
Member

@DriesSchaumont DriesSchaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small things left, should be good afterwards 👍

CHANGELOG.md Outdated Show resolved Hide resolved
src/transform/tfidf/config.vsh.yaml Outdated Show resolved Hide resolved
- name: "--scale_factor"
type: integer
description: Scale factor to multiply the TF-IDF matrix by.
default: 10000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a min here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, numbers less than 0 make little sense. Anything above can theoretically be used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set it to 1. Likely, less then this will not be used

src/transform/tfidf/config.vsh.yaml Show resolved Hide resolved
src/transform/tfidf/script.py Outdated Show resolved Hide resolved
src/transform/tfidf/script.py Outdated Show resolved Hide resolved
Copy link
Member

@DriesSchaumont DriesSchaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @VladimirShitov 👍

@DriesSchaumont DriesSchaumont merged commit 55fa49b into main Sep 26, 2024
5 checks passed
@DriesSchaumont DriesSchaumont deleted the feature/tf-idf-normalization branch September 26, 2024 07:01
dorien-er pushed a commit that referenced this pull request Nov 18, 2024
dorien-er pushed a commit that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants