Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bow histogram computation #26

Merged
merged 2 commits into from
Nov 2, 2023
Merged

Add Bow histogram computation #26

merged 2 commits into from
Nov 2, 2023

Conversation

ovysotska
Copy link
Owner

This PR adds the Python code to:

  • train the bag of words vocabulary
  • extract image histograms based on pre-trained bag of words vocabulary
  • some tests 😄

In the testing mode, the code takes a folder with images and returns the np.array of Bow features stored in the specified file.

The computed features can be used in the current codebase further, by first turning them into proto files, and then the full pipeline for image_sequence_localizer can be used as usual.

… can train a vocabulary of words. Based on SIFT descriptors
@ovysotska
Copy link
Owner Author

Hey @niosus would you have time to review this one?

Copy link

@niosus niosus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A solid PR, @ovysotska! I left a couple of nitpicks here and there, but feel free to ignore them. I think the code is easy to read and the intent is clear. Added plus for having a couple of tests.

src/python/bow/bow.py Show resolved Hide resolved
return trainImageFiles


def rescaleImageIfNeeded(image):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: very minor, but you seem to have docs for the other functions and not for these. Maybe add docstring to these too?



def computeIDF(descriptorsPerImage, clusters):
"""Compute inverse document frequence (IDF). Here means in how many images does the word occur.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here means in how many images does the word occur.

nit: This reads a bit strange. What did you mean to say here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the IDF (which is inverse document frequency) , here stands for how many images (not documents) does the word occur in.

Comment on lines 89 to 92
descriptors = [
descriptor for descriptors in descriptorsPerImage for descriptor in descriptors
]
descriptors = np.array(descriptors)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know how to improve this, but the naming seems to be overloaded here. The word descriptors is very overused here. Maybe use more descriptive names?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the best. I also don't know now what that suppose to be :)

idfs = computeIDF(descriptorsPerImage, words)

plt.bar(range(0, len(idfs)), idfs)
plt.savefig("idf_" + str(kDefaultClusterSize) + ".png")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do you want to always save these images? Or should it be hidden behind some flag?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neh, not really important. Will just remove it and reduce the dependencies list.



def getVocabulary(imageTrainFolder, vocabularyFile):
if vocabularyFile is not None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can simplify this to:

if vocabularyFile and vocabularyFile.exists():

@ovysotska ovysotska merged commit f7e1f5e into main Nov 2, 2023
2 checks passed
@ovysotska ovysotska deleted the add_sift_bow_support branch November 2, 2023 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants