Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added all optional arguments and extraction of loss function values #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rbbby
Copy link

@rbbby rbbby commented Dec 23, 2021

I have updated the train_topic_model function to take any argument accepted by mallet as kwargs with the only difference being that arguments have '-' changed for '_', i.e. num-iterations --> num_iterations. Numeric values can be passed as either numeric or strings.

It is backwards compatible keeping all of the mandatory arguments. The only thing removed is the default value of --optimize-interval 10 from within the function. Instead it uses mallets default value of 0 and can be set manually by adding optimize_interval=10 as an argument. This was done in order to allow for the user to specify hyperparameter values themselves in case optimization is not wanted (for example by setting alpha=0.05).

The functionality to return loss function values gathered during training has also been added. If logperplexity=True, loss values will be scraped from the output and returned in a list (they are still printed as usual). This option is by default set to False.

The subprocess module is used to get loss values. I saw #2 and had the same issue on mac but managed to resolve it (no delay for printing output). I have not tested it on windows however (but think it should work?). If it does not work however, an option could be to use os by default unless logperplexity=True.

@ninpnin
Copy link

ninpnin commented Dec 23, 2021

I tested this version on my Ubuntu machine and it worked fine.

@maria-antoniak
Copy link
Owner

Hello, thank you so much for adding to this project! 🙏

I think some of these changes are very useful but others might take some more thought/testing.

  • Adding the diagnostics file - awesome!
  • Adding optional arguments - awesome!
  • Switching to subprocess - unless we can test on Windows, I'd prefer not to switch. This is meant to be an accessible package (e.g., for teaching), and it needs to run smoothly on Windows. I don't currently have time or access to test on Windows; maybe someone else could do this?
  • Returning log likelihood - maybe. I'd prefer to send this to an output file rather than giving it priority by returning directly from the function. It doesn't have great correlation with human judgments, and sometimes people over-rely on this metric.
  • Removing the optimization default - needs changes. Again, the goal is accessibility, and I want people to have the best chance of getting good topic output without having to be an expert. Turning on hyperparameter optimization by default supports those goals. I think it would make more sense to add this as an optional argument to the function with the default still set to 10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants