Added all optional arguments and extraction of loss function values #8

rbbby · 2021-12-23T13:25:24Z

I have updated the train_topic_model function to take any argument accepted by mallet as kwargs with the only difference being that arguments have '-' changed for '_', i.e. num-iterations --> num_iterations. Numeric values can be passed as either numeric or strings.

It is backwards compatible keeping all of the mandatory arguments. The only thing removed is the default value of --optimize-interval 10 from within the function. Instead it uses mallets default value of 0 and can be set manually by adding optimize_interval=10 as an argument. This was done in order to allow for the user to specify hyperparameter values themselves in case optimization is not wanted (for example by setting alpha=0.05).

The functionality to return loss function values gathered during training has also been added. If logperplexity=True, loss values will be scraped from the output and returned in a list (they are still printed as usual). This option is by default set to False.

The subprocess module is used to get loss values. I saw #2 and had the same issue on mac but managed to resolve it (no delay for printing output). I have not tested it on windows however (but think it should work?). If it does not work however, an option could be to use os by default unless logperplexity=True.

…n to return loss values in a list

ninpnin · 2021-12-23T13:56:37Z

I tested this version on my Ubuntu machine and it worked fine.

maria-antoniak · 2022-01-18T20:07:19Z

Hello, thank you so much for adding to this project! 🙏

I think some of these changes are very useful but others might take some more thought/testing.

Adding the diagnostics file - awesome!
Adding optional arguments - awesome!
Switching to subprocess - unless we can test on Windows, I'd prefer not to switch. This is meant to be an accessible package (e.g., for teaching), and it needs to run smoothly on Windows. I don't currently have time or access to test on Windows; maybe someone else could do this?
Returning log likelihood - maybe. I'd prefer to send this to an output file rather than giving it priority by returning directly from the function. It doesn't have great correlation with human judgments, and sometimes people over-rely on this metric.
Removing the optimization default - needs changes. Again, the goal is accessibility, and I want people to have the best chance of getting good topic output without having to be an expert. Turning on hyperparameter optimization by default supports those goals. I think it would make more sense to add this as an optional argument to the function with the default still set to 10.

Added functionality to take any additional mallet arguments and optio…

22f5dae

…n to return loss values in a list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added all optional arguments and extraction of loss function values #8

Added all optional arguments and extraction of loss function values #8

rbbby commented Dec 23, 2021

ninpnin commented Dec 23, 2021

maria-antoniak commented Jan 18, 2022

Added all optional arguments and extraction of loss function values #8

Are you sure you want to change the base?

Added all optional arguments and extraction of loss function values #8

Conversation

rbbby commented Dec 23, 2021

ninpnin commented Dec 23, 2021

maria-antoniak commented Jan 18, 2022