Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development: A lot of refactoring and documentation. #2

Open
wants to merge 128 commits into
base: master
Choose a base branch
from

Conversation

diegovalenzuelaiturra
Copy link
Member

@diegovalenzuelaiturra diegovalenzuelaiturra commented Aug 28, 2019

  • When trying to use multiple threads for data loading, I've faced this 2 issues:
  1. HDF5 concurrent reads aren't safe.

  2. h5py objects cannot be pickled.

We should aim to fix this behaviour in order to use multiple threads in Dataloaders. For now, I've fix num_workers = 1 for Dataloaders.

  • Added config.json file that contains most of the configuration parameters for running train.py

  • Now using BCEWithLogitsLoss loss instead of MSE loss.

  • Added a HilbertMapper class.

  • Added train_step and validation_step methods to Autoencoder module, both are now used in the training script.

  • Added process_batch method to utils module.

  • Added encode and decode methods to general Autoencoder class.

  • Added set_configs method to utils, to configure device and seeds, etc.

  • Added save_checkpoint, load_from_checkpoint and create_folders methods.

  • Added Tensorboard writer, currently logging model parameters and training and validation losses.

  • Added Upsampling module in Encoder Input and Downsampling module in Decoder Output.

  • Added Reshape module to be used in conjunction with Sequential module.

  • Added a Meter and Accumulator Classes.

  • Added a get_train_dev_sets method to split the dataset.

  • Added some code to experiment with apex NVIDIA Mixed precision training to be used in GPUs that support this feature. The apex library should be pre-installed, more information about it can be found here: https://github.com/NVIDIA/apex

  • To create the dataset file, now run the export_dataset script. The build_dataloader_from_disk method now returns train and dev dataloaders.

  • Updated documentation.

  • Included type annotations.

  • Renamed some methods and arguments.

  • Deleted useless comments.

  • Some references are included in the Readme file.

  • Currently using tqdm progress bar.

  • Added some methods to be implemented in future.

  • Updated LICENSE.

Python Version:

  • I'm using Python 3.7

Environment:

  • I'm using a conda environment.

I'm using some linters:

  • pydocstyle for Documentation.
  • mypy for Type Annotations
  • pylint for Python Code

And I'm also using a formatter:

  • yapf formater

…methods

process_file method now recives a callable mapper
updated documentation and type annotations, deleted useless comments, renamed some arguments
updated documentation and type annotations, deleted useless comments, renamed some arguments
 added **kwargs, change Optional for Union for type hinitng,
…alidation error, and log model weights to tensorboard
In case we want to use other loss, we should consider adding a sigmoid layer to the decoder output (or something alike)
@diegovalenzuelaiturra diegovalenzuelaiturra changed the title Development: Development: A lot of refactoring and documentation. Sep 25, 2019
@diegovalenzuelaiturra diegovalenzuelaiturra self-assigned this Dec 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant