Development: A lot of refactoring and documentation. #2

diegovalenzuelaiturra · 2019-08-28T16:32:16Z

When trying to use multiple threads for data loading, I've faced this 2 issues:

HDF5 concurrent reads aren't safe.
h5py objects cannot be pickled.

We should aim to fix this behaviour in order to use multiple threads in Dataloaders. For now, I've fix num_workers = 1 for Dataloaders.

Added config.json file that contains most of the configuration parameters for running train.py
Now using BCEWithLogitsLoss loss instead of MSE loss.
Added a HilbertMapper class.
Added train_step and validation_step methods to Autoencoder module, both are now used in the training script.
Added process_batch method to utils module.
Added encode and decode methods to general Autoencoder class.
Added set_configs method to utils, to configure device and seeds, etc.
Added save_checkpoint, load_from_checkpoint and create_folders methods.
Added Tensorboard writer, currently logging model parameters and training and validation losses.
Added Upsampling module in Encoder Input and Downsampling module in Decoder Output.
Added Reshape module to be used in conjunction with Sequential module.
Added a Meter and Accumulator Classes.
Added a get_train_dev_sets method to split the dataset.
Added some code to experiment with apex NVIDIA Mixed precision training to be used in GPUs that support this feature. The apex library should be pre-installed, more information about it can be found here: https://github.com/NVIDIA/apex
To create the dataset file, now run the export_dataset script. The build_dataloader_from_disk method now returns train and dev dataloaders.
Updated documentation.
Included type annotations.
Renamed some methods and arguments.
Deleted useless comments.
Some references are included in the Readme file.
Currently using tqdm progress bar.
Added some methods to be implemented in future.
Updated LICENSE.

Python Version:

I'm using Python 3.7

Environment:

I'm using a conda environment.

I'm using some linters:

pydocstyle for Documentation.
mypy for Type Annotations
pylint for Python Code

And I'm also using a formatter:

yapf formater

…m_disk methods

…methods process_file method now recives a callable mapper

…h process_file method in datasets

… renamed some arguments

updated documentation and type annotations, deleted useless comments, renamed some arguments

…bertMapper Class

added **kwargs, change Optional for Union for type hinitng,

…autoencoder

…alidation error, and log model weights to tensorboard

…into development

…he other arguments weren't used

In case we want to use other loss, we should consider adding a sigmoid layer to the decoder output (or something alike)

diegovalenzuelaiturra added 30 commits August 25, 2019 23:58

Added install_requiers and tests_require

4b59775

Adding some comments and renaming some parameters

dc4d467

added some tests for utils

f01ab8d

fix torch version in requirements

f98af20

updated dataloaders, Added HilbertDataLoader and build_dataloader_fro…

6cf522c

…m_disk methods

Updated LanguageModelDataset, Added HilbertDataset, and process_file …

93a9b5d

…methods process_file method now recives a callable mapper

updated hints and documentation

937f7aa

Added HilbertMapper Class to be used as a callable in conjunction wit…

8461865

…h process_file method in datasets

moved a TODO message

7d257b0

Added horoscopo.py file

dc0beea

delete useless comment

932a629

minor change

1c45396

Just doing some refactor

746ac3a

minor change

fab091a

updated documentation and type annotations, deleted useless comments,…

652e294

… renamed some arguments

Added some TODOs to make it more general

b884709

updated documentation and type annotations, deleted useless comments, renamed some arguments

Added pad_collate as static_method of the PadCollate Class

cb1be74

updated documentation and type annotations, deleted useless comments, renamed some arguments

Added hilbert_curve and sequence2hilbert as static_methods of the Hil…

9e9673d

…bertMapper Class

Updated test in relation to the changes that has been introduced.

2e70537

Added *.h5 to gitignore

0f1b366

Updated main and fixed HilbertMapper bug

66c1197

added **kwargs, change Optional for Union for type hinitng,

Minor updates to main file.

4593923

Updated hints annotations

f8d8a04

Minor documentation changes

c0f00e0

Moved HilbertMapper Class to a different file

1e251c8

Fixed import issue, and renaming minibatch to batch

bcf95c0

moved create_folders and get_args method to utils

206f28f

autoencoder now returns reconstructed input and latent representation

1c1f677

Deleted DataLoader.py

e56c32f

Added train_step and validation_step methods

e43701d

diegovalenzuelaiturra and others added 23 commits September 22, 2019 18:06

experimenting with apex

e78c643

fix condition to raise error

12a4ec9

added some comments for clip gradients when using apex

b667f13

replaced a try_catch with an if statement

0316f13

updated with changes in build_dataloader_from_disk

eee0799

edited config values and added some comments in utils

435023d

fixed typo

9f07f5b

Created in Colaboratory

6aa0c7a

Created in Colaboratory

89f28c6

added a Reshape module, to use with Sequential. Added a vanilla conv …

afb61f8

…autoencoder

added params to vanilla autoencoder

f414a87

Upgraded with vanilla autoencoder

33c6806

save_checkpoint now creates the output folder if it doesn't exist

986d5a1

Upgraded with vanilla autoencoder. Now save best model according to v…

05e6e25

…alidation error, and log model weights to tensorboard

Merge branch 'development' of https://github.com/nlpchile/Hilbert-AE …

9de5e70

…into development

Added a TODO

4eab447

added batchnorm to simple_autoencoder

32d619b

added a --notebook flag

4f8b67f

returned to the convolutional autoencoder we were using before

cb6ebd9

added unit arg to tqdm

2748b79

"reduction" seems to be a common argument for different losses, and t…

4a7047b

…he other arguments weren't used

returned to autoencoder instead of simple_autoencoder

4851812

Changed MSE loss with BCEWithLogitsLoss

a9bc359

In case we want to use other loss, we should consider adding a sigmoid layer to the decoder output (or something alike)

diegovalenzuelaiturra changed the title ~~Development:~~ Development: A lot of refactoring and documentation. Sep 25, 2019

diegovalenzuelaiturra requested a review from ribanez September 25, 2019 02:50

diegovalenzuelaiturra self-assigned this Dec 2, 2019

added test

aa78c92

diegovalenzuelaiturra added 2 commits March 21, 2020 17:12

added tqdm to process_dataset method

7eba2c2

updated requirements

e46b0f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development: A lot of refactoring and documentation. #2

Development: A lot of refactoring and documentation. #2

diegovalenzuelaiturra commented Aug 28, 2019 •

edited

Loading

Development: A lot of refactoring and documentation. #2

Are you sure you want to change the base?

Development: A lot of refactoring and documentation. #2

Conversation

diegovalenzuelaiturra commented Aug 28, 2019 • edited Loading

diegovalenzuelaiturra commented Aug 28, 2019 •

edited

Loading