These are the main requirements for the SILNLP code to run on a local machine. Since there are many Python packages that need to be used with complex versioning requirements, we use a Python package called Poetry to mangage all of those. So here is a rough heirarchy of SILNLP with the major dependencies.
Requirement | Reason |
---|---|
GIT | to get the repo from github |
Python | to run the silnlp code |
Poetry | to manage all the Python packages and versions |
NVIDIA GPU | Required to run on a local machine |
Nvidia drivers | Required for the GPU |
CUDA Toolkit | Required for the Machine learning with the GPU |
Environment variables | To tell SILNLP where to find the data, etc. |
The SILNLP code can be run on either Windows or Linux operating systems. If using an Ubuntu distribution, the only compatible version is 20.04.
Download and install the following before creating any projects or starting any code, preferably in this order to avoid most warnings:
-
If using a local GPU: NVIDIA driver
- On Ubuntu, the driver can alternatively be installed through the GUI by opening Software & Updates, navigating to Additional Drivers in the top menu, and selecting the newest NVIDIA driver with the labels proprietary and tested.
- After installing the driver, reboot your system.
-
Python 3.8 (latest minor version, ie 3.8.19)
- Can alternatively install Python using miniconda if you're planning to use more than one version of Python. If following this method, activate your conda environment before installing Poetry.
-
- Note that whether the command should call python or python3 depends on which is required on your machine.
- It may (or may not) be possible to run the curl command within a VS Code terminal. If that causes permission errors close VS Code and try it in an elevated CMD prompt.
Windows: At an administrator CMD prompt or a terminal within VS Code run:
curl -sSL https://install.python-poetry.org | python - --version 1.7.1
In Powershell, run:
(Invoke-WebRequest -Uri https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py -UseBasicParsing).Content | python
Linux / macOS: In terminal, run:
curl -sSL https://install.python-poetry.org | python3 - --version 1.7.1
Add the following line to your .bashrc file (Linux) or .profile file (macOS) in your home directory:
export PATH="$HOME/.local/bin:$PATH"
-
C++ Redistributable
- Note - this may already be installed. If it is not installed you may get cryptic errors such as "System.DllNotFoundException: Unable to load DLL 'thot' or one of its dependencies"
- Windows: Download from https://support.microsoft.com/en-us/topic/the-latest-supported-visual-c-downloads-2647da03-1eea-4433-9aff-95f26a218cc0 and install
- Linux: Instead of installing the redistributable, run the following commands:
sudo apt-get update sudo apt-get install build-essential gdb
- Install Visual Studio Code
- Install Python extension for VS Code
- Open up silnlp folder in VSC
- In CMD window, type
poetry install
to create the virtual environment for silnlp- If using conda, activate your conda environment first before
poetry install
. Poetry will then install all the dependencies into the conda environment.
- If using conda, activate your conda environment first before
- Choose the newly created virtual environment as the "Python Interpreter" in the command palette (ctrl+shift+P)
- If using conda, choose the conda environment as the interpreter
- Open the command palette and select "Preferences: Open User Settings (JSON)". In the
settings.json
file, add the following options:"python.formatting.provider": "black", "python.linting.pylintEnabled": true, "editor.formatOnSave": true,
See S3 bucket setup.
See ClearML setup.
- Home directory ($HOME) on windows is usually C:\Users<Username>; on linux it is /home/username; and on macOS it is /Users//. In your home directory, create the following directories.
- Create the directory "$HOME/.cache/silnlp"
- Create the directory "$HOME/.cache/silnlp/experiments" and set the environment variable SIL_NLP_CACHE_EXPERIMENT_DIR to that path.
- Create the directory "$HOME/.cache/silnlp/projects" and set the environment variable SIL_NLP_CACHE_PROJECT_DIR to that path.
- Set the following environment variables with your respective credentials: CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY.
- Set SIL_NLP_DATA_PATH to "/silnlp" and CLEARML_API_HOST to "https://api.sil.hosted.allegro.ai".
See the wiki for information on setting up and running experiments. The most important pages for getting started are the ones on file structure, model configuration, and running experiments. A lot of the instructions are specific to NMT, but are still helpful starting points for doing other things like alignment.
See this page for information on using the VS code debugger.
If you need to use a tool that is supported by SILNLP but is not installable as a Python library (which is probably the case if you get an error like "RuntimeError: eflomal is not installed."), follow the appropriate instructions here.
Linux / macOS users: To set environment variables permanently, add each variable as a new line to the .bashrc
file (Linux) or .profile
file (macOS) in your home directory with the format
export VAR="VAL"
Close and reopen any open terminals for the changes to take effect.
Windows:
- Open Settings and go to the System tab.
- Under the "Device Specifications" section, in the "Related links", click "Advanced system settings".
- Click "Environment Variables".
- In the "System Variables" section, click "New".
- Enter the name and value of the variable and click "Ok". Repeat for as many variables as you need.
- Click "Ok" on the Environment Variables page to save your changes.
- Close and reopen any open command prompt terminals for the changes to take effect.
If you need to run the .NET versions of the Machine alignment models, you will need to install .NET Core SDK 8.0. After installing, run dotnet tool restore
.
- Windows: .NET Core SDK
- Linux: Installation instructions can be found here.
- macOS: Installation instructions can be found here.