Skip to content

Latest commit

 

History

History

docker

LinkedHypernymDataset in Docker

A docker image of the current LHD extraction framework with all required dependencies (Gate, TreeTagger, etc.) can be created by a following docker build script:

docker build -t lhd:latest https://github.com/KIZI/LinkedHypernymsDataset.git#:docker

After the docker image has been successfully built, you can use the image for running of an extraction process by this command:

docker run --name lhd -d [-v </path/to/host/output>:/root/LinkedHypernymsDataset/data/output] --env-file <path-to-env-vars-file> lhd <language(en|de|nl)> <dbpedia-version>

Example:

docker run --name lhd -d --env-file examples/datasets_en lhd en 2015-10

or

docker run --name lhd -d -v /tmp/output:/root/LinkedHypernymsDataset/data/output --env-file examples/datasets_en lhd en 2015-10

After running an LHD docker container from the image, the extraction process is being in progress. It can take several hours or days - it depends on the number of available cores and the size of input datasets. After the completion of the extraction process, the docker container will contain all linked hypernym datasets for the selected language that are placed in the data/output directory. It only remains to copy datasets from the container to your local disk for other purposes (you can specify mounting of this directory to the host by volume settings):

docker cp lhd:/root/LinkedHypernymsDataset/data/output ./

The output directory will be copied to your local disk. It contains basic LHD datasets with other auxiliary files that have been created in different steps of the extraction process. Most important datasets, that you are probably looking for, are:

  • <lang>.lhd.core.<version>.nt
  • <lang>.lhd.raw.<version>.nt
  • <lang>.lhd.extension.<version>.nt
  • <lang>.lhd.inference.<version>.nt

See docs - result section. With regard to this you may prefer these copying commands (<lang> is a selected language and <version> is a used dbpedia version):

docker cp lhd:/root/LinkedHypernymsDataset/data/output/<lang>.lhd.core.<version>.nt ./
docker cp lhd:/root/LinkedHypernymsDataset/data/output/<lang>.lhd.raw.<version>.nt ./
docker cp lhd:/root/LinkedHypernymsDataset/data/output/<lang>.lhd.extension.<version>.nt ./
docker cp lhd:/root/LinkedHypernymsDataset/data/output/<lang>.lhd.inference.<version>.nt ./