Skip to content

Latest commit

 

History

History
269 lines (159 loc) · 18.9 KB

README.md

File metadata and controls

269 lines (159 loc) · 18.9 KB

gdcl - GoldenDict command-line interface written in Ruby

gdcl is a command-line interface for searching GoldenDict dictionaries. A request for a command-line version is currently the third most commented issue on the GoldenDict issue tracker. This script is a very rudimentary workaround to allow searching through groups of dictionaries until an official command-line interface is available.

As an example of a similar interface, StarDict has sdcv (StarDict Console Version), but it can only handle dictionaries in the StarDict format. For users of GoldenDict who have large collections of dictionaries in other formats (e.g. DSL or BGL), converting and maintaining two parallel sets of dictionaries is not a practical solution.

This script answers a practical need: namely the ability to search through groups of dictionaries from the command-line over ssh. The script can be used to search dictionaries interactively, but also has a non-interactive mode which allows results from GoldenDict dictionaries to be piped to standard output or used as part of a toolchain.

Currently, gdcl does not require an installation of GoldenDict, as it simply searches through subdirectories of existing dictionaries in the GoldenDict folder (which can be configured) and could conceivably be used to search through any collection of dictionaries in DSL format. However, the eventual goal of the project is to read preferences from GoldenDict's config file, support the full range of formats that GoldenDict can use and, ideally, to use GoldenDict's pre-made index files for faster searching.

Installation

To install you can either download the project source and run the script directly, or use a package manager to install the appropriate files for your distro. See below for more details and also the section on setup and configuration for how to customize your installation once you've downloaded the source files.

Installation from Distro Packages

User Packaged

Usage

Summary

Interactive search:

ruby gdcl.rb

Non-interactive search:

ruby gdcl.rb [group] [keyword]

A group name can also be specified without a keyword, i.e.:

ruby gdcl.rb [group]

This can be useful because it allows you to bind an alias to invoke the program and lookup words in specific groups. For example, you could bind gf to look up words in a subfolder of French dictionaryies called fr, and gr to search in Russian dictionaries (subfolder ru):

alias gf='ruby /path/to/gdcl.rb fr'
alias gr='ruby /path/to/gdcl.rb ru'

This can be a good way of maintaining a large collection of dictionaries while still being able to search quickly and with minimal interaction.

See below for further configuration and usage details.

Setup and configuration

gdcl.rb

The main script for searching through dictionaries is called gdcl.rb.

There are a number of configuration options available in the config.yml file. By default, this file should be installed in the standard config folder under the user's home directory (i.e., in the folder ~/.config/gdcl). If gdcl can't find the file config.yml in that folder, it will look for it in $XDG_CONFIG_DIRS (i.e., /etc/xdg/gdcl), and failing that, the script folder (i.e., the same directory as the script executable). The ~/.config/gdcl folder and default config.yml file will be created if they do not already exist when you first run gdcl.

The options available in config.yml are commented and should be self-explanatory. They are listed below for reference:

  • dict_dir: Dictionary folder (The location of your GoldenDict dictionaries folder; can be any folder, but set to ~/.goldendict/dic by default)
  • group: Group name (A subfolder of the directory specified in dict_dir above, containing a group of dictionaries to be searched together; default blank -- if you specify a value here, gdcl will never ask interactively for a group name, and will use the specified group by default)
  • kword: Keyword to search for (Use this to specify a keyword in the script; if not specified here, gdcl will search for a term provided either interactively or on the command line)
  • interactive_search: Interactive search (Set to false for non-interactive search, e.g. to pipe or redirect the search results; defaults to false if a group and keyword are specified as command-line parameters)
  • header_footer: Header and footer information (Set to false to turn off header and footer information, i.e.: dictionary name and number of hits for search term)
  • pager_off: Don't prompt to open results in pager (Set to true to turn off the pager prompt for all searches)
  • case_off: Case insensitive search (Set this to true if you want all searches to ignore character case)
  • history: Log search history (Uncomment this line if you want to log a record of your searches to a text file)
  • logfile: Alternate logfile location (A directory where you want to store the history logfile (default is a file called history.txt in the gdcl config folder, i.e.: ~/.config/gdcl))
  • temp_dir: Temporary working directory (The directory where gdcl will store files)
  • search_term: Search pattern (Specify a pattern to search for; default is headwords starting with keyword, but strict matches or any other regex are also supported)
  • del_dict: Excluded dictionaries (Optionally exlude the specified dictionaries from search results)
  • markup: DSL Markup Options (Defaults to removing dsl dictionary markup in results; to display markup, comment out this line and uncomment the line markup = "")
  • markup_replace: DSL Markup Replacement String (Change this if you want to replace dsl markup with some other string)

See also the Options section for more details on how to specify some of these as command-line options.

Note: You don't need to set up or configure gdcl if you just have one or more dictionaries in a single folder that you want to search. You can specify the folder to look in with the -d option, or just navigate to the location of the dictionary folder in your terminal and execute gdcl with -d using . (a single period) to represent the current directory:

ruby /location/of/gdcl.rb -d .

When prompted to enter a group name, just use . again.

Searching

By default, invoking gdcl with the command ruby gdcl.rb will search interactively. Command prompts will ask you to specify a group of dictionaries to search in out of a list of available groups, and then a keyword to look for. Results will be displayed immediately to standard output.

In interactive mode, after the search results have finished displaying, there is an option to view the results in a paging program (by default less). This is helpful if there are many results or if results exceed the terminal buffer size.

Alternatively, you can use non-interactive mode to search and pipe results to a file or other programs. gdcl will default to interactive mode if a group and keyword are specified as command-line parameters:

ruby gdcl.rb [group] [keyword]

For example, if you want to search for the term aardvark in the en dictionary group, you can use:

ruby gdcl.rb en aardvark

As always, it is a good practice to quote or escape search strings, and this is mandatory for terms that contain e.g. spaces:

ruby gdcl.rb en "monkey wrench"

To pipe dictionary search results to a file:

ruby gdcl.rb en "monkey wrench" > output.txt

Regular expressions are supported in search terms. Let's say you are looking for the word "test" in a collection of dictionaries. By default gdcl searches for headwords in the dictionary that begin with the search string, but this might give too many results ("testament", "testimony", "testing" etc). To find only words that strictly match the word "test", you could use test$.

As another example, searching for arm.....o or arm.*o$ will both find the word "armadillo".

Options

Most default options can be configured in the user's config.yml file (see here for more details on setting up the config.yml file).

There are also a number of settings that can be specified on the fly as command-line options. Use ruby gdcl.rb -h to print a list of all available command-line options. Currently, gdcl supports the following options:

  • -c GROUP, --names [GROUP] (List all dictionaries in specified group by canonical name)
  • -C, --case-off (Enable case insensitive search)
  • -d DIRECTORY, --dict-directory DIRECTORY (Directory in which to look for dictionaries)
  • -g, --groups (Print a list of all available dictionary groups)
  • -h, --help (Print help message)
  • -H, --history (Record search term history in a log file)
  • -i FILENAMES, --ignore FILENAMES (List of dictionaries to ignore while searching)
  • -l GROUP, --list GROUP (List all dictionaries in specified group by filename)
  • -L, --logfile DIRECTORY (Directory in which to store search log)
  • -m, --markup (Don't strip DSL markup from output)
  • -n, --no-headers (Remove headers and footers from results output)
  • -p, --pager-off (Don't prompt to open results in pager)
  • -r, --restrict FILENAMES (Restrict search to FILENAMES)

Most of these can be combined, e.g.: ruby gdcl.rb -nm -d /path/to/dictionaries to search in /path/to/dictionaries and print out results with no headers or footers and without stripping DSL markup.

Some options provide information that can be supplied to other options. For example, you can use -g to get a list of available groups, and then print out a list of all dictionaries in one of those groups using the -l option. The results of -l can, in turn, be used to specify a list of dictionaries to ignore with the -i option.

An in-depth look at usage of some of these options is below.

Listing groups and dictionaries

The -g and -l options list all the groups and dictionaries within a group, respectively, that gdcl knows about. The group names are essentially the names of subfolders in your GoldenDict dictionaries folder (or whichever folder you have specified in config.yml).

The dictionary names provided by -l GROUP are in fact the raw filenames (minus *.dsl.dz extension) of each of the dictionaries in GROUP. This is useful for accessing other command-line options that take dictionary names in this format, such as restrict and ignore.

If you are more interested in the canonical name (i.e., the name specified in the first line of a DSL file) of dictionaries in a given group, you should use the -c option instead of -l.

For example, let's say you have a Swedish dictionary contained in a file called myswedishdictfile.dsl.dz, located in a subfolder sv, and the first line of the DSL file looks like this:

#NAME "Fancy Swedish Dictionary - Min extraordinärt svenska ordbok"

Calling ruby gdcl.rb -l sv will give the following output:

myswedishdictfile

Whereas calling ruby gdcl.rb -c sv will output the full name:

Fancy Swedish Dictionary - Min extraordinärt svenska ordbok

To get a list of dictionary filenames mapped to canonical dictionary names, you can use the -c and -l options together. Following the example above, you could use the following command to list filenames and canonical names of dictionaries in the sv folder:

ruby gdcl.rb -c -l sv

And the resulting output would be:

myswedishdictfile	Fancy Swedish Dictionary - Min extraordinärt svenska ordbok

(Note, if you are piping this output to another file, the two fields are separated by a tab space)

Ignoring and restricting dictionaries

You can ignore certain dictionaries in a group with -i or, conversely, restrict your search to a subset of dictionaries in a certain group with -r. For example, to ignore a dictionary called jedict.dsl.dz in the jp group, use:

ruby gdcl.rb -i jedict jp

The dictionary you specified will be excluded from your search, and results from all other dictionaries in the group will be shown instead.

If you want to only show results from jedict.dsl.dz, you can use the following command:

ruby gdcl.rb -r jedict jp

This can be really useful if you have groups with a large number of dictionaries (particularly collections of example sentences or encyclopedias), and you only want to do a quick lookup of a single term.

Both of these options can take comma separated lists of dictionaries you want to ignore or restrict your search to, for example:

ruby gdcl.rb -r oed,longman,webster en
ruby gdcl.rb -i wikipedia,encyclopaedia_britannica,good_writing_guide en

All dictionaries in the list will be included in the --ignore or --restrict parameters if they exist.

Tip: When using -r, you can supply a partial filename to search only in the dictionary (or dictionaries) that match the given string. For example, if you have a dictionary called supercooldict.dsl.dz, you could search only in that dictionary by entering:

ruby gdcl.rb -r super

On the other hand, if you have a collection of dictionaries in group es called e.g., collins_spanish-english.dsl.dz, collins_spanish-french.dsl.dz, collins_spanish-verbs.dsl.dz, collins_french-spanish.dsl.dz, etc., you could restrict your search to only those dictionaries containing collins in the title by using the command:

ruby gdcl.rb -r collins es

Logging search history

If logging is enabled (it's off by default), gdcl will save a record of all search terms to a history file, located by default in the gdcl configuration directory (~/.config/gdcl/history.txt). This can be useful for, e.g., studying or learning new vocabulary.

There a couple of ways to enable logging of search terms, and you can also specify an alternate directory to save the history file.

If you want gdcl to always record your search terms, you should enable logging in the gdcl config file. Look in config.yml for a line containing # :history: true and uncomment it to turn logging on for all searches. If you only want to gdcl to record search terms selectively, you can use the -H option on the command-line turn logging on for a specific search or set of searches (if you use -H, logging will remain in effect until you exit the program).

To specify an alternate directory, uncomment the # :logfile: line in config.yml and specify a directory of your choice. You can also specify a different directory using -L and the directory name when running gdcl. This could be useful for, e.g., recording new vocabulary from different sources in different history files.

For example, let's say you were reading a Swedish crime novel and you wanted to lookup vocabulary as you read. You could open up an instance of gdcl and send a record of your search terms to a separate file with the following command:

ruby gdcl.rb -H -L novel_vocab.txt

Later you are watching an interesting Swedish documentary series so you open up gdcl to record vocabulary in a new file:

ruby gdcl.rb -H -L documentary_vocab.txt

Finally, you have to finish your Chinese homework, and you want to look up new words as you go along:

ruby gdcl.rb -H -L homework_vocab.txt

Note that the search history file only contains a record of your search terms, not the actual search results themselves. If you want to save the full text of search results you should use non-interactive mode and pipe the results to a file, e.g.:

ruby gdcl.rb mygroup "my search term" > search_results.txt

Forvo audio pronunciations

Looking up and playing back audio pronunciations from Forvo is supported by gdcl using the -f option and mplayer. This requires registering for a Forvo API key, which is free for non-commercial educational use.

Once you have a key, you just need to copy it into your gdcl config file in your user home directory (i.e. ~/.config/gdcl/config.yml) under the section "forvo key". Uncomment the line # :forvo_key: "" and add your key between the quotation marks "". Now you can look up pronunciations by running gdcl.rb with the -f option.

The basic format for a Forvo audio lookup is:

ruby gdcl.rb -f [lang_code] [word_to_be_pronounced]

You'll need to supply the 2-letter ISO 639 language code and a word or phrase to pronounce. You can find a full list of the supported codes here.

For example, if you wanted to find the pronounciation of the word "сегодня" in Russian, you would enter:

ruby gdcl.rb -f ru сегодня

The last argument should probably be in quotes to avoid problems -- this also allows for pronunciation of phrases and other terms with spaces:

ruby gdcl.rb -f sv "Johannes Robert Rydberg"

The script will immediately begin playback of all the available pronunciations it found.

Gdcl only supports playback of pronunciation audio. For more full-featured access to the Forvo API, including listing track info and saving pronunciation audio, check out forvo-cl.

Supported formats

gdcl currently supports compressed dictionary files in ABBYY Lingvo .dsl dictionary format (i.e., files ending in the extension .dsl.dz) as well as online pronunciation audio files from Forvo.com (see the section above on using gdcl to look up pronunciations). Support for other formats and online dictionaries is planned for future releases.

To do

Features that need to be implemented:

  • Read search and dictionary preferences from GoldenDict config file
  • Search using GoldenDict's existing index files
  • Dictzip support (i.e. search dictionaries in place rather than needing to unzip them to tmp folder)
  • bgl, dict and other formats support
  • Online dictionaries support (Wikipedia, Wiktionary etc)

License

MIT -- see LICENSE file for details.