lemma
is a command-line utility that
provides the lemmatized forms (stems) of words in natural language text.
$ echo "Don't be amazed if you see my eyes always wandering." | lemma
do Do
not n't
be be
amaze amazed
if if
you you
see see
I my
eye eyes
always always
wander wandering
For more information about natural language processing, check out Chapter 7 of the Flight School Guide to Swift Strings.
- macOS 10.12+
Install lemma
with Homebrew using the following command:
$ brew install flight-school/formulae/lemma
Text can be read from either standard input or file arguments. Tagged words are written to standard output on separate lines.
$ echo "walking" | lemma
walk walking
$ echo "gesagt" | lemma
sagen gesagt
$ lemma
This text is being typed into standard input.
this This
text text
be is
be being
type typed
into into
standard standard
input input
$ cat calvino.txt
One reads alone, even in another's presence.
$ lemma calvino.txt
one One
read reads
alone alone
even even
in in
another another
presence presence
lemma
can be chained with
Unix text processing commands,
like cut
, sort
, uniq
, comm
, grep
sed
, and awk
.
$ lemma calvino.txt | cut -f1
one
read
alone
even
in
another
presence
$ echo "She has to have had a reason" | lemma | awk '!a[$1]++'
she She
have has
to to
a a
reason reason
Lemmatized words are written to standard output on separate lines.
Each line consists of
the lemma
followed by a tab (\t
),
followed by the original word.
lemma
uses
NLTagger
when available,
falling back on
NSLinguisticTagger
for older versions of macOS.
MIT
Mattt (@mattt)