The Unix tool pwgen can create pronounceable passwords using the -A0
flags. These make it easy to type and remember high entropy passwords which are shorter than a passphrase. However, they are not of the highest quality.
- estimating entropy based on compression shows around 3.4 bits per character
- there is no entropy estimate based on generation method
- many un-phonetic segments like x<vowel> and large vowel clusters
- it only supports English phonetics
This tool instead uses an extremely simple bigram model to generate passwords.
- being Markov chains, we can give a standard entropy estimate using a state ensemble
- supports any character set you have a word list for, though pronounceability will vary by language
Also, given some experiments with a 10,000k word dictionary
- fewer un-phonetic segments
- higher entropy of around 3.9 bits per character
Check out this repo:
git clone https://github.com/off-by-one/pwgram.git
cd pwgram
Make sure rust is installed.
To run without installing, use cargo run --bin --
. The binary arguments go after the double dash.
cargo run --bin pwgram -- models/english-10k.pwgram
To make a new bigram model from a wordlist instead of using mine, find a wordlist and use pwgram-train.
cargo run --bin pwgram-train -- [wordlist-file-path] > [model-filename].pwgram
cargo run --bin pwgram -- [model-filename].pwgram
For other options explore the -h
option for each binary - you can adjust the entropy during password generation, and set custom delimeters or multigraphs during training.
To install pwgram, build release versions using
cargo build -r
And place target/release/[pwgram,pwgram-train]
somewhere in your $PATH
. Use them as above, but with cargo run --bin <binary-name> --
replaced with just the binary name.
This repo has a few 10k English word lists. Word list should be formatted as one word per line. I prefer all lowercase characters, but capitalized characters will be counted as distinct tokens and used appropraitely (e.g. if only first chars are capitalized, the start of each ‘word’ boundary will be capitalized, which probably means the first letter of your password will sometimes be capitalized).
If you end up with a lot of truncated multigraphs generated by your model (e.g. ch, then something that makes sense after h but not ch), retrain it with that multigraph using the --multigraphs
flag, as a comma-separated list. A decent list for English is -m sh,ph,th,ch
.
Set your entropy at the desired level. NIST sets the minimum at around 40. There’s no strong reason to go above 100 - that is sufficient for a long-term encryption key, even with a somewhat weak KDF.
Convenient install locations for binaries are $HOME/bin/
or $HOME/.bin/
. I keep the bigram models in $HOME/.config/pwgram/
, and have aliases to generate passwords using them.
Finally, don’t overdo it. Please use a password manager. The goal here is to make things easier, so this should either be used to make passwords that you have to type on machines where the password manager is not installed or not available, or to make the password to your machine or password manager. There’s usually no good reason to have more than one or two memorized at any time.
The entropy estimator effectively counts the entropy of N tokens, given all possible ways this model could generate N tokens. Since some tokens are multiple characters, this is an underestimate, since the true entropy would be how many ways there are to generate M characters, where M >= N. However, it has experimentally been pretty close to the entropy estimated via compession ratio, so it’s probably good enough.
Multigraphs are an annoying sticking point. A trigram (or more generally n gram) model would do better, but they take more training data to achieve the same entropy per character. The current plan in the next major version is to provide a training helper - it will help the user identify multigraph tokens if they choose to pretrain a model.
Also, I am still learning rust, and made a lot of iffy choices partly to learn how the feature works. The bigram and tokenize internals will probably change a lot if I ever get around to it.