pwgram

Why

The Unix tool pwgen can create pronounceable passwords using the -A0 flags. These make it easy to type and remember high entropy passwords which are shorter than a passphrase. However, they are not of the highest quality.

estimating entropy based on compression shows around 3.4 bits per character
there is no entropy estimate based on generation method
many un-phonetic segments like x<vowel> and large vowel clusters
it only supports English phonetics

This tool instead uses an extremely simple bigram model to generate passwords.

being Markov chains, we can give a standard entropy estimate using a state ensemble
supports any character set you have a word list for, though pronounceability will vary by language

Also, given some experiments with a 10,000k word dictionary

fewer un-phonetic segments
higher entropy of around 3.9 bits per character

How

Check out this repo:

git clone https://github.com/off-by-one/pwgram.git
cd pwgram

Make sure rust is installed.

To run without installing, use cargo run --bin --. The binary arguments go after the double dash.

cargo run --bin pwgram -- models/english-10k.pwgram

To make a new bigram model from a wordlist instead of using mine, find a wordlist and use pwgram-train.

cargo run --bin pwgram-train -- [wordlist-file-path] > [model-filename].pwgram
cargo run --bin pwgram -- [model-filename].pwgram

For other options explore the -h option for each binary - you can adjust the entropy during password generation, and set custom delimeters or multigraphs during training.

To install pwgram, build release versions using

cargo build -r

And place target/release/[pwgram,pwgram-train] somewhere in your $PATH. Use them as above, but with cargo run --bin <binary-name> -- replaced with just the binary name.

Tips

This repo has a few 10k English word lists. Word list should be formatted as one word per line. I prefer all lowercase characters, but capitalized characters will be counted as distinct tokens and used appropraitely (e.g. if only first chars are capitalized, the start of each ‘word’ boundary will be capitalized, which probably means the first letter of your password will sometimes be capitalized).

If you end up with a lot of truncated multigraphs generated by your model (e.g. ch, then something that makes sense after h but not ch), retrain it with that multigraph using the --multigraphs flag, as a comma-separated list. A decent list for English is -m sh,ph,th,ch.

Set your entropy at the desired level. NIST sets the minimum at around 40. There’s no strong reason to go above 100 - that is sufficient for a long-term encryption key, even with a somewhat weak KDF.

Convenient install locations for binaries are $HOME/bin/ or $HOME/.bin/. I keep the bigram models in $HOME/.config/pwgram/, and have aliases to generate passwords using them.

Finally, don’t overdo it. Please use a password manager. The goal here is to make things easier, so this should either be used to make passwords that you have to type on machines where the password manager is not installed or not available, or to make the password to your machine or password manager. There’s usually no good reason to have more than one or two memorized at any time.

Limitations

The entropy estimator effectively counts the entropy of N tokens, given all possible ways this model could generate N tokens. Since some tokens are multiple characters, this is an underestimate, since the true entropy would be how many ways there are to generate M characters, where M >= N. However, it has experimentally been pretty close to the entropy estimated via compession ratio, so it’s probably good enough.

Multigraphs are an annoying sticking point. A trigram (or more generally n gram) model would do better, but they take more training data to achieve the same entropy per character. The current plan in the next major version is to provide a training helper - it will help the user identify multigraph tokens if they choose to pretrain a model.

Also, I am still learning rust, and made a lot of iffy choices partly to learn how the feature works. The bigram and tokenize internals will probably change a lot if I ever get around to it.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
models		models
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
readme.org		readme.org
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pwgram

Why

How

Tips

Limitations

About

Releases

Packages

Languages

License

off-by-one/pwgram

Folders and files

Latest commit

History

Repository files navigation

pwgram

Why

How

Tips

Limitations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages