snippets

Python snippets for educational purposes

Statistics

charcount.py

cat text.txt | python statistics/charcount.py > charcount.out

Time it!

time cat text.txt | python statistics/charcount.py > charcount.out

charcount_fancy.py

The same as charcount.py plus it prints a log message at every 100000th line processed. If an integer argument is supplied, it only outputs characters that can be found at least N time. It only affects the output, it doesn't reduce the scripts running time.

time cat text.txt | python statistics/charcount.py 10 > charcount_fancy.out

diacritic_stats.py

Statistics on Hungarian diacritics:

number of tokens
number of types
ratio of words with at least one diacritic
lexdif: average number of words that map to the same latinized word (word with the diacritics removed)

The input is expected to be one word-per-line. Example usage:

cat words | python statistics/diacritic_stats.py

Output format:

8351 tokens, 3737 types
1.00214075462 lexdif, 0.490360435876 diacritic ratio

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data_analysis/emnlp_2019		data_analysis/emnlp_2019
deep_learning		deep_learning
feature_extraction		feature_extraction
misc		misc
statistics		statistics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snippets

Statistics

charcount.py

charcount_fancy.py

diacritic_stats.py

About

Releases

Packages

Languages

License

juditacs/snippets

Folders and files

Latest commit

History

Repository files navigation

snippets

Statistics

charcount.py

charcount_fancy.py

diacritic_stats.py

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages