Logs downloader

Tested with Python 3.8+

Logs downloader

Tools to download phoenix replays from tenhou.net.

For example, these logs can be useful for machine learning.

This repo contains two main scripts:

Download and store log IDs. It can both obtain game IDs from year archive (e.g., https://tenhou.net/sc/raw/scraw2009.zip) or from latest phoenix games page (https://tenhou.net/sc/raw/list.cgi).
Download logs content for already collected log IDs.

Installation

Just install requirements with command pip install -r requirements.txt

Download log IDs

The first step is to add list of log IDs to the DB.

Historical logs (per year)

For example, we want to download game IDs for the 2009 year (keep in mind that phoenix games started to appear only from the 2009 year).

Download https://tenhou.net/sc/raw/scraw2009.zip manually and put it to the temp/scraw2009.zip.

Input command:

python main.py -a id -y 2009 --from_archive

Output:

Preparing the list of games...
Found 80156 games
Temp folder was removed
Inserting new IDs to the database...
Done

Latest log IDs

To download games from 1 January (current year) until (current day - 7 days) specify -s flag:

python main.py -a id -s -p db/2021.db

To download just log IDs from the latest 7 days:

python main.py -a id -p db/2021.db

You can add this command to the cron (for example to run each one hour) and it will add new log IDs to the DB.

Download yakuman log IDs

You can download hanchans where yakuman was collected for specific year and month with this command:

python download_yakuman_game_ids.py -y 2006 -m 10

It will be saved to the db/yakuman/2006/10.db file.

After that you can download content for these IDs with this command: python main.py -a content -p db/yakuman/2006/10.db -l 100000 -t 10 --strip

Download log content

To download log content for already downloaded IDs use this command:

python main.py -a content -y 2009 -l 50 -t 3 --strip

Where is -l is how many items to download and -t is the number of threads to use.

Tenhou allows to use only one thread to download logs: https://x.com/tsuno_s/status/1804487739657580636

Validate that downloaded logs can be parsed

You can validate that all downloaded logs can be parsed with this command:

python validate.py -y 2009

It contains example of parsing log content on separate tags as well.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
debug.py		debug.py
download_game_ids.py		download_game_ids.py
download_logs_content.py		download_logs_content.py
download_yakuman_game_ids.py		download_yakuman_game_ids.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logs downloader

Installation

Download log IDs

Historical logs (per year)

Latest log IDs

Download yakuman log IDs

Download log content

Validate that downloaded logs can be parsed

About

Releases

Packages

Languages

License

MahjongRepository/phoenix-logs

Folders and files

Latest commit

History

Repository files navigation

Logs downloader

Installation

Download log IDs

Historical logs (per year)

Latest log IDs

Download yakuman log IDs

Download log content

Validate that downloaded logs can be parsed

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages