Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk space usage of pipelines #157

Open
tmcgilchrist opened this issue Oct 18, 2023 · 1 comment
Open

Disk space usage of pipelines #157

tmcgilchrist opened this issue Oct 18, 2023 · 1 comment

Comments

@tmcgilchrist
Copy link
Member

ocaml-docs-ci uses a concept of Epochs to track sets of package documentation that interlink and share common dependencies. Each Epoch is self-contained and can grow to 100G or more for production.

When they are no longer used Epochs can be removed and the disk space reclaimed. Outside of epochs two directories compile and prep also take up a lot of disk space. But there is no clean way of reclaiming the un-used diskspace from them. For example on live docs.ci.ocaml.org they look like:

$ ls -l
total 940
drwxr-xr-x 4 localadmin localadmin   4096 Oct 12 17:50 compile
-rw-r--r-- 1 root       root       459760 Jan  3  2023 content.current
-rw-r--r-- 1 root       root       450418 Jan  3  2023 content.live
drwxr-xr-x 3 localadmin localadmin   4096 Aug 11 10:58 epoch-51470fc39ff110dfccafc8f0f1d882ec
drwxr-xr-x 3 localadmin localadmin   4096 Oct 12 17:50 epoch-5daeecab2ad7a2d07a12742d4cc0ab6f
drwxr-xr-x 3 localadmin localadmin   4096 Aug 11 12:23 epoch-804419edbd92ec13bc06b69d15132959
drwxr-xr-x 3 localadmin localadmin   4096 Oct 12 17:51 epoch-ae8bf595b8594945ee40c58377e03730
drwxr-xr-x 3 localadmin localadmin   4096 Jul 20 09:44 epoch-c96e56dbd72372ea5f56e27e667243ea
lrwxrwxrwx 1 root       root           44 Aug 16 05:48 html-current -> /data/epoch-ae8bf595b8594945ee40c58377e03730
-rw-r--r-- 1 root       root         1365 Aug 16 05:48 html-current.log
lrwxrwxrwx 1 root       root           44 Sep  7 13:56 html-live -> /data/epoch-ae8bf595b8594945ee40c58377e03730
-rw-r--r-- 1 root       root         1028 Sep  7 13:56 html-live.log
drwxr-xr-x 2 root       root         4096 Aug 25  2022 linked
lrwxrwxrwx 1 root       root           44 Aug 16 05:48 linked-current -> /data/epoch-5daeecab2ad7a2d07a12742d4cc0ab6f
-rw-r--r-- 1 root       root         1690 Aug 16 05:48 linked-current.log
lrwxrwxrwx 1 root       root           44 Sep  7 13:56 linked-live -> /data/epoch-5daeecab2ad7a2d07a12742d4cc0ab6f
-rw-r--r-- 1 root       root          585 Sep  7 13:56 linked-live.log
lrwxrwxrwx 1 root       root            9 Jul 21  2021 live -> html-live
drwxr-xr-x 3 localadmin localadmin   4096 Oct 12 17:45 prep

$ du -skhc *
159G	compile
452K	content.current
440K	content.live
94G	epoch-51470fc39ff110dfccafc8f0f1d882ec
28G	epoch-5daeecab2ad7a2d07a12742d4cc0ab6f
59G	epoch-804419edbd92ec13bc06b69d15132959
47G	epoch-ae8bf595b8594945ee40c58377e03730
103G	epoch-c96e56dbd72372ea5f56e27e667243ea
0	html-current
4.0K	html-current.log
0	html-live
4.0K	html-live.log
4.0K	linked
0	linked-current
4.0K	linked-current.log
0	linked-live
4.0K	linked-live.log
0	live
551G	prep
1.1T	total

Both directories appear to contain universes but they aren't linked to a specific Epoch and can't be easily pruned.
Further investigation is required to work out how we can reclaim disk space or organise these directories into epochs so they can be reclaimed.

@tmcgilchrist
Copy link
Member Author

Initial tooling for pruning prep and compile directories is available with #161.
Running on staging freed up 83Gb worth of orphan universes.

# du -skhc /data/*
32G	/data/compile
27G	/data/epoch-097e46a4d589b9e34ed2903beecd1a04
16G	/data/epoch-3820829bb005c559218fffb16ee32f3b
17G	/data/epoch-410108220dc0168ea4d9bd697dfa8e34
56K	/data/epoch-7b9b14524382caea73504bb5a36d8879
40K	/data/epoch-8550e0f943a1074985306ec2c9d21822
25G	/data/epoch-9baa5939aca1673d141e85b6ecd1e770
0	/data/html-current
4.0K	/data/html-current.log
0	/data/html-live
4.0K	/data/html-live.log
4.0K	/data/linked
0	/data/linked-current
4.0K	/data/linked-current.log
0	/data/linked-live
4.0K	/data/linked-live.log
99G	/data/prep
212G	total
# /usr/local/bin/epoch --base-dir /data
Files to be deleted in prep/universes
000266a0eb75736fb01067c065abfdcf
00065d52d8d3a4ed06de725186e78055
0008e7ff536074039c951738c9386eff
001351655c7802dc23d2b7508c32f206
0014e3e571ed5319e5db4b6c3dc31c48
001ff6efc0fbeb22ffde83cc265ceb41
00221d8f37762a32b03e84334594560a
0024ed8660c331065b18a08bbe205a2c
00259e1f647baf2c1493e52d0d9ff07d
0035a1d03fc9aaa713d6aac6d1fe7fe8
... plus 20959 more

Deleting 20969 files in /data/prep/universes
✔️ │███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████│ 20969/20969
Files to be deleted in compile/u
000266a0eb75736fb01067c065abfdcf
0014e3e571ed5319e5db4b6c3dc31c48
001ff6efc0fbeb22ffde83cc265ceb41
00259e1f647baf2c1493e52d0d9ff07d
003cafc54a9826c27f9a6a44bbbe45d2
004e68d419cbbf2cf6e49edbee60da35
0062d3b633c68a7b15fd49012400b3e5
006b995bf91c4616f744e3f862ac63cd
007077bb760550510c5a3a9cf34bbefe
0078206d4374af0bf15247fd078daced
... plus 6323 more

Deleting 6333 files in /data/compile/u
✔️ │█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████│ 6333/6333

After running epoch prune

# du -skhc /data/*
23G	/data/compile
27G	/data/epoch-097e46a4d589b9e34ed2903beecd1a04
16G	/data/epoch-3820829bb005c559218fffb16ee32f3b
17G	/data/epoch-410108220dc0168ea4d9bd697dfa8e34
56K	/data/epoch-7b9b14524382caea73504bb5a36d8879
40K	/data/epoch-8550e0f943a1074985306ec2c9d21822
25G	/data/epoch-9baa5939aca1673d141e85b6ecd1e770
0	/data/html-current
4.0K	/data/html-current.log
0	/data/html-live
4.0K	/data/html-live.log
4.0K	/data/linked
0	/data/linked-current
4.0K	/data/linked-current.log
0	/data/linked-live
4.0K	/data/linked-live.log
26G	/data/prep
129G	total

The next step is to provide epoch validate epoch-c96e56dbd72372ea5f56e27e667243ea command that checks an epoch is internally consistent and does not have missing universes. Epoch pruning works on the assumption that no active jobs are running and the symlinks/universes are consistent on disk. This doesn't always hold if jobs are running and making changes to the disk layout.

After that epoch cli tool needs to have a way to diff two epochs of the same type.
For example:

# epoch diff /data/epoch-9baa5939aca1673d141e85b6ecd1e770 /data/epoch-8550e0f943a1074985306ec2c9d21822

Shows a rich diff format of which package versions are present or absent between the two epochs. This is particularly useful for the html epochs that are used by ocaml.org.

# epoch diff /data/epoch-9baa5939aca1673d141e85b6ecd1e770 /data/epoch-8550e0f943a1074985306ec2c9d21822 --summary

Provides a quick summary of how many package versions are added, missing or new between the two epochs

# epoch diff /data/epoch-9baa5939aca1673d141e85b6ecd1e770 /data/epoch-8550e0f943a1074985306ec2c9d21822 --package lwt

Provides a diff for a single package which versions are added, missing or new between the two epochs.
New packages are ones that were not present in opam-repository when the first epoch was created.
Added packages are ones that existed and now have documentation built.
Missing packages are ones that did exist in opam-repository but have no successful build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant