Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Show useful data about token URLs #192

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
29 changes: 28 additions & 1 deletion .github/workflows/check_links.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,40 @@ jobs:
key: cache-lychee-${{ github.sha }}
restore-keys: cache-lychee-

# Extract and check URLs directly from token files
# Extract and check URLs directly from token files (checking dumped file did not work)
- name: Check token art URLs
uses: lycheeverse/lychee-action@v2
with:
args: '${{env.lychee_args}} -- tokens.xml'
jobSummary: true

# Extract picture URLs from tokens.xml
- name: Extract URLs
id: tokens_pic_urls
uses: lycheeverse/lychee-action@v2
with:
args: '--dump --exclude www.w3.org -- tokens.xml'
ebbit1q marked this conversation as resolved.
Show resolved Hide resolved
output: lychee/out.md
jobSummary: false

# Analyse extracted links (1/2)
- name: List duplicated image links
if: steps.tokens_pic_urls.outcome == 'success'
shell: bash
# Remove empty lines | trim trailing integers (Scryfall) | sort | count and list duplicates
run: |
echo "🪞 **Duplicated Image Links**" >> $GITHUB_STEP_SUMMARY
grep . lychee/out.md | sed 's/\.jpg?.*/.jpg/' | sort | uniq -cd >> $GITHUB_STEP_SUMMARY

# Analyse extracted links (2/2)
- name: List image hosting sources
if: steps.tokens_pic_urls.outcome == 'success'
shell: bash
# Extract 3rd field (domain name) | remove empty lines | sort | count duplicates and list with numbers | sort descending
run: |
echo "📊 **Image Hosting Statistics**" >> $GITHUB_STEP_SUMMARY
awk -F/ '{print $3}' lychee/out.md | grep . | sort | uniq -c | sort -nr >> $GITHUB_STEP_SUMMARY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is grep . here? is there empty lines? why does it run sort twice?

Copy link
Member Author

@tooomm tooomm Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is grep . here? is there empty lines?

There used to be empty lines in the output when we had token or sets added without image links. This is currently not the case.
Since we do not validate for empty/missing links that could happen again.

But I can also remove it.
Am actually not sure if empty lines would even mess with the result or not.


why does it run sort twice?

You mean sort descending directly before count + list?
I did simply not think about combining that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the comment on it could be more explicit, it takes the third column from the file, removes empty lines, sorts and then counts the duplicates, listing them with their count before the url, then sorts again but based on the number that represents the amount of duplicates descending.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the wording of the comment.

The sorting needs to be split because first it's sorting to bring same domain names next to reach other for counting to work. Only after combining the entries and adding quantity numbers it can be sorted descending.


# Always save cache
- name: Save lychee cache
uses: actions/cache/save@v4
Expand Down