Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive - populate with archived shows #154

Open
9 tasks
gerbrent opened this issue Jul 22, 2022 · 15 comments
Open
9 tasks

Archive - populate with archived shows #154

gerbrent opened this issue Jul 22, 2022 · 15 comments
Assignees
Labels
meta-issue Tracks many related sub-tasks/issues (with check list in the description)
Milestone

Comments

@gerbrent
Copy link
Collaborator

gerbrent commented Jul 22, 2022

This is a big one, but essential for our 1.0 Milestone

For all shows listed on jupiterbroadcasting.com's Archive sub-menu:

  • scrape show metadata: title, description, hosts, etc.
  • create sub-menu items for each Show
  • create Show page w all episodes listed (w pagination)
  • implement Show host/guest pages - but NOT NECESSARILY also listed in all JB hosts/guests listings
  • ...?

Potential Problems

Some shows....

  • ...listed in the archive do not feature Show artwork, only per-episode artwork
    • design Show artwork - can be fairly simple & derivative of current episode art
  • ...have a Fireside listing, some do not.
  • ...are no longer hosted on the JB Network, do have archive items, but are still actively produced shows with the same Fireside site
  • ...feature outdated artwork branding
    • i.e. Choose Linux, User Error feature Linux Academy branding
    • modify branded artwork to modern JB design guidelines
  • ...do not follow the current JB Episode naming conventions

Ideas Worth Discussing

Filesystem

  • Preferred files structure could be /content/archive/show/beer-is-tasty to keep the /content/show for active shows, i.e. cleaner for everyday use

Sub-menu?

I feel the sub-menu approach is a fine one, with all archived shows being presented under the "Archive" menu item.
Open to discussion.

Media Files?

Hosts/Guests

implement Show host/guest pages - but NOT NECESSARILY also listed in all JB hosts/guests listings

We do not necessarily want all previous hosts and guests to be listed on the main Hosts and Guests pages - possibly only listing active show Hosts/Guests there. Worth discussion and decision/input further from JB.

Show Artwork

  • some currently feature this nice touch, for inspiration:
    image

..... more?

@gerbrent gerbrent added the meta-issue Tracks many related sub-tasks/issues (with check list in the description) label Jul 22, 2022
@gerbrent gerbrent added this to the JB.com 1.0 milestone Jul 22, 2022
@kbondarev
Copy link
Collaborator

Oooo that’s a big one 😨

I have a lot of thoughts and questions on this as I’ve been dreading it in the back of my mind. Mostly it’s all related to scraping.

No particular order to these items, numbers are for ease of referencing. Some of these are for my own reference for the future 🧠

  • 1) 📝 Can someone compile a list of ALL the shows?
    • 🔢 Possibly with number of episodes each show had.
    • 📆 And even better estimated dates between which the shows was active (from my initial look at archive.org it will probably will be good to have that info)
  • 2) 🎶 Media files (audio/video) - which ones JB has possession of? Which ones are lost and MUST be searched through archive.org?
    • is there a possibility that we could find show-notes of an episode 📄 , but no actually episode audio file?
    • Should YouTube be a backup target for scraping? How far does the catalogue goes there?
  • 3) About a week or two ago (seems like ages) I found some collections on archive.org made by “JupiterBroadcasting” 🚀 user and some by “ChrisLAS” 🔥 user.
    • I think it’s mostly was LAS 🐧
    • it would be good to make a list of any known collections
  • 4) 📣 Anyone who has any experience with archive.org please raise your hand now! 🤚

@gerbrent
Copy link
Collaborator Author

Here's another helpful way Wes and I have been exploring the scope of this item:

  • The archive could potentially be added after the 1.0 Milestone, but ultimate must be added within a reasonable time-frame.
  • At this point, the scraping can potentially be simplified to obtaining a "map" of the known items and their relationships/pointers to external sources. i.e. We only need to capture that which will be lost by the change to Scale Engine, but not necessarily to other sources like archive.org.

Hopefully that relieves a little tension and invokes some ideas..

@gerbrent
Copy link
Collaborator Author

essential for our 1.0 Milestone

Realizing I said the following on the very first line of this ticket, but than am now challenging that concept in the above comment. Happy to keep challenging it!

@reesericci
Copy link
Collaborator

Should BSD Now still be listed? I don't see why it should if it still runs without the JB Banner.

@gerbrent
Copy link
Collaborator Author

gerbrent commented Aug 5, 2022

@StefanS-O proposed that we create a separate repo and hugo instance for the JB Archive with a few distinct advantages:

  • build times for the main JB site would be independent of the archived shows, i.e. likely 2x the speed of consolidated builds
    the Archive repo can be experimented on without impacting the main JB repo
  • Hosts and Guest entries/profiles would be duplicated, but can be an advantage since archived Hosts/Guests are then not present in the main JB Site listings.
  • a separate scraper strategy can be applied if needed
  • the archive would indeed be the source-of-truth for archived items. JB repo would have Fireside as source-of-truth at this time.
  • When a current show is "archived", it would simply be copy-pasted from one repo to the other.
  • The new Archive repo would be cloned from the current JB repo, and benefit from all it's implementations, customizations, and changes. repo would be archive.jupiterbroadcasting.com here on github
  • the media player will need to be something other than the Podverse embed, since the archive shows are not in the Indexes
  • hosted at archive.jupiterbroadcasting.com

work is currently being done on this strategy, just figured I would keep you all updated on the direction of the Archive.

@gerbrent
Copy link
Collaborator Author

gerbrent commented Aug 5, 2022

Should BSD Now still be listed? I don't see why it should if it still runs without the JB Banner.

BSD Now is an important part of the JB Network's history, and we are happy to list it in the JB Archives. (hoping that answers your Q.)

@kbondarev kbondarev self-assigned this Aug 12, 2022
@gerbrent gerbrent pinned this issue Aug 15, 2022
@gerbrent
Copy link
Collaborator Author

gerbrent commented Aug 15, 2022

@StefanS-O proposed that we create a separate repo and hugo instance for the JB Archive with a few distinct considerations:

  • build times for the main JB site would be independent of the archived shows, i.e. likely 2x the speed of consolidated builds
  • the Archive repo can be experimented on without impacting the main JB repo
  • Hosts and Guest entries/profiles would be duplicated, but can be an advantage since archived Hosts/Guests are then not present in the main JB Site listings.
  • a separate scraper strategy can be applied to the archive (esp since it's a one-time-only operation)
  • the archive would indeed be the source-of-truth for archived items. JB repo would have Fireside as source-of-truth at this time.
  • When a current show is "archived", it would simply be copy-pasted from one repo to the other.
  • The new Archive repo would be cloned from the current JB repo, and benefit from all it's implementations, customizations, and changes.
  • the media player will need to be something other than the Podverse embed, since the archive shows are not in the Podverse
    Indexes

all the above open to feedback!

@reclaimingmytime
Copy link
Contributor

I think we should also consider the use case of moving shows out of the archive. Coder Radio has been "archived" for a few months (September 2019 - August 2020) and is now active again. Maybe the steps above are done in reverse?

@gerbrent
Copy link
Collaborator Author

gerbrent commented Aug 15, 2022

I'm happy to challenge the above assumption about needing a distinct Archive repo (w the help of in-person discussions w @ironicbadger ):

  • build times for the main JB site would be independent of the archived shows, i.e. likely 2x the speed of consolidated builds

At the current build times of 43 seconds, I'm not sure this is much of a worry...

  • Hosts and Guest entries/profiles would be duplicated, but can be an advantage since archived Hosts/Guests are then not present in the main JB Site listings.

This is a big disadvantage, as duplication is generally problematic, difficult to maintain. An option would be to have a "retired" flag in profiles (this may be already implemented, I recall discussing it w @kbondarev but my searches have failed.. #125 related) to keep current and past hosts distinct.

  • a separate scraper strategy can be applied to the archive (esp since it's a one-time-only operation)

This is essential. The current show-scraper is used for current and future shows hosted on Fireside. The archive-scraper would be essentially for one-time-use to pull archived data from the current WP site. @StefanS-O already has a working model (to be shared) and @kbondarev has been working on/banging his head on a wall with this challenge too ; )

  • When a current show is "archived", it would simply be copy-pasted from one repo to the other.

a much better strategy might be a simple archived label, or current or whatever suits. If current=true, show in main menu and main shows page. If archived / current=false, show in archive section.

  • the media player will need to be something other than the Podverse embed, since the archive shows are not in the Podverse
    Indexes

Again, could have a label/variable in the Show metadata could define which media player should be used for the episodes of this show.

lets discuss!

@kbondarev
Copy link
Collaborator

I recall discussing it w @kbondarev but my searches have failed.. #125 related) to keep current and past hosts distinct.

@gerbrent I think you meant this one #223

@StefanS-O
Copy link
Collaborator

StefanS-O commented Aug 17, 2022

So i created a new repo here:

https://github.com/JupiterBroadcasting/archive.jupiterbroadcasting.com

It works the same way as the main site. I also extracted the JB Hugo theme to it's own repository here:

https://github.com/JupiterBroadcasting/jb-hugo-theme

I added the Github Actions and deployment (to my server at the moment):

https://archive.jupiterbroadcasting.net/

There is also a child theme in the archive that can be used to override or add funtionality of the jb theme. It is part of the archive repo, because it is specific to that:

https://github.com/JupiterBroadcasting/archive.jupiterbroadcasting.com/tree/main/themes/archive/layouts/partials

What needs to be done:

  1. delete theme from the main site and replace with git submodule of jb-hugo-theme. Community needs to be informed that this will change, because it will mess with their current local setup
  2. readd stuff like audio / video player, because for archived shows we don't have the live player
  3. scrape the old site

@StefanS-O StefanS-O modified the milestones: JB.com 1.0, JB.com 2.0 Aug 21, 2022
@gerbrent
Copy link
Collaborator Author

gerbrent commented Sep 7, 2022

Note Archive menu item temporarily disabled: #399

@StefanS-O
Copy link
Collaborator

Not sure if i mentioned it before somewhere, i did a static export of the old site, before we switched to Hugo:

https://original.jupiterbroadcasting.net/

@StefanS-O StefanS-O self-assigned this Mar 13, 2023
@reclaimingmytime
Copy link
Contributor

reclaimingmytime commented Jun 21, 2023

I noticed 20 episodes of Jupiter EXTRAS have already been removed, both on extras.show (currently fireside.fm) and https://www.jupiterbroadcasting.com/show/jupiter-extras/. The gap isn't as noticeable on Fireside, because it seems like the episode numbers have been removed from the list.

The episodes are:

2 15 16 20 23 25 30 37 39 40 46 47 49 52 53 54 55 58 60 72

Based on the pattern I see in these episodes on YouTube, I'm sure there are internal reasons for removing them that shouldn't be discussed here. I just want to let the next archiver know in case they're looking for these missing episodes.

Also related to issue #22.

@gerbrent
Copy link
Collaborator Author

Great observations, and yep - intentional and all is well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-issue Tracks many related sub-tasks/issues (with check list in the description)
Projects
None yet
Development

No branches or pull requests

5 participants