Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

SitemapIndexReader implementation #640

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kapil-malik
Copy link

@kapil-malik kapil-malik commented Nov 16, 2023

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Implemented a loader for sitemap_index URLs. This internally uses SitemapLoader. There are multiple websites which don't have a single sitemap but rather a whole index of multiple sitemaps. This class simplifies loading documents through all of the sitemaps via a single loader call.

Few examples of websites with sitemap_index -

Fixes # (issue)

Type of Change

Please delete options that are not relevant.

  • New Loader/Tool
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new unit/integration tests
  • Added new notebook (that tests end-to-end)

Suggested Checklist:

  • I have added a library.json file if a new loader/tool was added
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

@EmanuelCampos
Copy link
Collaborator

@kapil-malik the linting and tests are broken, and have a conflict, can you solve please?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants