Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import confluence documents #1160

Closed
wants to merge 3 commits into from

Conversation

makelinux
Copy link
Contributor

@makelinux makelinux commented May 14, 2024

yaml example:

document:
  confluence:
    host: https://spaces.redhat.com
    spaces:
      - name: telco5gocp
        pages:
          - title: Debugging an OCP host
            version: 22

Closes #1152
Depends on: instructlab/schema#17

@makelinux makelinux force-pushed the confluence branch 4 times, most recently from 637b946 to e0e212d Compare May 14, 2024 18:00
.gitmodules Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
src/instructlab/utils.py Outdated Show resolved Hide resolved
src/instructlab/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

The lint failures are going to be fixed by #1162.

src/instructlab/config.py Outdated Show resolved Hide resolved
src/instructlab/utils.py Outdated Show resolved Hide resolved
@mergify mergify bot added the testing Relates to testing label May 15, 2024
@makelinux makelinux force-pushed the confluence branch 2 times, most recently from a1cac92 to 87ebdd8 Compare May 15, 2024 14:49
@russellb russellb added the hold In-progress PR. Tag should be removed before merge. label May 15, 2024
Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed a hold since this should depend on backend support, as well.

As for these changes, I think it would be nice to consider making this an optional dependency, since it would not be part of the most common flow.

@bjhargrave
Copy link
Contributor

There is also the issue of knowledge stability. In the current git-repo-with-knowledge, a commit SHA is provided so the knowledge can be reliably collected at SDG/training time which matches the contribution time. This is important. The example here of using confluence do not seem to have a way to pin the knowledge to a specific revision like git does with a commit SHA. So I think this is a concern for repeatability and safety (since the knowledge could be poisoned after the contribution is made but before the SDG and training are done).

@russellb
Copy link
Member

Adding a new knowledge source seems like a good candidate for a design doc in the dev-docs repo to answer some of these key design questions. The addition spans many components, so a project-wide design doc there makes sense to me.

@mergify mergify bot added the ci-failure PR has at least one CI failure label May 16, 2024
@mergify mergify bot removed the ci-failure PR has at least one CI failure label May 16, 2024
@mergify mergify bot added the ci-failure PR has at least one CI failure label May 16, 2024
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels May 16, 2024
@makelinux
Copy link
Contributor Author

I've added Confluence versioning.

Copy link
Contributor

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makelinux What would a functional test for this look like? Thanks!

@makelinux makelinux requested a review from tiran May 23, 2024 16:49
@mergify mergify bot added the ci-failure PR has at least one CI failure label May 23, 2024
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels May 23, 2024
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels May 23, 2024
@mergify mergify bot removed the ci-failure PR has at least one CI failure label May 23, 2024
Copy link
Member

@hickeyma hickeyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @makelinux for pushing this capability. As @russellb mentioned in #1160 (comment), do you mins pushing a design doc for this capability?

@makelinux
Copy link
Contributor Author

Hi, @russellb , @hickeyma, here is the requested document Confluence document source design notes

@makelinux
Copy link
Contributor Author

I placed a hold since this should depend on backend support, as well.
@russellb , Confluence pages usually are private, unlike Wikipedia. What do you mean by backend support?

@makelinux makelinux requested a review from hickeyma May 27, 2024 03:42
tests/test_utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits otherwise this feels ready to go.

src/instructlab/utils.py Outdated Show resolved Hide resolved
Depends on updated schema.

Test with

 pytest -k test_get_confluence_docs

Signed-off-by: Costa Shulyupin <[email protected]>
@russellb
Copy link
Member

I placed a hold since this should depend on backend support, as well.
@russellb , Confluence pages usually are private, unlike Wikipedia. What do you mean by backend support?

I mean if there's an expectation that we would accept any PRs into instructlab/taxonomy with pointers to confluence, then we have some backend code that does weekly model builds out of that repo that would need to understand it. If the intent is that this would only be for private taxonomy forks, then I'd agree only this PR is necessary.

I see a design doc is up now (thank you!) so let's also get that merged first -- instructlab/dev-docs#64

@bjhargrave
Copy link
Contributor

If the intent is that this would only be for private taxonomy forks, then I'd agree only this PR is necessary.

The rub is that ilab command and the public taxonomy repo share the same schema (from the schema repo). So to add this PR in ilab would require a schema update which would also be then in the public taxonomy repo which means people can submit such qna.yaml files which would not fail schema validation. Otherwise we would need to diverge on the schema such that ilab uses an superset schema than the public taxonomy repo. But even this can create disharmony since people could develop a qna.yaml which ilab diff approves of but will fail schema validation in a PR to the public taxonomy repo.

Copy link

This pull request has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions bot added the stale label Aug 27, 2024
Copy link
Contributor

mergify bot commented Aug 27, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @makelinux please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added needs-rebase This Pull Request needs to be rebased dependencies Relates to dependencies labels Aug 27, 2024
@RedHat-Israel RedHat-Israel closed this by deleting the head repository Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration container Affects containization aspects dependencies Relates to dependencies documentation Improvements or additions to documentation hold In-progress PR. Tag should be removed before merge. needs-rebase This Pull Request needs to be rebased stale testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import Confluence documents
7 participants