Skip to content

Commit

Permalink
confluence
Browse files Browse the repository at this point in the history
Signed-off-by: Costa Shulyupin <[email protected]>
  • Loading branch information
makelinux committed Jul 18, 2024
1 parent 8fde6f7 commit d37ada8
Showing 1 changed file with 42 additions and 0 deletions.
42 changes: 42 additions & 0 deletions docs/confluence-doc-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Confluence document source

Importing information from Confluence is crucial for fine-tuning models on internal documentation.
Many companies use Confluence to store their internal documents.
Fine-tuned models can be employed within these companies and shared externally without compromising the internal documentation itself.
Therefore, importing information from Confluence benefits both companies and the broader community.

## Interfaces

qna.yaml file, `document` section:

- Confluence Host: The base URL of the Confluence instance.
- Space: The Confluence space key where the documents reside.
- Page titles: The titles of the Confluence pages to fetch.
- Version: The version of the Confluence page.

The qna.yaml file can define single host and multiple spaces and pages,
each with an optional version.

Confluence credentials in config.yaml:
- Username
- [Token](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/)

## Changes across modules

- [Configuration module](https://github.com/instructlab/instructlab/blob/main/src/instructlab/config.py)
defines the structure and validation rules for
the config.yaml file.
- [Schema module](https://github.com/instructlab/schema) defines the structure and validation rules for
the qna.yaml file.
- [sdg utilities module](https://github.com/instructlab/sdg/blob/main/src/instructlab/sdg/utils/taxonomy.py)
fetches documents
- [unit test](https://github.com/instructlab/instructlab/tree/main/tests)

## Additional External Packages

The implementation relies on the following external packages:

- [atlassian-python-api](https://atlassian-python-api.readthedocs.io/)
A Python library to interact with Atlassian products, including Confluence.
- [markdownify](https://pypi.org/project/markdownify/)
A library to convert HTML content to Markdown for processing Confluence page content.

0 comments on commit d37ada8

Please sign in to comment.