Conflex is an application that is designed to crawl any Confluence Wiki and translate the information into structured data in a database that can be easily queried.
This application is written using python v3.6. It is advisable but not necessary to run the application
using virtualenv. Please install the required libraries through pip using: pip install -r requirements.txt
To run the application simply pass the containing folder to the python interpreter and the __main__.py
will be
invoked. i.e. python /path/to/conflex
.
There are two modes of operation for the conflex application. (Other than the below modes, a change to the configuration file will cause a full recheck to occur, this can take some time).
In this mode, all pages on the wiki are rechecked to ensure that they meet the criteria set out in the config file.
To enable this mode, set the --full-sync
flag when running the main application.
(See Command Line Arguments section for more details).
In this mode, the application will only perform delta updates on pages that it currently knows about/that are stored in the local database. This means that new pages added to the wiki will not be synced. Also, pages that no longer meet the criteria set out in the config file will not be removed.
It is recommended to run the application overnight for a Full Delta Sync (15 minutes) as this mode of operation takes a much longer time than the Half Delta Sync. Running the below configuration file takes an average of 5 minutes on a Half Delta Sync, so this can be run more often.
The following are important usage notes for the conflex application:
- Pages that are to be retrieved from the wiki should be child pages of the space Home. This is due to the fact that pages are undiscoverable if they exist in a space but are not a child of the space Home.
- Linking in the configuration file must be correctly written as these are parsed in the Confluence API return
_links
section and will be undiscoverable otherwise. - Heading/Title matching only supports full text matches i.e. no regex.
The application supports the following command line arguments:
Argument | Description |
---|---|
--application-inventory | run the application inventory existing database update script. |
--bigquery | run the Google Big Query update application. |
--config | the location of the configuration file to run the application with. If none is provided, one in the folder where the script is run will be used. |
--datastore | run the Google DataStore update application. |
--full-sync | runs the application in full sync mode. i.e. pages are checked to ensure they meet the criteria in the config file. |
--half-sync | runs the application in half sync mode. i.e. no new pages will be added to the database. (This is the default application behaviour. |
--version | return the current application version. |
The application supports documentation through Sphinx. To generate the Documentation run make html
. Currently the
only way to do this is through the Makefile
.
The configuration file is written in YAML. Currently the conflex application only supports a file in the application
directory named config.yaml
.
confluence:
host: https://wiki.auckland.ac.nz
username:
password:
mysql:
host: 127.0.0.1
port: 3306
database: conflex
username:
password:
table_prefix: wiki
logging:
level: INFO
wiki:
spaces:
APPLCTN:
pages:
labels:
application:
panels:
- Overview
- Roadmap
- Infrastructure
page_properties:
- inv_item_info
- itil_stakeholders
- nfr
url:
- tinyui
- webui
pages:
labels:
support_model:
page_properties:
- support_model
- support_tiers
url:
- webui
inv_architecture_overview:
page_properties:
- inv_architecture_info
url:
- webui
REFARCH:
pages:
titles:
Reference Architecture:
pages:
titles:
Information Architecture:
pages:
titles:
Information Entities:
pages:
labels:
entity_description:
page_properties:
- entity_description
- entity_caudit
Technology Architecture:
pages:
labels:
service-area-metadata:
headings:
$ref: '#/schemas/REFARCH/service-area/headings'
pages:
labels:
domain-metadata:
headings:
$ref: '#/schemas/REFARCH/domain-brick-element/headings'
pages:
labels:
brick-metadata:
headings:
$ref: '#/schemas/REFARCH/domain-brick-element/headings'
pages:
labels:
element-metadata:
headings:
$ref: '#/schemas/REFARCH/domain-brick-element/headings'
schemas:
REFARCH:
service-area:
headings:
- Definition
- Principles
- Standards
- Guidelines
- Security
- Monitoring
- Resilience
- Recovery
- Future
domain-brick-element:
headings:
- Overview
- Principles
- Standards
- Guidelines
- Security
- Monitoring
- Resilience
- Recovery
- Future
The application supports the following levels of logging:
- CRITICAL
- ERROR
- WARNING
- INFO
- DEBUG
- NOTSET
The example configuration above contains a configuration definition for wiki
. This configures the conflex application
so that it knows where and what information the application is to retrieve. The wiki
configuration adheres to the
following design guideline.
wiki:
spaces:
[LIST OF SPACE NAMES]:
pages:
[PAGE NAVIGATION TYPES]:
[Page NAVIGATION VALUE]:
[PAGE INFORMATION RETRIEVAL TYPE]:
- [PAGE INFORMATION RETRIEVAL VALUE]
Keyword | Description |
---|---|
pages | Wiki pages. When using the pages descriptor, only the specified pages will be crawled for information. To get information from a child page, a child pages definition is required (This will also ensure that in the database, the parent /app of the child page will be set to the id of the parent page. The pages descriptor should only have navigation types as its immediate values. |
spaces | Wiki spaces. There should only ever be one spaces definition in the wiki configuration. |
Keyword | Description |
---|---|
labels | Find a wiki page which is labelled with the following label. |
titles | Find a wiki page with the following title. |
Keyword | Description |
---|---|
headings | Gets information from the heading. |
page | Gets information from the entire page. |
page_properties | Gets key-value pairs from the page properties macro. |
panels | Gets information from within a panel element in a wiki page. (For the APPLCTN space, the Overview panel will read up to the following words before stopping: The application is accessible from these locations: . |
url | Retrieves URL information about a page. Sub options are tinyui and webui . |
The configuration file supports the use of the $ref
keyword for reusable configuration elements.
To reference a definition, use the $ref keyword:
$ref: 'reference to definition'
This works in a similar way to how Swagger implements the $ref
tag. Please see their
documentation for more details. Currently conflex is only able to
reference definitions within the current configuration document.