Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Adding a meltano postgres connection #1226

Merged
merged 2 commits into from
Apr 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 137 additions & 6 deletions apps/docs/docs/contribute/connect-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,15 +266,146 @@ In the future we intend to improve the experience of adding a plugin to the
pipeline, but for now these docs are consistent with the current state of the
pipeline.

## Airbyte Connectors
## Connecting external databases

---
The easiest way to connect data to OSO is to use our AirByte Connector or
Singer.io Tap integration through meltano. This section provides the details
necessary to add a connector or a tap from an existing postgres database into
our system. Other databases or datasources should be similar.

### Settings up your postgres database for connection

We will setup the postgre connection to use Change Data Capture which is
suggested for very large databases. You will need to have the following in order
to connect your postgres database to OSO for replication.

Deploy Airbyte connectors to index data from new sources.
- `wal_level` must be set to `logical`
- You need to create a username of your choosing and share the associated
credentials with a maintainer at OSO
- You need to grant `REPLICATION` privileges to a username of your choosing
- You need to create a replication slot
- You need to create a publication for OSO for the tables you wish to have replicated.

#### Setting your `wal_level`

:::warning
This section is a work in progress.
Please ensure that you understand what changing the `wal_level` will do for your
database system requirements and/or performance.
:::

- Go over creating a simple airbyte connector to a sample API
- Go over creating a simple airbyte connector to an already existing public dataset
Before you begin, it's possible your settings are already correct. To check your
`wal_level` settings, run the following query:

```SQL
SHOW wal_level
```

The output would look something like this from `psql`:

```
wal_level
-----------
logical
```

If doesn't have the word `logical` but instead some other value, you will need
to change this. Please ensure that this `wal_level` change is actually what you
want for your database. Setting this value to `logical` will likely affect
performance as it increases the disk writes by the database process. If you are
comfortable with this, then you can change the `wal_level` by executing the
following:

```SQL
ALTER SYSTEM SET wal_level = logical;
```

#### Creating a user for OSO

To create a user, choose a username and password, here we've chosen `oso_user`
and have a placeholder password `somepassword`:

```SQL
CREATE USER oso_user WITH PASSWORD 'somepassword';
```

#### Granting replication privileges

The user we just created will need replication privileges

```SQL
ALTER USER oso_user WITH REPLICATION;
```

#### Create a replication slot

Create a replication slot for the `oso_user`. Here we named it `oso_slot`, but
it can have any name.

```SQL
SELECT * FROM pg_create_logical_replication_slot('oso_slot', 'pgoutput');
```

#### Create a publication

For the final step, we will be creating the publication which will subscribe to
a specific table or tables. That table should already exist. If it does not, you
will need to create it _before_ creating the publication. Once you've ensured
that the table or tables in question have been created, run the following to
create the publication:

_This assumes that you're creating the publication for table1 and table2._

```SQL
CREATE PUBLICATION oso_publication FOR TABLE table1, table2
```

You can also create a publication for _all_ tables. To do this run the following
query:

```SQL
CREATE PUBLICATION oso_publication FOR TABLE ALL TABLES
```

For more details about this command see: https://www.postgresql.org/docs/current/sql-createpublication.html

### Adding your postgres replication data to the OSO meltano configuration

Assuming that you've created the publication you're now ready to connect your
postgres data source to OSO.

#### Add the extractor to `meltano.yml`

The `meltano.yml` YAML file details all of the required configuration for the
meltano "extractors" which are either airbyte connectors or singer.io taps.

For postgres data sources we use the postgres airbyte connector. Underneath the
`extractors:` section. Add the following as a new list item (you should choose a
name other than `tap-my-postgres-datasource`):

```yaml
extractors:
# ... other items my be above
# Choose any arbitrary name tap-# that is related to your datasource
- name: tap-my-postgres-datasource
inherit_from: tap-postgres
variant: airbyte
pip_url: git+https://github.com/MeltanoLabs/tap-airbyte-wrapper.git
config:
airbyte_config:
jdbc_url_params: "replication=postgres"
ssl_mode: # Update with your SSL configuration
mode: enable
schemas: # Update with your schemas
- public
replication_method:
plugin: pgoutput
method: CDC
publication: publication_name
replication_slot: oso_slot
initial_waiting_seconds: 5
```

#### Send the read only credentials to OSO maintainers

For now, once this is all completed it is best to open a pull request and an OSO
maintainer will reach out with a method to accept the read only credentials.
Loading