Skip to content

Commit

Permalink
Fix: skip tests for sources without test accounts (#379)
Browse files Browse the repository at this point in the history
* skip facebook tests

* skip matomo tests

* skip personio tests

* replace postgres to duckdb

* return kafka, skip strapi
  • Loading branch information
AstrakhantsevaAA authored Mar 5, 2024
1 parent 00b612c commit e7c3226
Show file tree
Hide file tree
Showing 10 changed files with 70 additions and 32 deletions.
27 changes: 16 additions & 11 deletions sources/facebook_ads/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Facebook Ads

> **Warning!**
>
> This source is a Community source and was tested only once. Currently, we **don't test** it on a regular basis.
> If you have any problem with this source, ask for help in our [Slack Community](https://dlthub.com/community).
This Facebook dlt verified source and pipeline example loads data to a preferred destination using the Facebook Marketing API. It supports loading data from multiple endpoints, providing flexibility in the data you can retrieve. The following endpoints are available for loading data with this verified source:
| Endpoint | Description |
| --- | --- |
Expand All @@ -24,44 +29,44 @@ To read about grabbing the Facebook Ads credentials and configuring the verified

1. Open `.dlt/secrets.toml`.
2. Enter the `access_token` :

```toml
# put your secret values and credentials here. do not share this file and do not push it to github
[sources.facebook_ads]
access_token="set me up!"
```

3. Enter credentials for your chosen destination as per the [docs](https://dlthub.com/docs/dlt-ecosystem/destinations/).
4. Open `.dlt/config.toml`.
```toml
[sources.facebook_ads]
account_id = "1430280281077689"
```

5. Replace the value of the account id.

## Run the pipeline example

1. Install the necessary dependencies by running the following command:

```bash
pip install -r requirements.txt
```

2. Now the pipeline can be run by using the command:

```bash
python3 facebook_ads_pipeline.py
python facebook_ads_pipeline.py
```

3. To make sure that everything is loaded as expected, use the command:

```bash
dlt pipeline <pipeline_name> show
```

For example, the pipeline_name for the above pipeline example is `facebook_ads`, you may also use any custom name instead.



💡 To explore additional customizations for this pipeline, we recommend referring to the official `dlt` Facebook Ads documentation. It provides comprehensive information and guidance on how to further customize and tailor the pipeline to suit your specific needs. You can find the Facebook Ads verified source documentation in [Setup Guide: Facebook Ads](https://dlthub.com/docs/dlt-ecosystem/verified-sources/facebook_ads).
36 changes: 21 additions & 15 deletions sources/matomo/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Matomo

> **Warning!**
>
> This source is a Community source and was tested only once. Currently, we **don't test** it on a regular basis.
> If you have any problem with this source, ask for help in our [Slack Community](https://dlthub.com/community).

Matomo is a free and open source web analytics platform that allows website owners and businesses to gain detailed insights into the performance of their websites and applications. With this verified source you can easily extract data from Matomo and seamlessly load it into your preferred destination. This verified source supports the following endpoints:

| Endpoint | Description |
Expand All @@ -12,10 +18,10 @@ Matomo is a free and open source web analytics platform that allows website owne
## Initialize the pipeline with Matomo verified source
To get started with your data pipeline, follow these steps:
```bash
dlt init matomo bigquery
dlt init matomo duckdb
```

Here, we chose BigQuery as the destination. Alternatively, you can also choose redshift, duckdb, or any of the other [destinations.](https://dlthub.com/docs/dlt-ecosystem/destinations/)
Here, we chose DuckDB as the destination. Alternatively, you can also choose redshift, bigquery, or any of the other [destinations.](https://dlthub.com/docs/dlt-ecosystem/destinations/)

## Grab Matomo credentials

Expand All @@ -25,51 +31,51 @@ To obtain the Matomo credentials, please refer to the [full documentation here.]

1. Open `.dlt/secrets.toml`.
2. Enter the API key:

```toml
# put your secret values and credentials here. do not share this file and do not push it to github
[sources.matomo]
api_token= "access_token" # please set me up!
```

3. Enter credentials for your chosen destination as per the [docs](https://dlthub.com/docs/dlt-ecosystem/destinations/).
4. Inside the **`.dlt`** folder, you'll find a file called **`config.toml`**, where you can securely store your pipeline configuration details.

Here's what the config.toml looks like:

```toml
[sources.matomo]
url = "Please set me up !" # please set me up!
queries = ["a", "b", "c"] # please set me up!
site_id = 0 # please set me up!
live_events_site_id = 0 # please set me up!
```

5. Replace the value of `url` and `site_id` with the one that you copied above. This will ensure that your data pipeline can access the required Matomo resources.
6. In order to track live events for a website, the `live_event_site_id` parameter must be set to the same value as the `site_id` parameter for that website.

## Run the pipeline

1. Install the necessary dependencies by running the following command:

```bash
pip install -r requirements.txt
```

2. Now the pipeline can be run by using the command:

```bash
python3 matomo_pipeline.py
python matomo_pipeline.py
```

3. To make sure that everything is loaded as expected, use the command:

```bash
dlt pipeline <pipeline_name> show
```

For example, the pipeline_name for the above pipeline is `matomo`, you may also use any custom name instead.



💡 To explore additional customizations for this pipeline, we recommend referring to the official DLT Matomo documentation. It provides comprehensive information and guidance on how to further customize and tailor the pipeline to suit your specific needs. You can find the DLT Matomo documentation in [Setup Guide: Matomo.](https://dlthub.com/docs/dlt-ecosystem/verified-sources/matomo)
5 changes: 5 additions & 0 deletions sources/personio/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Personio

> **Warning!**
>
> This source is a Community source and was tested only once. Currently, we **don't test** it on a regular basis.
> If you have any problem with this source, ask for help in our [Slack Community](https://dlthub.com/community).
[Personio](https://personio.de/) is a human resources management software that helps businesses
streamline HR processes, including recruitment, employee data management, and payroll, in one
platform.
Expand Down
1 change: 1 addition & 0 deletions sources/sql_database/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
sqlalchemy>=1.4
dlt>=0.3.5
pymysql>=1.0.0
18 changes: 12 additions & 6 deletions sources/strapi/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Strapi

> **Warning!**
>
> This source is a Community source, and we never tested it. Currently, we **don't test** it on a regular basis.
> If you have any problem with this source, ask for help in our [Slack Community](https://dlthub.com/community).

Strapi is a headless CMS (Content Management System) that allows developers to create powerful API-driven content management systems without having to write a lot of custom code.

Since the endpoints available in Strapi depend on your API setup created, you need to be aware of which endpoints you will be ingesting in order to get data from Strapi into your warehouse.
Expand All @@ -21,10 +27,10 @@ To grab the API token and initialise the verified source, read the [following do
```toml
# put your secret values and credentials here. do not share this file and do not upload it to github.
[sources.strapi]
api_secret_key = "please set me up!" # your api_secret_key"
api_secret_key = "please set me up!" # your api_secret_key"
domain = "please set me up!" # your strapi domain
```

3. When you run the Strapi project and a new tab opens in the browser, the URL in the address bar of that tab is the domain. For example, `[my-strapi.up.railway.app]` if URL in the address bar is (`http://my-strapi.up.railway.app`).
4. Enter credentials for your chosen destination as per the [docs.](https://dlthub.com/docs/dlt-ecosystem/destinations/)

Expand All @@ -34,19 +40,19 @@ To grab the API token and initialise the verified source, read the [following do
```bash
pip install -r requirements.txt
```

2. Now you can run the pipeline using the command:
```bash
python3 strapi_pipeline.py
```

3. To ensure that everything is loaded as expected, use the command:
```bash
dlt pipeline <pipeline_name> show
```

For example, the pipeline_name for the above pipeline is `strapi_pipeline`, you can use any custom name instead.



💡 To explore additional customizations for this pipeline, we recommend referring to the official dlt Strapi verified source documentation. It provides comprehensive information and guidance on how to further customize and tailor the pipeline to suit your specific needs. You can find the dlt Strapi documentation in [Setup Guide: Strapi.](https://dlthub.com/docs/dlt-ecosystem/verified-sources/strapi)
6 changes: 6 additions & 0 deletions tests/facebook_ads/test_facebook_ads_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from tests.utils import ALL_DESTINATIONS, assert_load_info, load_table_counts


@pytest.mark.skip("We don't have a Facebook Ads test account.")
@pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
def test_load_all_ads_object(destination_name: str) -> None:
pipeline = dlt.pipeline(
Expand All @@ -40,6 +41,7 @@ def test_load_all_ads_object(destination_name: str) -> None:
assert all(c > 0 for c in table_counts.values())


@pytest.mark.skip("We don't have a Facebook Ads test account.")
def test_set_ads_fields() -> None:
s = facebook_ads_source()
# get only ids for ads
Expand All @@ -51,6 +53,7 @@ def test_set_ads_fields() -> None:
assert "id" in ad


@pytest.mark.skip("We don't have a Facebook Ads test account.")
def test_select_only_ads_with_state() -> None:
s = facebook_ads_source()
all_ads = list(s.with_resources("ads"))
Expand All @@ -61,6 +64,7 @@ def test_select_only_ads_with_state() -> None:
assert len(all_ads) > len(disapproved_ads)


@pytest.mark.skip("We don't have a Facebook Ads test account.")
def test_enrich_objects_multiple_chunks() -> None:
s = facebook_ads_source()
s.campaigns.bind(fields=("id",))
Expand All @@ -81,12 +85,14 @@ def test_enrich_objects_multiple_chunks() -> None:
assert "name" in full_campaigns[0]


@pytest.mark.skip("We don't have a Facebook Ads test account.")
def test_load_insights() -> None:
# just load 1 past day with attribution window of 7 days - that will re-acquire last 8 days + today
i_daily = facebook_insights_source(initial_load_past_days=1)
assert len(list(i_daily)) == 0


@pytest.mark.skip("We don't have a Facebook Ads test account.")
def test_load_insights_weekly() -> None:
i_weekly = facebook_insights_source(initial_load_past_days=1, time_increment_days=7)
assert len(list(i_weekly)) == 0
Expand Down
1 change: 1 addition & 0 deletions tests/kinesis/test_kinesis.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ def test_kinesis_incremental(kinesis_client: Any) -> None:
pipeline_name="kinesis_test",
destination="duckdb",
dataset_name="kinesis_test_data",
full_refresh=True,
)
test_id = str(uuid.uuid4())

Expand Down
5 changes: 5 additions & 0 deletions tests/matomo/test_matomo_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
]


@pytest.mark.skip("We don't have a Matomo test account.")
@pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
def test_reports(destination_name: str) -> None:
"""
Expand All @@ -84,6 +85,7 @@ def test_reports(destination_name: str) -> None:
_check_pipeline_has_tables(pipeline, ALL_TABLES_REPORTS)


@pytest.mark.skip("We don't have a Matomo test account.")
@pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
def test_visits(destination_name: str) -> None:
"""
Expand Down Expand Up @@ -118,6 +120,7 @@ def test_visits(destination_name: str) -> None:
assert diff_count >= 0 and diff_count < 5


@pytest.mark.skip("We don't have a Matomo test account.")
@pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
def test_visits_with_visitors(destination_name: str) -> None:
"""
Expand Down Expand Up @@ -269,6 +272,7 @@ def test_remove_active_visits(
assert result == expected_visits


@pytest.mark.skip("We don't have a Matomo test account.")
@pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
def test_incrementing_reports(destination_name: str) -> None:
"""
Expand Down Expand Up @@ -308,6 +312,7 @@ def test_incrementing_reports(destination_name: str) -> None:
)


@pytest.mark.skip("We don't have a Matomo test account.")
@pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
def test_start_date(destination_name: str) -> None:
"""
Expand Down
1 change: 1 addition & 0 deletions tests/personio/test_personio_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ def get_metadata(endpoint, headers, params):
]


@pytest.mark.skip("We don't have a Personio test account.")
@pytest.mark.parametrize("endpoint, params, offset_by_page", endpoints_data)
def test_client(endpoint, params, offset_by_page, client):
headers = {"Authorization": f"Bearer {client.access_token}"}
Expand Down
2 changes: 2 additions & 0 deletions tests/personio/test_personio_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from tests.utils import ALL_DESTINATIONS, assert_load_info, load_table_counts


@pytest.mark.skip("We don't have a Personio test account.")
@pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
def test_all_resources(destination_name: str) -> None:
pipeline = dlt.pipeline(
Expand All @@ -26,6 +27,7 @@ def test_all_resources(destination_name: str) -> None:
assert table_counts["absences"] > 1000


@pytest.mark.skip("We don't have a Personio test account.")
@pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
def test_incremental_endpoints(destination_name: str) -> None:
# do the initial load
Expand Down

0 comments on commit e7c3226

Please sign in to comment.