Fix: skip tests for sources without test accounts (#379)

* skip facebook tests * skip matomo tests * skip personio tests * replace postgres to duckdb * return kafka, skip strapi
dlt-hub · Mar 6, 2024 · 2c4e492 · 2c4e492
1 parent 7abe4e8
commit 2c4e492
Show file tree

Hide file tree

Showing 10 changed files with 70 additions and 32 deletions.
diff --git a/sources/facebook_ads/README.md b/sources/facebook_ads/README.md
@@ -1,5 +1,10 @@
 # Facebook Ads
 
+> **Warning!**
+>
+> This source is a Community source and was tested only once. Currently, we **don't test** it on a regular basis.
+> If you have any problem with this source, ask for help in our [Slack Community](https://dlthub.com/community).
+
 This Facebook dlt verified source and pipeline example loads data to a preferred destination using the Facebook Marketing API. It supports loading data from multiple endpoints, providing flexibility in the data you can retrieve. The following endpoints are available for loading data with this verified source:
 | Endpoint | Description |
 | --- | --- |
@@ -24,44 +29,44 @@ To read about grabbing the Facebook Ads credentials and configuring the verified
 
 1. Open `.dlt/secrets.toml`.
 2. Enter the `access_token` :
-    
+
     ```toml
     # put your secret values and credentials here. do not share this file and do not push it to github
     [sources.facebook_ads]
     access_token="set me up!"
     ```
-    
+
 3. Enter credentials for your chosen destination as per the [docs](https://dlthub.com/docs/dlt-ecosystem/destinations/).
 4.  Open `.dlt/config.toml`.
     ```toml
     [sources.facebook_ads]
     account_id = "1430280281077689"
     ```
-    
+
 5. Replace the value of the account id.
 
 ## Run the pipeline example
 
 1. Install the necessary dependencies by running the following command:
-    
+
     ```bash
     pip install -r requirements.txt
     ```
-    
+
 2. Now the pipeline can be run by using the command:
-    
+
     ```bash
-    python3 facebook_ads_pipeline.py
+    python facebook_ads_pipeline.py
     ```
-    
+
 3. To make sure that everything is loaded as expected, use the command:
-    
+
     ```bash
     dlt pipeline <pipeline_name> show
     ```
-    
+
     For example, the pipeline_name for the above pipeline example is `facebook_ads`, you may also use any custom name instead.
-    
+
 
 
 💡 To explore additional customizations for this pipeline, we recommend referring to the official `dlt` Facebook Ads documentation. It provides comprehensive information and guidance on how to further customize and tailor the pipeline to suit your specific needs. You can find the Facebook Ads verified source documentation in [Setup Guide: Facebook Ads](https://dlthub.com/docs/dlt-ecosystem/verified-sources/facebook_ads).
diff --git a/sources/matomo/README.md b/sources/matomo/README.md
@@ -1,5 +1,11 @@
 # Matomo
 
+> **Warning!**
+>
+> This source is a Community source and was tested only once. Currently, we **don't test** it on a regular basis.
+> If you have any problem with this source, ask for help in our [Slack Community](https://dlthub.com/community).
+
+
 Matomo is a free and open source web analytics platform that allows website owners and businesses to gain detailed insights into the performance of their websites and applications. With this verified source you can easily extract data from Matomo and seamlessly load it into your preferred destination. This verified source supports the following endpoints:
 
 | Endpoint | Description |
@@ -12,10 +18,10 @@ Matomo is a free and open source web analytics platform that allows website owne
 ## Initialize the pipeline with Matomo verified source
 To get started with your data pipeline, follow these steps:
 ```bash
-dlt init matomo bigquery
+dlt init matomo duckdb
 ```
 
-Here, we chose BigQuery as the destination. Alternatively, you can also choose redshift, duckdb, or any of the other [destinations.](https://dlthub.com/docs/dlt-ecosystem/destinations/)
+Here, we chose DuckDB as the destination. Alternatively, you can also choose redshift, bigquery, or any of the other [destinations.](https://dlthub.com/docs/dlt-ecosystem/destinations/)
 
 ## Grab Matomo credentials
 
@@ -25,51 +31,51 @@ To obtain the Matomo credentials, please refer to the [full documentation here.]
 
 1. Open `.dlt/secrets.toml`.
 2. Enter the API key:
-    
+
     ```toml
     # put your secret values and credentials here. do not share this file and do not push it to github
     [sources.matomo]
     api_token= "access_token" # please set me up!
     ```
-    
+
 3. Enter credentials for your chosen destination as per the [docs](https://dlthub.com/docs/dlt-ecosystem/destinations/).
 4. Inside the **`.dlt`** folder, you'll find a file called **`config.toml`**, where you can securely store your pipeline configuration details.
-    
+
     Here's what the config.toml looks like:
-    
+
     ```toml
     [sources.matomo]
     url = "Please set me up !" # please set me up!
     queries = ["a", "b", "c"] # please set me up!
     site_id = 0 # please set me up!
     live_events_site_id = 0 # please set me up!
     ```
-    
+
 5. Replace the value of `url` and `site_id` with the one that you copied above. This will ensure that your data pipeline can access the required Matomo resources.
 6. In order to track live events for a website, the `live_event_site_id` parameter must be set to the same value as the `site_id` parameter for that website.
 
 ## Run the pipeline
 
 1. Install the necessary dependencies by running the following command:
-    
+
     ```bash
     pip install -r requirements.txt
     ```
-    
+
 2. Now the pipeline can be run by using the command:
-    
+
     ```bash
-    python3 matomo_pipeline.py
+    python matomo_pipeline.py
     ```
-    
+
 3. To make sure that everything is loaded as expected, use the command:
-    
+
     ```bash
     dlt pipeline <pipeline_name> show
     ```
-    
+
     For example, the pipeline_name for the above pipeline is `matomo`, you may also use any custom name instead.
-    
+
 
 
 💡 To explore additional customizations for this pipeline, we recommend referring to the official DLT Matomo documentation. It provides comprehensive information and guidance on how to further customize and tailor the pipeline to suit your specific needs. You can find the DLT Matomo documentation in [Setup Guide: Matomo.](https://dlthub.com/docs/dlt-ecosystem/verified-sources/matomo)
diff --git a/sources/personio/README.md b/sources/personio/README.md
@@ -1,5 +1,10 @@
 # Personio
 
+> **Warning!**
+>
+> This source is a Community source and was tested only once. Currently, we **don't test** it on a regular basis.
+> If you have any problem with this source, ask for help in our [Slack Community](https://dlthub.com/community).
+
 [Personio](https://personio.de/) is a human resources management software that helps businesses
 streamline HR processes, including recruitment, employee data management, and payroll, in one
 platform.

diff --git a/sources/sql_database/requirements.txt b/sources/sql_database/requirements.txt
@@ -1,2 +1,3 @@
 sqlalchemy>=1.4
 dlt>=0.3.5
+pymysql>=1.0.0
diff --git a/sources/strapi/README.md b/sources/strapi/README.md
@@ -1,5 +1,11 @@
 # Strapi
 
+> **Warning!**
+>
+> This source is a Community source, and we never tested it. Currently, we **don't test** it on a regular basis.
+> If you have any problem with this source, ask for help in our [Slack Community](https://dlthub.com/community).
+
+
 Strapi is a headless CMS (Content Management System) that allows developers to create powerful API-driven content management systems without having to write a lot of custom code.
 
 Since the endpoints available in Strapi depend on your API setup created, you need to be aware of which endpoints you will be ingesting in order to get data from Strapi into your warehouse.
@@ -21,10 +27,10 @@ To grab the API token and initialise the verified source, read the [following do
     ```toml
     # put your secret values and credentials here. do not share this file and do not upload it to github.
     [sources.strapi]
-    api_secret_key = "please set me up!" # your api_secret_key" 
+    api_secret_key = "please set me up!" # your api_secret_key"
     domain = "please set me up!" # your strapi domain
     ```
-    
+
 3. When you run the Strapi project and a new tab opens in the browser, the URL in the address bar of that tab is the domain. For example, `[my-strapi.up.railway.app]` if URL in the address bar is (`http://my-strapi.up.railway.app`).
 4. Enter credentials for your chosen destination as per the [docs.](https://dlthub.com/docs/dlt-ecosystem/destinations/)
 
@@ -34,19 +40,19 @@ To grab the API token and initialise the verified source, read the [following do
     ```bash
     pip install -r requirements.txt
     ```
-    
+
 2. Now you can run the pipeline using the command:
     ```bash
     python3 strapi_pipeline.py
     ```
-    
+
 3. To ensure that everything is loaded as expected, use the command:
     ```bash
     dlt pipeline <pipeline_name> show
     ```
-    
+
     For example, the pipeline_name for the above pipeline is `strapi_pipeline`, you can use any custom name instead.
-    
+
 
 
 💡 To explore additional customizations for this pipeline, we recommend referring to the official dlt Strapi verified source documentation. It provides comprehensive information and guidance on how to further customize and tailor the pipeline to suit your specific needs. You can find the dlt Strapi documentation in [Setup Guide: Strapi.](https://dlthub.com/docs/dlt-ecosystem/verified-sources/strapi)
diff --git a/tests/facebook_ads/test_facebook_ads_source.py b/tests/facebook_ads/test_facebook_ads_source.py
@@ -15,6 +15,7 @@
 from tests.utils import ALL_DESTINATIONS, assert_load_info, load_table_counts
 
 
+@pytest.mark.skip("We don't have a Facebook Ads test account.")
 @pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
 def test_load_all_ads_object(destination_name: str) -> None:
     pipeline = dlt.pipeline(
@@ -40,6 +41,7 @@ def test_load_all_ads_object(destination_name: str) -> None:
     assert all(c > 0 for c in table_counts.values())
 
 
+@pytest.mark.skip("We don't have a Facebook Ads test account.")
 def test_set_ads_fields() -> None:
     s = facebook_ads_source()
     # get only ids for ads
@@ -51,6 +53,7 @@ def test_set_ads_fields() -> None:
         assert "id" in ad
 
 
+@pytest.mark.skip("We don't have a Facebook Ads test account.")
 def test_select_only_ads_with_state() -> None:
     s = facebook_ads_source()
     all_ads = list(s.with_resources("ads"))
@@ -61,6 +64,7 @@ def test_select_only_ads_with_state() -> None:
     assert len(all_ads) > len(disapproved_ads)
 
 
+@pytest.mark.skip("We don't have a Facebook Ads test account.")
 def test_enrich_objects_multiple_chunks() -> None:
     s = facebook_ads_source()
     s.campaigns.bind(fields=("id",))
@@ -81,12 +85,14 @@ def test_enrich_objects_multiple_chunks() -> None:
     assert "name" in full_campaigns[0]
 
 
+@pytest.mark.skip("We don't have a Facebook Ads test account.")
 def test_load_insights() -> None:
     # just load 1 past day with attribution window of 7 days - that will re-acquire last 8 days + today
     i_daily = facebook_insights_source(initial_load_past_days=1)
     assert len(list(i_daily)) == 0
 
 
+@pytest.mark.skip("We don't have a Facebook Ads test account.")
 def test_load_insights_weekly() -> None:
     i_weekly = facebook_insights_source(initial_load_past_days=1, time_increment_days=7)
     assert len(list(i_weekly)) == 0

diff --git a/tests/kinesis/test_kinesis.py b/tests/kinesis/test_kinesis.py
@@ -105,6 +105,7 @@ def test_kinesis_incremental(kinesis_client: Any) -> None:
         pipeline_name="kinesis_test",
         destination="duckdb",
         dataset_name="kinesis_test_data",
+        full_refresh=True,
     )
     test_id = str(uuid.uuid4())
 

diff --git a/tests/matomo/test_matomo_source.py b/tests/matomo/test_matomo_source.py
@@ -66,6 +66,7 @@
 ]
 
 
+@pytest.mark.skip("We don't have a Matomo test account.")
 @pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
 def test_reports(destination_name: str) -> None:
     """
@@ -84,6 +85,7 @@ def test_reports(destination_name: str) -> None:
     _check_pipeline_has_tables(pipeline, ALL_TABLES_REPORTS)
 
 
+@pytest.mark.skip("We don't have a Matomo test account.")
 @pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
 def test_visits(destination_name: str) -> None:
     """
@@ -118,6 +120,7 @@ def test_visits(destination_name: str) -> None:
         assert diff_count >= 0 and diff_count < 5
 
 
+@pytest.mark.skip("We don't have a Matomo test account.")
 @pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
 def test_visits_with_visitors(destination_name: str) -> None:
     """
@@ -269,6 +272,7 @@ def test_remove_active_visits(
     assert result == expected_visits
 
 
+@pytest.mark.skip("We don't have a Matomo test account.")
 @pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
 def test_incrementing_reports(destination_name: str) -> None:
     """
@@ -308,6 +312,7 @@ def test_incrementing_reports(destination_name: str) -> None:
     )
 
 
+@pytest.mark.skip("We don't have a Matomo test account.")
 @pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
 def test_start_date(destination_name: str) -> None:
     """

diff --git a/tests/personio/test_personio_client.py b/tests/personio/test_personio_client.py
@@ -39,6 +39,7 @@ def get_metadata(endpoint, headers, params):
 ]
 
 
+@pytest.mark.skip("We don't have a Personio test account.")
 @pytest.mark.parametrize("endpoint, params, offset_by_page", endpoints_data)
 def test_client(endpoint, params, offset_by_page, client):
     headers = {"Authorization": f"Bearer {client.access_token}"}

diff --git a/tests/personio/test_personio_source.py b/tests/personio/test_personio_source.py
@@ -5,6 +5,7 @@
 from tests.utils import ALL_DESTINATIONS, assert_load_info, load_table_counts
 
 
+@pytest.mark.skip("We don't have a Personio test account.")
 @pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
 def test_all_resources(destination_name: str) -> None:
     pipeline = dlt.pipeline(
@@ -26,6 +27,7 @@ def test_all_resources(destination_name: str) -> None:
     assert table_counts["absences"] > 1000
 
 
+@pytest.mark.skip("We don't have a Personio test account.")
 @pytest.mark.parametrize("destination_name", ALL_DESTINATIONS)
 def test_incremental_endpoints(destination_name: str) -> None:
     # do the initial load