-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Cloud API: Demonstrate cluster mgmt. on behalf of a Jupyter Notebook
- Loading branch information
Showing
4 changed files
with
246 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,203 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# CrateDB Cloud Import\n", | ||
"\n", | ||
"This is an example notebook demonstrating how to load data from\n", | ||
"files using the [Import API] interface of [CrateDB Cloud] into\n", | ||
"a [CrateDB Cloud Cluster].\n", | ||
"\n", | ||
"The supported file types are CSV, JSON, Parquet, optionally with\n", | ||
"gzip compression. They can be acquired from the local filesystem,\n", | ||
"or from remote HTTP and AWS S3 resources.\n", | ||
"\n", | ||
"[CrateDB Cloud]: https://cratedb.com/docs/cloud/\n", | ||
"[CrateDB Cloud Cluster]: https://cratedb.com/docs/cloud/en/latest/reference/services.html\n", | ||
"[Import API]: https://community.cratedb.com/t/importing-data-to-cratedb-cloud-clusters/1467\n", | ||
"\n", | ||
"## Setup\n", | ||
"\n", | ||
"To install the client SDK, use `pip`." | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"outputs": [], | ||
"source": [ | ||
"#!pip install 'cratedb-toolkit'" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Configuration\n", | ||
"\n", | ||
"The notebook assumes you are appropriately authenticated to the CrateDB Cloud\n", | ||
"platform, for example using `croud login --idp azuread`. To inspect the list\n", | ||
"of available clusters, run `croud clusters list`.\n", | ||
"\n", | ||
"For addressing a database cluster, and obtaining corresponding credentials,\n", | ||
"the program uses environment variables, which you can define interactively,\n", | ||
"or store them within a `.env` file.\n", | ||
"\n", | ||
"You can use those configuration snippet as a blueprint. Please adjust the\n", | ||
"individual settings accordingly.\n", | ||
"```shell\n", | ||
"CRATEDB_CLOUD_CLUSTER_NAME=Hotzenplotz\n", | ||
"CRATEDB_USERNAME='admin'\n", | ||
"CRATEDB_PASSWORD='H3IgNXNvQBJM3CiElOiVHuSp6CjXMCiQYhB4I9dLccVHGvvvitPSYr1vTpt4'\n", | ||
"```" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Acquire Database Cluster\n", | ||
"\n", | ||
"As a first measure, acquire a resource handle, which manages a CrateDB Cloud\n", | ||
"cluster instance.\n", | ||
"\n", | ||
"For effortless configuration, it will obtain configuration settings from\n", | ||
"environment variables as defined above." | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from cratedb_toolkit import ManagedCluster, InputOutputResource\n", | ||
"\n", | ||
"cluster = ManagedCluster.from_env().start()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Import Data\n", | ||
"\n", | ||
"From the [NAB Data Corpus], import the \"realKnownCause\" dataset. The dataset includes\n", | ||
"temperature sensor data of an internal component of an industrial machine, with known\n", | ||
"anomaly causes. The first anomaly is a planned shutdown of the machine. The second\n", | ||
"anomaly is difficult to detect and directly led to the third anomaly, a catastrophic\n", | ||
"failure of the machine.\n", | ||
"\n", | ||
"On this topic, we also recommend the notebook about [MLflow and CrateDB], where the\n", | ||
"same dataset is used for time series anomaly detection and forecasting.\n", | ||
"\n", | ||
"[NAB Data Corpus]: https://github.com/numenta/NAB/tree/master/data\n", | ||
"[MLflow and CrateDB]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/mlops-mlflow" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"\u001B[36m==> Info: \u001B[0mStatus: REGISTERED (Your import job was received and is pending processing.)\n", | ||
"\u001B[36m==> Info: \u001B[0mDone importing 22.70K records\n", | ||
"\u001B[32m==> Success: \u001B[0mOperation completed.\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": "CloudJob(info={'cluster_id': '09be10b6-7d78-497b-842e-fbb47642d398', 'compression': 'none', 'dc': {'created': '2023-11-17T18:54:04.070000+00:00', 'modified': '2023-11-17T18:54:04.070000+00:00'}, 'destination': {'create_table': True, 'table': 'nab-machine-failure'}, 'file': None, 'format': 'csv', 'id': '56051dc3-ee8e-4a38-9066-73bcd427d05a', 'progress': {'bytes': 0, 'details': {'create_table_sql': None}, 'failed_files': 0, 'failed_records': 0, 'message': 'Import succeeded', 'percent': 100.0, 'processed_files': 1, 'records': 22695, 'total_files': 1, 'total_records': 22695}, 'schema': {'type': 'csv'}, 'status': 'SUCCEEDED', 'type': 'url', 'url': {'url': 'https://github.com/crate/cratedb-datasets/raw/main/machine-learning/timeseries/nab-machine-failure.csv'}}, found=True, _custom_status=None, _custom_message=None)" | ||
}, | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# Define data source.\n", | ||
"url = \"https://github.com/crate/cratedb-datasets/raw/main/machine-learning/timeseries/nab-machine-failure.csv\"\n", | ||
"source = InputOutputResource(url=url)\n", | ||
"\n", | ||
"# Invoke import job. Without `target` argument, the destination\n", | ||
"# table name will be derived from the input file name.\n", | ||
"cluster.load_table(source=source)" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Query Data\n", | ||
"\n", | ||
"In order to inspect if the dataset has been imported successfully, run an SQL\n", | ||
"command sampling a few records." | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": "[{'timestamp': 1386021900000, 'value': 80.78327674},\n {'timestamp': 1386024000000, 'value': 81.37357535},\n {'timestamp': 1386024600000, 'value': 80.18124978},\n {'timestamp': 1386030300000, 'value': 82.88189183},\n {'timestamp': 1386030600000, 'value': 83.57965349}]" | ||
}, | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# Query data.\n", | ||
"cluster.query('SELECT * FROM \"nab-machine-failure\" LIMIT 5;')" | ||
], | ||
"metadata": { | ||
"collapsed": false | ||
} | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 2 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython2", | ||
"version": "2.7.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters