Skip to content

Latest commit

 

History

History
executable file
·
82 lines (58 loc) · 4.68 KB

README.md

File metadata and controls

executable file
·
82 lines (58 loc) · 4.68 KB

Kusto Copy

** This project is suspended right now. The code isn't at a usable level yet. **

This project aims at building a tool to provide copying capabilities between Azure Data Explorer (Kusto) databases. This enables scenarios such as table copy, cluster migration, replication across region (BCDR) & others.

The solution is based on exporting data on one side and ingesting it on the other. The tool takes care of orchestrating the data movement. It is fault tolerant (i.e. the tool can be interupted and restarted) and guarantees exact copies. It also tries to give a consistant replica at one point in time using database cursors and ingestion-time windows.

Limitations

The tool has the following known limitations:

  • Is bound by the capacity of the source cluster to export data and the destination clusters to ingest the data
  • Number of databases: the tool isn't optimized to work with lots of small databases totalling a huge amount (1000+) of tables
  • The tool doesn't track purges, row deletions or extent drops
  • Update policies aren't taken into account ; tables are replicated assuming it is all "original content"
  • If some records have no ingestion-time (e.g. ingestion time policy was disabled), records won't be copied over

Getting started

Kusto Copy is a command line interface (CLI) tool.

You can find the binary executable here for Linux, Windows & Mac.

Here is an example of usage:

kusto-copy -l https://vpdeltalake.blob.core.windows.net/copy/laptop/d16 -s https://help.kusto.windows.net/ -d https://vpdeltalake.eastus.kusto.windows.net --db Samples --tables-include StormEvents demo_make_series1 ConferenceSessions US_States

That CLI invocation is configured as followed:

The CLI will start copying the data from the source to destination cluster. You can monitor the process by querying the KC_Status view in the destination database. That view is surfacing the checkpoint file in the data lake folder.

Here are the command line options:

Short name Long name Is required Description
v verbose No Set output to verbose messages.
- continuous No Continuous run. If set, runs continuously. Otherwise, stop after first batch.
l lake Yes Data Lake (ADLS gen 2) folder URL or Kusto-style connection string
s source Yes Source Cluster Query Connection String
d destination Yes Destination Cluster Query Connection String
- db Yes Database to copy
- tables-include No Tables to include (default: copies all tables)
- tables-exclude No Tables to exclude (default: copies all tables)
- query-slots No Number of concurrent queries / commands on the clusters (default: %10 of query capacity)
- export-slots No # export slots to use on source cluster (default: %100 of export capacity)
- ingestion-slots No Number of concurrent ingestions on the clusters (default: %100 of ingestion capacity)
r rpo No Recovery Point Objectives: the target timespan between 2 iterations (default: 5 minutes)
b backfillHorizon No Backfill horizon: how long in the past should we start? (default: no horizon, i.e. backfill everything)

Authentication

Kusto Copy leverages Kusto Connection strings.

This can be used to specify the authentication mechanism to connect to the source cluster, the destination cluster and even the data lake.

If no authentication mechanism is specified, the Azure Default will be used, which go through the followings:

  • Azure.Identity.EnvironmentCredential
  • Azure.Identity.ManagedIdentityCredential
  • Azure.Identity.SharedTokenCacheCredential
  • Azure.Identity.VisualStudioCredential
  • Azure.Identity.VisualStudioCodeCredential
  • Azure.Identity.AzureCliCredential
  • Azure.Identity.InteractiveBrowserCredential

A popular authentication alternative is to use a service principal. The connection string would then look as followed:

Data Source={serviceUri};Database=NetDefaultDB;Fed=True;AppClientId={applicationClientId};AppKey={applicationKey};Authority Id={authority}

Look at Kusto Connection strings for more details / alternatives.