Double-onboard a data source
Last modified on 21-Jun-24
To scan your data for quality, Soda must connect to a data source using connection configurations (host, port, login credentials, etc.) that you either define in Soda Cloud during onboarding using a Soda Agent, or in a configuration YAML file you reference during programmatic or CLI scans using Soda Library. Soda recognizes each data source you onboard as an independent resource in Soda Cloud, where it displays all scan results and failed row samples for all data sources regardless of onboarding method.
However, data sources you connect via a Soda agent using the guided workflow in Soda Cloud support several features which data sources you connect via Soda Library do not, including:
If you have onboarded a data source via Soda Library but you wish to take advantage of the features available to Soda Agent-onboarded data sources, you can double-onboard an existing data source.
See also: Soda overview
See also: Choose a flavor of Soda
See also: Add a new data source in Soda Cloud
Prerequisites
- You installed Soda Library, you have configured it to connect to your data source, and you have run at least one scan programmatically or via the Soda Library CLI.
- You have deployed a self-hosted Soda Agent helm chart in a Kubernetes cluster in your cloud services environment
OR
Someone with Soda Admin privileges in your organization’s Soda Cloud account has navigated to your avatar > Organization Settings check the box to Enable Soda-hosted Agent; see Set up a Soda-hosted agent. - You have access to the connection configurations (host, port, login credentials, etc.) for your data source.
- Your data source is compatible with a Soda Agent; refer to tables below.
Self-hosted agent
Amazon Athena Amazon Redshift Azure Synapse ClickHouse Databricks SQL Denodo Dremio DuckDB GCP BigQuery Google CloudSQL | IBM DB2 MotherDuck MS SQL Server1 MySQL OracleDB PostgreSQL Presto Snowflake Trino Vertica |
1 MS SQL Server with Windows Authentication does not work with Soda Agent out-of-the-box.
Soda-hosted agent
BigQuery Databricks SQL MS SQL Server MySQL | PostgreSQL Redshift Snowflake  |
Onboard an existing data source
- Log in to Soda Cloud, then navigate to your avatar > Data Sources.
- From the list of data sources connected to your Soda Cloud account, click to select and open the one you onboarded via Soda Library and now wish to double-onboard via a Soda Agent.
- Follow the guided workflow to onboard the existing data source via a Soda Agent, starting by using the dropdown to select the Default Scan Agent you wish to use to connect to the data source.
- Complete the guided steps to:
- define a schedule for your default scan definition
- provide connection configuration details for the data source such as name, schema, and login credentials, and test the connection to the data source
- profile the datasets in the data source to gather basic metadata about the contents of each
- identify the datasets to which you wish to apply automated monitoring for anomalies and schema changes
- assign ownership roles for the data source and its datasets
- Save your changes, then navigate to the Datasets page and select a dataset in the data source you just double-onboarded.
- (Optional) If you wish, and if you have requested preview access for the feature, you can follow the instructions to activate the anomaly dashboard for the dataset.
- (Optional) Click Add Check and begin adding no-code checks to the dataset.
Known issue: Double-onboarding a data source renders Soda Library API keys invalid. After double-onboarding a data source, if you run a programmatic or CLI scan of that data source using Soda Library, an error appears to indicate that the API keys are invalid. As a workaround, generate new API keys in Soda Cloud, then, in your configuration YAML, replace the old API key values with the newly-generated ones.
Go further
- Learn more about automating anomaly detection for observability.
- Need help? Join the Soda community on Slack.
Was this documentation helpful?
What could we do to improve this page?
- Suggest a docs change in GitHub.
- Share feedback in the Soda community on Slack.
Documentation always applies to the latest version of Soda products
Last modified on 21-Jun-24