Metadata¶
Every time dbt Cloud runs a project, it generates and stores information about the project. The metadata includes details about your project’s models, sources, and other nodes along with their execution results. With the dbt Cloud Discovery API, you can query this comprehensive information to gain a better understanding of your DAG and the data it produces.
By leveraging the metadata in dbt Cloud, you can create systems for data monitoring and alerting, lineage exploration, and automated reporting. This can help you improve data discovery, data quality, and pipeline operations within your organization.
The metadata property on the dbtCloudClient class contains a single method, query, that allows a user to interact with the Discovery API.
If you're unfamiliar either with the Schema to query or even how to write a GraphQL query, I highly recommend going to the dbt Cloud Discovery API playground. You'll be able to interactively explore the Schema while watching it write a GraphQL query for you!
Usage¶
The metadata property on the dbtCloudClient class has a single method to pass a query string and variables that will be submitted in the payload with the query. It's important to note that as a default this package will use the beta endpoint at https://metadata.cloud.getdbt.com/beta/graphql (or your particular host). As of this writing, there are many more beta fields that allow for a user to retrieve performance, lineage, recommendations, and much more! If you don't want to use the beta endpoint, construct your dbtCloudClient as follows:
from dbtc import dbtCloudClient
# Assuming I have `DBT_CLOUD_SERVICE_TOKEN` as an env var
client = dbtCloudClient(use_beta_endpoint=False)
# Now all calls to the metadata service will use https://metadata.<host>.com/graphql
client.metadata.query(...)
query¶
Query the Discovery API
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
The GraphQL query to execute. |
required |
variables |
Dict
|
Dictionary containing the variables to include in the payload of the request. Defaults to None. |
None
|
max_pages |
int
|
The max number of pages to paginate through when Defaults to None. |
None
|
paginated_request_to_list |
bool
|
When paginating through a request, the elements of the list within each request will be combined into a single list of dictionaries. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
Union[List[Dict], Dict]
|
Union[List[Dict], Dict]: description |
Examples:
Assuming that client is an instance of dbtCloudClient
query = '''
query ($environmentId: BigInt!, $first: Int!) {
environment(id: $environmentId) {
definition {
metrics(first: $first) {
edges {
node {
name
description
type
formula
filter
tags
parents {
name
resourceType
}
}
}
}
}
}
}
'''
variables = {'environmentId': 1, 'first': 500}
data = client.metadata.query(query, variables)
client.metadata.query(query, variables)