Metadata¶
Every time dbt Cloud runs a project, it generates and stores information about the project. The metadata includes details about your project’s models, sources, and other nodes along with their execution results. With the dbt Cloud Discovery API, you can query this comprehensive information to gain a better understanding of your DAG and the data it produces.
By leveraging the metadata in dbt Cloud, you can create systems for data monitoring and alerting, lineage exploration, and automated reporting. This can help you improve data discovery, data quality, and pipeline operations within your organization.
The metadata
property on the dbtCloudClient
class contains a single method, query
, that allows a user to interact with the Discovery API.
If you're unfamiliar either with the Schema to query or even how to write a GraphQL query, I highly recommend going to the dbt Cloud Discovery API playground. You'll be able to interactively explore the Schema while watching it write a GraphQL query for you!
Usage¶
The metadata
property on the dbtCloudClient
class has a single method to pass a query
string and variables
that will be submitted in the payload with the query
. It's important to note that as a default this package will use the beta endpoint at https://metadata.cloud.getdbt.com/beta/graphql
(or your particular host). As of this writing, there are many more beta fields that allow for a user to retrieve performance, lineage, recommendations, and much more! If you don't want to use the beta endpoint, construct your dbtCloudClient
as follows:
Warning
If you do end up not using the beta endpoint, only the query
method will work properly.
from dbtc import dbtCloudClient
# Assuming I have `DBT_CLOUD_SERVICE_TOKEN` as an env var
client = dbtCloudClient(use_beta_endpoint=False)
# Now all calls to the metadata service will use https://metadata.<host>.com/graphql
client.metadata.query(...)
column_lineage¶
Retrieve column lineage for a given node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment_id |
int
|
The environment id. |
required |
node_unique_id |
str
|
The unique id of the node. |
required |
max_depth |
int
|
(int): The max depth to traverse the lineage. Defaults to None. |
None
|
column_name |
str
|
(str): The column name to filter by. Defaults to None. |
None
|
is_error |
bool
|
(bool): Whether to return only error nodes. Defaults to False. |
False
|
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.column_lineage(environment_id, "model.tpch.dim_customers")
dbtc metadata column-lineage --environment-id 1 --unique-id "model.tpch.dim_customers"
longest_executed_models¶
Retrieve the longest executed models for a given environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment_id |
int
|
The environment id. |
required |
start_date |
str
|
The start date in the format YYYY-MM-DD. |
required |
end_date |
str
|
The end date in the format YYYY-MM-DD. |
required |
limit |
int
|
The max number of models to return. Defaults to 5. |
5
|
job_limit |
int
|
The max number of jobs to return for each model. Defaults to 5. |
5
|
job_id |
int
|
The job id to filter by. Defaults to None. |
None
|
order_by |
str
|
The order by clause. One of "AVG" or "MAX". Defaults to "MAX". |
'MAX'
|
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.longest_executed_models(environment_id, "2024-01-23", "2024-01-24")
dbtc metadata longest-executed-models --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24"
mesh_projects¶
Retrieve mesh projects for a given account.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
account_id |
int
|
The account id. |
required |
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.mesh_projects(1)
dbtc metadata mesh-projects --account-id 1
model_execution_history¶
Retrieve model execution history for a given environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment_id |
int
|
The environment id. |
required |
start_date |
str
|
The start date in the format YYYY-MM-DD. |
required |
end_date |
str
|
The end date in the format YYYY-MM-DD. |
required |
unique_id |
str
|
The unique id of the model. |
required |
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.model_execution_history(1, "2024-01-23", "2024-01-24", "model.tpch.dim_customers")
dbtc metadata model-execution-history --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24" --unique-id "model.tpch.dim_customers"
model_job_information¶
Retrieve model job information for a given environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment_id |
int
|
The environment id. |
required |
start_date |
str
|
The start date in the format YYYY-MM-DD. |
required |
end_date |
str
|
The end date in the format YYYY-MM-DD. |
required |
unique_id |
str
|
The unique id of the model. |
required |
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.model_job_information(1, "2024-01-23", "2024-01-24", "model.tpch.dim_customers")
dbtc metadata model-job-information --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24" --unique-id "model.tpch.dim_customers"
most_executed_models¶
Retrieve the most executed models for a given environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment_id |
int
|
The environment id. |
required |
start_date |
str
|
The start date in the format YYYY-MM-DD. |
required |
end_date |
str
|
The end date in the format YYYY-MM-DD. |
required |
limit |
int
|
The max number of models to return. Defaults to 5. |
5
|
job_limit |
int
|
The max number of jobs to return for each model. Defaults to 5. |
5
|
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.most_executed_models(1, "2024-01-23", "2024-01-24")
dbtc metadata most-executed-models --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24"
most_failed_models¶
Retrieve the most failed models for a given environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment_id |
int
|
The environment id. |
required |
start_date |
str
|
The start date in the format YYYY-MM-DD. |
required |
end_date |
str
|
The end date in the format YYYY-MM-DD. |
required |
limit |
int
|
The max number of models to return. Defaults to 5. |
5
|
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.most_failed_models(1, "2024-01-23", "2024-01-24")
dbtc metadata most-failed-models --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24"
most_models_test_failures¶
Retrieve the most models with test failures for a given environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment_id |
int
|
The environment id. |
required |
start_date |
str
|
The start date in the format YYYY-MM-DD. |
required |
end_date |
str
|
The end date in the format YYYY-MM-DD. |
required |
limit |
int
|
The max number of models to return. Defaults to 5. |
5
|
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.most_models_test_failures(1, "2024-01-23", "2024-01-24")
dbtc metadata most-test-failures --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24"
public_models¶
Retrieve public models for a given account.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
account_id |
int
|
The account id. |
required |
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.public_models(1)
dbtc metadata public-models --account-id 1
query¶
Query the Discovery API
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
str
|
The GraphQL query to execute. |
required |
variables |
Dict
|
Dictionary containing the variables to include in the payload of the request. Defaults to None. |
None
|
max_pages |
int
|
The max number of pages to paginate through when Defaults to None. |
None
|
paginated_request_to_list |
bool
|
When paginating through a request, the elements of the list within each request will be combined into a single list of dictionaries. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
Union[List[Dict], Dict]
|
Union[List[Dict], Dict]: description |
Examples:
Assuming that client
is an instance of dbtCloudClient
query = '''
query ($environmentId: BigInt!, $first: Int!) {
environment(id: $environmentId) {
definition {
metrics(first: $first) {
edges {
node {
name
description
type
formula
filter
tags
parents {
name
resourceType
}
}
}
}
}
}
}
'''
variables = {'environmentId': 1, 'first': 500}
data = client.metadata.query(query, variables)
client.metadata.query(query, variables)
recommendations¶
Retrieve recommendations for a given environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
environment_id |
int
|
The environment id. |
required |
first |
int
|
The max number of recommendations to return. Defaults to 10. |
500
|
severity |
List[str]
|
The severity levels to filter by. Defaults to None. |
None
|
categories |
List[str]
|
The categories to filter by. Defaults to None. |
None
|
rule_names |
List[str]
|
The rule names to filter by. Defaults to None. |
None
|
unique_ids |
List[str]
|
The unique ids to filter by. Defaults to None. |
None
|
Examples:
Assuming that client
is an instance of dbtCloudClient
client.metadata.recommendations(1)
dbtc metadata recommendations --environment-id 1