Skip to content

Metadata

Every time dbt Cloud runs a project, it generates and stores information about the project. The metadata includes details about your project’s models, sources, and other nodes along with their execution results. With the dbt Cloud Discovery API, you can query this comprehensive information to gain a better understanding of your DAG and the data it produces.

By leveraging the metadata in dbt Cloud, you can create systems for data monitoring and alerting, lineage exploration, and automated reporting. This can help you improve data discovery, data quality, and pipeline operations within your organization.

The metadata property on the dbtCloudClient class contains a single method, query, that allows a user to interact with the Discovery API.

If you're unfamiliar either with the Schema to query or even how to write a GraphQL query, I highly recommend going to the dbt Cloud Discovery API playground. You'll be able to interactively explore the Schema while watching it write a GraphQL query for you!

Usage

The metadata property on the dbtCloudClient class has a single method to pass a query string and variables that will be submitted in the payload with the query. It's important to note that as a default this package will use the beta endpoint at https://metadata.cloud.getdbt.com/beta/graphql (or your particular host). As of this writing, there are many more beta fields that allow for a user to retrieve performance, lineage, recommendations, and much more! If you don't want to use the beta endpoint, construct your dbtCloudClient as follows:

Warning

If you do end up not using the beta endpoint, only the query method will work properly.

from dbtc import dbtCloudClient

# Assuming I have `DBT_CLOUD_SERVICE_TOKEN` as an env var
client = dbtCloudClient(use_beta_endpoint=False)

# Now all calls to the metadata service will use https://metadata.<host>.com/graphql
client.metadata.query(...)

column_lineage

Retrieve column lineage for a given node.

Parameters:

Name Type Description Default
environment_id int

The environment id.

required
node_unique_id str

The unique id of the node.

required
max_depth int

(int): The max depth to traverse the lineage. Defaults to None.

None
column_name str

(str): The column name to filter by. Defaults to None.

None
is_error bool

(bool): Whether to return only error nodes. Defaults to False.

False

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.column_lineage(environment_id, "model.tpch.dim_customers")

dbtc metadata column-lineage --environment-id 1 --unique-id "model.tpch.dim_customers"

longest_executed_models

Retrieve the longest executed models for a given environment.

Parameters:

Name Type Description Default
environment_id int

The environment id.

required
start_date str

The start date in the format YYYY-MM-DD.

required
end_date str

The end date in the format YYYY-MM-DD.

required
limit int

The max number of models to return. Defaults to 5.

5
job_limit int

The max number of jobs to return for each model. Defaults to 5.

5
job_id int

The job id to filter by. Defaults to None.

None
order_by str

The order by clause. One of "AVG" or "MAX". Defaults to "MAX".

'MAX'

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.longest_executed_models(environment_id, "2024-01-23", "2024-01-24")

dbtc metadata longest-executed-models --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24"

mesh_projects

Retrieve mesh projects for a given account.

Parameters:

Name Type Description Default
account_id int

The account id.

required

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.mesh_projects(1)

dbtc metadata mesh-projects --account-id 1

model_execution_history

Retrieve model execution history for a given environment.

Parameters:

Name Type Description Default
environment_id int

The environment id.

required
start_date str

The start date in the format YYYY-MM-DD.

required
end_date str

The end date in the format YYYY-MM-DD.

required
unique_id str

The unique id of the model.

required

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.model_execution_history(1, "2024-01-23", "2024-01-24", "model.tpch.dim_customers")

dbtc metadata model-execution-history --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24" --unique-id "model.tpch.dim_customers"

model_job_information

Retrieve model job information for a given environment.

Parameters:

Name Type Description Default
environment_id int

The environment id.

required
start_date str

The start date in the format YYYY-MM-DD.

required
end_date str

The end date in the format YYYY-MM-DD.

required
unique_id str

The unique id of the model.

required

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.model_job_information(1, "2024-01-23", "2024-01-24", "model.tpch.dim_customers")

dbtc metadata model-job-information --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24" --unique-id "model.tpch.dim_customers"

most_executed_models

Retrieve the most executed models for a given environment.

Parameters:

Name Type Description Default
environment_id int

The environment id.

required
start_date str

The start date in the format YYYY-MM-DD.

required
end_date str

The end date in the format YYYY-MM-DD.

required
limit int

The max number of models to return. Defaults to 5.

5
job_limit int

The max number of jobs to return for each model. Defaults to 5.

5

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.most_executed_models(1, "2024-01-23", "2024-01-24")

dbtc metadata most-executed-models --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24"

most_failed_models

Retrieve the most failed models for a given environment.

Parameters:

Name Type Description Default
environment_id int

The environment id.

required
start_date str

The start date in the format YYYY-MM-DD.

required
end_date str

The end date in the format YYYY-MM-DD.

required
limit int

The max number of models to return. Defaults to 5.

5

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.most_failed_models(1, "2024-01-23", "2024-01-24")

dbtc metadata most-failed-models --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24"

most_models_test_failures

Retrieve the most models with test failures for a given environment.

Parameters:

Name Type Description Default
environment_id int

The environment id.

required
start_date str

The start date in the format YYYY-MM-DD.

required
end_date str

The end date in the format YYYY-MM-DD.

required
limit int

The max number of models to return. Defaults to 5.

5

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.most_models_test_failures(1, "2024-01-23", "2024-01-24")

dbtc metadata most-test-failures --environment-id 1 --start-date "2024-01-23" --end-date "2024-01-24"

public_models

Retrieve public models for a given account.

Parameters:

Name Type Description Default
account_id int

The account id.

required

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.public_models(1)

dbtc metadata public-models --account-id 1

query

Query the Discovery API

Parameters:

Name Type Description Default
query str

The GraphQL query to execute.

required
variables Dict

Dictionary containing the variables to include in the payload of the request. Defaults to None.

None
max_pages int

The max number of pages to paginate through when Defaults to None.

None
paginated_request_to_list bool

When paginating through a request, the elements of the list within each request will be combined into a single list of dictionaries. Defaults to False.

False

Returns:

Type Description
Union[List[Dict], Dict]

Union[List[Dict], Dict]: description

Examples:

Assuming that client is an instance of dbtCloudClient

query = '''
query ($environmentId: BigInt!, $first: Int!) {
environment(id: $environmentId) {
    definition {
    metrics(first: $first) {
        edges {
        node {
            name
            description
            type
            formula
            filter
            tags
            parents {
            name
            resourceType
            }
        }
        }
    }
    }
}
}
'''
variables = {'environmentId': 1, 'first': 500}
data = client.metadata.query(query, variables)
client.metadata.query(query, variables)

recommendations

Retrieve recommendations for a given environment.

Parameters:

Name Type Description Default
environment_id int

The environment id.

required
first int

The max number of recommendations to return. Defaults to 10.

500
severity List[str]

The severity levels to filter by. Defaults to None.

None
categories List[str]

The categories to filter by. Defaults to None.

None
rule_names List[str]

The rule names to filter by. Defaults to None.

None
unique_ids List[str]

The unique ids to filter by. Defaults to None.

None

Examples:

Assuming that client is an instance of dbtCloudClient

client.metadata.recommendations(1)

dbtc metadata recommendations --environment-id 1