An open source JSON spec for lineage. Declare your pipelines, get beautiful graphs.
{
"pipelines": [{
"name": "etl-job",
"input_sources": ["raw_events"],
"output_sources": ["cleaned_events"]
}]
}
pipeviz.json describing your pipelines and data sources# each team versions their own pipeviz.json # merge for org-wide view jq -s '{ pipelines: map(.pipelines // []) | add, datasources: map(.datasources // []) | add }' team-*.json > pipeviz.json
OpenLineage/Marquez need agents, a metadata store, scheduler integration. Atlas wants a governance platform. dbt couples you to their framework. Pipeviz needs one JSON file.
?url=your-json-url
Load a configuration to see your pipelines
Load a configuration to see your data sources
Only pipelines is required. Clusters and datasources are auto-created when referenced.
| Field | Type | Required | Description |
|---|---|---|---|
pipelines | array | Yes | Jobs that transform or move data |
datasources | array | No | Tables, files, streams, APIs |
clusters | array | No | Groups for visual organization |
{
"clusters": [...],
"pipelines": [...],
"datasources": [...]
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier |
description | string | No | |
input_sources | array | No | Datasources read from |
output_sources | array | No | Datasources written to |
upstream_pipelines | array | No | Pipeline or group names (orange edges) |
cluster | string | No | Cluster name |
group | string | No | Collapse into group node |
schedule | string | No | Free text |
tags | array | No | |
links | object | No | name to URL |
{
"name": "user-enrichment",
"description": "Enriches user data with events",
"input_sources": ["raw_users", "events"],
"output_sources": ["enriched_users"],
"upstream_pipelines": ["data-ingestion"],
"cluster": "user-processing",
"group": "etl-jobs",
"schedule": "Every 2 hours",
"tags": ["user-data", "ml"],
"links": {
"airflow": "https://...",
"docs": "https://..."
}
}
Auto-created when referenced. Define explicitly to add metadata.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier |
description | string | No | |
type | string | No | snowflake, postgres, kafka, s3... |
owner | string | No | |
cluster | string | No | |
tags | array | No | |
metadata | object | No | Arbitrary key-value |
links | object | No | name to URL |
attributes | array | No | Column-level lineage |
{
"name": "raw_users",
"description": "Raw user data from prod",
"type": "snowflake",
"owner": "data-team@company.com",
"cluster": "user-processing",
"tags": ["pii", "users"],
"metadata": {
"size": "2.1TB",
"record_count": "45M"
},
"links": {
"snowflake": "https://...",
"docs": "https://..."
}
}
Auto-created when referenced. Define explicitly for nesting.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier |
description | string | No | |
parent | string | No | Parent cluster for nesting |
{
"name": "realtime",
"description": "Real-time processing cluster",
"parent": "order-processing"
}
Add attributes to a datasource. Supports nesting for structs/objects. Reference upstream with source::attr or source::parent::child.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Column/field name |
from | string or array | No | Upstream refs |
attributes | array | No | Nested child attributes |
{
"name": "enriched_users",
"attributes": [
{ "name": "user_id", "from": "raw_users::id" },
{
"name": "location",
"from": "raw_users::address",
"attributes": [
{ "name": "city", "from": "raw_users::address::city" },
{ "name": "zip", "from": "raw_users::address::zip" }
]
}
]
}
{
"clusters": [
{ "name": "etl", "description": "ETL pipelines" }
],
"pipelines": [
{
"name": "user-enrichment",
"description": "Enriches user data with events",
"input_sources": ["raw_users", "events"],
"output_sources": ["enriched_users"],
"cluster": "etl",
"schedule": "Every 2 hours",
"tags": ["user-data"],
"links": { "airflow": "https://..." }
}
],
"datasources": [
{
"name": "raw_users",
"type": "snowflake",
"owner": "data-team@company.com",
"attributes": [
{ "name": "id" },
{ "name": "first" },
{ "name": "last" }
]
},
{
"name": "enriched_users",
"type": "snowflake",
"attributes": [
{ "name": "user_id", "from": "raw_users::id" },
{ "name": "full_name", "from": ["raw_users::first", "raw_users::last"] }
]
}
]
}