github

Pipeviz logo Pipeviz

Easy, elegant lineage from a single .json

An open source JSON spec for lineage. Declare your pipelines, get beautiful graphs.

{
  "pipelines": [{
    "name": "etl-job",
    "input_sources": ["raw_events"],
    "output_sources": ["cleaned_events"]
  }]
}
Stack Agnostic
SQL, Spark, Kafka, APIs, shell scripts. Just JSON.
Zero Dependencies
One HTML file. No backend, no build step. Host anywhere.
Federated
Each team owns their JSON. Merge with jq for the org-wide view.
Column-Level Lineage
Track field-level provenance. See where each attribute comes from.
How it works
1
Define
Write a pipeviz.json describing your pipelines and data sources
2
Load
Drop your file here, or host both files together on any static server
3
Explore
Click through the graph, trace dependencies, export DOT for other tools
Merging configs
# each team versions their own pipeviz.json
# merge for org-wide view
jq -s '{
  pipelines: map(.pipelines // []) | add,
  datasources: map(.datasources // []) | add
}' team-*.json > pipeviz.json
Why

OpenLineage/Marquez need agents, a metadata store, scheduler integration. Atlas wants a governance platform. dbt couples you to their framework. Pipeviz needs one JSON file.

Auto-load: ?url=your-json-url

Load a configuration to see your pipelines

Load a configuration to see your data sources

Loading graph...
Legend:
■ = Pipeline
■ = Group
● = Data Source
□ = Cluster
→ = Data flow
→ = Dependency
Loading graph...
Legend:
▢ = Data Source
· = Attribute
→ = Derived from
Generated Graphviz DOT
Pipeviz JSON Spec

Only pipelines is required. Clusters and datasources are auto-created when referenced.

Root
FieldTypeRequiredDescription
pipelinesarrayYesJobs that transform or move data
datasourcesarrayNoTables, files, streams, APIs
clustersarrayNoGroups for visual organization
{
  "clusters": [...],
  "pipelines": [...],
  "datasources": [...]
}
Pipeline
FieldTypeRequiredDescription
namestringYesUnique identifier
descriptionstringNo
input_sourcesarrayNoDatasources read from
output_sourcesarrayNoDatasources written to
upstream_pipelinesarrayNoPipeline or group names (orange edges)
clusterstringNoCluster name
groupstringNoCollapse into group node
schedulestringNoFree text
tagsarrayNo
linksobjectNoname to URL
{
  "name": "user-enrichment",
  "description": "Enriches user data with events",
  "input_sources": ["raw_users", "events"],
  "output_sources": ["enriched_users"],
  "upstream_pipelines": ["data-ingestion"],
  "cluster": "user-processing",
  "group": "etl-jobs",
  "schedule": "Every 2 hours",
  "tags": ["user-data", "ml"],
  "links": {
    "airflow": "https://...",
    "docs": "https://..."
  }
}
Datasource

Auto-created when referenced. Define explicitly to add metadata.

FieldTypeRequiredDescription
namestringYesUnique identifier
descriptionstringNo
typestringNosnowflake, postgres, kafka, s3...
ownerstringNo
clusterstringNo
tagsarrayNo
metadataobjectNoArbitrary key-value
linksobjectNoname to URL
attributesarrayNoColumn-level lineage
{
  "name": "raw_users",
  "description": "Raw user data from prod",
  "type": "snowflake",
  "owner": "data-team@company.com",
  "cluster": "user-processing",
  "tags": ["pii", "users"],
  "metadata": {
    "size": "2.1TB",
    "record_count": "45M"
  },
  "links": {
    "snowflake": "https://...",
    "docs": "https://..."
  }
}
Cluster

Auto-created when referenced. Define explicitly for nesting.

FieldTypeRequiredDescription
namestringYesUnique identifier
descriptionstringNo
parentstringNoParent cluster for nesting
{
  "name": "realtime",
  "description": "Real-time processing cluster",
  "parent": "order-processing"
}
Attribute Lineage

Add attributes to a datasource. Supports nesting for structs/objects. Reference upstream with source::attr or source::parent::child.

FieldTypeRequiredDescription
namestringYesColumn/field name
fromstring or arrayNoUpstream refs
attributesarrayNoNested child attributes
{
  "name": "enriched_users",
  "attributes": [
    { "name": "user_id", "from": "raw_users::id" },
    {
      "name": "location",
      "from": "raw_users::address",
      "attributes": [
        { "name": "city", "from": "raw_users::address::city" },
        { "name": "zip", "from": "raw_users::address::zip" }
      ]
    }
  ]
}
Full Example
{
  "clusters": [
    { "name": "etl", "description": "ETL pipelines" }
  ],
  "pipelines": [
    {
      "name": "user-enrichment",
      "description": "Enriches user data with events",
      "input_sources": ["raw_users", "events"],
      "output_sources": ["enriched_users"],
      "cluster": "etl",
      "schedule": "Every 2 hours",
      "tags": ["user-data"],
      "links": { "airflow": "https://..." }
    }
  ],
  "datasources": [
    {
      "name": "raw_users",
      "type": "snowflake",
      "owner": "data-team@company.com",
      "attributes": [
        { "name": "id" },
        { "name": "first" },
        { "name": "last" }
      ]
    },
    {
      "name": "enriched_users",
      "type": "snowflake",
      "attributes": [
        { "name": "user_id", "from": "raw_users::id" },
        { "name": "full_name", "from": ["raw_users::first", "raw_users::last"] }
      ]
    }
  ]
}