Satsuma CLI &mdash; Agent Tooling for Structural Extraction

Start here

Teach your agent Satsuma in 30 seconds

satsuma agent-reference prints a compact prompt that teaches any AI agent the Satsuma grammar, the CLI commands, and the recommended workflow patterns. Append it to your agent's instructions file and you're done.

One-time setup

Append the reference to whatever file your agent reads at startup. Works with Claude Code, Copilot, Cursor, Windsurf, or any agent that reads a system prompt from a file.

# Claude Code
$ satsuma agent-reference >> CLAUDE.md

# GitHub Copilot
$ satsuma agent-reference >> .github/copilot-instructions.md

# Cursor
$ satsuma agent-reference >> .cursor/rules/satsuma.mdc

# Or paste into any conversation
$ satsuma agent-reference | pbcopy

What the agent gets

The reference covers:

Compact EBNF grammar for all Satsuma syntax

All 21 CLI commands with usage patterns

Transform function library and metadata tokens

Step-by-step workflow for reading and writing .stm files

22 common mistakes with fixes

Excel-to-Satsuma conversion examples

Agent Workflows

How agents use the CLI

The CLI gives agents token-efficient structural queries instead of dumping entire files into context. Agents compose these primitives into higher-level workflows — the CLI extracts the facts, the agent reasons over them.

Validate lineage and trace data flow

Your agent can trace any data element from source to destination across the entire workspace. It starts with lineage for schema-level paths, then drills into arrows for field-level detail.

When an arrow is classified as nl, the agent reads the natural-language intent and interprets it. When it's structural, the transform pipeline is fully specified — no interpretation needed.

1. lineage --from traces all downstream consumers of a schema

2. arrows --as-source follows each field through its transforms

3. nl extracts NL intent at [nl] hops for the agent to interpret

4. where-used finds every reference to a schema or fragment

lineage workflow

# Schema-level: where does this data go?
$ satsuma lineage --from loyalty_sfdc

loyalty_sfdc
  -> sat_customer_demographics
    -> mart_customer_360

# Field-level: trace LoyaltyTier through transforms
$ satsuma arrows loyalty_sfdc.LoyaltyTier --as-source --json

{ "target": "sat_customer_demographics.loyalty_tier",
  "classification": "structural",
  "transform": "UPPER | TRIM" }

# Follow the chain — next hop is NL
$ satsuma arrows sat_customer_demographics.loyalty_tier --as-source

[nl] loyalty_tier -> mart_customer_360.tier_label
  "Map Gold/Silver/Bronze to internal codes"

# Agent reads the NL and decides what to do

Extract NL blocks for analysis and critique

Satsuma uses natural-language strings for intent that can't be expressed as deterministic pipelines. The nl command extracts these verbatim — notes, transform descriptions, and comments — so your agent can analyze, critique, or summarize them.

Agents use this to review business logic, check if NL descriptions match the structural transforms around them, identify ambiguities, or generate documentation from the intent strings.

Extract all NL from a mapping for business logic review

Pull field-level NL for targeted interpretation

Bulk-extract NL across a workspace for pattern analysis

@ref references in NL are machine-extractable — the CLI traces them

NL extraction

# All NL content in a mapping
$ satsuma nl 'demographics to mart'

mart_customer_360.full_name [transform]
  "Concatenate first and last name from
   `@ref sat_customer_demographics`"

mart_customer_360.tier_label [transform]
  "Map Gold/Silver/Bronze to internal codes
   per the tier mapping in `@ref lookup_tiers`"

mart_customer_360 [note]
  "This mart combines demographic and loyalty
   data into a single customer view for BI"

# Field-level NL only
$ satsuma nl mart_customer_360.email

"Hash with SHA-256 before loading into the
 mart. Original plaintext stays in the sat."

Metadata and custom tags for code generation

Satsuma's metadata system is open-ended — any token can be a tag. Your agent uses meta and find --tag to extract these, then combines them with your organisation's guidelines to drive code generation.

For example, if your team follows the Data Vault standard, your agent reads pk, bk, hash_diff tags from schema metadata and generates hub, satellite, and link DDL accordingly. If your team uses pii and encrypt tags, the agent knows to emit encryption logic.

The CLI doesn't know what hash_diff means — it just extracts the tag. Your agent, armed with your org's standards doc, interprets it.

metadata + code gen

# Read metadata on a target schema
$ satsuma meta hub_customer

hub_customer
  customer_hk   (pk, hash_key)
  customer_bk   (bk, required)
  load_date     (required)
  record_source (required)

# Find all PII fields across the workspace
$ satsuma find --tag pii --json

[
  { "schema": "loyalty_sfdc", "field": "Email",
    "tags": ["pii", "encrypt"] },
  { "schema": "loyalty_sfdc", "field": "SSN",
    "tags": ["pii", "encrypt", "mask"] }
]

# Agent reads your org's Data Vault standard,
# sees pk + hash_key, and generates:
#   CREATE TABLE hub_customer (
#     customer_hk BINARY(32) NOT NULL,
#     customer_bk VARCHAR(255) NOT NULL,
#     ...

One-shot workspace reasoning

For complex analysis, graph --json exports the complete semantic graph in a single call — all nodes, edges, field-level data flow, and unresolved NL arrows. The agent loads it once and reasons offline, without round-trips.

Impact analysis, PII audit, and coverage check from a single payload

--schema-only and --no-nl reduce payload for large workspaces

--namespace scopes the export to a single namespace

unresolved_nl section surfaces all NL arrows awaiting interpretation

satsuma graph

$ satsuma graph examples/sfdc-to-snowflake/pipeline.stm --json

{
  "nodes": [
    { "name": "loyalty_sfdc", "type": "schema" },
    { "name": "sat_customer_demographics", ... },
    { "name": "mart_customer_360", ... }
  ],
  "schema_edges": [
    { "from": "loyalty_sfdc",
      "to": "sat_customer_demographics",
      "role": "source" },
    ...
  ],
  "edges": [ ... ],
  "unresolved_nl": [ ... ]
}

# Narrow scope for large workspaces
$ satsuma graph examples/sfdc-to-snowflake/pipeline.stm --json --namespace warehouse

Example agent workflows

Impact Analysis

"What breaks if I change this field?"

$ satsuma arrows loyalty_sfdc.LoyaltyTier --as-source --json
$ satsuma arrows sat_customer_demographics.loyalty_tier --as-source --json
$ satsuma nl mart_customer_360.loyalty_tier

PII Audit

"Does PII survive through the pipeline unencrypted?"

$ satsuma find --tag pii --json
$ satsuma arrows loyalty_sfdc.Email --as-source --json
$ satsuma nl mart_customer_360.email

Coverage Assessment

"Which target fields have no mapping?"

$ satsuma fields mart_customer_360 --unmapped-by 'demographics to mart'
$ satsuma fields mart_customer_360 --unmapped-by 'online to mart'

Drafting a New Mapping

"Match source to target and write the mapping"

$ satsuma match-fields --source loyalty_sfdc --target sat_customer
$ satsuma nl sat_customer
$ satsuma meta sat_customer.country_code

Human Workflows

Your day-to-day commands

These three commands are the human side of the CLI. Run them before committing, in CI, or as editor commands. They're also used by the VS Code extension under the hood.

validate

"Is my workspace well-formed?"

Checks parse errors, undefined schema references, missing fields, and invalid paths. Run it before every commit.

$ satsuma validate examples/sfdc-to-snowflake/pipeline.stm
valid — 0 errors, 0 warnings

$ satsuma validate --json
{ "valid": true, "errors": 0, "warnings": 0 }

fmt

"One canonical style, zero config."

Opinionated formatter backed by the tree-sitter CST. Semantics-preserving. Use --check in CI.

$ satsuma fmt examples/sfdc-to-snowflake/pipeline.stm
Formatted 1 file(s)

$ satsuma fmt --check examples/sfdc-to-snowflake/pipeline.stm
0 file(s) would be reformatted

$ satsuma fmt --diff mapping.stm

lint

"Does this follow best practices?"

Policy and convention checks with --fix for safe autofix. Catches hidden NL dependencies and duplicate definitions.

$ satsuma lint --fix
Fixed 2 issue(s)

$ satsuma lint --rules
hidden-source-in-nl  (fixable)
unresolved-nl-ref
duplicate-definition

Transform classification

Every arrow the CLI returns carries a classification derived from CST node types. This tells the agent whether it needs to interpret the transform or can trust the syntax.

Classification	Meaning	Agent action
`structural`	Deterministic pipeline	None — fully specified
`nl`	Natural-language string	Read and interpret intent
`mixed`	Pipeline steps + NL strings	Review the NL portion
`none`	Bare `src -> tgt`	None
`nl-derived`	Implicit from `@ref`	Verify referenced field exists

Installation

Download a prebuilt package from the v0.7.0 release and install globally with npm.

terminal — stable (v0.7.0)

# Universal — works on macOS, Linux, and Windows (WASM-based)
$ npm install -g https://github.com/thorbenlouw/satsuma-lang/releases/download/v0.7.0/satsuma-cli-v0.7.0.tgz

$ satsuma --help
satsuma <command> [options]
21 commands available

Latest unstable build (from main)

terminal — unstable (latest)

# Universal — works on macOS, Linux, and Windows (WASM-based)
$ npm install -g https://github.com/thorbenlouw/satsuma-lang/releases/download/latest/satsuma-cli-latest.tgz

All commands accept --json for structured output and --help for usage details. Many support --compact for minimal output.

All 21 commands

Complete reference for every command in the CLI. See the full reference on GitHub for detailed usage and examples.

Workspace Extractors

Block-level extraction — retrieve whole blocks or workspace-level summaries.

summary

Workspace overview — schemas, mappings, metrics, and counts.

schema

Full schema definition from the parse tree.

metric

Full definition of a schema decorated with metric metadata — grain, slice, filter, and measure fields.

mapping

Full mapping with all arrows and transforms.

find --tag

Find all fields carrying a metadata tag (pii, encrypt, etc.).

lineage

Schema-level graph traversal, forward or backward.

where-used

All references to a schema, fragment, or transform.

warnings

All //! and //? comments across the workspace.

context

Keyword-ranked block extraction (heuristic fuzzy search).

Structural Primitives

Fine-grained extraction — slice below block level for arrows, NL, metadata, and fields.

arrows

All arrows for a field, with transform classification.

nl

NL content — notes, transforms, comments — extracted verbatim.

fields

Field list with types. Supports --unmapped-by.

match-fields

Normalized name comparison between source and target schemas.

Workspace Graph

Full semantic graph export in a single call.

graph --json

Complete graph with nodes, edges, and field-level data flow.

graph --compact

Schema-level adjacency list (minimal payload).

graph --schema-only

Topology only, omit field-level edges.

Analysis

Formatting, validation, linting, and structural comparison.

fmt

Opinionated, zero-config formatter. --check for CI.

validate

Parse errors and semantic reference checks.

lint

Policy checks with --fix autofix.

diff

Structural comparison of two workspace snapshots.

Agent Setup

Bootstrap your AI agent.

agent-reference

Print the AI Agent Reference for embedding in agent instructions.

Design boundaries

Does not interpret NL

Transform strings, notes, and comments are extracted verbatim. The CLI never assesses whether an NL transform is correct or complete.

Does not compose analysis workflows

There are no impact, coverage, or audit commands. These are agent workflows built from primitives.

Does not call language models

The CLI is deterministic, fast, and reproducible. Same input, same output, every time.

Does not accept NL queries

Commands take explicit structural arguments. The agent decides which commands to call based on the user's question.

The agent's toolkit for Satsuma

Teach your agent Satsuma in 30 seconds

One-time setup

What the agent gets

How agents use the CLI

Validate lineage and trace data flow

Extract NL blocks for analysis and critique

Metadata and custom tags for code generation

One-shot workspace reasoning

Example agent workflows

Impact Analysis

PII Audit

Coverage Assessment

Drafting a New Mapping

Your day-to-day commands

validate

fmt

lint

Transform classification

Installation

All 21 commands

summary

schema

metric

mapping

find --tag

lineage

where-used

warnings

context

arrows

nl

meta

fields

match-fields

graph --json

graph --compact

graph --schema-only

fmt

validate

lint

diff

agent-reference

Design boundaries

Does not interpret NL

Does not compose analysis workflows

Does not call language models

Does not accept NL queries

Ready to give your agent superpowers?