Skip to content

Data Copilot: Layered Text-to-SQL Architecture

This document defines a robust, cost-effective pipeline for converting natural language questions into executable SQL queries. It uses a layered, multi-agent approach to handle complexity, minimize token usage, and ensure accuracy through specialized agents and human-in-the-loop (HITL) checkpoints.

Overview

Traditional "one-shot" Text-to-SQL often fails on complex schemas or ambiguous questions. This architecture breaks the problem into five distinct layers, each with a narrow focus.

flowchart TD
    User([User Question]) --> Router[1. Workspace Router]
    Router --> Intent[2. Intent Agent]
    Intent --> Table[3. Table Agent]
    Table --> Prune[4. Column Prune Agent]
    Prune --> SQL[5. SQL Generator]
    SQL --> Output[/SQL Query/]

    subgraph HITL [Human-in-the-Loop]
        TableCheck{Verify Tables?}
        PruneCheck{Verify Columns?}
    end

    Table -.-> TableCheck
    TableCheck -- Approved --> Prune
    Prune -.-> PruneCheck
    PruneCheck -- Approved --> SQL

Layers & Interfaces

1. Workspace Router

  • Role: Identifies which "workspace" (database or domain) the question belongs to (e.g., Finance, Home Automation, Inventory).
  • Benefit: Prevents the LLM from being overwhelmed by the entire repository's schema.
  • Model Recommendation: Small/Fast (e.g., Qwen 2.5 0.8B or 2B via Ollama).
  • Interface:
    {
      "workspace_id": "inventory",
      "description": "Home Inventory and Assets",
      "db_connection_string": "sqlite:///home_inventory.db"
    }
    

2. Intent Agent

  • Role: Refines the raw user question into a structured intent, identifying metrics, time ranges, and filters.
  • Input: User question + Workspace context.
  • Output: JSON object with intent parameters.
  • Interface:
    {
      "metrics": ["quantity", "purchase_price"],
      "dimensions": ["category"],
      "filters": {"room": "Kitchen"},
      "time_range": "last_30_days"
    }
    

3. Table Agent

  • Role: Selects the minimal set of tables required to answer the intent.
  • HITL Point: Optional user approval of selected tables to prevent joined-table explosions.
  • Pruning Strategy: Semantic search over table descriptions (RAG) instead of dumping all table names.
  • Interface:
    {
      "selected_tables": ["items", "categories"],
      "rationale": "Need 'items' for quantity/price and 'categories' for grouping."
    }
    

4. Column Prune Agent

  • Role: For the selected tables, identifies only the columns needed for the query.
  • Benefit: Drastically reduces the prompt size for the final SQL generation, staying within small model context limits.
  • HITL Point: Verification of critical columns (e.g., "Are you sure you want 'net_price' instead of 'gross_price'?").
  • Interface:
    [
      {
        "table_name": "items",
        "columns": ["name", "quantity", "purchase_price", "category_id", "room"]
      },
      {
        "table_name": "categories",
        "columns": ["id", "name"]
      }
    ]
    

5. SQL Generator

  • Role: Produces the final SQL query using the pruned schema and refined intent.
  • Input: Intent JSON + Pruned Schema (Tables + Columns).
  • Output: Valid SQL string.
  • Interface:
    {
      "sql": "SELECT c.name, SUM(i.quantity) FROM items i JOIN categories c ON i.category_id = c.id WHERE i.room = 'Kitchen' GROUP BY c.name;"
    }
    

Data Contracts (Interfaces)

To ensure interoperability between layers, we use standardized Pydantic models (see Skeleton Implementation).

Interface Input Output
Router query: str WorkspaceContext (ID, DB Path)
Intent query, WorkspaceContext IntentOutput (Metrics, Filters, Time)
Table IntentOutput TableSelection (Table List, Rationale)
Prune tables: List, IntentOutput List[PrunedSchema] (Table + Columns)
SQL IntentOutput, List[PrunedSchema] sql: str

Cost & Model Routing

To maintain a "free/cheap-first" stack, route each layer to the cheapest model class that can pass that layer's tests, then escalate only the failed layer with the same pruned context. Treat model IDs as configuration, not hard-coded business logic, because hosted catalogs and local model availability change over time.

Layer Free / cheap primary route Escalation route Token budget control Rationale
Router Local Ollama small classifier such as qwen3:1.7b or another compact open model Local qwen3:8b Workspace names + 1-line descriptions only Simple domain classification should not require paid tokens.
Intent Local qwen3:8b, llama3.1:8b, or equivalent small instruct model Low-cost hosted model such as OpenAI GPT-5.4 mini-class or Claude Haiku-class User question + workspace policy + metric glossary Metric/date extraction needs stronger reasoning than routing but remains compact.
Table Selection Hosted free/developer-plan open model on Groq or local 8B model over table summaries Low-cost hosted mini/Haiku-class model Top 10 table cards, each with owner, grains, joins, and metric tags Wrong-table errors are expensive, so this layer gets stronger reasoning and HITL review.
Column Pruning Local qwen3:8b with JSON/schema output validation Claude Haiku-class or OpenAI mini-class model Selected table DDL plus PK/FK/metric comments only Structured pruning is cheap to verify and keeps SQL prompts small.
SQL Generation Low-cost hosted mini/Haiku-class model Larger hosted SQL-capable model only after validation failure Pruned schema + intent JSON + dialect rules; no full database dump SQL syntax and dialect reliability are worth a small paid call after pruning.

Human correction points

Human correction is intentionally placed before the pipeline pays for larger-context SQL generation:

  1. Table review: The Table Agent returns selected tables, confidence, and a rationale. If confidence is low, too many tables are selected, or a domain expert spots a missing fact table, the reviewer can replace the table list and continue without rerunning routing or intent extraction.
  2. Column / metric review: The Column Prune Agent returns per-table column lists and the metric assumptions it inferred. A reviewer can correct ambiguous fields such as gross_price versus net_price, add required join keys, or send the question back for clarification.
  3. Post-validation review: If SQL validation or sample-result checks fail, show the user the intent, table list, pruned schema, generated SQL, and validator error together so the correction targets the smallest broken layer.

Token Control & Schema Pruning Strategy

To minimize costs and stay within the context limits of smaller models, the following pruning strategies are used:

  1. Semantic Table Filtering: Instead of sending all table schemas, we use an embedding-based search (RAG) to find the top $N$ (usually 3-5) most relevant tables for the intent.
  2. Column-Level Pruning: Once tables are selected, only the primary keys, foreign keys, and columns identified by the Column Prune Agent are included in the final SQL generation prompt.
    • Example: For a table orders with 50 columns, if the intent is "Total sales by month", the pruned schema sent to the SQL Generator only contains [id, total_amount, created_at].
  3. Schema metadata: We include short comments in the schema (e.g., -- type: categorical) rather than raw data samples to keep token counts low.
  4. Fallback Routing: If a query fails on a local model, route only that layer to a more capable hosted model with the same pruned schema. Avoid rerunning the whole pipeline unless the validator proves the earlier interface is wrong.
  5. Token ceilings per layer: Enforce hard maximum prompt sizes for router, intent, table, prune, and SQL layers. If a layer exceeds its budget, summarize or retrieve fewer schema cards before escalating to a bigger model.

Failure Modes & Mitigation

  1. Wrong Domain (Router Failure):
  2. Symptom: Question about "eggs" routed to "Home Office" instead of "Grocy".
  3. Mitigation: Provide a "Unsure" fallback that triggers a human clarification.
  4. Wrong Table / Table Explosion (Table Agent Failure):
  5. Symptom: Selecting an event-log table instead of the canonical fact table, or selecting 10+ tables for a simple query.
  6. Mitigation: Hard limit on table count (e.g., max 4), table-grain metadata, and HITL verification before column pruning.
  7. Wrong Metric Definition:
  8. Symptom: User asks for "spend" or "total sales" but the schema has subtotal, tax, discount, gross_price, and net_price.
  9. Mitigation: Maintain a metric glossary, require the Intent Agent to emit metric assumptions, and ask for clarification when multiple metric definitions match.
  10. Unsafe or Overbroad SQL:
  11. Symptom: Generated SQL uses SELECT *, unbounded date ranges, write statements, or cross-workspace joins.
  12. Mitigation: Run a SQL policy validator before execution, reject write statements, require date limits for large fact tables, and execute only against read-only credentials.

When NOT to use Text-to-SQL

  • High-stakes Financial Audits: Where 100% precision is required without human review.
  • Extremely Wide Tables: Tables with 500+ columns (requires heavy RAG-based column selection first).
  • Non-Relational Complex Joins: When the data resides across multiple incompatible silos (use a Multi-Agent RAG instead).
  • Highly Nested JSON: If the database stores critical business logic inside deeply nested JSONB/JSON columns that require complex extraction paths, LLMs often struggle without specialized "flattening" agents.

Sources / References

Contribution Metadata

  • Last reviewed: 2026-05-06
  • Confidence: high
  • Related Issues: #186