Instructor¶
What it is¶
Instructor is a multi-language library (Python, TypeScript, Go, Ruby, etc.) designed specifically for extracting structured data from Large Language Models (LLMs). It uses Pydantic (in Python) and similar schema-validation tools to ensure LLM outputs follow a strict, typed structure.
What problem it solves¶
It solves the "hallucination" and unpredictability problem of LLM outputs. Instead of receiving raw text that might be hard to parse, Instructor ensures you get validated, type-safe objects, automatically handling retries and re-asking the model if the initial output fails validation.
Where it fits in the stack¶
Category: Frameworks / Data Extraction
Typical use cases¶
- Reliable Data Extraction: Converting messy natural language (e.g., customer support emails) into structured database records.
- Type-Safe LLM Integration: Ensuring LLM outputs can be directly used in application logic without complex parsing or regex.
- Quality Gates: Implementing validation rules (e.g., "age must be positive", "response must not contain profanity") that are enforced via LLM retries.
Strengths¶
- Schema-First: Define what you want using standard types (Pydantic, Zod, etc.).
- Automatic Retries: Built-in logic to re-prompt the LLM when validation fails.
- Multi-Provider: Works with OpenAI, Anthropic, Gemini, DeepSeek, Ollama, and many others.
- Lightweight: Focuses on structured output rather than being a full agent orchestration framework.
- Type Inference: Excellent IDE support and autocompletion for extracted data.
Limitations¶
- Narrow Focus: It is not a general-purpose agent framework (like CrewAI or AutoGen); it does structured extraction exceptionally well.
- Schema Dependency: Requires defining formal schemas upfront, which might be overkill for simple text-to-text tasks.
Getting started¶
Installation (Python)¶
pip install instructor
Basic Extraction Example¶
import instructor
from pydantic import BaseModel
from openai import OpenAI
class User(BaseModel):
name: str
age: int
# Patch the client to add Instructor functionality
client = instructor.from_provider(OpenAI())
user = client.chat.completions.create(
model="gpt-4o",
response_model=User,
messages=[{"role": "user", "content": "Jason is 25 years old."}],
)
print(user.name) # "Jason"
print(user.age) # 25
Related tools / concepts¶
- PydanticAI
- Vercel AI SDK (uses Zod for similar patterns in TS)
- DSPy
- Structured Output Pattern
Sources / references¶
Contribution Metadata¶
- Last reviewed: 2026-05-08
- Confidence: high