Instructor¶
What it is¶
Instructor is a multi-language library (Python, TypeScript, Go, Ruby, etc.) designed specifically for extracting structured data from Large Language Models (LLMs). It uses Pydantic (in Python) and similar schema-validation tools to ensure LLM outputs follow a strict, typed structure.
What problem it solves¶
It solves the "hallucination" and unpredictability problem of LLM outputs. Instead of receiving raw text that might be hard to parse, Instructor ensures you get validated, type-safe objects, automatically handling retries and re-asking the model if the initial output fails validation.
Where it fits in the stack¶
Category: Frameworks / Data Extraction
Typical use cases¶
- Reliable Data Extraction: Converting messy natural language (e.g., customer support emails) into structured database records.
- Type-Safe LLM Integration: Ensuring LLM outputs can be directly used in application logic without complex parsing or regex.
- Quality Gates: Implementing validation rules (e.g., "age must be positive", "response must not contain profanity") that are enforced via LLM retries.
Strengths¶
- Schema-First: Define what you want using standard types (Pydantic, Zod, etc.).
- Automatic Retries: Built-in logic to re-prompt the LLM when validation fails.
- Multi-Provider: Works with OpenAI, Anthropic, Gemini, DeepSeek, Ollama, and many others.
- Lightweight: Focuses on structured output rather than being a full agent orchestration framework.
- Type Inference: Excellent IDE support and autocompletion for extracted data.
Limitations¶
- Narrow Focus: It is not a general-purpose agent framework (like CrewAI or AutoGen); it does structured extraction exceptionally well.
- Schema Dependency: Requires defining formal schemas upfront, which might be overkill for simple text-to-text tasks.
When to use it¶
- When you need reliable, type-safe data extraction from LLMs for use in programmatic workflows.
- If you want a lightweight solution that integrates easily with your existing OpenAI/Anthropic/Gemini client code.
- To enforce complex validation rules and automatic retries on LLM outputs using schema-based validation.
When not to use it¶
- For open-ended creative writing or chat where a strict schema is not necessary or possible.
- If you need a comprehensive framework for managing complex multi-agent conversations and memory (consider LangGraph or CrewAI).
Getting started¶
Installation (Python)¶
pip install instructor
Basic Extraction Example¶
import instructor
from pydantic import BaseModel
from openai import OpenAI
class User(BaseModel):
name: str
age: int
# Patch the client to add Instructor functionality
client = instructor.from_provider(OpenAI())
user = client.chat.completions.create(
model="gpt-4o",
response_model=User,
messages=[{"role": "user", "content": "Jason is 25 years old."}],
)
print(user.name) # "Jason"
print(user.age) # 25
Related tools / concepts¶
- PydanticAI
- Vercel AI SDK (uses Zod for similar patterns in TS)
- DSPy
- Structured Output Pattern
- LiteLLM
- Firebase Genkit
- Extraction and Classification
- Date Extraction
Instructor v2 and Latest Features¶
Released in May 2026, Instructor v2 introduced a major internal rewrite focused on a provider-owned architecture. While maintaining backward compatibility, it significantly improved extensibility and type-checking performance.
Key Updates in v2:¶
- Unified Provider Interface: Streamlined
from_provider()function for consistent cross-provider client initialization. - Improved Streaming: Enhanced support for partial responses and real-time list processing.
- Semantic Validation: Built-in support for validating LLM outputs against subjective criteria using LLM-based evaluators.
- llms.txt Support: Adoption of the
llms.txtstandard to make documentation instantly readable by AI agents.
Example: Semantic Validation¶
from instructor import SemanticValidator
from pydantic import BaseModel, Field, BeforeValidator
from typing import Annotated
class Response(BaseModel):
answer: Annotated[
str,
BeforeValidator(SemanticValidator(openai_client=client, statement="The answer must be polite and helpful"))
]
# If the LLM generates a rude response, Instructor will automatically retry
Sources / references¶
Contribution Metadata¶
- Last reviewed: 2026-06-06
- Confidence: high