Skip to content

Instructor

What it is

Instructor is a multi-language library (Python, TypeScript, Go, Ruby, etc.) designed specifically for extracting structured data from Large Language Models (LLMs). It uses Pydantic (in Python) and similar schema-validation tools to ensure LLM outputs follow a strict, typed structure.

What problem it solves

It solves the "hallucination" and unpredictability problem of LLM outputs. Instead of receiving raw text that might be hard to parse, Instructor ensures you get validated, type-safe objects, automatically handling retries and re-asking the model if the initial output fails validation.

Where it fits in the stack

Category: Frameworks / Data Extraction

Typical use cases

  • Reliable Data Extraction: Converting messy natural language (e.g., customer support emails) into structured database records.
  • Type-Safe LLM Integration: Ensuring LLM outputs can be directly used in application logic without complex parsing or regex.
  • Quality Gates: Implementing validation rules (e.g., "age must be positive", "response must not contain profanity") that are enforced via LLM retries.

Strengths

  • Schema-First: Define what you want using standard types (Pydantic, Zod, etc.).
  • Automatic Retries: Built-in logic to re-prompt the LLM when validation fails.
  • Multi-Provider: Works with OpenAI, Anthropic, Gemini, DeepSeek, Ollama, and many others.
  • Lightweight: Focuses on structured output rather than being a full agent orchestration framework.
  • Type Inference: Excellent IDE support and autocompletion for extracted data.

Limitations

  • Narrow Focus: It is not a general-purpose agent framework (like CrewAI or AutoGen); it does structured extraction exceptionally well.
  • Schema Dependency: Requires defining formal schemas upfront, which might be overkill for simple text-to-text tasks.

Getting started

Installation (Python)

pip install instructor

Basic Extraction Example

import instructor
from pydantic import BaseModel
from openai import OpenAI

class User(BaseModel):
    name: str
    age: int

# Patch the client to add Instructor functionality
client = instructor.from_provider(OpenAI())

user = client.chat.completions.create(
    model="gpt-4o",
    response_model=User,
    messages=[{"role": "user", "content": "Jason is 25 years old."}],
)

print(user.name) # "Jason"
print(user.age)  # 25

Sources / references

Contribution Metadata

  • Last reviewed: 2026-05-08
  • Confidence: high