Skip to content

Paperless-ngx

What it is

Paperless-ngx is a community-supported document management system (DMS) that transforms your physical documents into a searchable online archive.

What problem it solves

It eliminates paper clutter by providing a central, digital repository for all your documents. It handles OCR (Optical Character Recognition) automatically, making scanned PDFs and images full-text searchable, and uses machine learning to suggest tags, correspondents, and document types.

Where it fits in the stack

Category: Services / Document Management. It serves as the primary "Intake & Storage" layer for scanned and digital documents in a homelab or small office.

Typical use cases

  • Digitizing household bills, receipts, and medical records.
  • Storing and indexing technical manuals and whitepapers.
  • Managing a paperless office with automated tagging and classification.
  • Providing a searchable backend for AI agents to query household data.

Strengths

  • Automated OCR: High-quality text extraction from images and PDFs.
  • Machine Learning: Learns your tagging patterns over time.
  • Searchable Archive: Fast full-text search with support for complex queries.
  • Flexible Ingestion: Supports consumption folders, email polling, and a REST API.

Limitations

  • Hardware Intensive: OCR can be CPU intensive, especially for large backlogs.
  • Complexity: Setting up reliable email ingestion or custom workflows requires some configuration.

When to use it

  • When you want to go paperless and need a robust way to organize scanned documents.
  • When you want to host your own document archive privately.

When not to use it

  • For managing highly collaborative real-time documents (use Nextcloud or Google Docs).
  • If you only have a handful of documents and don't need OCR or advanced searching.

Getting started

Installation (Docker Compose)

services:
  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    ports:
      - "8000:8000"
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://redis:6379
      PAPERLESS_DBHOST: db
  db:
    image: postgres:16
    volumes:
      - ./pgdata:/var/lib/postgresql/data
  redis:
    image: redis:7

Hello World (API)

You can ingest a document via the API using curl:

curl -X POST http://localhost:8000/api/documents/post_document/ \
  -H "Authorization: Token your_api_token" \
  -F "document=@/path/to/my_document.pdf" \
  -F "title=My First Document"

CLI examples

paperless-ngx manage document_exporter

Exports all documents and metadata to a directory:

docker exec -it paperless-webserver python3 manage.py document_exporter /usr/src/paperless/export

paperless-ngx manage document_renamer

Renames files based on their current metadata and your storage path template:

docker exec -it paperless-webserver python3 manage.py document_renamer

paperless-ngx manage document_index reindex

Rebuilds the search index, useful after bulk updates or if search seems inconsistent:

docker exec -it paperless-webserver python3 manage.py document_index reindex

API examples

Python (Listing Documents)

import requests

url = "http://localhost:8000/api/documents/"
headers = {"Authorization": "Token your_api_token"}

response = requests.get(url, headers=headers)
documents = response.json()

for doc in documents['results']:
    print(f"ID: {doc['id']}, Title: {doc['title']}, Created: {doc['created']}")

Python (Fetching Document Metadata)

import requests

doc_id = 123
url = f"http://localhost:8000/api/documents/{doc_id}/"
headers = {"Authorization": "Token your_api_token"}

response = requests.get(url, headers=headers)
print(response.json()['content']) # Prints the OCR'd text content

Licensing and cost

  • Open Source: Yes (GPL-3.0)
  • Cost: Free
  • Self-hostable: Yes
  • n8n (For automation workflows)
  • Vikunja (For task management linked to docs)
  • Tika (Underlying content extraction)
  • Nextcloud (Alternative storage)
  • Docspell (Alternative DMS)

Sources / References

Contribution Metadata

  • Last reviewed: 2026-06-05
  • Confidence: high