Paperless-ngx¶
What it is¶
Paperless-ngx is a community-supported document management system (DMS) that transforms your physical documents into a searchable online archive.
What problem it solves¶
It eliminates paper clutter by providing a central, digital repository for all your documents. It handles OCR (Optical Character Recognition) automatically, making scanned PDFs and images full-text searchable, and uses machine learning to suggest tags, correspondents, and document types.
Where it fits in the stack¶
Category: Services / Document Management. It serves as the primary "Intake & Storage" layer for scanned and digital documents in a homelab or small office.
Typical use cases¶
- Digitizing household bills, receipts, and medical records.
- Storing and indexing technical manuals and whitepapers.
- Managing a paperless office with automated tagging and classification.
- Providing a searchable backend for AI agents to query household data.
Strengths¶
- Automated OCR: High-quality text extraction from images and PDFs.
- Machine Learning: Learns your tagging patterns over time.
- Searchable Archive: Fast full-text search with support for complex queries.
- Flexible Ingestion: Supports consumption folders, email polling, and a REST API.
Limitations¶
- Hardware Intensive: OCR can be CPU intensive, especially for large backlogs.
- Complexity: Setting up reliable email ingestion or custom workflows requires some configuration.
When to use it¶
- When you want to go paperless and need a robust way to organize scanned documents.
- When you want to host your own document archive privately.
When not to use it¶
- For managing highly collaborative real-time documents (use Nextcloud or Google Docs).
- If you only have a handful of documents and don't need OCR or advanced searching.
Getting started¶
Installation (Docker Compose)¶
services:
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
ports:
- "8000:8000"
volumes:
- ./data:/usr/src/paperless/data
- ./media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://redis:6379
PAPERLESS_DBHOST: db
db:
image: postgres:16
volumes:
- ./pgdata:/var/lib/postgresql/data
redis:
image: redis:7
Hello World (API)¶
You can ingest a document via the API using curl:
curl -X POST http://localhost:8000/api/documents/post_document/ \
-H "Authorization: Token your_api_token" \
-F "document=@/path/to/my_document.pdf" \
-F "title=My First Document"
CLI examples¶
paperless-ngx manage document_exporter¶
Exports all documents and metadata to a directory:
docker exec -it paperless-webserver python3 manage.py document_exporter /usr/src/paperless/export
paperless-ngx manage document_renamer¶
Renames files based on their current metadata and your storage path template:
docker exec -it paperless-webserver python3 manage.py document_renamer
paperless-ngx manage document_index reindex¶
Rebuilds the search index, useful after bulk updates or if search seems inconsistent:
docker exec -it paperless-webserver python3 manage.py document_index reindex
API examples¶
Python (Listing Documents)¶
import requests
url = "http://localhost:8000/api/documents/"
headers = {"Authorization": "Token your_api_token"}
response = requests.get(url, headers=headers)
documents = response.json()
for doc in documents['results']:
print(f"ID: {doc['id']}, Title: {doc['title']}, Created: {doc['created']}")
Python (Fetching Document Metadata)¶
import requests
doc_id = 123
url = f"http://localhost:8000/api/documents/{doc_id}/"
headers = {"Authorization": "Token your_api_token"}
response = requests.get(url, headers=headers)
print(response.json()['content']) # Prints the OCR'd text content
n8n (Document Ingestion Workflow)¶
This JSON snippet can be imported into n8n to automate document uploads:
{
"nodes": [
{
"parameters": {
"method": "POST",
"url": "http://paperless:8000/api/documents/post_document/",
"authentication": "genericCredentialType",
"genericAuthType": "httpHeaderAuth",
"sendBinaryData": true,
"binaryPropertyName": "data",
"bodyParametersUi": {
"parameter": [
{
"name": "title",
"value": "={{$node[\"Read File\"].binary.data.fileName}}"
}
]
}
},
"name": "Upload to Paperless",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 3,
"position": [450, 300]
}
]
}
Licensing and cost¶
- Open Source: Yes (GPL-3.0)
- Cost: Free
- Self-hostable: Yes
Related tools / concepts¶
- n8n — Automate document processing and metadata updates.
- Vikunja — Link tasks to relevant documents for workflow management.
- Immich — Manage visual assets alongside document archives.
- Tika — Extract text from complex binary formats before ingestion.
- Authentik — Secure DMS access with SSO and MFA.
- Nextcloud — Synchronize document folders with mobile devices.
- Linkwarden — Archive web pages as PDFs for indexing in Paperless.
Sources / References¶
Backlog¶
- [x] Perform quarterly technical freshness audit.
Contribution Metadata¶
- Confidence: high
- Last reviewed: 2026-05-26