Reference Implementation: Paperless Tag Taxonomy¶
What it is¶
A hierarchical tagging system designed for Paperless-ngx that organizes personal and household documents into actionable categories. It balances organizational needs (folders/categories) with workflow states (status/actions).
What problem it solves¶
Flat document storage quickly becomes unmanageable as volume grows. Without a standardized taxonomy, users struggle to find files, and automated agents cannot reliably trigger specific workflows (like paying a bill or extracting a warranty). This taxonomy provides the "semantic hooks" necessary for both humans and machines to navigate the archive.
Where it fits in the stack¶
The taxonomy sits at the Organization/Metadata layer of the document management system. It acts as the primary index used by Search, Automated Workflows (n8n, Python scripts), and AI Agents to filter and process documents.
Typical use cases¶
- Workflow Automation: Moving a document from
inboxtoneeds-actionto trigger a reminder. - Tax Preparation: Quickly retrieving all documents tagged with
Keep-7-yearsorFinance/Bill. - Legacy Preservation: Categorizing scanned physical photos and historical records for long-term archiving.
Strengths¶
- Action-Oriented: Clearly separates "State" (what needs to be done) from "Category" (what the document is).
- Extensible: The
Category/Subcategorypattern allows for infinite growth without breaking existing logic. - Machine-Readable: Simple, consistent naming conventions are easy for LLMs and scripts to parse.
Limitations¶
- Maintenance: Requires discipline to ensure every document is tagged correctly (unless fully automated).
- Tool Support: While ideal for Paperless-ngx, other DMS tools may have different tagging limitations.
When to use it¶
- When setting up a new Paperless-ngx instance.
- When designing automated "Scan-to-Action" pipelines.
- For managing multi-generational family archives.
When not to use it¶
- For extremely small document sets (under 100 files) where a simple search is sufficient.
- If using a DMS that relies entirely on full-text search without robust tagging support.
Core Status Tags¶
inbox: Document just arrived, needs manual or auto review.needs-action: Requires a human to perform a task (e.g. pay bill).processed: Automation has finished its work (e.g. calendar event created).automation-failed: LLM or script hit an error.
Category Tags¶
Admin/Warranty(receipts/consumer protection)Admin/Manual(product manuals/troubleshooting)Finance/BillSchool/CorrespondenceHealth/RecordAdmin/Government
History & Archive Tags¶
History/Family-Record: Letters, journals, family trees.History/Photo-Archive: Scanned physical photos.History/Genealogy: Birth/Death certificates (historic), census records.
Retention Tags¶
Keep-7-years: Tax related.Keep-forever: Birth certificates, deeds.Ephemeral: Coupons, flyers.
Related tools / concepts¶
- Paperless-ngx: The implementation platform for this taxonomy.
- Scan-to-Task Playbook: A workflow that uses these tags to trigger tasks.
- Warranty Extraction: Uses the
Admin/Warrantytag as a trigger. - Manual Metadata Schema: Uses the
Admin/Manualtag. - Webhook Ingestion: How documents and tags enter the system.
- n8n: The engine that processes tags and triggers actions.
- Home Admin Agent Architecture: The "brain" that interacts with the tagged archive.
Sources / References¶
Contribution Metadata¶
- Confidence: high
- Last reviewed: 2026-05-11