Skip to content

Reference Implementation: Paperless Tag Taxonomy

What it is

A hierarchical tagging system designed for Paperless-ngx that organizes personal and household documents into actionable categories. It balances organizational needs (folders/categories) with workflow states (status/actions).

What problem it solves

Flat document storage quickly becomes unmanageable as volume grows. Without a standardized taxonomy, users struggle to find files, and automated agents cannot reliably trigger specific workflows (like paying a bill or extracting a warranty). This taxonomy provides the "semantic hooks" necessary for both humans and machines to navigate the archive.

Where it fits in the stack

The taxonomy sits at the Organization/Metadata layer of the document management system. It acts as the primary index used by Search, Automated Workflows (n8n, Python scripts), and AI Agents to filter and process documents.

Typical use cases

  • Workflow Automation: Moving a document from inbox to needs-action to trigger a reminder.
  • Tax Preparation: Quickly retrieving all documents tagged with Keep-7-years or Finance/Bill.
  • Legacy Preservation: Categorizing scanned physical photos and historical records for long-term archiving.

Strengths

  • Action-Oriented: Clearly separates "State" (what needs to be done) from "Category" (what the document is).
  • Extensible: The Category/Subcategory pattern allows for infinite growth without breaking existing logic.
  • Machine-Readable: Simple, consistent naming conventions are easy for LLMs and scripts to parse.

Limitations

  • Maintenance: Requires discipline to ensure every document is tagged correctly (unless fully automated).
  • Tool Support: While ideal for Paperless-ngx, other DMS tools may have different tagging limitations.

When to use it

  • When setting up a new Paperless-ngx instance.
  • When designing automated "Scan-to-Action" pipelines.
  • For managing multi-generational family archives.

When not to use it

  • For extremely small document sets (under 100 files) where a simple search is sufficient.
  • If using a DMS that relies entirely on full-text search without robust tagging support.

Core Status Tags

  • inbox: Document just arrived, needs manual or auto review.
  • needs-action: Requires a human to perform a task (e.g. pay bill).
  • processed: Automation has finished its work (e.g. calendar event created).
  • automation-failed: LLM or script hit an error.

Category Tags

  • Admin/Warranty (receipts/consumer protection)
  • Admin/Manual (product manuals/troubleshooting)
  • Finance/Bill
  • School/Correspondence
  • Health/Record
  • Admin/Government

History & Archive Tags

  • History/Family-Record: Letters, journals, family trees.
  • History/Photo-Archive: Scanned physical photos.
  • History/Genealogy: Birth/Death certificates (historic), census records.

Retention Tags

  • Keep-7-years: Tax related.
  • Keep-forever: Birth certificates, deeds.
  • Ephemeral: Coupons, flyers.

Sources / References

Contribution Metadata

  • Confidence: high
  • Last reviewed: 2026-05-11