Standards and Conventions¶

This document defines the technical standards for tools to interoperate within this automation stack.

Taxonomy¶

The knowledge base uses a stable set of top-level categories. Do not create new top-level sections unless strictly necessary.

Category	Location	What belongs here
AI & Knowledge	`tools/ai_knowledge/`	General AI tools, knowledge management, LLM products
Frameworks	`tools/frameworks/`	Libraries for building LLM apps (LangChain, LlamaIndex, etc.)
Providers	`tools/providers/`	Companies offering LLM APIs or managed AI services
Agents	`tools/agents/`	Agent frameworks and autonomous AI tools
Orchestration	`tools/orchestration/`	Workflow automation, multi-agent routing, pipeline tools
Infrastructure	`tools/infrastructure/`	Inference engines, vector DBs, serving stacks, quantisation
Benchmarking	`tools/benchmarking/`	Eval frameworks, benchmarks, leaderboards
Development & Ops	`tools/development_ops/`	AI-assisted coding tools and IDEs
Patterns	`knowledge_base/patterns/`	Recurring design patterns (RAG, tool calling, routing, etc.)
Playbooks	`playbooks/`	Step-by-step workflow guides

Deduplication Rules¶

One canonical page per tool/framework/provider. All other mentions must link to that canonical page.
Before creating a new page, search the repo for the tool name, URL, and common aliases.
If a source maps to an existing page, update that page rather than creating a new one.
Merge duplicates rather than creating parallel pages.

Source Classification Tags¶

Items in new-sources.md use these tags: tool · framework · provider · paper/article · tutorial/guide · benchmark/eval · infrastructure · analysis

Daily Intake Log Format¶

New-source intake is daily-file based:

Index: docs/new-sources.md
Daily files: docs/new-sources/YYYY-MM-DD.md
Required table header:
Title | URL | Tags | Status | Canonical Page | Notes
Allowed statuses:
new, integrated, duplicate, needs-more-info, low-confidence

Validation is enforced by scripts/validate_new_sources.py in CI.

Naming Conventions¶

Tags (Paperless): kebab-case. Lowercase only. Prefix status tags with s: and category tags with c:.
Workflows (n8n): [Trigger Source] -> [Primary Action]. Example: IMAP -> Paperless Intake.
Prompts: Versioned using SemVer. Store as Markdown files in reference-implementations/llm-prompts/.

Document Lifecycle States¶

Ingested: Raw file received by the system.
OCRed: Searchable layer added (via OCRmyPDF).
Classified: Assigned a document type and category tags.
Actioned: Any extracted tasks/events have been synced to external systems.
Archived: Document is moved to long-term storage or its final tag state.

Minimal Metadata Schema¶

Every document processed by AI should attempt to populate: - extraction_date: ISO8601 of when AI ran. - source_origin: Email, Scan, Webhook. - action_required: Boolean. - due_date: If applicable. - confidence_score: 0.0 to 1.0 (from LLM).

Interoperability¶

Data Format: All cross-tool communication should prefer JSON.
Dates: Always use ISO8601 with UTC offsets.
IDs: Use the internal ID of the source system (e.g. Paperless document_id) in the metadata of the destination system (e.g. GCal event description).

What "Done" Means¶

An automated flow is considered "done" when: 1. The primary action is completed (Event created/Task synced). 2. The source document is updated with a processed or actioned tag. 3. No errors were logged in the orchestration engine (n8n). 4. If a critical failure occurred, a notification was sent to a human review channel.

AI-Authored Documentation Metadata (Required)¶

For AI-authored updates to knowledge pages (docs/tools/, docs/services/, docs/knowledge_base/, docs/architecture/, docs/playbooks/, and docs/reference-implementations/), include:

Last reviewed: ISO date (YYYY-MM-DD)
Confidence: high, medium, or low
Sources / References: at least one URL

Recommended section format:

## Sources / References
- [Official docs](https://example.com)

## Contribution Metadata
- Last reviewed: 2026-02-25
- Confidence: medium

These requirements are enforced by scripts/check_docs_contract.py in pull-request CI.

Multi-Agent KnowledgeOps Contract¶

To ensure consistency when multiple autonomous agents or humans contribute to the knowledge base, the following "KnowledgeOps" contract is enforced:

1. Contribution Metadata¶

Every AI-authored or AI-updated document must include the Contribution Metadata section with Last reviewed, Confidence, and Sources / References.

2. Catalog Consistency¶

New tools must be added to data/all_tools.json and mkdocs.yml before the PR is considered "Done".

3. CI Gates¶

validate_new_sources.py: Ensures daily logs are valid and no duplicate URLs exist.
check_docs_contract.py: Enforces metadata and section requirements.
check_catalog_consistency.py: Syncs data/all_tools.json with the filesystem.

4. No Placeholder Policy¶

Avoid TBD or TODO in merged documents. If information is missing, use a minimal description or skip the section.

Document Extraction Audit Trail¶

To maintain accountability and traceability for AI-processed documents, every extraction result must be logged with the following audit metadata:

llm_provider: e.g., OpenAI, Ollama.
llm_model: The specific model version (e.g., gpt-4o-2024-05-13, llama3.1:8b).
prompt_version: The SemVer version of the prompt used (from reference-implementations/llm-prompts/).
timestamp: ISO8601 UTC.
raw_input_hash: SHA256 of the input text/file to verify against later changes.

This metadata should be stored in the document's metadata (e.g., Paperless-ngx document notes or a dedicated sidecar JSON file).