Skip to content

Infrastructure

Inference engines, serving stacks, quantisation tools, vector databases, and deployment infrastructure for AI/LLM workloads.

Contents

Tool What it does
Aphrodite Engine Inference engine forked from vLLM for local use
ClawRouter Agent-native routing layer for OpenClaw model selection
ExLlamaV2 Optimized GPTQ/EXL2 inference for consumer GPUs
llama.cpp Lightweight local inference runtime for quantized LLMs
LiteLLM Unified LLM API proxy
LocalAI Self-hosted OpenAI-compatible local inference platform
MLX Apple's array framework for ML on Apple Silicon
OpenPipe Data-driven fine-tuning platform
Ollama Local LLM inference server
SGLang Fast structured generation runtime from LMSYS
Supabase Postgres-first backend platform for app and workflow state
Text Generation Inference (TGI) Hugging Face's production inference server
vLLM High-throughput LLM serving engine (PagedAttention)
ZSE Fast cold-start LLM inference engine

Sub-categories

  • Inference engines — vLLM, TGI, llama.cpp, MLX, etc.
  • Vector databases — Pinecone, Weaviate, Milvus, Qdrant, etc.
  • Serving & routing — Load balancers, model routers, API gateways
  • Quantisation & optimisation — GGUF, GPTQ, AWQ, etc.