Home Lab Hardware Reference¶
A hardware-scoped reference for a two-machine home lab stack. Each machine has a distinct role determined by its compute profile: the TrueNAS server handles persistent services, GPU-accelerated workloads, and inference at smaller model sizes; the MacBook Pro M5 handles large-model development inference, taking advantage of Apple Silicon's unified memory architecture.
Hardware inventory¶
| TrueNAS Server | MacBook Pro M5 | |
|---|---|---|
| CPU | 12 cores | Apple M5 |
| RAM | 96 GB DDR4 | 48 GB unified |
| GPU | RTX 4060 8 GB VRAM | Metal (unified) |
| Storage | 80 TB ZFS | SSD |
| Ollama endpoint | 192.168.0.5:30068 |
localhost:11434 |
| Primary role | Persistent services, NVENC, 3-8B LLMs, image gen | Large-model inference, development |
RTX 4060 8 GB — practical limits¶
The RTX 4060 has 8 GB VRAM. This is enough for many useful GPU workloads but rules out running large LLMs in fp16.
| Workload | Recommended tool | Config / model | Fits? |
|---|---|---|---|
| LLM inference | Ollama / llama.cpp | 3-8B models, Q4_K_M or Q5_K_S GGUF | ✅ Comfortable |
| LLM inference | Ollama | 7-8B Q4_K_M (e.g. Llama 3.2 8B, Qwen2.5 7B) | ✅ ~4-5 GB |
| LLM inference | Ollama / vLLM (AWQ) | 13-14B AWQ 4-bit | ⚠️ Tight (~7-8 GB) |
| LLM inference | Any | 13B+ fp16 / 30B+ any | ❌ Exceeds VRAM |
| Video transcoding | Jellyfin NVENC | Any resolution, H.264/H.265/AV1 | ✅ Dedicated encoder |
| Audio transcription | Whisper large-v3 | Faster-Whisper batched, FP16 | ✅ ~4-5 GB |
| Image generation | ComfyUI | SD 1.5, SDXL (fp16) | ✅ 4-8 GB |
| Image generation | ComfyUI | Flux-schnell (fp8/nf4) | ⚠️ Tight, use --lowvram |
| Image generation | ComfyUI | Flux-dev (fp16) | ❌ Needs 16-24 GB |
| Embeddings | Ollama / LocalAI | nomic-embed-text, mxbai-embed | ✅ <1 GB |
Key rule for LLMs on the RTX 4060: Use Q4_K_M or Q5_K_S GGUF quantization. The model weight footprint at Q4_K_M is approximately (parameters × 0.5 GB) — so a 7B model needs ~3.5-4 GB, leaving headroom for KV cache.
M5 48 GB — practical limits¶
Apple Silicon's unified memory architecture means all 48 GB is addressable by the Metal GPU backend. There is no separate VRAM pool. This makes the M5 a better local LLM host than any single consumer NVIDIA GPU for models in the 30-40B range.
| Workload | Recommended tool | Config / model | Fits? |
|---|---|---|---|
| LLM inference | Ollama (Metal) | Up to ~40B Q4_K_M | ✅ Comfortable |
| LLM inference | MLX | Qwen3.5 32B, Llama 3.3 70B Q4 | ✅ 22-40 GB |
| LLM inference | LM Studio | Any GGUF ≤ 40B | ✅ GUI with Metal |
| Fine-tuning (LoRA) | MLX / Unsloth | 7-13B models | ✅ Sufficient RAM |
| Image generation | ComfyUI (--use-pytorch-mps) |
SDXL, Flux-dev | ✅ All 48 GB available |
| Audio transcription | mlx-whisper | large-v3 via CoreML | ✅ Fast |
| Agentic development | Ollama + LM Studio | Any model ≤ 40B | ✅ Ideal dev machine |
| 70B models | Ollama (Metal) | Q4_K_M (~40 GB) | ⚠️ Tight, leaves ~8 GB |
| 70B fp16 | Any | 140 GB needed | ❌ Not viable |
Model size quick reference¶
| Model size | RTX 4060 8 GB | M5 48 GB | Recommended format |
|---|---|---|---|
| 1-3B | ✅ Comfortable | ✅ Comfortable | GGUF Q8 or MLX |
| 7-8B | ✅ Q4/Q5 only | ✅ Comfortable | GGUF Q4_K_M or MLX 4-bit |
| 13-14B | ⚠️ Q4 tight | ✅ Comfortable | GGUF Q4_K_M or MLX 4-bit |
| 30-34B | ❌ Not viable | ✅ Comfortable | MLX 4-bit or GGUF Q4 |
| 70B | ❌ Not viable | ⚠️ Q4 tight (~40 GB) | MLX 4-bit |
| 70B+ | ❌ Not viable | ❌ Not viable | Multi-GPU / cloud |
Workload routing guide¶
Run on TrueNAS (RTX 4060): - Persistent services (Jellyfin, Whisper, Immich, n8n, Paperless-ngx) - Small/medium LLM inference (up to 8B) available 24/7 via Ollama - NVENC video transcoding — does not consume VRAM, runs in parallel with LLMs - Local image generation (ComfyUI with SDXL/Flux-schnell) - All background automation and document processing
Run on MacBook M5: - Large-model development sessions (30-40B models) - Interactive agent development with LM Studio or Ollama - MLX fine-tuning experiments - Any task requiring a model too large for the RTX 4060
Unify both endpoints with LiteLLM:
model_list:
- model_name: "fast"
litellm_params:
model: "ollama/qwen2.5:7b"
api_base: "http://192.168.0.5:30068"
- model_name: "large"
litellm_params:
model: "ollama/qwen3.5:32b"
api_base: "http://localhost:11434"
Related tools / concepts¶
- Ollama — LLM serving on both machines; TrueNAS GPU passthrough documented there
- Jellyfin — NVENC hardware transcoding configuration
- Whisper — GPU benchmark table (RTX 3060 → RTX 4090 reference points)
- ComfyUI — Local image generation; hardware requirements table
- MLX — Apple Silicon inference framework for M5
- LM Studio — GUI LLM client with Metal backend for M5
- Local LLMs — Model hardware requirements and backend comparison
- LiteLLM — Proxy to unify TrueNAS + MacBook endpoints
- Immich — Photo management with ML features on TrueNAS
Sources / references¶
- NVIDIA RTX 4060 Specifications
- Apple M5 Chip Overview
- Ollama Hardware Requirements
- llama.cpp VRAM estimation
Contribution Metadata¶
- Last reviewed: 2026-06-08
- Confidence: high