Home Lab Hardware Reference¶

A hardware-scoped reference for a two-machine home lab stack. Each machine has a distinct role determined by its compute profile: the TrueNAS server handles persistent services, GPU-accelerated workloads, and inference at smaller model sizes; the MacBook Pro M5 handles large-model development inference, taking advantage of Apple Silicon's unified memory architecture.

Hardware inventory¶

	TrueNAS Server	MacBook Pro M5
CPU	12 cores	Apple M5
RAM	96 GB DDR4	48 GB unified
GPU	RTX 4060 8 GB VRAM	Metal (unified)
Storage	80 TB ZFS	SSD
Ollama endpoint	`192.168.0.5:30068`	`localhost:11434`
Primary role	Persistent services, NVENC, 3-8B LLMs, image gen	Large-model inference, development

RTX 4060 8 GB — practical limits¶

The RTX 4060 has 8 GB VRAM. This is enough for many useful GPU workloads but rules out running large LLMs in fp16.

Workload	Recommended tool	Config / model	Fits?
LLM inference	Ollama / llama.cpp	3-8B models, Q4_K_M or Q5_K_S GGUF	✅ Comfortable
LLM inference	Ollama	7-8B Q4_K_M (e.g. Llama 3.2 8B, Qwen2.5 7B)	✅ ~4-5 GB
LLM inference	Ollama / vLLM (AWQ)	13-14B AWQ 4-bit	⚠️ Tight (~7-8 GB)
LLM inference	Any	13B+ fp16 / 30B+ any	❌ Exceeds VRAM
Video transcoding	Jellyfin NVENC	Any resolution, H.264/H.265/AV1	✅ Dedicated encoder
Audio transcription	Whisper large-v3	Faster-Whisper batched, FP16	✅ ~4-5 GB
Image generation	ComfyUI	SD 1.5, SDXL (fp16)	✅ 4-8 GB
Image generation	ComfyUI	Flux-schnell (fp8/nf4)	⚠️ Tight, use `--lowvram`
Image generation	ComfyUI	Flux-dev (fp16)	❌ Needs 16-24 GB
Embeddings	Ollama / LocalAI	nomic-embed-text, mxbai-embed	✅ <1 GB

Key rule for LLMs on the RTX 4060: Use Q4_K_M or Q5_K_S GGUF quantization. The model weight footprint at Q4_K_M is approximately (parameters × 0.5 GB) — so a 7B model needs ~3.5-4 GB, leaving headroom for KV cache.

M5 48 GB — practical limits¶

Apple Silicon's unified memory architecture means all 48 GB is addressable by the Metal GPU backend. There is no separate VRAM pool. This makes the M5 a better local LLM host than any single consumer NVIDIA GPU for models in the 30-40B range.

Workload	Recommended tool	Config / model	Fits?
LLM inference	Ollama (Metal)	Up to ~40B Q4_K_M	✅ Comfortable
LLM inference	MLX	Qwen3.5 32B, Llama 3.3 70B Q4	✅ 22-40 GB
LLM inference	LM Studio	Any GGUF ≤ 40B	✅ GUI with Metal
Fine-tuning (LoRA)	MLX / Unsloth	7-13B models	✅ Sufficient RAM
Image generation	ComfyUI (`--use-pytorch-mps`)	SDXL, Flux-dev	✅ All 48 GB available
Audio transcription	mlx-whisper	large-v3 via CoreML	✅ Fast
Agentic development	Ollama + LM Studio	Any model ≤ 40B	✅ Ideal dev machine
70B models	Ollama (Metal)	Q4_K_M (~40 GB)	⚠️ Tight, leaves ~8 GB
70B fp16	Any	140 GB needed	❌ Not viable

Model size quick reference¶

Model size	RTX 4060 8 GB	M5 48 GB	Recommended format
1-3B	✅ Comfortable	✅ Comfortable	GGUF Q8 or MLX
7-8B	✅ Q4/Q5 only	✅ Comfortable	GGUF Q4_K_M or MLX 4-bit
13-14B	⚠️ Q4 tight	✅ Comfortable	GGUF Q4_K_M or MLX 4-bit
30-34B	❌ Not viable	✅ Comfortable	MLX 4-bit or GGUF Q4
70B	❌ Not viable	⚠️ Q4 tight (~40 GB)	MLX 4-bit
70B+	❌ Not viable	❌ Not viable	Multi-GPU / cloud

Workload routing guide¶

Run on TrueNAS (RTX 4060): - Persistent services (Jellyfin, Whisper, Immich, n8n, Paperless-ngx) - Small/medium LLM inference (up to 8B) available 24/7 via Ollama - NVENC video transcoding — does not consume VRAM, runs in parallel with LLMs - Local image generation (ComfyUI with SDXL/Flux-schnell) - All background automation and document processing

Run on MacBook M5: - Large-model development sessions (30-40B models) - Interactive agent development with LM Studio or Ollama - MLX fine-tuning experiments - Any task requiring a model too large for the RTX 4060

Unify both endpoints with LiteLLM:

model_list:
  - model_name: "fast"
    litellm_params:
      model: "ollama/qwen2.5:7b"
      api_base: "http://192.168.0.5:30068"
  - model_name: "large"
    litellm_params:
      model: "ollama/qwen3.5:32b"
      api_base: "http://localhost:11434"

Point all agents at the LiteLLM proxy and route by model alias.

Ollama — LLM serving on both machines; TrueNAS GPU passthrough documented there
Jellyfin — NVENC hardware transcoding configuration
Whisper — GPU benchmark table (RTX 3060 → RTX 4090 reference points)
ComfyUI — Local image generation; hardware requirements table
MLX — Apple Silicon inference framework for M5
LM Studio — GUI LLM client with Metal backend for M5
Local LLMs — Model hardware requirements and backend comparison
LiteLLM — Proxy to unify TrueNAS + MacBook endpoints
Immich — Photo management with ML features on TrueNAS

Sources / references¶

Contribution Metadata¶

Last reviewed: 2026-06-08
Confidence: high