Tube Archivist¶

What it is¶

Tube Archivist is a self-hosted YouTube archive that allows you to index and download YouTube videos, metadata, and comments to your own server.

What problem it solves¶

YouTube videos can be deleted, made private, or censored at any time. Tube Archivist provides a way to build a permanent, offline, and searchable library of your favorite YouTube content, ensuring you always have access to the information and entertainment you value.

Where it fits in the stack¶

Category: Services / Media Management. It serves as a content preservation layer, sitting alongside tools like Plex or Jellyfin but specialized for YouTube content and metadata.

Typical use cases¶

Automated Subscriptions¶

Tube Archivist can automatically monitor and download new videos from your favorite YouTube channels or playlists.

Add Subscription: Go to the Subscriptions tab and enter the URL of a YouTube channel or playlist.
Configure: Set the download frequency (e.g., daily) and the maximum number of videos to keep.
Automate: Tube Archivist will periodically poll YouTube and download any new content that matches your criteria.
Archiving educational channels or tutorials for offline reference.
Saving high-quality versions of favorite music videos or documentaries.
Building a private "YouTube" experience without ads or tracking.
Keeping a record of comments and metadata for research purposes.

Strengths¶

Comprehensive Metadata: Downloads thumbnails, descriptions, comments, and subtitles.
Powerful Search: Features an integrated Elasticsearch-based search engine for finding content within your archive.
Automation: Can be configured to automatically monitor and download new videos from specific channels or playlists.
Self-Hosted: Full control over your data and hardware.
Metadata Persistence: Supports embedding all indexed metadata directly into media files, enabling index reconstruction from the files themselves.

Limitations¶

Storage Intensive: High-quality video archives can consume terabytes of storage quickly.
Resource Usage: Requires a secondary Redis and Elasticsearch/OpenSearch container, which can be memory-intensive.
Maintenance: Upstream changes to YouTube's API or site layout may occasionally break the downloaders (requiring yt-dlp updates).

When to use it¶

When you want to ensure permanent access to specific YouTube content.
When you want to watch YouTube content without an internet connection.
When you need to search across video descriptions and comments at scale.

When not to use it¶

If you only need to download an occasional video (use a simple CLI tool like yt-dlp).
If you have very limited server resources (RAM/CPU/Storage).
If you prefer a lightweight, single-binary solution without the complexity of multiple containers.

Getting started¶

Installation (Docker Compose)¶

Tube Archivist requires a few companion services (Redis and Elasticsearch).

services:
  tubearchivist:
    container_name: tubearchivist
    restart: always
    image: bbilly1/tubearchivist
    ports:
      - 8000:8000
    volumes:
      - /path/to/media:/youtube
      - /path/to/cache:/cache
    environment:
      - ES_URL=http://archivist-es:9200
      - REDIS_HOST=archivist-redis
      - HOST_UID=1000
      - HOST_GID=1000
      - TA_USERNAME=admin
      - TA_PASSWORD=password
    depends_on:
      - archivist-es
      - archivist-redis

  archivist-redis:
    image: redis/redis-stack-server
    restart: always

  archivist-es:
    image: bbilly1/tubearchivist-es
    restart: always
    environment:
      - "ELASTIC_PASSWORD=verysecret"
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "xpack.security.enabled=true"
      - "discovery.type=single-node"
      - "path.repo=/usr/share/elasticsearch/data/snapshot"
    volumes:
      - /path/to/es:/usr/share/elasticsearch/data

Hello World¶

Start the Docker Compose stack.
Navigate to http://localhost:8000.
Log in with the credentials defined in your environment variables.
Go to Downloads, paste a YouTube URL, and click Index and Download.

Latest Project Status (May 2026)¶

Deno Integration: Tube Archivist now includes the Deno runtime in its container to leverage the latest yt-dlp improvements and workarounds for YouTube's bot detection mechanisms.
Enhanced Metadata Embedding: Supports bulk embedding of indexed metadata into media files as an additional backup. In the event of an index loss, the library can be reconstructed from these embedded tags.
Redownload & Multi-select: Introduced UI improvements for bulk actions and a "Redownload" feature to easily update or fix existing media files.
PO Token Support: Integrates with PO token providers (like bgutil-ytdlp-pot-provider) to mitigate 403 errors and other bot detection issues.

CLI examples¶

Tube Archivist is primarily managed via the web UI, but you can interact with the container for maintenance.

# Force an immediate rescan of the media directory
docker exec tubearchivist python manage.py rescan

# Manually update yt-dlp within the container
docker exec tubearchivist pip install -U yt-dlp

# Check the status of the background task worker
docker exec tubearchivist python manage.py check_worker

# Trigger a scan of channel tabs (shorts, streams, videos)
docker exec tubearchivist python manage.py ta_index_channel_tabs

Advanced: `yt-dlp` Quality Control¶

In the Settings > Application > Download Format section, you can fine-tune how videos are selected and sorted.

Priority for AV1 (Higher efficiency): Pass this to Format Sort:

codec:av1,res,fps,br

Limit resolution to 1080p and avoid premium/experimental formats: Pass this to Format:

bestvideo[height<=1080][ext=mp4]+bestaudio[ext=m4a]/best[height<=1080]/best

Advanced: Metadata Verification¶

If you enable Embed Metadata, Tube Archivist writes a JSON object into the media file's metadata tags. You can verify this using standard tools.

Using ffprobe (to see the ta tag):

ffprobe -v quiet -show_entries format_tags -of json my_video.mp4 \
  | jq -r '.format.tags.ta'

Using Python (mutagen):

import json
from mutagen.mp4 import MP4

video = MP4("video.mp4")
# Tube Archivist metadata is stored in a custom atom
metadata = json.loads(video.tags["----:com.tubearchivist:ta"][0].decode())
print(f"Title: {metadata['title']}")
print(f"Channel: {metadata['channel_name']}")

API examples¶

Python (Get all videos)¶

Tube Archivist provides a REST API. You can find your API key in the web UI settings.

import requests

TA_URL = "http://localhost:8000/api"
HEADERS = {"Authorization": "Token YOUR_API_TOKEN"}

response = requests.get(f"{TA_URL}/video/", headers=HEADERS)
videos = response.json()

for video in videos['results']:
    print(f"Title: {video['title']}, Channel: {video['channel_name']}")

Curl (Trigger a download)¶

curl -X POST -H "Authorization: Token <your_api_token>" \
     -d "url=https://www.youtube.com/watch?v=dQw4w9WgXcQ" \
     "http://localhost:8000/api/download/"

Plex — For streaming your archived content to TV/Mobile.
Jellyfin — Open-source media server.
Audiobookshelf — For managing YouTube podcasts or audio-only archives.
Changedetection.io — To monitor YouTube channels for changes.
SearXNG — To search for YouTube content privately before archiving.
n8n — For advanced automation of video ingestion.
Home Assistant — For automating notifications about new downloads.
Tailscale — For securely accessing your archive from anywhere.

Backlog¶

[x] Perform quarterly technical freshness audit (May 2026).

Contribution Metadata¶

Confidence: high
Last reviewed: 2026-05-26

Sources / References¶

https://www.tubearchivist.com/
https://docs.tubearchivist.com/
https://github.com/tubearchivist/tubearchivist
https://github.com/yt-dlp/yt-dlp