OpenHuman Guide

Feature Guide

OpenHuman TokenJuice Explained — Save Up to 80% on Token Costs

2026-05-24~7 min read

TokenJuice is OpenHuman's built-in token compression engine. It intelligently reduces the number of tokens sent to LLM APIs, saving you up to 80% on API costs without sacrificing response quality.

How TokenJuice Works

Instead of sending raw context to the API, TokenJuice applies several compression strategies:

  • Semantic deduplication — removes redundant information from memory context
  • Smart summarization — compresses long conversation histories into concise summaries
  • Priority pruning — keeps high-importance memories while trimming less relevant ones
  • Structural compression — optimizes JSON and code formatting without losing meaning

Configuration

TokenJuice is enabled by default. Configure it in config.toml:

[tokenjuice] enabled = true compression_level = "medium"  # low, medium, high, max max_context_tokens = 32000 preserve_recent = true

Compression Levels

  • Low (~20% savings) — minimal compression, best for critical conversations
  • Medium (~50% savings) — balanced compression, recommended default
  • High (~70% savings) — aggressive compression, good for routine queries
  • Max (~80% savings) — maximum compression, for high-volume, low-criticality use

Real-World Savings

A user sending 1000 API requests per day with GPT-4o at default context size would spend approximately $50/day. With TokenJuice at medium compression, the same usage drops to ~$25/day. At max compression: ~$10/day.

When to Adjust TokenJuice

  • Lower compression — when discussing complex projects where every detail matters
  • Higher compression — for casual conversations, quick queries, and routine tasks