Feature Guide
OpenHuman TokenJuice Explained — Save Up to 80% on Token Costs
2026-05-24~7 min read
TokenJuice is OpenHuman's built-in token compression engine. It intelligently reduces the number of tokens sent to LLM APIs, saving you up to 80% on API costs without sacrificing response quality.
How TokenJuice Works
Instead of sending raw context to the API, TokenJuice applies several compression strategies:
- Semantic deduplication — removes redundant information from memory context
- Smart summarization — compresses long conversation histories into concise summaries
- Priority pruning — keeps high-importance memories while trimming less relevant ones
- Structural compression — optimizes JSON and code formatting without losing meaning
Configuration
TokenJuice is enabled by default. Configure it in config.toml:
[tokenjuice] enabled = true compression_level = "medium" # low, medium, high, max max_context_tokens = 32000 preserve_recent = trueCompression Levels
- Low (~20% savings) — minimal compression, best for critical conversations
- Medium (~50% savings) — balanced compression, recommended default
- High (~70% savings) — aggressive compression, good for routine queries
- Max (~80% savings) — maximum compression, for high-volume, low-criticality use
Real-World Savings
A user sending 1000 API requests per day with GPT-4o at default context size would spend approximately $50/day. With TokenJuice at medium compression, the same usage drops to ~$25/day. At max compression: ~$10/day.
When to Adjust TokenJuice
- Lower compression — when discussing complex projects where every detail matters
- Higher compression — for casual conversations, quick queries, and routine tasks