OpenHuman Guide

Model Setup

OpenHuman Ollama Setup — Run Local LLMs for Free, No API Costs

2026-05-25~8 min read

One of OpenHuman's most powerful features is the ability to run entirely offline using local LLMs through Ollama. This gives you complete privacy, zero API costs, and full offline capability.

What is Ollama?

Ollama is a tool that lets you run large language models on your own hardware. It supports dozens of open-source models including Llama 3, Mistral, Qwen, DeepSeek, and many more. When paired with OpenHuman, you get a fully local AI assistant.

Installation

Install Ollama

Download from ollama.com and install:

  • macOS: Download the .dmg and drag to Applications
  • Linux: curl -fsSL https://ollama.com/install.sh | sh
  • Windows: Download and run the installer

Download a Model

Open a terminal and pull a model. For a balance of quality and speed:

ollama pull llama3.2:3b

For better results with more RAM:

ollama pull llama3.1:8b

Configure OpenHuman to Use Ollama

Edit config.toml to point OpenHuman to your local Ollama instance:

[models.ollama] provider = "openai" api_key = "ollama" base_url = "http://localhost:11434/v1" model = "llama3.2" [model_routing] fast_model = "ollama" reasoning_model = "ollama"

Setting Ollama as Your Only Model

If you want to run entirely offline with no API dependency:

[models] [models.ollama] provider = "openai" api_key = "ollama" base_url = "http://localhost:11434/v1" model = "llama3.2" [model_routing] reasoning_model = "ollama" fast_model = "ollama" vision_model = ""

Performance Tips

  • RAM matters: 3B models run on 4GB RAM, 8B needs 8GB+, 70B needs 32GB+
  • GPU acceleration: Ollama automatically uses NVIDIA GPUs (CUDA) or Apple Metal
  • Reduce context: In TokenJuice settings, set max_context_tokens = 8192 for smaller models
  • Quantized models: Use Q4 or Q5 quantizations for faster inference on consumer hardware

Pros and Cons

AspectLocal (Ollama)Cloud API
CostFreePay per token
Privacy100% localData leaves device
Offline❌ Requires internet
SpeedDepends on hardwareFast (GPU clusters)
QualitySmaller modelsFrontier models

Troubleshooting

Connection refused

Make sure Ollama is running (ollama serve). Check that the base_url ports match.

Slow responses

Try a smaller model (3B instead of 8B). Close other applications. Reduce context window in TokenJuice settings.