OpenHuman Ollama Setup — Run Local LLMs for Free, No API Costs

One of OpenHuman's most powerful features is the ability to run entirely offline using local LLMs through Ollama. This gives you complete privacy, zero API costs, and full offline capability.

What is Ollama?

Ollama is a tool that lets you run large language models on your own hardware. It supports dozens of open-source models including Llama 3, Mistral, Qwen, DeepSeek, and many more. When paired with OpenHuman, you get a fully local AI assistant.

Installation

Install Ollama

Download from ollama.com and install:

macOS: Download the .dmg and drag to Applications
Linux: curl -fsSL https://ollama.com/install.sh | sh
Windows: Download and run the installer

Download a Model

Open a terminal and pull a model. For a balance of quality and speed:

ollama pull llama3.2:3b

For better results with more RAM:

ollama pull llama3.1:8b

Configure OpenHuman to Use Ollama

Edit config.toml to point OpenHuman to your local Ollama instance:

[models.ollama] provider = "openai" api_key = "ollama" base_url = "http://localhost:11434/v1" model = "llama3.2" [model_routing] fast_model = "ollama" reasoning_model = "ollama"

Setting Ollama as Your Only Model

If you want to run entirely offline with no API dependency:

[models] [models.ollama] provider = "openai" api_key = "ollama" base_url = "http://localhost:11434/v1" model = "llama3.2" [model_routing] reasoning_model = "ollama" fast_model = "ollama" vision_model = ""

Performance Tips

RAM matters: 3B models run on 4GB RAM, 8B needs 8GB+, 70B needs 32GB+
GPU acceleration: Ollama automatically uses NVIDIA GPUs (CUDA) or Apple Metal
Reduce context: In TokenJuice settings, set max_context_tokens = 8192 for smaller models
Quantized models: Use Q4 or Q5 quantizations for faster inference on consumer hardware

Pros and Cons

Aspect	Local (Ollama)	Cloud API
Cost	Free	Pay per token
Privacy	100% local	Data leaves device
Offline	✅	❌ Requires internet
Speed	Depends on hardware	Fast (GPU clusters)
Quality	Smaller models	Frontier models

Troubleshooting

Connection refused

Make sure Ollama is running (ollama serve). Check that the base_url ports match.

Slow responses

Try a smaller model (3B instead of 8B). Close other applications. Reduce context window in TokenJuice settings.