Models

Harness runs a stack of small, specialized models on your device for screen understanding and memory, and routes to frontier language models for reasoning. The on-device models are what let your screen and your memory stay local.

On-device models

These run in your browser, on WebGPU where available and WASM as a fallback, via transformers.js and onnxruntime-web. Nothing they process leaves your machine.

Role	Model
Vision embedding	CLIP ViT-B/16 for scene-change gating and visual search
OCR	PaddleOCR PP-OCRv5, with an INT8 path for weaker hardware
Dense-text embedding	harness-embedder-v1, fine-tuned by Harness, for semantic screen search
Memory embedding	bge-base-en-v1.5 for semantic recall over your memories
Retrieval reranker	harness-reranker-v1, fine-tuned by Harness, for reranking recall results

Fine-tuned models

Harness trained and evaluated its own retrieval models, and they run in the pipeline today:

harness-embedder-v1 is the dense screen-retrieval embedder, fine-tuned to make semantic screen search sharper than an off-the-shelf embedder.
harness-reranker-v1 is the episodic-recall reranker, fine-tuned to order recall results by what you actually want back.

Both are published on Hugging Face and run on-device like the rest of the stack.

Language models

For language reasoning, Harness routes to frontier models through providers rather than running the large models on your device. You choose how that inference runs: a local model, your own Venice or Bankr key, or managed inference. The heavier deep-agent reasoning path routes to stronger frontier models when a turn calls for it. See The deep-agent brain and Plans.

The deep-agent brain Privacy & your data