Aspen Documentation

Everything you need to run private, local AI on your own machine — install, choose a model, use tools, and connect the developer API.

Last updated 2026-06-25

Overview

Aspen is private AI that runs entirely on your own hardware. Instead of sending your prompts to a company's servers, Aspen runs an open large language model locally and gives you a clean app to chat with it. Nothing leaves your device.

Aspen has three parts that work together: the model (an open LLM such as Llama, Qwen, DeepSeek, or Mistral that runs on your machine), a local gateway (an OpenAI-compatible server on your computer that handles requests, tools, and memory), and the apps (desktop for Mac and Windows, plus a free iPhone app that connects back to your own machine).

There is no cloud and no account. Core chat and coding work fully offline once a model is downloaded.

Install

Mac and Windows

Download the free app from runonaspen.com and open it. There is no terminal or configuration required. On first launch, Aspen detects your hardware and recommends a model to download.

Prefer the command line? One command installs everything and adds Aspen to your apps menu:

curl -fsSL https://runonaspen.com/install.sh | sh

Windows may show a "Windows protected your PC" warning on first run because Aspen is from an independent developer. Click More info, then Run anyway. It is safe.

iPhone

Install "Aspen Local AI" free from the App Store. The phone app connects to the AI running on your own computer, so you can use your private models from anywhere.

Quickstart

Open Aspen. It detects your hardware and suggests a model that fits.
Let the recommended model download (one time). Smaller models download and load faster.
Type a question in the chat box and press enter. The reply streams back, generated on your machine.
Try voice, attach an image, or ask it to write and run code — all locally.

That is the whole setup. Everything after this is optional configuration for power users and developers.

Choosing a model

Aspen runs the latest open models and shows a library you can browse in Settings. Each model lists its size and the memory it needs, and Aspen flags any that may be too large for your machine.

Rough hardware guide

8GB RAM: small models around 3B parameters.
16GB RAM: 7–8B models, a great all-round sweet spot.
32GB RAM: 13–14B models.
64GB+ RAM: 30B+ models for the strongest local quality.

For most people a mid-sized Qwen or Llama model is the best default for chat, tool use, and coding. Aspen can update to a newer, better model automatically as the open ecosystem improves.

Quantized models use less memory and run faster with little quality loss — Aspen uses sensible quantized versions by default.

Chat & artifacts

The chat works like any modern AI assistant: ask questions, brainstorm, write and edit text, or get coding help. Responses stream in real time, generated locally.

When you ask Aspen to build something on the web — a page, a small app, a visualization — it renders a live artifact with a preview panel right in the chat, so you can see and run the result immediately.

Voice

Aspen includes a hands-free voice mode with a natural neural voice. Speak your question and hear the answer back, with speech handled on your machine.

Vision

Attach a photo or screenshot and Aspen reads it with a local vision model. Ask it to describe an image, critique a design, extract text, or explain a chart. The image never leaves your machine.

Tools

Tools let your local model do things beyond chat. Every tool runs on your own computer and uses your own network; nothing is routed through Aspen's servers. Toggle each one in Settings.

Web search — current information from the live web, with the source cited. Runs from your machine and your IP.
Read web page — fetch and read the text of a specific URL.
Run commands — execute shell commands to clone repos, read and write files, and run scripts (works best with larger models).
Download files — fetch a file to work with locally.
Calculator and date/time — quick deterministic helpers.
Git — clone, status, and commit/push helpers.

For real-time questions (weather, news, prices), enable Web Search so the model answers from live results instead of memory.

Memory (World Model)

Aspen can build a "World Model" — a set of facts about you, learned from your conversations, that makes its answers more personal and context-aware. It is stored as a plain file on your own computer.

After each conversation, your local model quietly extracts useful facts (name, job, preferences, projects) and prepends them to new chats so the AI remembers who you are. You can view, edit, or delete any fact at any time.

100% local — these facts never leave your machine and are never sent to any server.

Privacy

Privacy is the whole point. The model runs on your hardware, there is no server in the middle, and your conversations are never transmitted or used for training.

The only time anything touches the network is when you explicitly use an online tool such as web search — and that request goes out from your own machine and IP, not through Aspen. You stay in control.

Developer API

Aspen runs a local gateway that speaks the OpenAI API format. Point any OpenAI-style client at your Aspen endpoint and it works unchanged — your tools now run against your own private AI.

Endpoints

Same machine: http://localhost:4000/v1
From anywhere: a private HTTPS tunnel URL Aspen can generate for you, so your phone and other apps reach your machine securely.

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000/v1",
    api_key="YOUR-ASPEN-KEY",
)

resp = client.chat.completions.create(
    model="local",   # the model name shown in the Aspen app
    messages=[{"role": "user", "content": "Hello from my own machine"}],
)
print(resp.choices[0].message.content)

JavaScript (fetch)

const r = await fetch("http://localhost:4000/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR-ASPEN-KEY",
  },
  body: JSON.stringify({
    model: "local",
    messages: [{ role: "user", content: "Hello" }],
  }),
});
const data = await r.json();
console.log(data.choices[0].message.content);

API key tiers

Owner — full access including computer use and shared memory. Only for devices that are you.
Family / member — its own private memory plus safe tools; no computer use.
Anonymous guest — chat and safe tools only, ephemeral, safe to share widely.

Aspen works with the OpenAI and Anthropic SDKs, LangChain, Cursor, Continue.dev, n8n, Zapier, and similar tools — anything that accepts a custom base URL and key.

Connectors (MCP)

Aspen supports connectors built on the open Model Context Protocol (MCP), letting your local AI work with services like GitHub. Access tokens are encrypted and stay on your device.

The Aspen device

The Aspen device is an optional, dedicated machine for running the largest models around the clock without using your own computer. You never need it to use Aspen — the free app runs well on a modern Mac or PC.

About 1 petaflop of AI performance
128GB unified memory
Runs models up to roughly 200B parameters
Silent and always on
About 5.9" x 5.9" x 2"

It is available by preorder with a $1 deposit.

Troubleshooting

Windows "protected your PC" warning

Normal and safe for a new app from an independent developer. Click More info, then Run anyway. If the download was blocked, right-click the file, choose Properties, check Unblock, then run it.

A model is slow or crashes

The model is probably large relative to your memory. Pick a smaller or more quantized model, close memory-heavy apps, or use a machine with more RAM. Aspen flags models that may be too big for your hardware.

It says it cannot get current information

Enable the Web Search tool in Settings so the model can answer real-time questions from the live web with cited sources.

Ready to own your intelligence?

Try Aspen free in your browser →