VECTOR — Classified Technical Brief

Architecture

What lives on the drive.

Three layers. Runtime, model, memory. Everything the system needs to think, remember, and respond. Nothing else.

vector@usb:~ — filesystem

❯ tree /VECTOR /VECTOR ├── runtime/ # ollama binaries — win64 + mac-arm + linux-x64 ├── models/ # mistral-7b-instruct.Q4_K_M.gguf (~4.1GB) ├── memory/ │ ├── profile.json # who you are — editable, yours │ ├── sessions/ # full conversation history per session │ └── vectors/ # chromadb persistent store — semantic memory ├── app/ │ ├── vector.py # core runtime — chat loop + memory pipeline │ └── launcher.py # os detection + process management ├── start.sh # mac / linux entry point └── start.bat # windows entry point ❯

Layer 01 — Runtime

Ollama

Self-contained binary per OS. Sets OLLAMA_MODELS env var to USB path. Starts local REST API on localhost:11434. Zero system install required.

portable binary

Layer 02 — Model

Mistral 7B Q4

4-bit quantized GGUF. ~4GB on disk. Loads to host RAM at startup. CPU-only inference — runs on any machine with 8GB+ RAM.

gguf format

Layer 03 — Memory

ChromaDB

Persistent vector store writing directly to /memory/vectors/. Semantic retrieval. Survives every unplug. Gets denser with every session.

persistent

Memory System

Two-tier memory. Like a brain, but on a stick.

Not fake context stuffing. Real semantic long-term memory that survives unplugging, machine changes, and months between sessions.

Tier 1 — In-context

Short-term memory

Full current session history passed with every message. Coherent multi-turn reasoning within a session. Up to 32k tokens depending on model.

Tier 2 — Vector store

Long-term memory

After every exchange, VECTOR calls itself to extract memorable facts. Embeds and writes to ChromaDB. Retrieved semantically on future sessions — even months later.

vector@usb:~ — extracted memories (session_2026-03-20)

❯ cat memory/sessions/2026-03-20_extracted.json [ "User is Arjun. Prefers first name. Works in Bengaluru.", "Building RAG pipeline with LangChain + ChromaDB on side project.", "Hates verbose responses. Wants bullet points. Asks for code fast.", "Uses uv instead of pip. Python 3.12. MacBook M2.", "Interested in building a portable AI on a USB drive." ] # stored to vectors/. retrievable semantically. permanent. ❯

Memory pipeline — per message

user input

→

semantic recall

→

inject top-5 memories

→

model responds

→

extract facts

→

write to vector store

Execution Flow

Five commands. Any machine. Under 30 seconds.

INSERT DRIVE

USB-A or USB-C. Windows auto-surfaces in Explorer. Mac mounts to /Volumes. Linux mounts or sudo mount. The drive is fully self-contained from this moment forward. Internet not consulted.

EXECUTE LAUNCHER

Double-click start.bat on Windows. Run ./start.sh on Mac/Linux. Launcher detects OS, sets OLLAMA_MODELS to the USB path, fires the Ollama server process pointing entirely at on-drive weights. ~15s on USB 3.1+.

IDENTITY LOADED

VECTOR reads profile.json, retrieves the last session summary, and greets you with context. "Hey Arjun — we left off on your RAG pipeline Tuesday. Continue?" That's not a gimmick. That's ChromaDB.

RUNTIME ACTIVE

Every message fires semantic recall, injects memories, generates a response, then silently extracts new facts in the background. You just talk. It accumulates. Over time it knows your stack, your projects, your opinions, your shortcuts.

SLEEP MODE

Type 'exit'. Ollama shuts down cleanly. All memory already written — ChromaDB is synchronous. Pull the drive. VECTOR goes dormant. Carries everything to the next machine, the next city, the next session.

Deployment Scenarios

Standard operational use cases.

Dev environment

Knows your stack

VECTOR knows you're on Python 3.12, FastAPI, deploying to Fly.io, and that you hate ORMs. No re-explaining your context every single session. Ask the question. Get the answer that fits.

Writing companion

Knows your voice

50 conversations in — it knows your sentence length, your second-person default, your aversion to em dashes. No style guide. No prompt engineering. Just learned.

Knowledge base

Remembers everything you told it

Ask "what did I decide about the auth system?" and it knows — because you mentioned it three sessions ago and it stored the fact. Not RAG on your documents. RAG on your mind.

Study system

Tracks your gaps

Studying for a cert or going through a textbook? VECTOR tracks what confused you last time, what you got right, what to drill next. Adaptive learning with no app subscription.

// CLASSIFIED — extreme deployment scenarios

The unhinged use cases. All technically feasible.

These are the ideas that make you go "wait, that's actually possible?" — most of them are. Some of them are already being imagined by people who need them.

◈

// THE JOURNALIST'S BURNER BRAIN

Investigative reporter in an authoritarian country. VECTOR on an encrypted USB. Interviews analyzed, sources protected, story drafted — all offline, all local. If seized, drive wiped in seconds. No cloud logs. No API calls to subpoena. The AI cannot testify against you because it has never touched a network.

veracrypt layer zero network plausible deniability

◈

// THE DEAD ZONE DOCTOR

Field clinic in a disaster zone. Zero connectivity. VECTOR pre-loaded with WHO protocols, drug interaction tables, and surgical procedure guides embedded as vector documents. Field surgeon asks "patient has X and Y — contraindicated meds?" Gets an answer in 3 seconds from a local model. Not replacing clinical judgment. Augmenting it under pressure with no wifi for 400 miles.

WHO database offline medicine zero latency

◈

// THE DIGITAL TWIN

Run VECTOR every day for two years. It knows how you think across every domain. Set a system prompt: "respond as me." Now you have a synthetic self — not a chatbot pretending, but a model shaped by 2 years of your actual reasoning patterns. Let it answer emails while you sleep. Write first drafts that sound like you. The existential crisis is free of charge.

synthetic identity autonomous output existential risk

◈

// THE LONG-HAUL COMPANION

Solo sailor on a 6-month Pacific crossing. Antarctic research station. Moon base. VECTOR on a ruggedized drive. No Starlink budget. No comms. An intelligent companion that remembers your conversations from Day 1, watches mental health patterns across months, acts as therapist, navigator, logbook assistant — zero uplink required for any of it.

extreme isolation long-term memory mental health

◈

// THE INDUSTRIAL WHISPERER

A 40-year-old SCADA system. The one engineer who understood every quirk just retired. His knowledge: gone. Now imagine VECTOR briefed by that engineer for 2 years before he left — every undocumented behavior, every tribal workaround. New operators plug it in. "Why does Tank 4 alarm at 3am on cold nights?" VECTOR knows. Because it was told. Institutional memory on a stick.

institutional memory offline industrial legacy systems

◈

// THE TIME CAPSULE

Spend one year briefing VECTOR about your life — beliefs, reasoning, relationships, fears, inside jokes. Seal the drive. Give it to your kid with instructions: open in 20 years. Not a video. Not a letter. An AI they can have an actual conversation with. Ask questions to. Argue with. The most intimate thing you could possibly leave behind. This one isn't a hack. It's just deeply human.

legacy digital afterlife intergenerational

■ MAXIMUM THREAT LEVEL — theoretical

// THE MESH

What if drives talked to each other? A peer-to-peer mesh of VECTORs — different users, fully opt-in — exchanging anonymized memory fragments over a local LAN or encrypted relay. You learn something useful. My VECTOR learns it too. No central server. No platform owner. No API key. Distributed collective intelligence where nobody is the product. The Fediverse, but for personal AI memory. This is either the future or a terrible idea. We need to build it to find out.

Spec	Minimum	Recommended
Capacity	32 GB	128 GB
Standard	USB 3.0	USB 3.2 Gen 2
Read speed	100 MB/s	400+ MB/s
Form factor	USB-A	USB-C + adapter
Picks	—	SanDisk Extreme Pro, Samsung T7

RAM	Model	Performance
8 GB	Phi-3 Mini	Functional. Slow.
16 GB	Mistral 7B	Sweet spot.
32 GB	Llama 3 13B	Strong reasoning.
64 GB+	Llama 3 70B	Near GPT-4.

Development Roadmap

From "it works on my machine" to "it ships."

v0.1

MVP — it talks, it runs, it doesn't crash

Ollama + Mistral 7B // basic ChromaDB // CLI chat loop // profile.json // Mac + Linux

NOW

v0.2

Real memory — it remembers you across sessions

LLM-powered fact extraction // semantic retrieval // multi-session injection // deduplication

SOON

v0.3

Truly portable — anyone can run it

PyInstaller bundle // Windows support // GUI launcher // model downloader wizard