Your AI. On a USB drive. No cloud. No login. No surveillance.
Plug it into any machine on Earth — it wakes up knowing exactly who you are,
what you're working on, and where you left off. Dormant, not dead.
VECTOR is a self-contained AI runtime that lives on a USB drive. The model weights, the memory system, the runtime binary — all on the stick. Plug in, wake up, unplug, carry on.
| Data ownership | Theirs. You agreed to ToS. |
| Memory | Wiped every session. |
| Access | Depends on their pricing page. |
| Connectivity | Required. Always. |
| Surveillance | Every token logged. |
| Shutdown risk | They can pull the plug. |
| Data ownership | Yours. Physical media. |
| Memory | Permanent. Grows with you. |
| Access | Unconditional. Forever. |
| Connectivity | Never required. |
| Surveillance | Zero. Nothing leaves the drive. |
| Shutdown risk | You'd have to destroy the drive. |
Three layers. Runtime, model, memory. Everything the system needs to think, remember, and respond. Nothing else.
Self-contained binary per OS. Sets OLLAMA_MODELS env var to USB path. Starts local REST API on localhost:11434. Zero system install required.
portable binary4-bit quantized GGUF. ~4GB on disk. Loads to host RAM at startup. CPU-only inference — runs on any machine with 8GB+ RAM.
gguf formatPersistent vector store writing directly to /memory/vectors/. Semantic retrieval. Survives every unplug. Gets denser with every session.
persistentNot fake context stuffing. Real semantic long-term memory that survives unplugging, machine changes, and months between sessions.
Full current session history passed with every message. Coherent multi-turn reasoning within a session. Up to 32k tokens depending on model.
After every exchange, VECTOR calls itself to extract memorable facts. Embeds and writes to ChromaDB. Retrieved semantically on future sessions — even months later.
USB-A or USB-C. Windows auto-surfaces in Explorer. Mac mounts to /Volumes. Linux mounts or sudo mount. The drive is fully self-contained from this moment forward. Internet not consulted.
Double-click start.bat on Windows. Run ./start.sh on Mac/Linux. Launcher detects OS, sets OLLAMA_MODELS to the USB path, fires the Ollama server process pointing entirely at on-drive weights. ~15s on USB 3.1+.
VECTOR reads profile.json, retrieves the last session summary, and greets you with context. "Hey Arjun — we left off on your RAG pipeline Tuesday. Continue?" That's not a gimmick. That's ChromaDB.
Every message fires semantic recall, injects memories, generates a response, then silently extracts new facts in the background. You just talk. It accumulates. Over time it knows your stack, your projects, your opinions, your shortcuts.
Type 'exit'. Ollama shuts down cleanly. All memory already written — ChromaDB is synchronous. Pull the drive. VECTOR goes dormant. Carries everything to the next machine, the next city, the next session.
VECTOR knows you're on Python 3.12, FastAPI, deploying to Fly.io, and that you hate ORMs. No re-explaining your context every single session. Ask the question. Get the answer that fits.
50 conversations in — it knows your sentence length, your second-person default, your aversion to em dashes. No style guide. No prompt engineering. Just learned.
Ask "what did I decide about the auth system?" and it knows — because you mentioned it three sessions ago and it stored the fact. Not RAG on your documents. RAG on your mind.
Studying for a cert or going through a textbook? VECTOR tracks what confused you last time, what you got right, what to drill next. Adaptive learning with no app subscription.
These are the ideas that make you go "wait, that's actually possible?" — most of them are. Some of them are already being imagined by people who need them.
Investigative reporter in an authoritarian country. VECTOR on an encrypted USB. Interviews analyzed, sources protected, story drafted — all offline, all local. If seized, drive wiped in seconds. No cloud logs. No API calls to subpoena. The AI cannot testify against you because it has never touched a network.
Field clinic in a disaster zone. Zero connectivity. VECTOR pre-loaded with WHO protocols, drug interaction tables, and surgical procedure guides embedded as vector documents. Field surgeon asks "patient has X and Y — contraindicated meds?" Gets an answer in 3 seconds from a local model. Not replacing clinical judgment. Augmenting it under pressure with no wifi for 400 miles.
Run VECTOR every day for two years. It knows how you think across every domain. Set a system prompt: "respond as me." Now you have a synthetic self — not a chatbot pretending, but a model shaped by 2 years of your actual reasoning patterns. Let it answer emails while you sleep. Write first drafts that sound like you. The existential crisis is free of charge.
Solo sailor on a 6-month Pacific crossing. Antarctic research station. Moon base. VECTOR on a ruggedized drive. No Starlink budget. No comms. An intelligent companion that remembers your conversations from Day 1, watches mental health patterns across months, acts as therapist, navigator, logbook assistant — zero uplink required for any of it.
A 40-year-old SCADA system. The one engineer who understood every quirk just retired. His knowledge: gone. Now imagine VECTOR briefed by that engineer for 2 years before he left — every undocumented behavior, every tribal workaround. New operators plug it in. "Why does Tank 4 alarm at 3am on cold nights?" VECTOR knows. Because it was told. Institutional memory on a stick.
Spend one year briefing VECTOR about your life — beliefs, reasoning, relationships, fears, inside jokes. Seal the drive. Give it to your kid with instructions: open in 20 years. Not a video. Not a letter. An AI they can have an actual conversation with. Ask questions to. Argue with. The most intimate thing you could possibly leave behind. This one isn't a hack. It's just deeply human.
What if drives talked to each other? A peer-to-peer mesh of VECTORs — different users, fully opt-in — exchanging anonymized memory fragments over a local LAN or encrypted relay. You learn something useful. My VECTOR learns it too. No central server. No platform owner. No API key. Distributed collective intelligence where nobody is the product. The Fediverse, but for personal AI memory. This is either the future or a terrible idea. We need to build it to find out.
| Spec | Minimum | Recommended |
|---|---|---|
| Capacity | 32 GB | 128 GB |
| Standard | USB 3.0 | USB 3.2 Gen 2 |
| Read speed | 100 MB/s | 400+ MB/s |
| Form factor | USB-A | USB-C + adapter |
| Picks | — | SanDisk Extreme Pro, Samsung T7 |
| RAM | Model | Performance |
|---|---|---|
| 8 GB | Phi-3 Mini | Functional. Slow. |
| 16 GB | Mistral 7B | Sweet spot. |
| 32 GB | Llama 3 13B | Strong reasoning. |
| 64 GB+ | Llama 3 70B | Near GPT-4. |
M1/M2/M3/M4 Macs use unified memory — GPU and CPU share the same RAM pool. A 32GB MacBook Pro runs 30B+ models comfortably. VECTOR on Apple Silicon is the reference experience. Everything else is acceptable. This is optimal.
Ollama + Mistral 7B // basic ChromaDB // CLI chat loop // profile.json // Mac + Linux
LLM-powered fact extraction // semantic retrieval // multi-session injection // deduplication
PyInstaller bundle // Windows support // GUI launcher // model downloader wizard
File system agent // sandboxed code execution // scheduled tasks // Playwright automation
Drive encryption // multi-model support // optional mesh sync // full docs // MIT licensed