Build it by voice.
Watch your AI
drive it
.
Voice Mirror is a voice-native IDE. Describe what you want, watch it render live in the App Preview, and the in-app AI sees and drives the running app — clicking, typing, and reading it to fix its own work.
unsigned alpha build — SmartScreen: “More info” → “Run anyway” · all releases
one minute — the whole loop, unmute for sound
This is Voice Mirror
A full IDE, a live App Preview, and a voice-driven AI agent — in one native Rust window.
The loop
voice → build → see → fix
- 01 / voice
Say what you want
Wake word, push-to-talk, or call mode. Describe the app or the change in plain words.
- 02 / build
It writes & runs it
The AI edits the code, starts the dev server, and launches your app live in the App Preview.
- 03 / see
You both watch it run
One live surface, true to size. The AI reads the same running app you're looking at.
- 04 / fix
It drives & repairs
It clicks and types through the app, catches what's broken, and fixes the code — then speaks back.
One surface. Two engines.
Voice Mirror exposes every running app as the same accessibility tree of @ref handles. The AI clicks and types by ref — and never has to know whether it's driving a web app or a native window.
Web, Tauri & Electron
CDPStreamed live over the Chrome DevTools Protocol — the same channel that drives Voice Mirror's browser automation.
Native Windows apps
UI AutomationNotepad, Calculator, Settings, Win32/WPF/Qt — driven through UI Automation with the very same tools and refs.
$ same tools · same @refs · the AI can't tell which engine is underneath — Windows-first during alpha
A full agent, not a chat box
The see-and-drive loop rides on a complete desktop agent: it hears you, sees your screen, runs your terminal, and remembers across sessions.
Voice control
Wake word ("Hey Claude"), push-to-talk, or always-on call mode. Local Whisper in, local Kokoro out — Edge voices when you're online.
Built-in IDE
CodeMirror editor, file tree, command palette, language-server smarts, integrated terminals, and a dev-server manager.
Screen awareness
"What's this error on my screen?" — it captures your display and reasons about whatever you're looking at.
Browser automation
Real CDP control — navigate, click, fill, screenshot, read the DOM. Not "search and summarize."
Persistent memory
A three-tier memory with hybrid semantic + keyword search. Preferences and project context carry across sessions.
45 MCP tools
Five groups — core, capture, memory, browser, n8n — loaded on demand for any MCP-aware agent.
Any model. Switch without restart.
Run Claude Code for agentic coding, OpenCode to unlock 75+ models, or a local LLM for full privacy. CLI agents and cloud APIs, all behind one interface.
Rust under the hood. CPU or GPU.
The whole backend — windowing, voice pipeline, MCP server — is native Rust on Tauri 2. No Electron, no Node runtime, no Python. The voice stack runs on CPU and works fully offline — and taps your GPU when it's there to make dictation and speech instant.
Ready to build by voice?
Voice Mirror is free and open source, in active alpha on Windows. Grab the installer, or join the Discord and build along.
unsigned alpha installer — SmartScreen: “More info” → “Run anyway” · nightly channel