Building a Real Voice Assistant with Tools
By Neel Vora
This post walks through how I built a Real Voice Assistant with Tools, and where it fits in the rest of my work.
Building voice-driven characters like Geary, Charleen, and Humphrey for museums taught me how critical voice UX is in high-traffic public spaces - lessons that shaped this assistant.
I wanted to build something more than a chat box. I wanted a voice assistant that could actually do things. This post walks through how I designed a full conversational assistant with:
- Real tool calling
- Speech recognition on the client
- Text to speech with multiple voices
- A clean agent loop that supports memory and context
Goals
My goals were simple:
- Build a real assistant that feels responsive and natural
- Support real actions like fetching weather, math, time zones, and search
- Use a simple tool calling architecture that is easy to extend
Architecture
The system is split into three layers:
- Frontend for speech recognition and UI events
- Tool router that handles function calls from the model
- Backend conversation engine powered by OpenAI and a lightweight memory system
Speech input
I used the browser SpeechRecognition API with fallback to manual text. The assistant starts listening when you press the mic button.
Tool calling
The tools I implemented:
- Weather
- Search
- Calculator
- Timezones
- Session notes
The model chooses the tool. My backend routes it to a handler.
TTS
OpenAI TTS provides natural voices. I allow the user to choose.
UI design
I aimed for clarity:
- Big mic button
- Transcript preview
- Tool call badges
- Clean message bubbles
What this project does
This project demonstrates that I understand:
- Agent design
- Tool calling ergonomics
- Real time UX
- Speech interfaces
And I plan to extend it with streaming audio output.
Keep exploring
From here you can:
-
See how I applied similar patterns on the
Thanks for reading! If you found this useful, check out my other posts or explore the live demos in my AI Lab.