Voice Assistant with Tools
A conversational AI agent that responds to natural voice commands through browser-based speech recognition. It can search the web, check weather, perform calculations, tell time across timezones, and remember notes. Responses stream back as text and can be read aloud using OpenAI text-to-speech voices.

Overview
I built this voice assistant as a hands-on exploration of OpenAI tool calling and browser speech APIs. The goal was to create something that feels like a real assistant: you speak, it understands, it takes action, and it responds naturally.
The assistant supports five built-in tools: web search, weather lookup, math calculations, timezone-aware time queries, and a simple note-taking system. When you ask a question like "What is the weather in Austin?" or "Remember to call mom tomorrow", the model decides which tool to invoke, executes it, and weaves the result into a conversational response.
How It Works
Speech Recognition
The frontend uses the Web Speech API (SpeechRecognition) to capture voice input directly in the browser. As you speak, interim transcripts appear in real time. Once the browser detects a pause, it finalizes the transcript and sends it to the backend.
Tool Calling
The backend defines each tool as a JSON schema that describes its name, purpose, and parameters. When a user message arrives, I send it to GPT-4o-mini with the tool definitions attached. If the model decides a tool is needed, it returns a structured tool call instead of a plain response. I execute that tool, feed the result back into the conversation, and let the model generate the final answer.
The tools I implemented:
- get_weather: Fetches current conditions for any city using a weather API
- web_search: Runs a web search and returns summarized results
- calculate: Evaluates math expressions safely
- get_time: Returns the current date and time in any timezone
- remember_note: Saves a note to session memory with an optional category
Text-to-Speech
Once the assistant generates a response, it can optionally speak it aloud. I send the response text to OpenAI TTS endpoint and stream the audio back to the browser. Users can choose from six voice options (alloy, echo, fable, onyx, nova, shimmer) in the settings panel.
Streaming Responses
Tech Stack
Attribution
Interested in working together? I'm always open to discussing new projects and opportunities.