Project CreatorPersonal ProjectBy Neel Vora

Voice Assistant with Tools

A conversational AI agent that responds to natural voice commands through browser-based speech recognition. It can search the web, check weather, perform calculations, tell time across timezones, and remember notes. Responses stream back as text and can be read aloud using OpenAI text-to-speech voices.

Next.js 16React 19TypeScriptOpenAI APITool CallingSpeech RecognitionText-to-SpeechStreaming
Voice Assistant with Tools project screenshot

Overview

I built this voice assistant as a hands-on exploration of OpenAI tool calling and browser speech APIs. The goal was to create something that feels like a real assistant: you speak, it understands, it takes action, and it responds naturally.

The assistant supports five built-in tools: web search, weather lookup, math calculations, timezone-aware time queries, and a simple note-taking system. When you ask a question like "What is the weather in Austin?" or "Remember to call mom tomorrow", the model decides which tool to invoke, executes it, and weaves the result into a conversational response.

How It Works

Speech Recognition

The frontend uses the Web Speech API (SpeechRecognition) to capture voice input directly in the browser. As you speak, interim transcripts appear in real time. Once the browser detects a pause, it finalizes the transcript and sends it to the backend.

Tool Calling

The backend defines each tool as a JSON schema that describes its name, purpose, and parameters. When a user message arrives, I send it to GPT-4o-mini with the tool definitions attached. If the model decides a tool is needed, it returns a structured tool call instead of a plain response. I execute that tool, feed the result back into the conversation, and let the model generate the final answer.

The tools I implemented:

  • get_weather: Fetches current conditions for any city using a weather API
  • web_search: Runs a web search and returns summarized results
  • calculate: Evaluates math expressions safely
  • get_time: Returns the current date and time in any timezone
  • remember_note: Saves a note to session memory with an optional category

Text-to-Speech

Once the assistant generates a response, it can optionally speak it aloud. I send the response text to OpenAI TTS endpoint and stream the audio back to the browser. Users can choose from six voice options (alloy, echo, fable, onyx, nova, shimmer) in the settings panel.

Streaming Responses

Tech Stack

Next.js 16React 19TypeScriptOpenAI APITool CallingSpeech RecognitionText-to-SpeechStreaming

Attribution

Role:Project Creator
Company:Personal Project

Interested in working together? I'm always open to discussing new projects and opportunities.

Related Projects

👋 Get to Know Me

Learn More About Me

From sound engineering to AI systems. Discover the journey that shaped how I build technology.