AI Voice Chat with OpenAI, Google Gemini & ElevenLabs

September 1, 2025

Aladuddin Aladin

This workflow enables real-time voice conversations with AI by combining speech-to-text, LLM-based responses, and text-to-speech. It uses OpenAI Whisper for audio transcription, Google Gemini for intelligent contextual replies, and ElevenLabs for natural-sounding voice output.

The template includes memory management to maintain conversation context across turns, ensuring more natural and coherent AI interactions. Perfect for building voice assistants, chatbots, or interactive AI companions.

✨ Features

  • 🎙 Voice Input via Webhook – Accepts user audio messages.
  • 📝 Speech-to-Text (OpenAI Whisper) – Converts spoken input into text.
  • 🧠 Conversation Memory – Maintains chat history for contextual responses.
  • 🤖 Google Gemini LLM – Generates smart, context-aware replies.
  • 🔊 Text-to-Speech (ElevenLabs) – Delivers AI responses in natural voices.
  • 🔄 Webhook Response – Sends back generated audio in real-time.
  • Customizable – Swap ElevenLabs with OpenAI TTS or other providers.

About the author

Alauddin Aladin is an AI Automation expert helping businesses streamline operations, boost productivity, and scale effortlessly using tools like Make.com and n8n. With over a decade of experience in digital systems and automation strategy, Alauddin empowers entrepreneurs to save time and grow smarter through intelligent workflows and AI-driven solutions.

Leave a Comment