This workflow enables real-time voice conversations with AI by combining speech-to-text, LLM-based responses, and text-to-speech. It uses OpenAI Whisper for audio transcription, Google Gemini for intelligent contextual replies, and ElevenLabs for natural-sounding voice output.
The template includes memory management to maintain conversation context across turns, ensuring more natural and coherent AI interactions. Perfect for building voice assistants, chatbots, or interactive AI companions.
✨ Features
- 🎙 Voice Input via Webhook – Accepts user audio messages.
- 📝 Speech-to-Text (OpenAI Whisper) – Converts spoken input into text.
- 🧠 Conversation Memory – Maintains chat history for contextual responses.
- 🤖 Google Gemini LLM – Generates smart, context-aware replies.
- 🔊 Text-to-Speech (ElevenLabs) – Delivers AI responses in natural voices.
- 🔄 Webhook Response – Sends back generated audio in real-time.
- ⚡ Customizable – Swap ElevenLabs with OpenAI TTS or other providers.