AI

ElevenLabs,Twilio,andthefutureofvoiceAIforbusinesses.

What we learned building Okto, our bilingual AI agent.

PKPrashant Khadka·August 2024·5 min read

Why we built Okto

Octoways needed a customer support agent that could handle enquiries in both English and Nepali — without routing every question to a human. The use case was clear. The technology stack, less so.

We evaluated Twilio, ElevenLabs, Azure Cognitive Services, and a handful of regional SIP providers. What we built is Okto — a bilingual voice and text AI agent that handles inbound customer queries, product questions, and meeting scheduling.

What ElevenLabs gets right

Voice quality. The gap between ElevenLabs and every other TTS provider is not subtle. For a customer-facing agent, voice quality is trust. An agent that sounds robotic erodes confidence. An agent that sounds natural builds it.

The integration with language models is clean. You stream text from the LLM into the TTS pipeline, and the latency is low enough that conversation flow feels natural — not like waiting for a response.

Voice AI is not about what it says. It's about whether you believe it.

What still needs work

Bilingual switching mid-conversation remains genuinely hard. Most models handle language detection slowly. Twilio's international call routing has quirks — particularly for inbound calls from certain regions — that require careful configuration.

The space is moving fast. What's hard today will be solved by the time you read this. The principle holds: invest in the voice experience as seriously as the visual one.

Voice is the interface nobody designed for. Start now.

More from the lab →

Let's talk

Ready to add intelligence to your business?

Tell us what you're building. We'll point you to the right arm — or build a new one.

200+
Clients in production
8
AI products shipping
10+
Years engineering
hello@octoways.comSend a project brief

Kathmandu · Replies within 1 business day