AI/ML Engineer - Voice AI / Speech Models
Title: AI/ML Engineer – Voice AI / Speech Models / Real-Time Audio Systems
Tech Stack: Python, PyTorch, GCP, Vertex AI, Cloud Run, STT, TTS, Transformers, Audio ML, Real-Time Inference
What You’ll Do:
Join a fast-scaling AI startup building a next-generation voice AI platform powering real-time enterprise communication systems across industries like banking, fintech, and customer operations.
You will work on the core AI layer behind real-time conversational systems, helping design, train, optimise, and deploy production-grade speech models used in live environments.
This is a highly applied AI role focused on taking models from research into production, with strong ownership across training, evaluation, inference, and deployment.
Key responsibilities include:
- Training and fine-tuning speech models across STT, TTS, and conversational voice AI use cases
- Building scalable ML pipelines for training, inference, and deployment
- Improving latency, speech quality, and real-time performance of voice systems
- Working on modern deep learning architectures for speech and audio understanding
- Deploying production ML systems on GCP using services like Vertex AI and Cloud Run
- Designing evaluation frameworks around accuracy, latency, robustness, and conversational quality
- Collaborating closely with engineering and product teams to ship real-world AI features into production systems
Who They Are:
A well-funded AI startup building advanced voice infrastructure and conversational AI products for enterprise customers across the Middle East.
The company operates at the intersection of speech AI, real-time systems, and enterprise automation, building technology across voice agents, speech infrastructure, workflow automation, and AI-powered communications.
Their products are already being deployed into enterprise environments, and the team is now scaling aggressively across AI research and engineering.
This is an opportunity to join an early but high-growth environment where engineers have direct ownership, fast feedback loops, and real impact on product direction.
Requirements:
- 5+ years of experience building ML or deep learning systems in production
- Strong Python engineering skills
- Hands-on experience with PyTorch or PyTorch Lightning
- Strong understanding of deep learning fundamentals, transformers, optimisation, and sequence models
- Experience deploying ML systems end-to-end in cloud environments
- Comfortable working in startup environments with fast iteration cycles
- Speech AI / Voice AI experience (ASR, STT, TTS, conversational AI)
- Experience with real-time or low-latency inference systems
- Exposure to distributed training or large-scale model optimisation
- Understanding of telephony, VoIP, or communications systems
- Arabic or multilingual speech model experience