Lead DevOps Engineer

Observe AI · Bangalore, India
full-time lead Posted 4 hours ago

About this role

About Us Observe.AI  is the AI Agents platform for customer experience, designed to help organizations deliver faster, smarter, and more efficient customer service at scale. The platform enables businesses to deploy specialized AI agents that autonomously execute work across the full CX lifecycle—from handling customer conversations to supporting frontline teams and optimizing operations. Each AI agent is purpose-built for a specific role, equipped to understand context, make decisions, take action, and continuously improve outcomes. This allows organizations to increase resolution speed, elevate service quality, and reduce operational costs while empowering your frontline team to focus on higher-value work. Built on a CX-native foundation,  Observe.AI  helps leading brands like DoorDash, Affordable Care, Signify Health, and Verida improve customer satisfaction, boost agent productivity, and deliver consistent, scalable performance across every customer interaction. Why Join Us Joining Observe.AI as a Lead DevOps Engineer puts you at the forefront of AI and cloud infrastructure, where you’ll own and scale systems powering real-world customer interactions. You’ll drive high-impact initiatives like GPU orchestration, self-hosting, and low-latency AI deployments while working closely with ML teams to productionize cutting-edge models. With end-to-end ownership, a modern tech stack, and the opportunity to shape MLOps best practices, this role offers strong technical leadership, tangible business impact, and accelerated growth in a fast-scaling AI company. What you’ll be doing Manager Self-Hosting tools: Lead the transition from managed services to self-hosted Elastic search, Prometheus, and other critical infrastructure components to optimize performance and cost. Optimize AI Infrastructure: Work closely with ML engineers and data scientists to efficiently deploy and scale AI/ML models, ensuring high availability and low-latency inference. Infrastructure Scalability & Reliability: Design and implement scalable, fault-tolerant systems capable of handling large-scale AI workloads, distributed training, and high-throughput data pipelines. Technology Evaluation & Implementation: Continuously assess and introduce new technologies to enhance automation, reliability, and security in AI model deployment and training pipelines. CI/CD for AI Workflows: Enhance and automate ML model deployment pipelines using MLOps best practices and tools like Kubeflow, MLflow, and Argo Workflows. Observability & Monitoring: Implement and enhance monitoring, logging, and alerting strategies using Prometheus, Grafana, ELK, OpenTelemetry, etc., tailored for AI workloads. Security Best Practices: Implement security measures for AI data pipelines, model storage, and cloud infrastructure. Mentorship & Best Practices: Set high standards by implementing best practices in DevOps and MLOps, mentoring team members to raise the technical bar. What you bring to the role 6+ years of experience in DevOps, SRE, or Cloud Infrastructure roles, preferably in AI or data-intensive environments. Strong expertise in Kubernetes (EKS, AKS preferred ) for deploying AI workloads and managing GPU & non-CPU clusters. Experience with self-hosting services like Elasticsearch, Prometheus, Grafana, Kafka, etc. Hands-on expertise in Infrastructure as Code (Terraform, CloudFormation). Deep understanding of cloud platforms (AWS, Azure, GCP) and AI-focused services like AWS Sagemaker, Vertex AI, or Azure ML. Strong automation and scripting skills in Python, Bash, or Go. Experience in CI/CD tools (Jenkins, GitHub Actions, ArgoCD, etc.) with a focus on AI model deployment. Strong leadership and mentorship skills to guide DevOps and ML teams. FinOps expertise for optimizing GPU and AI cloud compute costs. Familiarity with service meshes (Istio, Linkerd) and API gateways. Knowledge of compliance frameworks (SOC2, ISO 27001, etc.) for AI data pipelines. Perks & Benefits Excellent medical insurance options and free online doctor consultations Yearly privilege and sick leaves as per Karnataka S&E Act Generous holidays (National and Festive) recognition and parental leave policies Learning & Development fund to support your continuous learning journey and professional development Fun events to build culture across the organization Flexible benefit plans for tax exemptions (i.e. Meal card, PF, etc.) Our Commitment to Inclusion and Belonging Observe.AI  is an Equal Employment Opportunity employer that proudly pursues and hires a diverse workforce. Observe AI does not make hiring or employment decisions on the basis of race, color, religion or religious belief, ethnic or national origin, nationality, sex, gender, gender identity, sexual orientation, disability, age, military or veteran status, or any other basis protected by applicable local, state, or federal laws or prohibited by Company policy. Observe.AI also strive

Similar Jobs

Related searches:

On-site Jobs Lead Jobs On-site Lead Jobs Lead Data EngineeringLead AI InfrastructureLead Backend & SystemsLead AI Agents & RAG AI Jobs in Bangalore Data Engineering in BangaloreAI Infrastructure in BangaloreBackend & Systems in BangaloreAI Agents & RAG in Bangalore api-designclouddistributed-systemssearchdata-pipelinemlopsagentsdevops

Get jobs like this delivered weekly

Free AI jobs newsletter. No spam.