GSPANN is hiring an AI Platform Engineer with 5+ years of experience to build and scale enterprise AI platforms and agentic AI solutions. The role focuses on developing RAG pipelines, managing Kubernetes-based AI deployments, and enabling secure, observable, and high-performance AI systems. This position is based across Hyderabad, Gurugram, Pune, Noida, and Bangalore, offering the opportunity to work on cutting-edge AI and platform engineering initiatives.
Description
Roles and Responsibilities
- Build, maintain, and scale the NOVA agentic AI platform and enterprise AI gateway using LiteLLM (Lightweight Large Language Model Proxy).
- Design and optimize RAG pipelines, including data ingestion, embeddings generation, and vector database management on Google Cloud Platform (GCP) and Microsoft Azure.
- Deploy and operate AI services on Kubernetes clusters, including Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE), using CI/CD pipelines with tools such as Jenkins, GitHub Actions, and Opsera.
- Implement monitoring and observability solutions using Prometheus, Grafana, and OpenTelemetry to ensure system reliability and performance.
- Automate infrastructure provisioning and management using Terraform, Helm, and GitOps practices, while ensuring security, governance, and compliance standards.
- Develop Model Context Protocol (MCP) servers, automation scripts, agent workflows, Software Development Kits (SDKs), and Application Programming Interfaces (APIs) to support internal platform and engineering teams.
Skills and Experience
- 5+ years of experience in Platform Engineering or DevOps, along with 2+ years of experience working with Artificial Intelligence / Machine Learning / Large Language Model (AI/ML/LLM) platforms.
- Demonstrate strong hands-on experience with Kubernetes, CI/CD pipelines, and cloud platforms such as GCP or Microsoft Azure.
- Develop solutions using strong programming skills in Python or TypeScript.
- Apply experience in Large Language Model (LLM) routing, cost optimization, and observability tools.
- Work with frameworks and tools such as LangChain, LlamaIndex, LangGraph, LiteLLM, Model Context Protocol (MCP), and Backstage.
- Utilize vector databases and build scalable, enterprise-grade AI platforms.
- Demonstrate a strong understanding of LLM cost optimization strategies and agentic workflow design.