AI Platform Engineer

Artificial IntelligenceRetrieval-Augmented Generation (RAG)Prometheus/GrafanaMicrosoft AzureGitHubJenkinsBash ScriptPythonContinuous Integration / Continuous Delivery (CI/CD)Kubernetes

Description

GSPANN is hiring an AI Platform Engineer with 5+ years of experience to build and scale enterprise AI platforms and agentic AI solutions. The role focuses on developing RAG pipelines, managing Kubernetes-based AI deployments, and enabling secure, observable, and high-performance AI systems. This position is based across Hyderabad, Gurugram, Pune, Noida, and Bangalore, offering the opportunity to work on cutting-edge AI and platform engineering initiatives.

Roles and Responsibilities

  • Build, maintain, and scale the NOVA agentic AI platform and enterprise AI gateway using LiteLLM (Lightweight Large Language Model Proxy).
  • Design and optimize RAG pipelines, including data ingestion, embeddings generation, and vector database management on Google Cloud Platform (GCP) and Microsoft Azure.
  • Deploy and operate AI services on Kubernetes clusters, including Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE), using CI/CD pipelines with tools such as Jenkins, GitHub Actions, and Opsera.
  • Implement monitoring and observability solutions using Prometheus, Grafana, and OpenTelemetry to ensure system reliability and performance.
  • Automate infrastructure provisioning and management using Terraform, Helm, and GitOps practices, while ensuring security, governance, and compliance standards.
  • Develop Model Context Protocol (MCP) servers, automation scripts, agent workflows, Software Development Kits (SDKs), and Application Programming Interfaces (APIs) to support internal platform and engineering teams.

Skills and Experience

  • 5+ years of experience in Platform Engineering or DevOps, along with 2+ years of experience working with Artificial Intelligence / Machine Learning / Large Language Model (AI/ML/LLM) platforms.
  • Demonstrate strong hands-on experience with Kubernetes, CI/CD pipelines, and cloud platforms such as GCP or Microsoft Azure.
  • Develop solutions using strong programming skills in Python or TypeScript.
  • Apply experience in Large Language Model (LLM) routing, cost optimization, and observability tools.
  • Work with frameworks and tools such as LangChain, LlamaIndex, LangGraph, LiteLLM, Model Context Protocol (MCP), and Backstage.
  • Utilize vector databases and build scalable, enterprise-grade AI platforms.
  • Demonstrate a strong understanding of LLM cost optimization strategies and agentic workflow design.

Apply Now

PDF or DOCX up to 5MB