HI

Cloud Devops

HighPoints Technologies India Private Limited
Bangalore13-23 LPA Posted 4 Mar 2026
FULL TIME
Kubernetes
Terraform
Grafana
Prometheus
MLops
+8 more

Job Description

Job Description: Role summary

• Senior Cloud/Platform Engineer on Oracle Cloud Infrastructure (OCI) focused on secure, reliable delivery of AI/ML and LLM workloads. Own IaC/GitOps, Kubernetes platform, MLOps/LLM serving, service mesh progressive delivery, and production SLOs.

 

Experienced candidates with 10+ years DevOps Engineer experience but who have working experience implement AI solutions (Agentic AI, MCP, Python, Machine Learning (ML)) or have experience implementing Service Mesh/Istio specifically around blue/green deployments, or extensive observability experience expert level for example in implementing/confuring Grafana/Prometheus. 

Must have qualifications

• OCI platform (4+ years overall cloud, 2+ years hands on OCI): OKE, OCIR, OCI API Gateway/WAF, Vault, Logging/Monitoring/Alarms, Identity Domains/IAM, VCN/NSGs.

• IaC/GitOps (3+ years): Terraform (OCI provider), Helm/Kustomize; Git based workflows; CI/CD with Jenkins or GitHub Actions; artifact/version promotion across envs.

• Kubernetes at scale (3+ years): cluster/node pool design, autoscaling, upgrade strategy, RBAC, network policies, Ingress/Gateway controllers, secrets management.

• Linux and networking: solid Linux admin (SELinux bonus), TCP/HTTP, TLS/mTLS, DNS, load balancing; container image hardening and SBOM awareness.

• Programming/automation: proficient in Python and Bash; working knowledge of Terraform HCL and at least one of Go/Ansible. Comfortable writing reusable modules and pipelines. SQL basics for troubleshooting/data checks.

• Oracle Database integration: connectivity patterns (ATP/ADW), Wallets, connection pooling, secrets rotation, and performance aware app connectivity.

• Observability and SLOs: Prometheus/Grafana or OCI Monitoring, OpenTelemetry traces, logs/metrics/traces correlation, alerting on latency/error budgets/capacity.

• Security and compliance: mTLS, least privilege IAM, KMS/Vault for secrets, audit trails, change management.

• Service mesh and progressive delivery: Istio (or OCI Service Mesh) traffic policies, retries/timeouts/circuit breakers, and hands on blue green, canary, and A/B testing.

• Communication and teamwork: clear written runbooks/Diagrams, ability to drive incident/postmortem processes.

AI/ML and LLM delivery (required exposure)

• LLM/RAG fundamentals: retrieval patterns, vector search integration, prompt/config management, guardrails/safety filters, offline/online evaluations.

• MCP (Model Context Protocol): concepts (tools/resources), building and operating MCP servers on Kubernetes; secure tool/resource exposure, auditability, and RAG via MCP resources.

• Vector databases/indices: pgvector, OpenSearch/Elastic, Milvus, Pinecone (or equivalent); hybrid search patterns and embedding pipelines.

• Certifications: OCI Architect Professional strongly preferred; plus one of (CKA/CKS), and AI/ML or Data Science professional certifications.

Key responsibilities

• Design, build, and operate OCI based Kubernetes platforms for AI/ML/LLM services with strong security, observability, and reliability.

• Implement and manage IaC/GitOps for repeatable environments, model/inference deployments, and traffic policies.

• Enable progressive delivery (blue green/canary/A B) with metric gated rollouts and fast rollback.

• Stand up and optimize LLM serving stacks, vector search, and RAG pipelines; enforce guardrails and monitor quality/cost SLOs.

• Integrate Oracle Databases and OCI services securely; manage secrets, credentials, and network segmentation.

• Establish SLOs, dashboards, runbooks, and incident/DR procedures; lead operational readiness reviews and postmortems.

 

Additional Details:

Work mode: WFO

Work type: Contract

Work location: Bangalore

Join WhatsApp Channel