NONomiso
SRE Engineer
Gurgaon ₹5-8 LPA Posted 30 Jun 2025
FULL TIME
Docker
Kubernetes
Terraform
Grafana
Prometheus
Job Description
Roles and Responsibilities:
- Monitor application and infrastructure metrics; build dashboards and alerts (Prometheus, Grafana, ELK).
- Automate health checks, incident remediation, and reliability guardrails.
- Manage on-call rotations, conduct root cause analysis, and implement postmortem action plans.
- Define and track SLOs, SLIs, and error budgets.
- Use chaos engineering and resilience testing to ensure fault tolerance.
Must Have Skills:
- 4-5years of experience in managing production-grade Kubernetes clusters and cloud-native platforms.
- Proficiency in Linux system internals, containers, and networking.
- Scripting/automation expertise in Python/Go/Shell.
- Familiarity with incident management, runbooks, and observability standards.
- Exposure to service discovery, DNS routing, and load balancing is a bonus.
Qualification:
- BE/BTech/MCA/ME/MTech/MS in Computer Science or a related technical field or equivalent practical experience.