ATAthenahealth Technology Private Limited
Senior Site Reliability Engineer
Chennai ₹5-8 LPA Posted 12 Jun 2025
FULL TIME
Ansible
Puppet
Terraform
Grafana
Prometheus
+2 more
Job Description
Job Responsibilities
- Provisioning and ongoing management of physical & virtual Linux machines using tools like Puppet, Ansible, and Terraform, to name a few
- Engage closely with sister teams to assume ownership of various system lifecycle tasks
- Automate away toil and/or create empowerment processes for transitioning high urgency work to the NOC's rapid response team
- Build automated monitoring & observability using tools such as Prometheus/AlertManager, iCinga, Grafana, etc.
- Participate in all Agile/scrum ceremonies including daily stand-ups, sprint planning, backlog grooming, etc.
- Participate in the team's on-call rotation (expected to begin late 2024, early 2025)
- Work closely with internal teams to integrate new monitoring & alerts into the NOC using Perl scripting to author custom parsing & mapping rules
- Develop metrics and observability dashboards which can be used to measure and track various success measures for the team & the business
Typical Qualifications
- 5+ years of professional experience delivering SaaS solutions, preferably in a hybrid cloud environment
- Bachelor's or Master's degree in a Computer Science / Engineering program
- Proven experience using query languages to deliver observability solutions
- Proficiency working with one or more configuration management tools (Puppet, Chef, Ansible, etc.)
- Admin-level expertise with a Unix-based operating system
- Proven ops background using cloud-native best practices
- Proven proficiency with one or more scripting languages (Python, Ruby, Perl, Java, etc.)
- Proficiency working with Git & Atlassian suite or similar
- Proficiency working with containerized environments is a plus
- Experience creating technical documentation & standard operating procedures (SOPs)