AT

Senior Site Reliability Engineer

Athenahealth Technology Private Limited
Chennai5-8 LPA Posted 12 Jun 2025
FULL TIME
Ansible
Puppet
Terraform
Grafana
Prometheus
+2 more

Job Description

Job Responsibilities  

  • Provisioning and ongoing management of physical & virtual Linux machines using tools like Puppet, Ansible, and Terraform, to name a few
  • Engage closely with sister teams to assume ownership of various system lifecycle tasks
  • Automate away toil and/or create empowerment processes for transitioning high urgency work to the NOC's rapid response team
  • Build automated monitoring & observability using tools such as Prometheus/AlertManager, iCinga, Grafana, etc.
  • Participate in all Agile/scrum ceremonies including daily stand-ups, sprint planning, backlog grooming, etc.
  • Participate in the team's on-call rotation (expected to begin late 2024, early 2025)
  • Work closely with internal teams to integrate new monitoring & alerts into the NOC using Perl scripting to author custom parsing & mapping rules
  • Develop metrics and observability dashboards which can be used to measure and track various success measures for the team & the business

 

Typical Qualifications 

  • 5+ years of professional experience delivering SaaS solutions, preferably in a hybrid cloud environment
  • Bachelor's or Master's degree in a Computer Science / Engineering program
  • Proven experience using query languages to deliver observability solutions
  • Proficiency working with one or more configuration management tools (Puppet, Chef, Ansible, etc.)
  • Admin-level expertise with a Unix-based operating system
  • Proven ops background using cloud-native best practices
  • Proven proficiency with one or more scripting languages (Python, Ruby, Perl, Java, etc.)
  • Proficiency working with Git & Atlassian suite or similar
  • Proficiency working with containerized environments is a plus
  • Experience creating technical documentation & standard operating procedures (SOPs)
Join WhatsApp Channel