AQ

Site Reliability Engineer

Aqilea
Hyderabad4-9 LPA Posted 16 Jun 2025
FULL TIME
Git
Incident Management
Problem Management
Infrastructure Management
Version Control
+1 more

Job Description

Key Responsibilities

  • Software & Automation : Develop, test, and maintain high-quality software, frameworks, and automation tools to improve system reliability and reduce manual effort.
  • System Design : Collaborate with product and engineering teams to design scalable and resilient infrastructure solutions.
  • Incident Management : Lead incident response efforts, troubleshoot issues, and participate in on-call rotations. Create and maintain runbooks.
  • Observability & Monitoring : Implement observability strategies using tools like Grafana, Splunk , and other APM/monitoring solutions. Define SLIs/SLOs for applications and infrastructure.
  • CI/CD & DevOps : Design and maintain CI/CD pipelines using GitHub Actions . Promote DevOps best practices across teams.
  • Infrastructure Management : Build and optimize cloud and/or on-prem infrastructure with a focus on scalability, performance, and reliability.
  • Security & Compliance : Ensure systems are secure and comply with industry standards. Collaborate with security teams to implement necessary controls.
  • Collaboration & Reviews : Participate in code reviews, sprint ceremonies, and design discussions to ensure high-quality deliverables.
  • Knowledge Sharing : Document processes and systems comprehensively. Mentor junior engineers and contribute to knowledge-sharing initiatives.

Required Skills & Qualifications

  • Programming/Scripting : Proficiency in one or more languages like Python .
  • Cloud Expertise : Good working knowledge of at least one major cloud platform Microsoft Azure or GCP .
  • CI/CD : Hands-on experience with GitHub Actions and version control using Git .
  • Observability : Experience with monitoring tools such as Grafana, Splunk , Prometheus, etc.
  • Agile Methodology : Solid understanding of Agile/Scrum processes.
  • Incident & Problem Management : Willingness to work in operations and manage live issues.
  • Certifications Azure Fundamentals (AZ-900) or equivalent required.
  • Soft Skills : Strong problem-solving, communication, and collaboration skills.
Join WhatsApp Channel