AQAqilea
Site Reliability Engineer
Hyderabad ₹4-9 LPA Posted 16 Jun 2025
FULL TIME
Git
Incident Management
Problem Management
Infrastructure Management
Version Control
+1 more
Job Description
Key Responsibilities
- Software & Automation : Develop, test, and maintain high-quality software, frameworks, and automation tools to improve system reliability and reduce manual effort.
- System Design : Collaborate with product and engineering teams to design scalable and resilient infrastructure solutions.
- Incident Management : Lead incident response efforts, troubleshoot issues, and participate in on-call rotations. Create and maintain runbooks.
- Observability & Monitoring : Implement observability strategies using tools like Grafana, Splunk , and other APM/monitoring solutions. Define SLIs/SLOs for applications and infrastructure.
- CI/CD & DevOps : Design and maintain CI/CD pipelines using GitHub Actions . Promote DevOps best practices across teams.
- Infrastructure Management : Build and optimize cloud and/or on-prem infrastructure with a focus on scalability, performance, and reliability.
- Security & Compliance : Ensure systems are secure and comply with industry standards. Collaborate with security teams to implement necessary controls.
- Collaboration & Reviews : Participate in code reviews, sprint ceremonies, and design discussions to ensure high-quality deliverables.
- Knowledge Sharing : Document processes and systems comprehensively. Mentor junior engineers and contribute to knowledge-sharing initiatives.
Required Skills & Qualifications
- Programming/Scripting : Proficiency in one or more languages like Python .
- Cloud Expertise : Good working knowledge of at least one major cloud platform Microsoft Azure or GCP .
- CI/CD : Hands-on experience with GitHub Actions and version control using Git .
- Observability : Experience with monitoring tools such as Grafana, Splunk , Prometheus, etc.
- Agile Methodology : Solid understanding of Agile/Scrum processes.
- Incident & Problem Management : Willingness to work in operations and manage live issues.
- Certifications : Azure Fundamentals (AZ-900) or equivalent required.
- Soft Skills : Strong problem-solving, communication, and collaboration skills.