TETeamware Solutions
Site Reliability Engineer (SRE)
Hyderabad ₹4-7 LPA Posted 16 Jul 2025
FULL TIME
Devops
Grafana
Elk Stack
Prometheus
Linux
+1 more
Job Description
Key Responsibilities:
- Design, build, and maintain scalable, highly available, and resilient infrastructure.
- Develop automation tools and scripts to improve operational efficiency and reduce manual intervention.
- Monitor system performance and availability, using monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog).
- Participate in on-call rotations and incident response, troubleshooting production issues promptly.
- Collaborate with development teams to improve deployment pipelines, reliability, and observability.
- Implement and enforce best practices in deployment, configuration management, and incident management.
- Analyze root causes of system failures and recommend solutions to prevent recurrence.
- Support capacity planning, disaster recovery, and security compliance initiatives.
- Document systems, processes, and incident post-mortems clearly and thoroughly.
Qualifications and Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.
- 3+ years of experience in site reliability, systems engineering, DevOps, or related roles.
- Strong programming/scripting skills in languages like Python, Go, Bash, or Ruby.
- Experience with cloud platforms such as AWS, Azure, or GCP.
- Proficiency with containerization (Docker) and orchestration tools (Kubernetes).
- Knowledge of CI/CD pipelines and infrastructure as code (Terraform, Ansible, CloudFormation).
- Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, ELK Stack).
- Solid understanding of Linux/Unix systems, networking, and security best practices.
- Strong problem-solving skills and ability to work under pressure.
Desirable Skills and Certifications:
- Certifications such as AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer, or Certified Kubernetes Administrator (CKA).
- Experience with chaos engineering and performance tuning.
- Knowledge of database systems and caching technologies.
- Exposure to Agile and DevOps methodologies.
- Strong communication skills and ability to work collaboratively in cross-functional teams.