TE

Site Reliability Engineer (SRE)

Teamware Solutions
Hyderabad4-7 LPA Posted 16 Jul 2025
FULL TIME
Devops
Grafana
Elk Stack
Prometheus
Linux
+1 more

Job Description

Key Responsibilities:

  • Design, build, and maintain scalable, highly available, and resilient infrastructure.
  • Develop automation tools and scripts to improve operational efficiency and reduce manual intervention.
  • Monitor system performance and availability, using monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog).
  • Participate in on-call rotations and incident response, troubleshooting production issues promptly.
  • Collaborate with development teams to improve deployment pipelines, reliability, and observability.
  • Implement and enforce best practices in deployment, configuration management, and incident management.
  • Analyze root causes of system failures and recommend solutions to prevent recurrence.
  • Support capacity planning, disaster recovery, and security compliance initiatives.
  • Document systems, processes, and incident post-mortems clearly and thoroughly.

Qualifications and Requirements:

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.
  • 3+ years of experience in site reliability, systems engineering, DevOps, or related roles.
  • Strong programming/scripting skills in languages like Python, Go, Bash, or Ruby.
  • Experience with cloud platforms such as AWS, Azure, or GCP.
  • Proficiency with containerization (Docker) and orchestration tools (Kubernetes).
  • Knowledge of CI/CD pipelines and infrastructure as code (Terraform, Ansible, CloudFormation).
  • Familiarity with monitoring, logging, and alerting tools (Prometheus, Grafana, ELK Stack).
  • Solid understanding of Linux/Unix systems, networking, and security best practices.
  • Strong problem-solving skills and ability to work under pressure.

Desirable Skills and Certifications:

  • Certifications such as AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer, or Certified Kubernetes Administrator (CKA).
  • Experience with chaos engineering and performance tuning.
  • Knowledge of database systems and caching technologies.
  • Exposure to Agile and DevOps methodologies.
  • Strong communication skills and ability to work collaboratively in cross-functional teams.

Join WhatsApp Channel