ALAlter Domus
Site Reliability Engineer at Alter Domus
Hyderabad ₹5-7 LPA Posted 25 Jun 2025
FULL TIME
Windows
Javascript
Event Management
Networking
Linux
+2 more
Job Description
We are looking for an experienced and motivated DevOps Engineer to join our Site Reliability Engineering (SRE) team . This role involves spearheading the Grafana Cloud and Backstage implementations as part of our Observability project. The ideal candidate will bring a blend of technical expertise in observability tools, strong problem-solving skills, and a passion for creating efficient, reliable systems.
Key Responsibilities:
- Configure and manage data sources, including Prometheus and Azure Monitor, to build dashboards in Grafana.
- Collaborate with DevOps engineers, system administrators, and software developers to understand monitoring requirements and design robust observability solutions.
- Customize and extend Grafana functionalities by developing and implementing plugins and scripts.
- Enhance visualizations for observability solutions to meet organizational needs.
- Optimize dashboard performance and usability by fine-tuning data queries.
- Troubleshoot and resolve issues related to Grafana configuration, data ingestion, and visualizations.
- Participate in the administration, maintenance, and development of observability tools, including Grafana and ELK stack.
- Troubleshoot network communication problems and ensure smooth operations.
- Support Backstage implementation to enhance developer experience within the organization.
Required Skills:
- Familiarity with Event Management and Application Monitoring concepts.
- Experience in building and enhancing visualizations for observability solutions.
- Proficiency with observability tools such as Grafana , Prometheus , Dynatrace , Splunk , Azure Monitor , or AWS CloudWatch .
- Expertise in scripting with one or more of the following languages: Unix Shell , Windows PowerShell , JavaScript , Python , or Go .
- Strong problem-solving and analytical skills, with the ability to troubleshoot complex network communication issues.
- Hands-on experience with the administration, maintenance, and development of Grafana or ELK stack.
- Minimum of 5-7 years of domain experience in monitoring or related fields.
- Comfortable working with both Windows and Linux command lines.
- Excellent communication and collaboration skills, with the ability to work effectively within a team and interact with stakeholders.
Core/Must-Have Skills
- Observability Subject Matter Expertise (SME)
- Prometheus
- Azure Monitor
- Grafana
- Open Telemetry
Good-to-Have Skills
- Proficiency in Unix Shell, Windows PowerShell, JavaScript, Python, or Go.
- Familiarity with Backstage implementation.
- Experience troubleshooting network communication problems.