BMBmw Techworks India
SRE/OPS professional
Chennai ₹7-10 LPA Posted 22 Aug 2025
FULL TIME
Splunk
Bash
Grafana
Itil
Azure
+1 more
Job Description
- 6+ years of experience in IT operations or a similar role
- Willing and able to travel internationally (twice a year)
Monitor and Operate IT Products:
- Perform regular and sporadic operational tasks to ensure optimal performance of IT services
- Own and maintain the Regular OPS Tasks list, refining sporadic tasks based on input from the Operations Experts (OE) network
Manage IT Service Continuity:
- Prepare for and attend emergency exercises (EE), reviewing outcomes and deriving follow-up tasks
- Communicate findings and improvements to the OE network
Manage Availability:
- Participate in 'Gamedays' and backup/restore test sessions, practicing and executing backup and restore processes.
- Own the recovery and backup plan, reviewing success and identifying follow-up tasks.
Manage Capacity:
- Monitor cluster capacity using prepared dashboards and coordinate with the DevOps team for any issues
- Plan and execute capacity extensions as needed
Manage Service Configuration:
- Oversee service configuration management using ITSM tools
Manage Events:
- Observe dashboards and alerts, take action for root cause analysis (RCA) and create tasks for the DevOps team.
- Provide proactive feedback and maintain monitoring and alerting solutions.
Manage Problems:
- Conduct root cause analysis and manage known issues, creating Jira defects for further assistance if required
Enable Changes:
- Create and sync changes with the team, assisting with releases and deployment plans.
Manage Service Requests and Incidents:
- Observe and resolve service requests and incidents, creating Jira tasks for the DevOps team as necessary.
Manage Knowledge:
- Create, use, and extend knowledge articles, ensuring availability and consistency.
You take part in 24/7 on-call rotations in a future setup with teams around the world and can restore systems in an efficient manner.
Must have technical skill
- Strong understanding of IT service management principles and practices
- Proficiency in monitoring and management tools (e.g., dashboards, alerting systems)
- Strong analytical and problem-solving abilities, particularly in IT service management
- Experience in conducting root cause analysis (RCA) and managing known issues
- Experience in performing regular and sporadic operational tasks to ensure optimal performance of IT services
- Ability to manage IT service continuity, availability, and capacity effectively
- Experience with change management processes, including creating and syncing changes with teams
- Ability to plan and execute capacity extensions and backup/restore processes
- Any additional responsibilities assigned in the Agile Working Model (AWM) Charter
Good to have technical skills
- Experience with IT service management frameworks (e.g., ITIL, SRE practices)
- Familiarity with cloud platforms (e.g. Azure) and their operational management
- Experience with automation tools (e.g., Ansible, Puppet, Terraform) and scripting languages (e.g., Python, Bash) to streamline operational tasks
- Understanding of DevOps methodologies and practices, including CI/CD (Continuous Integration/Continuous Deployment) processes
- Knowledge of network protocols, configurations, and troubleshooting to support IT infrastructure
- Understanding of IT security best practices and compliance requirements to ensure secure operations
- Skills in data analysis and visualization tools (e.g., Splunk, Grafana) to interpret operational metrics and trends
- Above-board work ethics