BM

SRE/OPS professional

Bmw Techworks India
Chennai7-10 LPA Posted 22 Aug 2025
FULL TIME
Splunk
Bash
Grafana
Itil
Azure
+1 more

Job Description

  • 6+ years of experience in IT operations or a similar role
  • Willing and able to travel internationally (twice a year)

Monitor and Operate IT Products:

  • Perform regular and sporadic operational tasks to ensure optimal performance of IT services
  • Own and maintain the Regular OPS Tasks list, refining sporadic tasks based on input from the Operations Experts (OE) network

Manage IT Service Continuity:

  • Prepare for and attend emergency exercises (EE), reviewing outcomes and deriving follow-up tasks
  • Communicate findings and improvements to the OE network

Manage Availability:

  • Participate in 'Gamedays' and backup/restore test sessions, practicing and executing backup and restore processes.
  • Own the recovery and backup plan, reviewing success and identifying follow-up tasks.

Manage Capacity:

  • Monitor cluster capacity using prepared dashboards and coordinate with the DevOps team for any issues
  • Plan and execute capacity extensions as needed

Manage Service Configuration:

  • Oversee service configuration management using ITSM tools

Manage Events:

  • Observe dashboards and alerts, take action for root cause analysis (RCA) and create tasks for the DevOps team.
  • Provide proactive feedback and maintain monitoring and alerting solutions.

Manage Problems:

  • Conduct root cause analysis and manage known issues, creating Jira defects for further assistance if required

Enable Changes:

  • Create and sync changes with the team, assisting with releases and deployment plans.

Manage Service Requests and Incidents:

  • Observe and resolve service requests and incidents, creating Jira tasks for the DevOps team as necessary.

Manage Knowledge:

  • Create, use, and extend knowledge articles, ensuring availability and consistency.

You take part in 24/7 on-call rotations in a future setup with teams around the world and can restore systems in an efficient manner.

Must have technical skill

  • Strong understanding of IT service management principles and practices
  • Proficiency in monitoring and management tools (e.g., dashboards, alerting systems)
  • Strong analytical and problem-solving abilities, particularly in IT service management
  • Experience in conducting root cause analysis (RCA) and managing known issues
  • Experience in performing regular and sporadic operational tasks to ensure optimal performance of IT services
  • Ability to manage IT service continuity, availability, and capacity effectively
  • Experience with change management processes, including creating and syncing changes with teams
  • Ability to plan and execute capacity extensions and backup/restore processes
  • Any additional responsibilities assigned in the Agile Working Model (AWM) Charter

Good to have technical skills

  • Experience with IT service management frameworks (e.g., ITIL, SRE practices)
  • Familiarity with cloud platforms (e.g. Azure) and their operational management
  • Experience with automation tools (e.g., Ansible, Puppet, Terraform) and scripting languages (e.g., Python, Bash) to streamline operational tasks
  • Understanding of DevOps methodologies and practices, including CI/CD (Continuous Integration/Continuous Deployment) processes
  • Knowledge of network protocols, configurations, and troubleshooting to support IT infrastructure
  • Understanding of IT security best practices and compliance requirements to ensure secure operations
  • Skills in data analysis and visualization tools (e.g., Splunk, Grafana) to interpret operational metrics and trends
  • Above-board work ethics

Join WhatsApp Channel