EP

Systems Lead_Service Management

Epsilon Data Management
Gurgaon6-10 LPA Posted 3 Jun 2025
FULL TIME
Incident Management
Sql
Service Management
Troubleshooting
Agile
+2 more

Job Description

Your Impact

  • 24x7 Application Support: Provide 24x7 support for customer applications, ensuring high availability and performance.
  • Performance Monitoring: Monitor application performance and system health, utilizing appropriate tools and techniques.
  • Incident Management & Resolution: Handle production incidents, including comprehensive troubleshooting, root cause analysis (RCA), and timely resolution.
  • Primary Contact: Act as the primary point of contact for all production support issues related to customer applications.
  • Prioritization: Prioritize and manage incidents based on severity (P1-P3) and business impact.
  • Coordination & Deployment: Coordinate with development teams for issue resolution and deploy fixes as required.
  • Application-Level Debugging: Conduct detailed application-level debugging to identify and resolve issues.
  • Diagnostic Analysis: Analyze logs, trace files, and other diagnostic data to pinpoint problem areas.
  • Fix Implementation: Implement and test fixes and patches in a timely manner.
  • Root Cause Analysis (RCA): Perform in-depth root cause analysis for recurring and complex problems.
  • Preventive Measures & Documentation: Document findings and implement preventive measures to avoid future occurrences. Share RCA reports with relevant stakeholders and ensure learnings are incorporated into the application lifecycle.
  • Escalation Handling: Handle escalations from junior team members and ensure timely resolution. Communicate effectively with stakeholders during high-impact incidents. Maintain detailed documentation of escalations, including actions taken and outcomes.
  • Team Leadership & Mentorship: Lead and mentor a team of production support engineers, providing guidance and support.
  • Training & Knowledge Sharing: Conduct regular training sessions and knowledge sharing to upskill team members. Ensure adherence to best practices and support processes.
  • Process Improvement: Identify and implement process improvements to enhance support efficiency and effectiveness.
  • Documentation: Develop and maintain support documentation, including incident reports, standard operating procedures, and knowledge base articles.
  • Communication: Communicate clearly and effectively with stakeholders, including technical teams, management, and business users. Provide regular status updates and post-incident reviews.
  • On-call Rotation: Participate in on-call rotations as required.
  • Compliance: Ensure compliance with ITIL and other relevant standards and frameworks.

Qualifications

Your Skills & Experience

  • Education: Bachelor's degree in Computer Science, Engineering, or a related field.
  • Experience: 6-9.5 years of hands-on experience in application production support.
  • Debugging & Incident Management: Proven experience in application-level debugging and incident management.
  • Issue Management: Strong background in handling P1-P3 issues and managing multiple issues simultaneously.
  • Domain Expertise: Trade Surveillance Production Support Capital Markets domain experience is a must, and Surveillance domain experience is good to have.
  • Support Response: Role aim is to respond to support requests via email, chat, or call.
  • End-User Communication: Ability to communicate with end-users to obtain support details.
  • ServiceNow: Proficiency in using ServiceNow to create, maintain, and close incidents and problems.
  • ITSM Processes: Understanding and ability to follow service management and change management processes.
  • Jenkins: Understanding how to use and deploy using Jenkins.
  • Documentation Updates: Ability to perform updates to process documents in response to support incidents.
  • External Coordination: Ability to engage external teams and external vendors to resolve technology and business issues.
  • Querying Skills: Ability to generate queries using SQL and No-SQL to help resolve incidents.
  • Networking Fundamentals: Understanding of basic networking, internet protocols, and encryption.
  • Programming Exposure: Exposure to Java or Python.
  • KX & Time Series: KX exposure and time series database exposure.
  • Reporting: Ability to provide status reports and share information in standups.

Additional Information

  • Join the team to sharpen your skills and expand your collaborative methods.
  • Make an impact on clients and their businesses directly through your work.
  • Gender Neutral Policy
  • 18 paid holidays throughout the year.
  • Generous parental leave and new parent transition program.
  • Flexible work arrangements.
  • Employee Assistance Programs to help with wellness and well-being

Join WhatsApp Channel