EPEpsilon Data Management
Systems Lead_Service Management
Gurgaon ₹6-10 LPA Posted 3 Jun 2025
FULL TIME
Incident Management
Sql
Service Management
Troubleshooting
Agile
+2 more
Job Description
Your Impact
- 24x7 Application Support: Provide 24x7 support for customer applications, ensuring high availability and performance.
- Performance Monitoring: Monitor application performance and system health, utilizing appropriate tools and techniques.
- Incident Management & Resolution: Handle production incidents, including comprehensive troubleshooting, root cause analysis (RCA), and timely resolution.
- Primary Contact: Act as the primary point of contact for all production support issues related to customer applications.
- Prioritization: Prioritize and manage incidents based on severity (P1-P3) and business impact.
- Coordination & Deployment: Coordinate with development teams for issue resolution and deploy fixes as required.
- Application-Level Debugging: Conduct detailed application-level debugging to identify and resolve issues.
- Diagnostic Analysis: Analyze logs, trace files, and other diagnostic data to pinpoint problem areas.
- Fix Implementation: Implement and test fixes and patches in a timely manner.
- Root Cause Analysis (RCA): Perform in-depth root cause analysis for recurring and complex problems.
- Preventive Measures & Documentation: Document findings and implement preventive measures to avoid future occurrences. Share RCA reports with relevant stakeholders and ensure learnings are incorporated into the application lifecycle.
- Escalation Handling: Handle escalations from junior team members and ensure timely resolution. Communicate effectively with stakeholders during high-impact incidents. Maintain detailed documentation of escalations, including actions taken and outcomes.
- Team Leadership & Mentorship: Lead and mentor a team of production support engineers, providing guidance and support.
- Training & Knowledge Sharing: Conduct regular training sessions and knowledge sharing to upskill team members. Ensure adherence to best practices and support processes.
- Process Improvement: Identify and implement process improvements to enhance support efficiency and effectiveness.
- Documentation: Develop and maintain support documentation, including incident reports, standard operating procedures, and knowledge base articles.
- Communication: Communicate clearly and effectively with stakeholders, including technical teams, management, and business users. Provide regular status updates and post-incident reviews.
- On-call Rotation: Participate in on-call rotations as required.
- Compliance: Ensure compliance with ITIL and other relevant standards and frameworks.
Qualifications
Your Skills & Experience
- Education: Bachelor's degree in Computer Science, Engineering, or a related field.
- Experience: 6-9.5 years of hands-on experience in application production support.
- Debugging & Incident Management: Proven experience in application-level debugging and incident management.
- Issue Management: Strong background in handling P1-P3 issues and managing multiple issues simultaneously.
- Domain Expertise: Trade Surveillance Production Support Capital Markets domain experience is a must, and Surveillance domain experience is good to have.
- Support Response: Role aim is to respond to support requests via email, chat, or call.
- End-User Communication: Ability to communicate with end-users to obtain support details.
- ServiceNow: Proficiency in using ServiceNow to create, maintain, and close incidents and problems.
- ITSM Processes: Understanding and ability to follow service management and change management processes.
- Jenkins: Understanding how to use and deploy using Jenkins.
- Documentation Updates: Ability to perform updates to process documents in response to support incidents.
- External Coordination: Ability to engage external teams and external vendors to resolve technology and business issues.
- Querying Skills: Ability to generate queries using SQL and No-SQL to help resolve incidents.
- Networking Fundamentals: Understanding of basic networking, internet protocols, and encryption.
- Programming Exposure: Exposure to Java or Python.
- KX & Time Series: KX exposure and time series database exposure.
- Reporting: Ability to provide status reports and share information in standups.
Additional Information
- Join the team to sharpen your skills and expand your collaborative methods.
- Make an impact on clients and their businesses directly through your work.
- Gender Neutral Policy
- 18 paid holidays throughout the year.
- Generous parental leave and new parent transition program.
- Flexible work arrangements.
- Employee Assistance Programs to help with wellness and well-being