TETeamware Solutions
MONITORING TOOL - L2
Mumbai ₹4-7 LPA Posted 16 Jul 2025
FULL TIME
zendesk
Azure
Aws
Jira
Job Description
Key Responsibilities:
- Monitoring Tool Support:
- Provide L2 support for various monitoring tools (e.g., Nagios, Zabbix, Splunk, Prometheus, SolarWinds, AppDynamics, New Relic, etc.).
- Troubleshoot and resolve escalated alerts, incidents, and issues related to system performance, application health, network connectivity, and infrastructure availability.
- Collaborate with L1 support teams to assist in the diagnosis and resolution of simpler issues.
- Incident & Problem Management:
- Handle escalated incidents from L1 support, providing root cause analysis (RCA) and resolution.
- Track and maintain records of incidents, problems, and resolutions within the ticketing system (e.g., ServiceNow, JIRA).
- Ensure SLA compliance for issue resolution and follow-up on tickets to meet agreed-upon timelines.
- Alert Management:
- Review and manage monitoring alerts for critical systems, servers, databases, and applications.
- Ensure alerts are appropriately categorized and routed for resolution.
- Investigate and respond to false positives or irrelevant alerts to maintain the integrity of the monitoring system.
- Performance Monitoring & Reporting:
- Continuously monitor system health, application performance, and network traffic to proactively identify issues before they affect services.
- Maintain and improve monitoring dashboards to reflect the current health of the environment.
- Generate regular reports for system performance and uptime, providing recommendations for improvements or preventive actions.
- Tool Configuration & Optimization:
- Assist in the configuration and tuning of monitoring tools to ensure they provide meaningful and actionable data.
- Customize monitoring thresholds, alerts, and notifications to align with the organization's operational needs.
- Continuously improve the monitoring setup to ensure that it effectively supports the evolving infrastructure and application stack.
- Documentation & Knowledge Sharing:
- Document troubleshooting procedures, known issues, and best practices for the monitoring tools.
- Share knowledge and insights with L1 support teams to improve their troubleshooting capabilities.
- Maintain user manuals or standard operating procedures (SOPs) for monitoring tool management and escalation processes.
- Collaboration & Communication:
- Collaborate with DevOps, System Admins, and Network Engineers to resolve infrastructure or application performance issues.
- Communicate effectively with internal teams regarding ongoing incidents, resolution timelines, and potential impacts on services.
- Proactive System Improvements:
- Work with the IT Operations team to identify and implement proactive measures to improve the overall system performance and reduce downtime.
- Provide input for optimizing monitoring thresholds, reducing false alarms, and implementing new monitoring solutions or features.
Required Qualifications:
- 2-5 years of experience in L2 support or operations with monitoring tools.
- Strong understanding of IT infrastructure, including servers, databases, networks, and applications.
- Hands-on experience with monitoring tools (e.g., Nagios, Zabbix, Prometheus, Splunk, AppDynamics, New Relic, etc.).
- Experience working with alert management systems and troubleshooting complex issues.
- Familiarity with cloud environments (AWS, Azure, GCP) and the related monitoring tools.
- Solid understanding of system performance metrics and the ability to identify and troubleshoot issues based on performance data.
- Experience using ticketing systems (e.g., ServiceNow, JIRA, Zendesk) for incident management and tracking.
- Proficiency in Linux/Unix and Windows Server operating systems.
- Scripting knowledge (e.g., Bash, Python, PowerShell) for automating monitoring tasks and alerts.
- Good understanding of networking concepts (DNS, HTTP, TCP/IP, etc.) and their impact on monitoring.