UKUKG
Sr Principal Site Reliability Engineer
Pune ₹10-12 LPA Posted 4 Jun 2025
FULL TIME
Javascript
Azure
Java
Aws
Python
Job Description
About the Role
- Site Reliability Engineers at UKG are team members with a breadth of knowledge encompassing all aspects of service delivery.
- They develop software solutions to enhance, harden, and support our service delivery processes.
- This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering, and auto remediation.
- Site Reliability Engineers must have a passion for learning and evolving with current technology trends.
- They strive to innovate and are relentless in their pursuit of a flawless customer experience.
- They have an automate everything mindset, helping us bring value to our customers by deploying services with incredible speed, consistency, and availability.
Primary/Essential Duties and Key Responsibilities
- Engage in and improve the lifecycle of services from conception to EOL, including system design consulting, and capacity planning.
- Define and implement standards and best practices related to System Architecture, Service delivery, metrics, and the automation of operational tasks.
- Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response.
- Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis.
- Collaborate closely with engineering professionals within the organization to deliver reliable services.
- Identify and eliminate operational toil by treating operational challenges as a software engineering problem.
- Actively participate in incident response, including on-call responsibilities.
- Partner with stakeholders to influence and help drive the best possible technical and business outcomes.
- Guide junior team members and serve as a champion for Site Reliability Engineering.
Required Qualifications
- Engineering degree, or a related technical discipline, and ten plus years of experience in SRE.
- Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java).
- Knowledge of Cloud based applications & Containerization Technologies.
- Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing.
- Ability to analyze current technology utilized and engineering practices within the company and develop steps and processes to improve and expand upon them.
- Working experience with industry standards like Terraform, Ansible.
Experience, Education, Certification, License and Training
- Must have hands-on experience working within Engineering or Cloud.
- Experience with public cloud platforms (e.g., GCP, AWS, Azure).
- Experience in configuration and maintenance of applications & systems infrastructure.
- Experience with distributed system design and architecture.
- Experience building and managing CI/CD Pipelines