IFIfintalent Global Private Limited
Site Reliability Engineer
Bangalore ₹5-7 LPA Posted 10 Apr 2025
FULL TIME
Jenkins
Linux
Aws
Python
Job Description
Job Description
Job Summary:
We are seeking a highly skilled and experienced Site Reliability Engineer to join our engineering team.
The ideal candidate will be an expert in AWS cloud services, automation using Python, and infrastructure as code practices. You will be responsible for ensuring the reliability and scalability of our cloud-based applications and services, developing automation solutions, and maintaining our CI/CD pipelines.
Key Responsibilities:
- Design, develop, and maintain scalable, automated, user-friendly systems, tools, and processes.
- Able to build infrastructure as code using Terraform or python.
- Write and maintain robust, high-quality Chef cookbooks to manage our infrastructure configuration.
- Develop scripts and automation using Python to streamline the deployment and management of AWS cloud services.
- Collaborate with development teams to enhance, document, establish process improvements, and drive operational excellence.
- Monitor the health and performance of services, and respond to system issues as necessary.
- Participate in on-call rotations, providing support and incident management for production systems.
- Keep abreast of new AWS features and services to enhance our cloud infrastructure and automation tools.
- Document systems, processes, and decisions for both technical and non-technical audiences.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.
- 5+ years of experience in working on large scale single or multiple cloud infrastructure.
- Strong experience with AWS cloud services and managing production environments in AWS.
- Proficiency in writing infrastructure as code using Chef, Python, and other relevant technologies.
- Experience with Jenkins and able to create Jenkin pipelines to automate BAU tasks.
- Solid understanding of Linux/Unix Administration.
- Knowledge of best practices and IT operations in an always-up, always-available service.