FT

Sr Site Reliability Engineer

FTD India Private Limited
Hyderabad12-16 LPA Posted 9 Mar 2026
FULL TIME
Jenkins
Google Cloud Platform
Cicd
Kubernets
Java
+1 more

Job Description

The Senior Site Reliability Engineer (SRE) will enable FTD to efficiently deliver and operate high quality, secure software at scale. As a senior contributor you will collaborate with cross-functional teams to promote DevOps principles and practices and implement world class self-service. You will incubate and proliferate SRE principles and practices to ensure the stability and reliability of our commerce platforms. You will take a lead role in managing the health of applications and infrastructure.

 

This position supports a hybrid work model including onsite presence in our Hyderabad, India office as needed. Occasional on-call and overtime work will be required but generally it is not expected to be significant.

KEY RESPONSIBILITIES:

  • Maintain availability, performance, and scalability of critical services and production environments.
  • Collaborate closely with developers to design reliable applications and improve deployment practices (you will be embedded in the Development team but reporting to infrastructure leader).
  • Break down walls and build trust between developers and infrastructure teams
  • Participate in end-to-end application ownership throughout the CI/CD process, including automated testing, observability, dependency management, and other operational concerns.
  • Improve CI/CD pipeline reliability, traceability, and security
  • Build automation for provisioning, configuration, deployments, and incident response.
  • Improve observability using metrics, logs, distributed tracing, dashboards, and alerting.
  • Participate in on-call rotation, lead incident response, and drive root cause analysis.
  • Conduct capacity planning, chaos testing, and reliability reviews.
  • Implement infrastructure-as-code using Terraform, Helm, Jenkins/GitHub Actions/etc.
  • Optimize CI/CD pipelines and ensure safe, repeatable deployments (i.e. ArgoCD).
  • Champion SRE principles: SLIs/SLOs, error budgets, toil reduction, problem management, blameless postmortems.
  • Embrace a culture of enablement, customer service, continuous improvement, transparency, and fiscal responsibility
  • Perform other duties as directed

  • KNOWLEDGE, SKILLS AND ABILITIES
  • 5+ years designing, developing, delivering, and operating scalable, available, high-performance applications (Java and node.js, etc.) and infrastructure
  • Bachelor's or advanced degree in Computer Science, Information Systems, or a related field
  • Familiarity with modern application languages and concepts, with hands-on e-commerce software development experience preferred
  • Google Professional Cloud Architect or similar certification desired
  • Advanced hands-on experience with continuous integration and delivery / deployment methodologies and technologies
  • Advanced experience with computer, networking, security, storage, monitoring, logging, database, and other technologies in Google Cloud Platform or similar major cloud environment
  • Strong experience with containerization (e.g. Docker), Kubernetes, and Infrastructure as Code (Terraform preferred)
  • Working knowledge of Helm and Service Mesh (e.g. Istio)
  • Proficient understanding of microservices principles and orchestration
  • Excellence in navigating and prioritizing multiple simultaneous responsibilities of varying scope and complexity
  • Ability to effectively articulate technical concepts to audiences at all organizational levels via oral, written, and other non-verbal communications
  • Demonstrated desire and ability to be self-directed, take ownership of issues, and establish a prominent level of credibility
  • Ability to work well independently and within dynamic, cross-functional teams
  • Excellent understanding of Internet concepts, technologies and protocols (TCP/IP, DNS, HTTP, TLS / SSL, etc.)
  • Experience with rapid detection and resolution of technical issues using various monitoring and application performance management tools
  • Proficiency with shell scripting, Python and/or other scripting languages in a Linux environment
  • Ability to operate effectively under pressure, both independently and in collaboration with other resources
  • Ability to rapidly learn new technologies via mentoring, formal training, independent research and testing
  • A genuine desire and willingness to share knowledge effectively with others

Join WhatsApp Channel