Manager, Platform Engineer
Job Description
Role Overview
As a Platform Engineer on our DevXOps Platform team, youll play a crucial role in building, maintaining, and evolving the foundational platforms that empower our product development teams. Our diverse platform ecosystem includes technologies like JFrog Artifactory , JFrog Xray , SonarQube , GitHub Actions Runners , Crossplane , Backstage , and Ansible Automation Platform . Youll work at the intersection of infrastructure-as-code, cloud-native technologies, and DevOps principles, designing, building, and supporting highly available, scalable, and secure platforms. Your efforts will enable our development teams with robust CI/CD pipelines, slick self-service capabilities, and comprehensive observability solutions.
What will you do in this role
- Platform Design & Implementation : Design and implement robust platform solutions using Infrastructure as Code (IaC) tools like Terraform and Ansible .
- Collaboration & Performance : Collaborate closely with development teams to ensure application performance, scalability, and reliability across our platforms.
- Cloud Infrastructure Expertise : Leverage your strong understanding of AWS architecture and components (EKS, EC2, RDS, IAM, VPC networking, CloudWatch/CloudTrail) to design, implement, and manage our cloud infrastructure efficiently.
- Monitoring & Troubleshooting : Implement and manage comprehensive logging, monitoring, and alerting solutions using tools such as Prometheus, Grafana, Loki, Alertmanager , and cloud-native services like CloudWatch and Log Analytics . Youll also monitor and troubleshoot complex issues across our platform stack.
- CI/CD Pipeline Development : Design, build, and maintain robust Continuous Integration and Continuous Deployment (CI/CD) pipelines using tools like GitHub Actions . This includes managing the Software Development Life Cycle (SDLC) documentation and promoting a culture of automation.
- Containerized Solutions : Design, implement, and maintain containerized solutions using Kubernetes and Docker , including managing deployments on EKS and other Kubernetes environments.
- Architectural Contributions : Contribute to architectural discussions and technical decisions for the DXO Platform, ensuring alignment with organizational goals and best practices.
- Automation & Self-Service Enablement : Develop and maintain automation solutions using tools like Ansible , Crossplane , and custom scripts. Empower development teams with self-service capabilities, enabling them to provision resources and deploy applications efficiently.
- Production Support : Provide expert support for production systems, ensuring high availability, reliability, and optimal performance for all our platforms.
- Security & Compliance : Implement and maintain security best practices across all platforms, ensuring compliance with organizational policies and industry standards.
- Accessibility & Automation : Automate testing procedures and document defects to ensure accessibility standards are met across our platforms..
What should you have
- 5+ years of experience in Platform Engineering , DevOps , Infrastructure Engineering , or Site Reliability Engineering (SRE) roles.
- Proven hands-on experience designing, building, and operating production systems on AWS and familiarity with Azure .
- Deep proficiency in Linux and strong scripting skills in Python or similar languages.
- Extensive experience with CI/CD methodologies and tools , specifically GitHub Actions or GitLab CI.
- Extensive experience with Ansible .
- Solid understanding of containerization technologies (e.g., Podman, Docker) and container orchestration (e.g., Kubernetes, OpenShift).
- Strong grasp of infrastructure-as-code principles with experience in tools like Terraform.
- Experience with monitoring and logging tools including Prometheus, Grafana, ELK stack (Elasticsearch, Logstash, Kibana), CloudWatch, and Log Analytics .
- Familiarity with version control systems, especially Git , and practices like GitOps and semantic versioning.
- Familiarity with Artifact Management like JFrog Artifactory or Nexus.
- Understanding of network protocols and security best practices.
- Excellent communication skills, with the ability to mentor engineers, write clear documentation, and present technical concepts effectively.
Soft Skills
- Problem-Solving : Strong problem-solving and troubleshooting abilities.
- Communication : Excellent communication skills, both verbal and written.
- Collaboration : A collaborative mindset, eager to work effectively within a team and across departments.
- Continuous Learning : Proactive in learning new technologies and adapting to evolving industry trends.
- Adaptability : Ability to thrive and contribute effectively in fast-paced environments.
- Passion : A genuine passion for building and optimizing robust, scalable platforms.
Good to have Skills.
- Certifications in AWS , Red Hat Ansible Automation Platform , or Kubernetes.
- Experience with JFrog Artifactory and JFrog Xray administration.
- Knowledge of OpenShift Operators and Kubernetes CRDs .
- Experience integrating ServiceNow or other ITSM platforms into CI/CD workflows.
- Working knowledge of Backstage for service catalog and integration.
- Terraform, Packer or Crossplane expertise.