Infrastructure Engineer

We are building a unified alert management platform for ADP SRE teams that enables 'Alerts as Code ' across multiple monitoring systems. The solution leverages an internal Infrastructure-as-Code orchestration tool, to provide a cohesive wrapper over Pulumi and Terraform, enabling teams to manage alerts for Mosaic, Hubble, and Prometheus platforms through version-controlled, auditable configurations.

Core Responsibilities

You will be responsible for designing and implementing the alert management infrastructure, ensuring seamless integration across monitoring platforms.

Implementation: Develop and configure Pkl configurations to orchestrate both Pulumi and Terraform executors for alert management across environments.
Multi-Platform Alert Support: Implement alert creation workflows for Mosaic (via Pulumi), Hubble (via Terraform), and Prometheus monitoring platforms.
State Management: Configure and manage state backends using HCS (HybridCloud Services) or S3 with appropriate security controls and ClientConnect authentication.
Validation & Automation: Build automated validation pipelines to ensure alert configurations include required metadata (title, playbook location, severity, ownership, description) before deployment.
Change Management Integration: Implement approval workflows and CODEOWNERS-based enforcement to ensure alert changes require appropriate stakeholder review.
Documentation & Onboarding: Create comprehensive documentation and templates to enable self-service alert creation for ADP SRE teams, Product Engineering, and LOB SREs.

Required Technical Skills

Infrastructure as Code: Strong experience with Terraform (HCL) and/or Pulumi (TypeScript, Go, or Python). Must be comfortable managing IaC state and understanding declarative infrastructure patterns.
Configuration Languages: Familiarity with Pkl or willingness to quickly learn Client's Pkl configuration language.
Monitoring & Observability: Understanding of monitoring concepts including alert thresholds, severity levels, routing, and playbook integration. Experience with enterprise monitoring platforms is essential.
Cloud Infrastructure: Experience with AWS services (S3, DynamoDB for state locking).
Version Control & CI/CD: Strong Git workflows, experience with code review processes, and CI/CD pipeline integration for infrastructure changes.
Python: Proficiency in Python for scripting, automation, and potential integration work.

Preferred Qualifications (Nice-to-Have)

SRE Tools: Experience with Prometheus and Grafana.
SRE Background: Experience in Site Reliability Engineering, particularly around alert management, on-call workflows, and incident response.
Template Systems: Familiarity with Jinja templating or similar systems for generating IaC configurations.

Tech Stack

Python (Scripting and automation), Strong experience with Terraform (HCL) and/or Pulumi (TypeScript, Go, or

Python), Pkl (HCL, Jsonnet, or CUE), Prometheus/Hubble/Mosaic, and Git

infrastructure engineer

Job Description

Infrastructure Engineer

Core Responsibilities

Required Technical Skills

Preferred Qualifications (Nice-to-Have)

Tech Stack

Required Skills