AR

infrastructure engineer

Artech Infosystems Private Limited
Remote25-28 LPA Posted 30 Mar 2026
FULL TIME
Scripting
Pulumi
Automation
Python

Job Description

Infrastructure Engineer

We are building a unified alert management platform for ADP SRE teams that enables 'Alerts as Code ' across multiple monitoring systems. The solution leverages an internal Infrastructure-as-Code orchestration tool, to provide a cohesive wrapper over Pulumi and Terraform, enabling teams to manage alerts for Mosaic, Hubble, and Prometheus platforms through version-controlled, auditable configurations.

Core Responsibilities

You will be responsible for designing and implementing the alert management infrastructure, ensuring seamless integration across monitoring platforms.

  • Implementation: Develop and configure Pkl configurations to orchestrate both Pulumi and Terraform executors for alert management across environments.
  • Multi-Platform Alert Support: Implement alert creation workflows for Mosaic (via Pulumi), Hubble (via Terraform), and Prometheus monitoring platforms.
  • State Management: Configure and manage state backends using HCS (HybridCloud Services) or S3 with appropriate security controls and ClientConnect authentication.
  • Validation & Automation: Build automated validation pipelines to ensure alert configurations include required metadata (title, playbook location, severity, ownership, description) before deployment.
  • Change Management Integration: Implement approval workflows and CODEOWNERS-based enforcement to ensure alert changes require appropriate stakeholder review.
  • Documentation & Onboarding: Create comprehensive documentation and templates to enable self-service alert creation for ADP SRE teams, Product Engineering, and LOB SREs.

Required Technical Skills

  • Infrastructure as Code: Strong experience with Terraform (HCL) and/or Pulumi (TypeScript, Go, or Python). Must be comfortable managing IaC state and understanding declarative infrastructure patterns.
  • Configuration Languages: Familiarity with Pkl or willingness to quickly learn Client's Pkl configuration language.
  • Monitoring & Observability: Understanding of monitoring concepts including alert thresholds, severity levels, routing, and playbook integration. Experience with enterprise monitoring platforms is essential.
  • Cloud Infrastructure: Experience with AWS services (S3, DynamoDB for state locking).
  • Version Control & CI/CD: Strong Git workflows, experience with code review processes, and CI/CD pipeline integration for infrastructure changes.
  • Python: Proficiency in Python for scripting, automation, and potential integration work.

Preferred Qualifications (Nice-to-Have)

  • SRE Tools: Experience with Prometheus and Grafana.
  • SRE Background: Experience in Site Reliability Engineering, particularly around alert management, on-call workflows, and incident response.
  • Template Systems: Familiarity with Jinja templating or similar systems for generating IaC configurations.

Tech Stack

Python (Scripting and automation), Strong experience with Terraform (HCL) and/or Pulumi (TypeScript, Go, or

Python), Pkl (HCL, Jsonnet, or CUE), Prometheus/Hubble/Mosaic, and Git