CareerZen Logo
Company logo

DevOps Engineer

AI Front Desk

Full-time

United States

Job description

DevOps Engineer

About AI Front Desk

AI Front Desk is an intelligent AI business software platform that enables organizations to build and operate AI powered features through a unified, extensible architecture. Our platform supports multi model decision making, workflow orchestration, and human in the loop systems while maintaining strong governance, security, and compliance standards. We power production AI systems used across multiple industries and operate at scale in real world business environments.

Role Overview

We are seeking a DevOps Engineer to join our infrastructure team and support the development and operation of our AWS based cloud platform. This role focuses on building reliable, scalable, and secure infrastructure for an AI driven SaaS platform. You will work across Kubernetes, CI/CD, infrastructure automation, observability, and data services while partnering closely with backend, frontend, and ML engineers to ensure smooth deployments, high availability, and strong operational foundations.

This is a hands-on role for an engineer who enjoys ownership, system thinking, and improving how complex systems run in production.

Key Responsibilities

Cloud Infrastructure and Operations

  • Design, deploy, and maintain AWS infrastructure supporting production workloads
  • Operate Kubernetes clusters on AWS EKS for containerized services
  • Manage core AWS services including compute, networking, storage, and databases
  • Implement high availability, backup, and disaster recovery strategies
  • Monitor and optimize infrastructure performance, reliability, and cost

Kubernetes and Containerization

  • Manage Kubernetes deployments, services, ingress, and autoscaling
  • Support container image builds and registry management
  • Apply Kubernetes security best practices including RBAC and network policies
  • Partner with engineering teams to improve deployment patterns and resource efficiency

CI/CD and Automation

  • Build and maintain CI/CD pipelines for application and infrastructure deployments
  • Implement multi environment deployment workflows for development, staging, and production
  • Support deployment strategies such as rolling, blue green, or canary releases
  • Improve deployment reliability, rollback strategies, and release visibility

Infrastructure as Code

  • Define and manage infrastructure using Terraform or CloudFormation
  • Maintain versioned, repeatable infrastructure configurations
  • Manage infrastructure changes, state, and drift detection
  • Automate provisioning and updates across environments

Observability and Reliability

  • Implement and maintain monitoring, logging, and alerting systems
  • Support metrics, dashboards, and tracing for platform services
  • Respond to incidents and participate in on call rotations
  • Contribute to reliability improvements and operational best practices

Collaboration and Documentation

  • Work closely with engineering teams on infrastructure and deployment needs
  • Document system architecture, operational procedures, and runbooks
  • Participate in infrastructure planning and platform evolution discussions

Required Experience

  • 2 to 4 years of hands on experience operating AWS based production systems
  • Experience working with Kubernetes and Docker in real world environments
  • Experience building and maintaining CI/CD pipelines
  • Infrastructure as Code experience using Terraform or CloudFormation
  • Strong Linux fundamentals and understanding of networking concepts
  • Experience working in cross functional engineering teams

Preferred Experience

  • Experience with service meshes, API gateways, or event streaming platforms
  • Familiarity with observability tools such as Prometheus, Grafana, or OpenTelemetry
  • Experience managing PostgreSQL or Redis in cloud environments
  • Exposure to security best practices, secrets management, and compliance requirements
  • Interest in AI or data intensive platform infrastructure

Technical Stack

Our current and evolving stack includes AWS, Kubernetes on EKS, Docker, Terraform, PostgreSQL, Redis, Kafka, and modern observability tooling. Experience with some but not all of these technologies is expected.

What We Offer

  • Opportunity to work on production AI platform infrastructure
  • Collaborative environment with experienced engineers
  • Competitive compensation and benefits
  • Professional development and growth opportunities
  • Flexible work arrangements
  • Direct impact on real world AI deployments

Salary Range: $85,000 to $110,000

Location: Remote (required collaboration in Austin, TX periodically)

Employment Type: Full time

Experience Level: Mid level

AI Front Desk is an equal opportunity employer. We are committed to building a diverse and inclusive team.

Job Type: Full-time

Pay: $85,000.00 - $110,000.00 per year

Benefits:

  • Dental insurance
  • Health insurance
  • Paid time off

Application Question(s):

  • Do you have experience working with Kubernetes and Docker in real world environments? Please explain
  • Do you have experience building and maintaining CI/CD pipelines? Please explain
  • Do you have infrastructure as Code experience using Terraform or CloudFormation? Please explain
  • Do you have strong Linux fundamentals and understanding of networking concepts? Please explain.
  • Do you have experience working in cross functional engineering teams? Please explain

Experience:

  • software engineering: 2 years (Required)

Work Location: Remote