AI SCORE 8.5 / 10

Remote Forward Deployed Engineer - AI Inference

$190K–$313K/year

Python•Go•Kubernetes•AI Inference•Terraform•Helm

About the Role

We are seeking a Remote Forward Deployed Engineer - AI Inference to join the vLLM and LLM-D Engineering team at Red Hat. In this role, you will not just build software; you will be the bridge between our cutting-edge inference platform (LLM-D and vLLM) and our customers' most critical production environments. As a Remote Forward Deployed Engineer, you will interface directly with engineering teams at our customers to deploy, optimize, and scale distributed Large Language Model (LLM) inference systems. You will solve "last mile" infrastructure challenges that defy off-the-shelf solutions, ensuring that massive models run with low latency and high throughput on complex Kubernetes clusters.

What You'll Do

Orchestrate Distributed Inference: Deploy and configure LLM-D and vLLM on Kubernetes clusters, setting up advanced deployments like disaggregated serving and KV-cache aware routing.
Optimize for Production: Go beyond standard deployments by running performance benchmarks, tuning vLLM parameters, and configuring intelligent inference routing policies to meet SLOs for latency and throughput.
Code Side-by-Side: Collaborate with customer engineers to write production-quality code (Python/Go/YAML) that integrates our inference engine into their existing Kubernetes ecosystem.
Solve the "Unsolvable": Debug complex interactions between model architectures, hardware accelerators, and Kubernetes networking.
Feedback Loop: Act as the "Customer Zero" for our core engineering teams, channeling field learnings back to product development.

Requirements

8+ years of engineering experience in Backend Systems, SRE, or Infrastructure Engineering.
Deep Kubernetes expertise, fluent in K8s primitives and experienced with stateful workloads and high-performance networking.
Proficiency in Python and Go for systems programming.
Experience with Infrastructure as Code tools like Helm and Terraform.
Understanding of AI inference, including KV Caching and continuous batching in vLLM.

Nice to Have

Experience contributing to open-source AI infrastructure projects.
Knowledge of Envoy Proxy or Inference Gateway (IGW).
Familiarity with model optimization techniques like Quantization.

What We Offer

Comprehensive medical, dental, and vision coverage.
401(k) with employer match.
Paid time off and holidays.
Flexible Spending Account for healthcare and dependent care.
Paid parental leave plans for all new parents.

Language Requirements

EnglishC1

BasicIntermediateAdvancedNative

Why This Job8.5 of 10

This Remote Forward Deployed Engineer position at Red Hat offers a unique opportunity to work on cutting-edge AI technologies in a collaborative environment. With a competitive salary and comprehensive benefits, it's an attractive role for experienced engineers.

Salary Range

Required

0/1

Optional

0/1

Bonus

0/1

Who Will Succeed Here

→

Proficient in Python and Go, with hands-on experience in developing and deploying AI inference models using Kubernetes and Terraform, enabling seamless integration into various production environments.

→

Strong problem-solving mindset with a focus on optimizing AI inference performance; adept at troubleshooting complex systems remotely, ensuring high availability and reliability for customer-facing applications.

→

Demonstrated experience in managing cloud infrastructure with Helm and Terraform in a remote work setting, showcasing self-motivation and the ability to collaborate effectively across distributed teams.

Learning Resources

→Python for Data Science Handbookguide

→Kubernetes in Actionbook

→Deploying AI Models with Kubernetescourse

Career Path

Remote Forward Deployed Engineer - AI Inference(Now)→AI Solutions Architect(1-2 years)→Director of AI Engineering(3-5 years)

Market Overview

Market Size 2024

$10.4B

Annual Growth

11.2%

AI Adoption

45%

Investment

+78%

Labour Demand

+30%

Avg Salary

$135K

Skills & Requirements

Required

PythonGoKubernetes

Growing in Demand

Machine LearningCloud ComputingData Engineering

Declining

PerlRuby on Rails

Domain Trends

Rise of AI Inference Optimization

With AI inference applications growing, 60% of organizations are investing in optimizing their inference pipelines to enhance performance and reduce latency.

Shift to Cloud-Native Architectures

Over 50% of companies are transitioning to cloud-native architectures, leveraging Kubernetes and microservices for scalable AI deployments.

Increased Focus on MLOps

MLOps adoption has surged by 70% as companies seek to streamline AI deployment processes, making skills in Terraform and Helm increasingly valuable.

Industry News

Loading latest industry news...

Finding relevant articles from the last 6 months

All job postings are automatically gathered by algorithms. We do not review or verify listings, be careful when applying and do not sign-in with iCloud or Google services.