Machine Learning Engineer — Foundation Models & Systems

Forecareer

Full-time

San Francisco, CA

Job description

Overview

We are hiring a Machine Learning Engineer / Research Engineer for a well-funded AI infrastructure company building next-generation foundation models and high-performance model serving systems.

This role sits at the intersection of research and production, spanning large-scale pretraining, post-training (RL), evaluation environments, and deployment/inference optimization. The team works hands-on across the full model lifecycle and operates at serious scale.

About the Role

As a Research-Oriented ML Engineer, you’ll work across the full foundation-model stack:

Large-scale pretraining and scaling
Post-training and reinforcement learning
Sandbox environments for evaluation and agent learning
Deployment and inference optimization for production systems

You’ll move quickly from ideas to working systems, contribute production-grade infrastructure, and help deliver models that power real-world applications at scale.

What You’ll Work On

This role spans multiple tracks. Candidates may focus on one area or contribute across several.

Pretraining & Scaling

Train large foundation models across massive, heterogeneous datasets
Design stable training recipes and scaling strategies for new architectures
Improve throughput, memory efficiency, and utilization on large GPU clusters
Build and maintain distributed, fault-tolerant training pipelines

Post-Training & Reinforcement Learning

Develop post-training pipelines (SFT, preference optimization, RLHF / RLAIF, RL)
Curate and generate targeted datasets to improve specific model capabilities
Build reward models and evaluation loops for iterative improvement
Explore inference-time learning and compute-aware techniques

Sandbox Environments & Evaluation

Build scalable sandbox environments for agent learning and evaluation
Design automated evaluations for reasoning, tool use, and safety
Create offline and online environments that support RL-style training
Instrument systems for observability, reproducibility, and fast iteration

Deployment & Inference Optimization

Optimize inference latency and throughput for large models
Build high-performance serving pipelines (batching, KV caching, quantization)
Improve end-to-end efficiency, cost, and reliability in production
Profile and optimize runtime bottlenecks, GPU kernels, and memory behavior

Ideal Candidate ProfileTechnical Strength

Strong software engineering fundamentals (robust, performant systems)
Experience training or serving large neural networks (LLMs or similar)
Solid understanding of modern deep learning methods and literature
Comfort working in high-performance, GPU-based, distributed environments

Relevant Experience (one or more)

Large-scale distributed training (FSDP, ZeRO, Megatron-style systems)
Post-training pipelines (SFT, RLHF / RLAIF, eval loops)
Building RL environments, simulators, or agent frameworks
Inference optimization, model compression, quantization, profiling
Large-scale data pipelines for internet-scale ingestion and cleaning
Owning production ML systems end-to-end (monitoring, reliability)

Research Orientation

Ability to propose, test, and iterate on research ideas quickly
Strong experimental discipline: metrics, ablations, reproducibility
Builder mindset — turning ideas into working code and measurable results

Education

MS or PhD in Computer Science, Machine Learning, AI, Mathematics, or related field

Benefits

Competitive salary and meaningful equity
Medical, dental, and vision coverage
401(k)
Flexible time off
Daily meals and snacks (on-site)

Equal Opportunity

This employer is committed to building a diverse and inclusive team and is an equal opportunity employer.

Job Type: Full-time

Pay: $200,000.00 - $275,000.00 per year

Benefits:

401(k)
Dental insurance
Flexible schedule
Health insurance
Paid time off
Vision insurance

Work Location: In person

Apply