Shape the Future of AI At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric approaches that are fundamental to AI development, and our work becomes even more essential as AI capabilities expand exponentially. About Labelbox We're the only company offering three integrated solutions for frontier AI development: Enterprise Platform & Tools : Advanced annotation tools, workflow automation, and quality control systems that enable teams to produce high-quality training data at scale Frontier Data Labeling Service : Specialized data labeling through Alignerr, leveraging subject matter experts for next-generation AI models Expert Marketplace : Connecting AI teams with highly skilled annotators and domain experts for flexible scaling Why Join Us High-Impact Environment : We operate like an early-stage startup, focusing on impact over process. You'll take on expanded responsibilities quickly, with career growth directly tied to your contributions. Technical Excellence : Work at the cutting edge of AI development, collaborating with industry leaders and shaping the future of artificial intelligence. Innovation at Speed : We celebrate those who take ownership, move fast, and deliver impact. Our environment rewards high agency and rapid execution. Continuous Growth : Every role requires continuous learning and evolution. You'll be surrounded by curious minds solving complex problems at the frontier of AI. Clear Ownership : You'll know exactly what you're responsible for and have the autonomy to execute. We empower people to drive results through clear ownership and metrics. The Role We’re hiring a Forward Deployed Engineer to own the design, development, and operationalization of reinforcement learning environments. You’ll build the sandboxed, reproducible execution environments that AI agents interact with during training and evaluation—things like terminal-based task benchmarks, browser and computer-use environments, and tool-augmented agentic workspaces. This is a hands-on engineering role. You’ll write production-quality infrastructure code, integrate with open-source RL tooling, and work closely with our data operations team to ensure environments are robust, observable, and ready for human annotators and model agents alike. You won’t be doing ML research, but you’ll need to deeply understand how RL training loops consume environments and where the bottlenecks live. What You’ll Do Design, build, and maintain sandboxed RL environments for agentic AI training—including terminal emulators, browser automation harnesses, computer-use simulators, and tool-augmented workspaces (e.g., environments built on frameworks like TerminalBench, OSWorld, and Tau-bench) Develop reproducible, containerized execution environments (Docker, VMs, lightweight sandboxes) that support deterministic task rollouts and reward signal collection Integrate with and extend open-source agentic tooling and custom CLI/API harnesses to enable multi-step agent interaction Build instrumentation and observability layers—structured logging, trajectory capture, state snapshotting—so training runs and human annotation sessions produce clean, auditable data Collaborate with data operations to design task curricula and evaluation protocols that stress-test model capabilities across environment types Own environment deployment and reliability: CI/CD pipelines, automated testing of environment configurations, and monitoring for drift or breakage across versions Rapidly prototype new environment types as client and internal requirements evolve, moving from spec to working system in days, not weeks What We’re Looking For Required 2+ years of professional software engineering experience, with strong fundamentals in Python and at least one systems-level language (Go, Rust, C++) Demonstrated experience with containerization and sandboxing (Docker, Podman, Firecracker, or similar) in production or near-production contexts Familiarity with RL concepts: MDPs, reward shaping, episode structure, observation/action spaces. You don’t need to have trained models, but you need to understand what an environment must provide to an RL training loop Experience building or maintaining developer tooling, CLI tools, or infrastructure automation Comfort working with browser automation frameworks or terminal interaction tooling Strong debugging instincts—you can trace failures across process boundaries, container layers, and network calls Ability to read and implement from academic papers and open-source benchmark repositories without extensive hand-holding Preferred Direct experience building or contributing to RL environments (Gymnasium/Gym, PettingZoo, or custom environment implementations) Experience with agentic AI evaluation frameworks (SWE-bench, WebArena, OSWorld,

Forward Deployed Engineer, RL Environments

About this role

Job Details

Explore More

Get jobs like this in your inbox

Similar Jobs