Senior Simulation Data Engineer
full-time
senior
Posted 3 days ago
About this role
About us
PhysicsX is a deep-tech company with roots in numerical physics and Formula One, dedicated to accelerating hardware innovation at the speed of software.
We are building an AI-driven simulation software stack for engineering and manufacturing across advanced industries. By enabling high-fidelity, multi-physics simulation through AI inference across the entire engineering lifecycle, PhysicsX unlocks new levels of optimization and automation in design, manufacturing, and operations — empowering engineers to push the boundaries of possibility. Our customers include leading innovators in Aerospace & Defense, Materials, Energy, Semiconductors, and Automotive.
Note: We are currently recruiting for multiple positions, however please only apply for the role that best aligns with your skillset and career goals.
The Role
The Senior Simulation Data Engineer will extend and operate the infrastructure that powers our research Data Factory. You will be responsible for the end-to-end pipeline: from geometry preparation and simulation orchestration through validation, post-processing, and delivery to downstream ML training systems, using PhysicsX platform orchestration services where synergies exist.
This role sits at the intersection of HPC engineering and data engineering. You will orchestrate long-running CFD simulations at scale, build robust data pipelines, and ensure that every simulation we produce meets rigorous quality standards.
Team Context
In this role, you will be vertically embedded in Research , working daily with:
Research Scientists who define data requirements and quality standards
ML Engineers who consume Data Factory outputs for model training
ML Infrastructure Engineers who are accountable for downstream training infrastructure
You will have end-to-end responsibilities over the Data Factory, with the autonomy to make architectural decisions and the responsibility to keep data flowing reliably.
Horizontally, you will be part of an infrastructure engineering group responsible for infrastructure across the company.
What you will do
Simulation Orchestration
Extend and operate the Data Factory infrastructure that orchestrates thousands of CFD simulations per day on cloud compute
Design and operate job scheduling systems that maximize throughput while handling failures gracefully
Build monitoring and alerting to detect simulation failures, convergence issues, and resource bottlenecks early
Data Pipeline Engineering
Build high-performance data pipelines that move simulation outputs from solver results to ML-ready training data
Implement geometry preprocessing workflows (mesh preparation, morphing, watertightness validation)
Design and operate post-processing pipelines: surface decimation, field interpolation, format conversion
Optimize I/O performance for large mesh datasets
Data Quality and Validation
Implement comprehensive validation checks at every pipeline stage: solver convergence, physical field bounds, post-processing fidelity
Build systems that capture and quarantine bad data before they reach training pipelines
Track and report data quality metrics across the entire Data Factory
Work towards full provenance: training samples should be traceable back to their source geometry and simulation configuration
Integration and Delivery
Deliver validated datasets to downstream ML training infrastructure in formats optimized for efficient data loading
Design data versioning and cataloging systems that support reproducible training runs
Work closely with ML Infrastructure Engineers to ensure smooth handoff between data production and model training
Support multi-dataset training workflows
What you bring to the table
Ability to scope and effectively deliver projects, prioritising activity as needed.
Problem-solving skills and the ability to analyse issues, identify causes, and recommend solutions quickly.
Excellent collaboration and communication skills, especially in a research setting. You can translate "the model isn't converging" into infrastructure hypotheses and solutions, and can bridge technical abstractions with implementations.
5+ years of experience in data engineering, HPC engineering, or simulation infrastructure.
Strong experience with orchestration systems: SLURM, Kubernetes, Temporal
Production data pipeline experience: you've built and operated pipelines that process large volumes of data reliably
Proficiency in Python for pipeline development and automation
Systems engineering fundamentals: Linux, networking, storage systems, performance debugging
Experience with cloud infrastructure; ****ideally CoreWeave or similar GPU/HPC-focused clouds
Background in HPC for simulation engineering: experience with CFD, FEA, or similar computational workflows (StarCCM+, OpenFOAM, ANSYS, etc.)
Experience with geometry processing: mesh manipulation, CAD formats, PyVista
Familiarity with scientific data formats: HDF5, VTK, NetCDF,
Similar Jobs
Related searches: