{"access":{"advertiser_pricing_url":"https://aidevboard.com/pricing","catalog_url":"https://aidevboard.com/api/v1/catalog","description":"Public read endpoints are open and free. API keys are optional for stable agent identity and keyed hourly throttling.","docs_url":"https://aidevboard.com/docs","mode":"open","register_url":"https://aidevboard.com/api/v1/register"},"degraded":false,"estimated":false,"has_next":true,"jobs":[{"id":"48720738-0f4b-483d-9739-14039ae457d0","company_id":"a0000000-0000-0000-0000-000000000001","title":"Research Engineer, Performance RL (Reinforcement Learning) ","slug":"research-engineer-performance-rl-2f0da25a","description":"About Anthropic \n Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.\n About the RL Teams \n Our Reinforcement Learning teams lead Anthropic's reinforcement learning research and development, playing a critical role in advancing our AI systems. We've contributed to all Claude models, with significant impacts on the autonomy and coding capabilities of Claude Sonnet 4.6 and Opus 4.6. Our work spans several key areas:\n \n \n Developing systems that enable models to use computers effectively\n \n Advancing code generation through reinforcement learning\n \n Pioneering fundamental RL research for large language models\n \n Building scalable RL infrastructure and training methodologies\n \n Enhancing model reasoning capabilities\n \n We collaborate closely with Anthropic's alignment and frontier red teams to ensure our systems are both capable and safe. We partner with the applied production training team to bring research innovations into deployed models, and are dedicated to implement our research at scale. Our Reinforcement Learning teams sit at the intersection of cutting-edge research and engineering excellence, with a deep commitment to building high-quality, scalable systems that push the boundaries of what AI can accomplish.\n About the Role \n We're hiring for the Code RL team within the RL organization. As a Research Engineer, you'll advance our models' ability to safely write correct, fast code for accelerators.\n You'll need to know accelerator performance well to turn it into tasks and signals models can learn from. Specifically, you will:\n \n \n Invent, design and implement RL environments and evaluations.\n \n Conduct experiments and shape our research roadmap.\n \n Deliver your work into training runs.\n \n Collaborate with other researchers, engineers, and performance engineering specialists across and outside Anthropic.\n \n You may be a good fit if you:\n \n \n Have expertise with accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch).\n \n Have worked across the stack – kernels, model code, distributed systems.\n \n Know how to balance research exploration with engineering implementation.\n \n Are passionate about AI's potential and committed to developing safe and beneficial systems.\n \n Strong candidates may also have:\n \n \n Experience with reinforcement learning.\n \n Experience porting ML workloads between different types of accelerators.\n \n Familiarity with LLM training methodologies.\n The annual compensation range for this role is listed below. \n For sales roles, the range provided is the role’s On Target Earnings (\"OTE\") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\n Annual Salary:\n $350,000 — $850,000 USD \n Logistics \n Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience\n Required field of study:  A field relevant to the role as demonstrated through coursework, training, or professional experience\n Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position\n Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.\n Visa sponsorship:  We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.\n We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.  Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a ","salary_min":350000,"salary_max":850000,"location":"San Francisco, CA","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"principal","tags":["reinforcement-learning","code-generation","search","pytorch","llm","jax","fine-tuning","gpu"],"apply_url":"https://job-boards.greenhouse.io/anthropic/jobs/5160330008","is_featured":true,"is_sticky":true,"status":"active","published_at":"2026-03-23T16:27:59Z","expires_at":"2026-08-15T14:00:29.666185Z","created_at":"2026-04-13T09:36:00.086246Z","updated_at":"2026-07-16T14:00:29.796553Z","company_name":"Anthropic","company_slug":"anthropic","company_logo_url":"https://www.google.com/s2/favicons?domain=anthropic.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/48720738-0f4b-483d-9739-14039ae457d0"},{"id":"b2503a2d-d800-43bc-9f84-11af04a6a4b4","company_id":"77beb456-fc80-40a4-b773-f0b17d1ece4c","title":"Generative AI - Graphics Engineer","slug":"generative-ai-graphics-engineer-429a27ce","description":"WHO YOU ARE\n\nWe are looking for skilled graphics engineer who have a deep command of modern C++ and GPU programming, a strong mathematical foundation, and an expert understanding of computer graphics—whether rendering, geometry processing, simulation, or advanced real-time techniques. You collaborate naturally with artists, researchers, and engineers, explaining complex ideas with clarity and learning from diverse perspectives. You’re not afraid of new ideas or unfamiliar pipelines. Most importantly, you’re excited to build the next generation of 3D creation technology—graphics systems that will empower millions of creators worldwide.\n\n\nWHO WE ARE\n\nAt Meshy, we believe 3D creation should be boundless and accessible. Our mission statement is simple: unleash creativity. We built a full pipeline for 3D content ranging from text / image to 3D, texturing, texture editing, animation rigging, etc. We also built a vibrant community for our creators, where people can share their work, take inspiration from others, and even use it as an asset marketplace for their games and prototypes. We are the market leader in 3D generative AI, recognized as the No.1 in popularity among 3D AI tools (according to 2024 A16Z Games survey), and we generate real value and is used by enterprises (including Meta, Square Enix, Deepmind, etc.) and millions of end users. Meshy is used in game and film production, in 3D printing, in industrial product design, in enablement of novel product features such as user-generated content, and even in training and simulation for robotics and physical AI.\n\n\nYOUR NEXT CHALLENGE\n\nAs a core member of Meshy’s algorithm team, you will design and build the next generation of high-performance graphics systems that power our 3D generative AI training and products. You will collaborate closely with graphics experts, generative AI researchers, and infrastructure engineers to enable new creative capabilities and push the boundaries of what AI-empowered 3D pipelines can achieve.\n\n \n\nIn this role, you will:\n\n - Build and optimize high-performance graphics components—rendering kernels, geometry processing operators, and supporting systems.\n\n - Develop robust production-quality pipelines that integrate with data pipelines, generative models and artist-facing applications.\n\n - Work across GPU clusters, cloud environments, and local DCC tools to ensure seamless interoperability and scalability.\n\n - Collaborate closely with artists, product teams, and ML researchers to translate creative requirements into technical implementations.\n\n - Contribute to internal tooling, demos, documentation, open-source initiatives, or technical reports that elevate Meshy’s graphics capabilities.\n\n\nWHAT WE'RE LOOKING FOR\n\n - Expert-level C++ and GPU programming skills, with a strong ability to write high-performance, memory-efficient code.\n\n - Solid mathematical foundation with deep understanding of computer graphics—either rendering, geometry processing, or both.\n\n - Hands-on experience building production-grade graphics systems, such as rendering engines, geometry pipelines, asset tools, or similar large-scale systems.\n\n - Strong engineering discipline—clean code, reproducible results, rigorous profiling, and sustainable system design.\n\n - Working knowledge of major DCC tools (Houdini, Blender, Maya), including experience developing scripts, plug-ins, or custom tools within these environments is a plus.\n\n - Experience in AAA game development, VFX pipelines, or other high-end 3D production environments is a plus.\n\n - Demonstrated contributions to open-source graphics projects or publications in top-tier CG venues (SIGGRAPH, etc.) are pluses.\n\n\nA LITTLE MORE ABOUT MESHY.AI\n\nTrusted by Meta, Square Enix, Deepmind and more, Meshy is redefining 3D creation with generative AI. We empower artists, designers, engineers, hobbyists, and makers to bring immersive worlds, characters, and experiences to reality in minutes instead of months.\n\n \n\nIn addition to our core mission of unleashing creativity, we build a culture that we enjoy and are proud of. Here are some highlights:\n\n - We value intelligence and the pursuit of knowledge. We are a global team of generative-AI pioneers, computer-graphics veterans, and product builders who believe human expression and enjoyment is the ultimate frontier of computing.\n\n - We care deeply about our work, our users, and each other. Empathy and passion drive us forward. We have a culture of directness and truthfulness, therefore we value constructive criticism. Being direct and truthful is the most sincere form of trust and care.\n\n - We trust our instincts and are not afraid to take bold risks. Meshy was born from a few-hour prototype, a bold pivot for a team that had very little experience in AI. Innovation requires courage.\n\n - We have a keen eye for quality and aesthetics. Our products are not just functional but also beautiful. The same aesthetics permeate through our culture, our code and ","salary_min":175000,"salary_max":300000,"location":"San Francisco, CA","workplace":"remote","remote_scope":"unknown","job_type":"full-time","experience_level":"senior","tags":["gpu","generative-ai","data-pipeline","robotics","computer-graphics","research"],"apply_url":"https://jobs.ashbyhq.com/meshy/e08ff336-379d-4cde-8df0-c5ab335517b3/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-15T20:04:34.461Z","expires_at":"2026-08-15T14:10:57.040052Z","created_at":"2026-07-16T14:10:57.180092Z","updated_at":"2026-07-16T14:10:57.180092Z","company_name":"Meshy","company_slug":"meshy","company_logo_url":"https://www.google.com/s2/favicons?domain=meshy.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/b2503a2d-d800-43bc-9f84-11af04a6a4b4"},{"id":"c86d2f91-914b-4899-b3e1-e61be0732f6a","company_id":"77beb456-fc80-40a4-b773-f0b17d1ece4c","title":"Generative AI - ML System Engineering","slug":"generative-ai-ml-system-engineering-b7b36ecf","description":"WHO YOU ARE\n\nWe are looking for Machine Learning Systems Engineers who can help us build the world's largest end-to-end 3D native machine learning systems. You will help us build our end to end ML framework dedicated for 3D, from pretraining, to finetuning, inferencing, etc. We expect a combination of strong hands on engineering skills, eagerness to learn new things, and thrives in a fast-paced, high-ownership environment.\n\n\nWHO WE ARE\n\nAt Meshy, we believe 3D creation should be boundless and accessible. Our mission statement is simple: unleash creativity. We built a full pipeline for 3D content ranging from text / image to 3D, texturing, texture editing, animation rigging, etc. We also built a vibrant community for our creators, where people can share their work, take inspiration from others, and even use it as an asset marketplace for their games and prototypes. We are the market leader in 3D generative AI, recognized as the No.1 in popularity among 3D AI tools (according to 2024 A16Z Games survey), and we generate real value and is used by enterprises (including Meta, Square Enix, Deepmind, etc.) and millions of end users. Meshy is used in game and film production, in 3D printing, in industrial product design, in enablement of novel product features such as user-generated content, and even in training and simulation for robotics and physical AI.\n\n\nYOUR NEXT CHALLENGE\n\n3D is the brave new frontier of Gen AI. Our work here involves a lot of unique new challenges in both training and inference. Your next challenge at Meshy would involve the full stack of AI, from debugging and monitoring the hardware platform, building training framework, scaling high-throughput 3D data pipelines for our foundational training, co-designing novel model architectures with researchers, to the novel challenge of efficient inference engines for diffusion models and more. Here are some examples for each side of the challenge:\n\n \n\nOn the training side\n\n - Work closely with researchers to co-design the next frontier of 3D \u0026 Spatial AI.\n\n - Build and debug on top of modern PyTorch, for maximum parallelism and efficiency, and build clean and intuitive training infrastructure for our in-house foundational models.\n\n - Identifying bottlenecks and optimizing for high throughput \u0026 efficient distributed model training across hundreds to thousands of GPUs.\n\n - Implementing and maintaining 3D specific custom operators in Triton or CUDA.\n\n - Implementing and maintaining novel data-loading framework and libraries.\n\nOn the inference side\n\n - Building efficient inference endpoints with complex multi-stage model pipelines.\n\n - Optimizing models through compilation, fusion, quantization, etc.\n\n\nWHAT WE'RE LOOKING FOR\n\n - Experience in machine learning or high performance graphics.\n\n - Solid practical understanding of at least one machine learning framework (e.g. PyTorch, JAX).\n\n - Strong ability to write beautiful and maintainable code in Python and/or C++.\n\n - Ability to learn fast and dive into new concepts or complex codebases.\n\n - Performance and efficiency oriented mindset, with a strong interest in the tiniest detail.\n\n - Strong communication skills for working in a globally distributed team.\n\n\nNICE TO HAVE\n\n - A strong passion to navigate through the PyTorch internals, with hands-on experience in areas like torch.compile , fully_shard (FSDP2) APIs.\n\n - Experience with building Triton kernels.\n\n - Experiences with large-scale distributed training, familiarity with modern parallelization techniques: DP, TP, CP, PP, zero redundancy optimizers, etc.\n\n - Experience with diffusion models in 3D or video.\n\n - Experience with low precision bf16 or fp8 training.\n\n\nA LITTLE MORE ABOUT MESHY.AI\n\nTrusted by Meta, Square Enix, Deepmind and more, Meshy is redefining 3D creation with generative AI. We empower artists, designers, engineers, hobbyists, and makers to bring immersive worlds, characters, and experiences to reality in minutes instead of months.\n\n \n\nIn addition to our core mission of unleashing creativity, we build a culture that we enjoy and are proud of. Here are some highlights:\n\n - We value intelligence and the pursuit of knowledge. We are a global team of generative-AI pioneers, computer-graphics veterans, and product builders who believe human expression and enjoyment is the ultimate frontier of computing.\n\n - We care deeply about our work, our users, and each other. Empathy and passion drive us forward. We have a culture of directness and truthfulness, therefore we value constructive criticism. Being direct and truthful is the most sincere form of trust and care.\n\n - We trust our instincts and are not afraid to take bold risks. Meshy was born from a few-hour prototype, a bold pivot for a team that had very little experience in AI. Innovation requires courage.\n\n - We have a keen eye for quality and aesthetics. Our products are not just functional but also beautiful. The same aesthetics permeate through our culture, our code and are the ","salary_min":175000,"salary_max":300000,"location":"San Francisco, CA","workplace":"remote","remote_scope":"unknown","job_type":"full-time","experience_level":"senior","tags":["fine-tuning","pytorch","distributed-systems","pre-training","generative-ai","gpu","robotics","data-pipeline"],"apply_url":"https://jobs.ashbyhq.com/meshy/3f94dcd6-9d31-47e7-a6b1-66e49a777056/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-15T19:52:38.747Z","expires_at":"2026-08-15T14:10:55.956272Z","created_at":"2026-07-16T14:10:56.084849Z","updated_at":"2026-07-16T14:10:56.084849Z","company_name":"Meshy","company_slug":"meshy","company_logo_url":"https://www.google.com/s2/favicons?domain=meshy.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/c86d2f91-914b-4899-b3e1-e61be0732f6a"},{"id":"6db6f99f-a30e-4524-a8e4-b34154992b4d","company_id":"77beb456-fc80-40a4-b773-f0b17d1ece4c","title":"Generative AI - 3D Foundation Model","slug":"generative-ai-3d-foundation-model-cb2667cc","description":"WHO YOU ARE\n\nYou are a talented, hands-on researcher who thrives in a fast-paced environment, is self-directed, a team player, and knows how to get things done efficiently. You have deep understanding of the transformer architecture, have strong python and tensor programming skills, have a vision for AI beyond linear sequences, and you believe in \"the scaling law\". You can translate high-level goals into concrete research and implementation steps, set an approach, and follow through. When it's time to explain your ideas, you bring clarity to complex technical issues. You are not afraid of confronting new ideas, and you are eager to share your knowledge with the team. You use these skills to create real-world benefits for our researchers, engineers, and millions of users, and you are excited to help advance our effort to push the state of the art of AI that understands and generates 3D worlds.\n\n\nWHO WE ARE\n\nAt Meshy, we believe 3D creation should be boundless and accessible. Our mission statement is simple: unleash creativity. We built a full pipeline for 3D content ranging from text / image to 3D, texturing, texture editing, animation rigging, etc. We also built a vibrant community for our creators, where people can share their work, take inspiration from others, and even use it as an asset marketplace for their games and prototypes. We are the market leader in 3D generative AI, recognized as the No.1 in popularity among 3D AI tools (according to 2024 A16Z Games survey), and we generate real value and is used by enterprises (including Meta, Square Enix, Deepmind, etc.) and millions of end users. Meshy is used in game and film production, in 3D printing, in industrial product design, in enablement of novel product features such as user-generated content, and even in training and simulation for robotics and physical AI.\n\n\nYOUR NEXT CHALLENGE\n\nAs a core member of the team of research scientists and machine learning engineers at Meshy, you will drive the development of our core 3D-native generative foundational model. In this role, you will join our foundational research to advance 3D AI, apply learnings from other fields of ML, and pushing the state of the art. You will also work towards long-term ambitious research goals, while identifying intermediate milestones.\n\n \n\nThe essential functions include, but are not limited to the following:\n\n - Design, train, and refine large-scale 3D generative models from covering pre-training, post-training, and emerging paradigms in diffusion, flow matching, and multi-modal learning.\n\n - Bridge the gap between cutting-edge research and product, deploy models in real products used by millions of creators, using human feedback and creative evaluation.\n\n - Create novel model architectures to make 3D generation faster, higher-quality, and more controllable.\n\n - Collaborate with infrastructure and systems teams to build scalable training, and data pipelines across GPU clusters and cloud environments.\n\n - Bring engineering discipline into an fast-paced research environment: elegant code, reproducible experiments, and building software as a team.\n\n - Share insights and breakthroughs through internal demos, open-source contributions, or technical reports that advance the field of 3D generative AI.\n\n\nWHAT WE'RE LOOKING FOR\n\n - Strong engineering skills in Python and deep learning frameworks (preferably PyTorch); comfortable moving between research prototypes and production systems.\n\n - Familiar with Transformers and modern generative AI models (Diffusion / flow matching, VAE, etc.).\n\n - Curiosity and passion for multi-modal AI, and have an intuitive understanding of how models perceive, represent, and generate 3D worlds.\n\n - Familiar with high performance training on large scale infrastructure (e.g., SLURM, Ray, k8s) is a plus.\n\n - Contributions to popular open-source machine learning projects or publications in top-tier CV / ML conferences is a plus.\n\n\nA LITTLE MORE ABOUT MESHY.AI\n\nTrusted by Meta, Square Enix, Deepmind and more, Meshy is redefining 3D creation with generative AI. We empower artists, designers, engineers, hobbyists, and makers to bring immersive worlds, characters, and experiences to reality in minutes instead of months.\n\n \n\nIn addition to our core mission of unleashing creativity, we build a culture that we enjoy and are proud of. Here are some highlights:\n\n - We value intelligence and the pursuit of knowledge. We are a global team of generative-AI pioneers, computer-graphics veterans, and product builders who believe human expression and enjoyment is the ultimate frontier of computing.\n\n - We care deeply about our work, our users, and each other. Empathy and passion drive us forward. We have a culture of directness and truthfulness, therefore we value constructive criticism. Being direct and truthful is the most sincere form of trust and care.\n\n - We trust our instincts and are not afraid to take bold risks. Meshy was born from a few-hour prototype, a bold pivot","salary_min":175000,"salary_max":300000,"location":"San Francisco, CA","workplace":"remote","remote_scope":"unknown","job_type":"full-time","experience_level":"senior","tags":["generative-ai","gpu","robotics","deep-learning","data-pipeline","pre-training","pytorch","research"],"apply_url":"https://jobs.ashbyhq.com/meshy/f52aa172-0212-4db8-a93d-406b910b9fea/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-15T19:18:36.925Z","expires_at":"2026-08-15T14:10:56.873764Z","created_at":"2026-07-16T14:10:56.993732Z","updated_at":"2026-07-16T14:10:56.993732Z","company_name":"Meshy","company_slug":"meshy","company_logo_url":"https://www.google.com/s2/favicons?domain=meshy.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/6db6f99f-a30e-4524-a8e4-b34154992b4d"},{"id":"021f3b70-f0d5-4666-a5e1-431d120b0e63","company_id":"31ae48bc-c938-4c26-a348-0bf3c089a446","title":"Senior Software Engineer - GPU Kernel Authoring \u0026 Optimization","slug":"senior-software-engineer-gpu-kernel-authoring-optimization-d4eed12b","description":"CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at  www.coreweave.com . \n About the role: \n CoreWeave is the top-rated AI-cloud for high-performance GPU infrastructure across AI/ML, visual effects, rendering, and real-time inference. Our stack is engineered for speed, scale, and cost-efficiency—an unmatched alternative to traditional hyperscalers. At CoreWeave, infrastructure is the product.\n We're looking for a Senior Engineer for CoreWeave's Benchmarking \u0026 Performance team, focused on kernel authoring and optimization. You will write, profile, and tune the GPU kernels that sit on the critical path of large-scale model serving—squeezing maximum throughput and minimum latency out of every SM, tensor core, and byte of memory bandwidth. You will also aid us in achieving industry-leading end-to-end performance benchmarking publications such as MLPerf.\n You will be an owner who leads designs, raises engineering standards, and delivers measurable improvements to latency, throughput, and reliability across our inference stack. You'll partner with product, orchestration, and hardware teams to turn kernel-level wins into end-to-end gains and meet strict P99 SLAs at scale.\n \n Author, profile, and optimize CUDA kernels—GEMMs, attention, MoE routing, quantization, KV-cache, and fused epilogues—on the critical path of LLM inference.\n Optimize for the hardware: exploit tensor cores and tune occupancy, memory coalescing, shared-memory/register usage, and overlap of compute with data movement.\n Use kernel-authoring DSLs and compilers to prototype and ship kernels quickly without sacrificing performance.\n Benchmark rigorously: build reproducible microbenchmarks and roofline analyses, and validate that kernel-level wins translate to end-to-end latency/throughput gains across model-serving stacks (vLLM, TensorRT-LLM, llm-d, SGLang).\n Implement and maintain benchmarking workflows for end-to-end MLPerf Inference (and Training) runs, including workload setup, cluster configuration, runbooks, and result validation.\n Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.\n Mentor junior engineers; review cross-team designs and elevate coding/testing standards.\n Help ensure reproducible, well-documented benchmarking and kernel-optimization processes.\n \n Who You Are: \n \n 5+ years of experience building high-performance computing, GPU/accelerator software, or performance-critical systems.\n Hands-on CUDA experience is required—you have written and optimized custom kernels and are fluent with the CUDA programming and memory model.\n Deep understanding of GPU architecture and performance: tensor cores, warp/occupancy tuning, the memory hierarchy and bandwidth, NVLink/PCIe, and profiling with Nsight Compute/Systems.\n Strong coding in C++ and Python; comfortable reading and writing low-level, performance-sensitive code.\n Familiarity with model-serving stacks (vLLM, TensorRT-LLM, llm-d, SGLang) and the kernels that dominate their inference cost.\n Strong communicator comfortable collaborating with cross-functional teams and external partners.\n \n Preferred: \n \n Triton or Mojo for authoring custom GPU kernels — highly desired.\n CuTe DSL for Python-based kernel authoring on NVIDIA GPUs.\n JAX and its Pallas kernel language for authoring kernels on GPU/TPU.\n HIP / ROCm and AMD GPU experience.\n NCCL and collective-communication performance.\n Experience with alternative accelerators such as Google TPUs and Meta's MTIA.\n Familiarity with kernel-authoring DSLs and nano-compilers such as KNYFE and its Block DSL.\n Experience with Kubernetes at production scale.\n Experience with SUNK (Slurm on Kubernetes) / Slurm for scheduling large GPU jobs.\n Experience running MLPerf submissions or similar large-scale audited benchmarks.\n Contributions to OSS projects such as vLLM, SGLang, PyTorch, Triton, or CUTLASS.\n \n Wondering if you're a good fit? \n We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match.\n Why CoreWeave? \n Help shape an industry-defining inference platform that enables teams to deploy generative AI and real-time applications at scale. If squeezing every last microsecond out of GPU kernels and delivering reliable model serving excites you, this is the place to build. We're in an exciting stage of hyper-growth that you will not want to miss out on. We're not afraid of a little chaos, and we're constantly ","salary_min":182000,"salary_max":242000,"location":"Sunnyvale, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["mlops","pytorch","gpu","generative-ai","llm","jax","computer-graphics"],"apply_url":"https://coreweave.com/careers/job?4697100006\u0026board=coreweave\u0026gh_jid=4697100006","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-14T22:01:55Z","expires_at":"2026-08-15T14:05:36.795Z","created_at":"2026-07-15T14:06:51.909822Z","updated_at":"2026-07-16T14:05:36.92287Z","company_name":"CoreWeave","company_slug":"coreweave","company_logo_url":"https://www.google.com/s2/favicons?domain=coreweave.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/021f3b70-f0d5-4666-a5e1-431d120b0e63"},{"id":"b5fee987-f2ea-4b80-a04f-395e616158d8","company_id":"c93e0284-9c76-4a85-9905-494865ab9278","title":"AI Systems Performance Engineer - New Graduate","slug":"ai-systems-performance-engineer-new-graduate-e4bfa2f7","description":"The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale. \n SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets. \n About The Role \n We are seeking a talented and highly motivated AI Systems Performance Engineer to bring up and optimize state-of-the-art foundation models on SambaNova's reconfigurable dataflow platform.\n You'll work hands-on with advanced AI models — such as DeepSeek, GLM, Kimi, GPT OSS, Llama, Qwen, and other frontier architectures — and learn how modern AI systems achieve high throughput, low latency, and efficient large-scale inference.\n In this role, you'll work at the intersection of machine learning and computer systems, collaborating with engineers across model, compiler, runtime, and hardware teams. This is an ideal opportunity for a new graduate who is passionate about understanding how AI models execute on real hardware and wants to help build the next generation of high-performance AI systems.\n Responsibilities \n \n Bring up cutting-edge foundation models, including LLMs and multimodal models, on the SambaNova platform through the SambaNova software stack.\n Analyze and profile model execution to identify performance bottlenecks across model, compiler, runtime, and hardware layers.\n Optimize AI workloads for throughput, latency, memory efficiency, and scalability.\n Collaborate with machine learning, compiler, runtime, and hardware engineers to develop high-performance AI applications.\n Explore and integrate new techniques in model architecture, quantization, scheduling, caching, and memory optimization.\n Develop tools, benchmarks, and performance analysis methodologies for large-scale AI inference.\n Investigate new model architectures and translate research advances into efficient implementations on production AI systems.\n Contribute ideas for dataflow, scheduling, and system optimizations for both single-node and distributed inference.\n \n Basic Qualifications \n \n Bachelor's or Master's degree in computer science, electrical engineering, computer engineering, or a related technical field (e.g., applied mathematics, physics, or statistics), completed or expected before the start date.\n Strong programming skills in Python, C++, or a similar programming language.\n Solid foundations in algorithms, data structures, computer architecture, operating systems, or parallel computing.\n Familiarity with deep learning and at least one major ML framework, such as PyTorch, TensorFlow, or JAX.\n Strong analytical and problem-solving skills, with an interest in understanding and optimizing system performance.\n Ability and enthusiasm to learn across machine learning, software systems, and hardware.\n \n Preferred Qualifications \n \n Coursework, research, internship, or project experience in machine learning systems, computer architecture, compilers, distributed systems, or high-performance computing.\n Hands-on experience with LLMs, multimodal models, or transformer architectures.\n Familiarity with model inference, KV cache, batching, quantization, or distributed execution.\n Experience with GPU or accelerator programming using CUDA, Triton, OpenCL, or similar technologies.\n Familiarity with frameworks such as vLLM, DeepSpeed, Megatron, or TensorRT.\n Understanding of memory hierarchy, caching, parallelism, or scheduling.\n Experience profiling and optimizing the performance of software or ML workloads.\n Research publications, open-source contributions, programming competitions, or technically challenging personal projects are a plus.\n \n We value strong technical fundamentals, curiosity, and the ability to learn quickly. Prior production experience with large-scale AI systems is not required.\n Base Salary Range:\n Base Pay Range\n $135,000 — $165,000 USD \n Submission Guidelines Please note that in order to be considered an applicant for any position at SambaNova Systems, you must submit an application form for each position for which you believe you are qualified.  \n EEO Policy SambaNova Systems is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard basis of age (40 and over), color, disability, gender identity, genetic information, marital status, military or veteran status, national origin/ancestry, race, religion, creed, sex ","salary_min":135000,"salary_max":165000,"location":"San Jose, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"mid","tags":["pytorch","tensorflow","generative-ai","distributed-systems","gpu","llm","deep-learning"],"apply_url":"https://sambanova.ai/sambanova-available-positions/?gh_jid=6115124004","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-13T22:28:28Z","expires_at":"2026-08-15T14:04:51.08642Z","created_at":"2026-07-15T14:06:10.360035Z","updated_at":"2026-07-16T14:04:51.213909Z","company_name":"SambaNova Systems","company_slug":"sambanova","company_logo_url":"https://www.google.com/s2/favicons?domain=sambanova.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/b5fee987-f2ea-4b80-a04f-395e616158d8"},{"id":"90b670fb-c16e-4406-9cd7-c2e700e6570a","company_id":"c0136eba-1fff-477a-8968-c5435a645cd3","title":"Senior AI Infrastructure Engineer - Model Training","slug":"senior-ai-infrastructure-engineer-model-training-1ae1d299","description":"Kodiak Robotics, Inc. was founded in 2018 and has become a leader in autonomous ground transportation committed to a safer and more efficient future for all. The company has developed an artificial intelligence (AI) powered technology stack purpose-built for commercial trucking and the public sector. The company delivers freight daily for its customers across the southern United States using its autonomous technology. In 2024, Kodiak became the first known company to publicly announce delivering a driverless semi-truck to a customer. Kodiak is also leveraging its commercial self-driving software to develop, test and deploy autonomous capabilities for the U.S. Department of Defense.\n Kodiak's AI is only as good as the speed at which we can train it. Every improvement to our models – from GigaFusionNet to large-scale world models – depends on infrastructure that turns thousands of hours of multimodal driving data into training throughput. We are looking for engineers who make model training fast: streaming massive camera, LiDAR, and radar datasets without stalling a single GPU, sharding data and models efficiently across nodes, and extracting every FLOP from the latest hardware. If you measure your impact in tokens per second and GPU utilization, this role is for you. In this role, you will: \n \n Design high-throughput data loading and streaming systems for multimodal sensor data (camera, LiDAR, radar), including dataset formats, sharding strategies, and prefetching pipelines that keep GPUs saturated \n Build and optimize distributed training infrastructure across multi-node GPU clusters, applying data, tensor, pipeline, and fully sharded (FSDP/ZeRO) parallelism to models that don't fit on a single device \n Maximize utilization of modern accelerators such as NVIDIA B200s through mixed-precision training (BF16/FP8), fused kernels, memory optimization, and communication/computation overlap \n Profile end-to-end training pipelines to find and eliminate bottlenecks across storage, network, CPU preprocessing, and GPU compute \n Develop scalable dataset construction pipelines that convert petabytes of raw driving logs into training-ready, streamable formats \n Partner with ML teams to scale new architectures from prototype to full-cluster training runs efficiently and reliably \n \n What you’ll bring: \n \n BS, MS, or PhD in Computer Science or a related field, and at least 2-3 years of industry experience in ML systems or infrastructure \n Hands-on experience with distributed training frameworks and techniques (PyTorch DDP/FSDP, DeepSpeed, Megatron, NCCL) and a strong grasp of parallelism trade-offs \n Experience building high-performance data pipelines for large-scale training, including streaming dataset formats (WebDataset, MosaicML Streaming/MDS, or similar), sharding, and storage/network-aware loading \n Deep understanding of GPU performance: mixed precision, memory hierarchy, kernel fusion, profiling tools (Nsight, PyTorch Profiler), and interconnects (NVLink, InfiniBand) \n Strong Python skills and proficiency in PyTorch internals; systems-level experience (C++/CUDA/Triton) a plus \n Passion for building the infrastructure that lets AI for the physical world train faster, scale further, and improve continuously \n \n What we offer: \n \n Competitive compensation package including equity and annual bonuses \n Excellent Medical, Dental, and Vision plans through Kaiser Permanente, Cigna, and  MetLife (including a medical plan with infertility benefits) \n MetLife Legal Services, Identity \u0026 Fraud Protection, Hospital Indemnity Insurance, Accident Insurance, \u0026 Critical Illness Insurance \n Flexible PTO, 10 paid holidays, and generous parental leave policies \n Our office is centrally located in Mountain View, CA \n Office perks: dog-friendly, free catered lunch, a fully stocked kitchen, and free EV charging \n Long Term Disability, Short Term Disability, Life Insurance \n Wellbeing Benefits - Headspace through Cigna, Calm through Kaiser, One Medical, Gympass, Spring Health through Cigna, Rula (mental health navigation)  \n Fidelity 401(k) \n Commuter, FSA, Dependent Care FSA, HSA \n Various incentive programs (referral bonuses, patent bonuses, etc.) \n The pay range listed below reflects the base salary  in our SF/Silicon Valley location,  across several internal levels. Actual starting pay will be based on job-related factors including: work location, experience, relevant training, education, skill level and performance during interview. Total compensation at Kodiak includes base pay, equity, bonus and a competitive benefits package\n California Pay Range\n $190,000 — $260,000 USD \n  \n At Kodiak, we strive to build a diverse community working towards our common company goals in a safe and collaborative environment where harassment of any kind is strictly prohibited. Kodiak is committed to equal opportunity employment regardless of race, ethnicity, religion, gender identity, sexual orientation, age, disability, or veteran status,","salary_min":190000,"salary_max":260000,"location":"Mountain View, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["autonomous-vehicles","robotics","data-pipeline","gpu","pytorch","distributed-systems","machine-learning","infrastructure"],"apply_url":"https://job-boards.greenhouse.io/kodiak/jobs/4310775009","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-09T16:07:51Z","expires_at":"2026-08-15T14:09:12.860026Z","created_at":"2026-07-10T14:08:04.242712Z","updated_at":"2026-07-16T14:09:12.9833Z","company_name":"Kodiak Robotics","company_slug":"kodiak-robotics","company_logo_url":"https://www.google.com/s2/favicons?domain=kodiak.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/90b670fb-c16e-4406-9cd7-c2e700e6570a"},{"id":"a78c1e6e-02a1-4e7c-aa23-10d382bb3863","company_id":"4c0fefc3-173a-4227-a823-4d67d3e70ff0","title":"Senior Engineering Manager, AI Infrastructure","slug":"senior-engineering-manager-ai-infrastructure-8804a5af","description":"Persons in these roles are expected to work from our offices in Seattle. On-site requirements vary based on position and team. If you have questions about on-site work arrangements for this role, please ask your recruiter.\n Our base salary range is $146,880 - $220,320, and in addition we have generous bonus plans to provide a competitive compensation package. \n Who You Are: \n We are seeking a Senior Manager, AI Infrastructure to run the day-to-day operation of the systems that power our research. Reporting to the VP of Engineering, you will own the execution and reliability of our high-performance computing (HPC) environment which includes on-prem GPU clusters and the software orchestration layer that schedules workloads across a hybrid cloud environment. This is a hands-on operational leadership role: your mandate is to keep the platform fast, reliable, and well-utilized, and to deliver against the roadmap set with your PM counterpart.\n Our ideal candidate is a:\n \n Systems Expert: You have a deep, hands-on understanding of the Linux kernel, container runtimes, and distributed systems. You understand the performance implications of InfiniBand topologies and NCCL optimizations.\n Execution-Focused Leader: You plan and deliver against near-term operational goals, keep reliability and researcher velocity high, and turn priorities set with leadership into shipped, dependable systems.\n Pragmatic Operator: You are comfortable making trade-offs between technical elegance and operational necessity. You triage and mitigate immediate risks, and know when to handle something yourself versus escalate.\n \n Who We Are:  \n Ai2 is a non-profit research institute at the forefront of open-source AI development. Unlike industry peers, our goal is to share our findings, data, code, and models with the global scientific community. \n Why Ai2: \n \n Open Science: Your work directly enables the release of open models like OLMo, providing the broader research community with tools they can't get elsewhere.\n Mission-Driven: We prioritize scientific impact over profit margins. This allows us to focus on building the \"right\" infrastructure for long-term research goals.\n Complexity at Scale: You will manage some of the most dense and high-performance compute environments currently in operation.\n \n Your Next Challenge: \n \n Cluster Operations: Manage the availability, performance, and health of our dense on-prem GPU clusters. Coordinate with hardware vendors and internal teams to keep physical infrastructure meeting the demands of frontier model training.\n Orchestration \u0026 Scheduling: Operate and improve Beaker, our internal orchestration platform by optimizing resource allocation and driving high utilization across on-prem assets and elastic cloud resources (AWS/GCP).\n Storage Operations: Execute and continuously improve our storage environment, balancing high-throughput performance for active training against cost-effective durability for petascale research data. Contribute to the longer-term storage roadmap.\n Resource Management: Manage GPU compute allocation against budget. Track utilization, surface the data, and recommend when to burst to the cloud versus investing in on-prem capacity, escalating larger trade-offs as needed.\n User Support \u0026 Velocity: Serve as the technical bridge to our research teams. Ensure infrastructure is an accelerator, not a bottleneck, for a diverse set of research objectives.\n Team Leadership: Manage and grow a team of systems engineers, SREs, and software developers. Set the bar for operational rigor, engineering quality, and a collaborative culture, and keep the team unblocked and delivering.\n \n What You’ll Need: \n \n Experience: 12+ years in infrastructure, systems engineering, or HPC (or an advanced degree with 8+ years), including 2+ years supervising a small engineering team (5+).\n Bachelor's degree in a related field : a relevant advanced degree may substitute for equivalent years of technical work experience.\n GPU/HPC Stack: Direct experience operating large-scale NVIDIA GPU clusters and high-performance networking (InfiniBand/RoCE).\n Orchestration: Strong background in Kubernetes, Slurm, or similar orchestration frameworks, particularly in hybrid-cloud configurations.\n Storage: Hands-on experience with distributed filesystems (e.g., WEKA, Ceph, Lustre) and cloud storage integration at scale.\n Software Development: Proficient in designing and managing SDLC processes including sprint planning and technical design reviews. Proficient in Go or Python.\n \n Physical Demands and Work Environment: \n The physical demands described here are representative of those that must be met by a team member to successfully perform the essential functions of this position. Reasonable accommodations may be made to enable individuals with disabilities to perform the functions.\n \n Must be able to remain in a stationary position for long periods of time. \n The ability to communicate information and ideas so others will unde","salary_min":146880,"salary_max":220320,"location":"Seattle, WA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["distributed-systems","robotics","healthcare","gpu","infrastructure"],"apply_url":"https://job-boards.greenhouse.io/thealleninstitute/jobs/8029564","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-09T00:47:06Z","expires_at":"2026-08-15T14:18:06.363366Z","created_at":"2026-07-09T14:16:58.241171Z","updated_at":"2026-07-16T14:18:06.487471Z","company_name":"Allen Institute for AI","company_slug":"allen-institute-for-ai","company_logo_url":"https://www.google.com/s2/favicons?domain=allenai.org\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/a78c1e6e-02a1-4e7c-aa23-10d382bb3863"},{"id":"92eff494-6559-427b-8a18-9f3ed481a25a","company_id":"2114efab-ea67-411b-bfb8-7899153105f3","title":"Member of Technical Staff, CI/CD Infrastructure","slug":"member-of-technical-staff-cicd-infrastructure-1daba4be","description":"Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference efficient and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware—a position that took years to build.\n\n\n\n\nABOUT THE ROLE\n\nvLLM is growing at a fast pace, and every bit of that growth lands on the CI system. More models, more hardware, more contributors, more ways for things to break. Your job is to advance the CI system so it scales with vLLM’s momentum and unlocks faster development for everyone.\n\nYou’ll get to:\n\n - Maintain and scale the compute infrastructure that powers CI, release, performance benchmark, accuracy evaluation for vLLM project, across a wide range of models and accelerators including H100/H200, (G)B200/300, AMD MI325/355X, TPU, Intel Gaudi, etc..\n\n - Get creative about cutting CI time-to-signal from hours to minutes\n\n - Make sure every corner of vLLM code base is well-tested\n\n - Keep vLLM releases rock-solid\n\n - Build out tooling that helps 3,000+ vLLM contributors move fast\n\n\n\n\nSKILLS AND QUALIFICATIONS\n\nMinimum qualifications:\n\n - Strong experience with Docker, Kubernetes, and containerized build or test environments.\n\n - Built CI/CD pipelines from scratch using GitHub Actions, Buildkite, or similar systems.\n\n - Familiar with CI design patterns and CI techniques: compute orchestration, handling flaky tests, dependency/environment management, caching, remote execution, test target determination, etc, test coverage, and so on.\n\n - Fluent in Python, Bash, Go, or similar for automation and tooling.\n\n - Solid fundamentals of Linux, security, networking, storage, package management,.\n\nBonus points for:\n\n - Setting up infrastructure for ML, inference, CUDA, ROCm, or accelerator-heavy workloads.\n\n - Running Buildkite at scale, including agents, queues, dynamic pipelines, test sharding, caching, and artifact management.\n\n - Operating Kubernetes clusters for CI, batch jobs, test execution, or internal developer infrastructure.\n\n - Managing CI/CD in large open-source project\n\n - Building dashboards, alerts, runbooks, or tooling for CI observability.\n\n\nLOGISTICS\n\n - Location: This role is based in San Francisco, California. Will consider remote in the US for exceptional candidates.\n\n - Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is $200,000 - $400,000 USD + equity.\n\n - Visa sponsorship: We sponsor visas on a case-by-case basis.\n\n - Benefits: Inferact offers generous health, dental, and vision benefits as well as 401(k) company match.","salary_min":200000,"salary_max":400000,"location":"San Francisco, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["gpu","llm","infrastructure","research"],"apply_url":"https://jobs.ashbyhq.com/inferact/3dee433c-7121-458c-8408-c193b6326ffb/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-08T20:04:02.323Z","expires_at":"2026-08-15T14:11:58.496586Z","created_at":"2026-07-09T14:11:07.184556Z","updated_at":"2026-07-16T14:11:58.62149Z","company_name":"Inferact","company_slug":"inferact","company_logo_url":"https://www.google.com/s2/favicons?domain=inferact.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/92eff494-6559-427b-8a18-9f3ed481a25a"},{"id":"ffb8f345-cc3f-4a19-b74d-6117413ea12c","company_id":"3da82454-107f-427f-88e7-01f315ef93fb","title":"Member of Technical Staff - Training Platform","slug":"member-of-technical-staff-training-platform-bf6e9667","description":"OWN YOUR INTELLIGENCE\n\n\n\nPrime Intellect is building the open superintelligence stack: the infrastructure frontier AI labs build internally, made available to every ambitious AI team.\n\n\n\nOur platform, Lab, unifies compute, environments, evaluations, secure sandboxes, high-performance training, and deployment into one full-stack system for post-training at frontier scale - from SFT and RL to tool use, agent workflows, and continuously improving production models. We are building open frontier AI: open-source models trained end to end for long-horizon tasks like autonomous research, and the full-stack platform our own research team uses to build them. The next generation of AI companies, enterprises, and research teams do not just need more GPUs. They need the ability to turn their own workflows, tools, data, and feedback loops into superintelligence they own.\n\n\n\nPrime Intellect has raised $150M in total funding from Founders Fund, Radical Ventures, NVIDIA, and exceptional AI, infrastructure, and enterprise operators — including Andrej Karpathy, Dwarkesh Patel, and leaders and founders from Ramp, Perplexity, Harvey, Mercor, Zapier, Datadog, Cognition, OpenAI, Thinking Machines, Together AI, SemiAnalysis, LangChain, Browserbase, Cloudflare, Sierra, Databricks, Airbnb, OpenRouter, Standard Intelligence, Fleet, Core Auto, and more. We are looking for people who want to build at the intersection of frontier research, real infrastructure, and go-to-market for a category that does not fully exist yet.\n\n\n\n\nROLE IMPACT\n\nYou'll help build our hosted training platform - the product that lets users launch LoRA and full fine-tuning runs on managed GPU clusters with a single API call or a few clicks. The role spans the developer-facing platform and the underlying Kubernetes-based training infrastructure that runs the jobs.\n\n\n\n\nCORE TECHNICAL RESPONSIBILITIES\n\n\n\n\nHOSTED TRAINING INFRASTRUCTURE\n\n - Design and operate Kubernetes-based training and inference orchestration across multi-cluster, multi-cloud GPU fleets\n\n - Build and maintain Helm charts that compose trainers, inference servers, environment servers, and supporting services into reproducible \"Training stacks\"\n\n - Develop the Python control-plane agents that watch pods, report run state to the platform, and keep clusters in sync\n\n - Implement scheduling and autoscaling for heterogeneous hardware (H100/H200/B200) using KEDA, LeaderWorkerSet, taints/tolerations, and gang scheduling\n\n - Run a tight GitOps workflow - every change ships through PRs, Helm values, and CI\n\n - Build node-local model caches, checkpoint pipelines, and shared storage for fast cold starts\n\n - Operate the observability stack (Prometheus, Grafana, Loki, DCGM) and make GPU cluster debugging fast\n\n\nPLATFORM DEVELOPMENT\n\n - Build the developer-facing surfaces for hosted training: job submission, live run monitoring, logs, metrics, model/adapter management, comparisons\n\n - Develop FastAPI backend services and REST APIs that bridge the platform to running clusters\n\n - Build real-time monitoring and debugging tools (streaming logs, step-level metrics, failure analysis)\n\n - Ship product UI in Next.js / React / TypeScript with shadcn, Tailwind, tRPC, and TanStack Query\n\n\nRESEARCH BRIDGE\n\n - Interface with the RL trainer, inference servers, and environment servers running inside our clusters\n\n - Productize new training capabilities (new model architectures, RL algorithms, modes)\n\n\n\n\n\nTECHNICAL REQUIREMENTS\n\nWe're looking for engineers who are fluent across three areas - you don't need to be the world's best at any one, but you should have real depth in all three and a clear point of view on how they connect.\n\n\n\n\nAI \u0026 GPU LANDSCAPE\n\n - Strong working knowledge of the modern AI stack - open model families, finetuning techniques (LoRA, QLoRA, full FT, RLHF/RLAIF), inference engines (vLLM, SGLang, TensorRT-LLM)\n\n - Familiarity with GPU hardware tradeoffs (H100 / H200 / B200, NVLink, interconnects, memory hierarchy) and what they mean for training and inference workloads\n\n - Understanding of distributed training fundamentals (data/tensor/pipeline/expert parallelism, NCCL, multi-node scheduling)\n\n - Awareness of what's happening at the frontier - new models, training methods, infra patterns - and the ability to translate that into product decisions\n   \n   \n\n\nKUBERNETES \u0026 INFRASTRUCTURE\n\n - Strong Kubernetes operations experience - Helm, CRDs, operators, KEDA, gang scheduling, GPU operator\n\n - Comfortable debugging real production clusters (kubectl, pod lifecycle, node issues, networking)\n\n - Cloud platform experience (GCP preferred - GCS, GKE, Cloud Run, Cloud Tasks)\n\n - Infrastructure automation (Helm, Terraform, Ansible) and a GitOps mindset\n\n - Observability: Prometheus, Grafana, Loki, OpenTelemetry, DCGM\n\n - Linux fundamentals: networking, namespaces, performance tuning\n   \n   \n\n\nPROGRAMMING \u0026 PLATFORM\n\n - Strong Python backend development (FastAPI, async, SQLAlchemy)\n\n - Comfortable building Python contr","salary_min":150000,"salary_max":300000,"location":"San Francisco, CA","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["reinforcement-learning","llm","gpu","api-design","fine-tuning","distributed-systems","agents","cloud"],"apply_url":"https://jobs.ashbyhq.com/PrimeIntellect/8706578d-5a01-4270-9d43-ed9cd998a982/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-08T18:45:47.645Z","expires_at":"2026-08-15T14:10:47.612821Z","created_at":"2026-05-11T14:11:38.576943Z","updated_at":"2026-07-16T14:10:47.733682Z","company_name":"Prime Intellect","company_slug":"PrimeIntellect","company_logo_url":"https://www.google.com/s2/favicons?domain=primeintellect.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/ffb8f345-cc3f-4a19-b74d-6117413ea12c"},{"id":"f1bf694f-6890-4864-b7de-7d77bfbc9a49","company_id":"3da82454-107f-427f-88e7-01f315ef93fb","title":"Member of Technical Staff - GPU Infrastructure","slug":"member-of-technical-staff-gpu-infrastructure-63e3e8cc","description":"OWN YOUR INTELLIGENCE\n\n\n\nPrime Intellect is building the open superintelligence stack: the infrastructure frontier AI labs build internally, made available to every ambitious AI team.\n\n\n\nOur platform, Lab, unifies compute, environments, evaluations, secure sandboxes, high-performance training, and deployment into one full-stack system for post-training at frontier scale - from SFT and RL to tool use, agent workflows, and continuously improving production models. We are building open frontier AI: open-source models trained end to end for long-horizon tasks like autonomous research, and the full-stack platform our own research team uses to build them. The next generation of AI companies, enterprises, and research teams do not just need more GPUs. They need the ability to turn their own workflows, tools, data, and feedback loops into superintelligence they own.\n\n\n\nPrime Intellect has raised $150M in total funding from Founders Fund, Radical Ventures, NVIDIA, and exceptional AI, infrastructure, and enterprise operators — including Andrej Karpathy, Dwarkesh Patel, and leaders and founders from Ramp, Perplexity, Harvey, Mercor, Zapier, Datadog, Cognition, OpenAI, Thinking Machines, Together AI, SemiAnalysis, LangChain, Browserbase, Cloudflare, Sierra, Databricks, Airbnb, OpenRouter, Standard Intelligence, Fleet, Core Auto, and more. We are looking for people who want to build at the intersection of frontier research, real infrastructure, and go-to-market for a category that does not fully exist yet.\n\n\n\nCore Technical Responsibilities\n\nThis customer-facing role combines deep technical expertise with hands-on implementation. You'll be instrumental in:\n\nCustomer Architecture \u0026 Design\n\n - Partner with clients to understand workload requirements and design optimal GPU cluster architectures\n\n - Create technical proposals and capacity planning for clusters ranging from 100 to 10,000+ GPUs\n\n - Develop deployment strategies for LLM training, inference, and HPC workloads\n\n - Present architectural recommendations to technical and executive stakeholders\n\nInfrastructure Deployment \u0026 Optimization\n\n - Deploy and configure orchestration systems including SLURM and Kubernetes for distributed workloads\n\n - Implement high-performance networking with InfiniBand, RoCE, and NVLink interconnects\n\n - Optimize GPU utilization, memory management, and inter-node communication\n\n - Configure parallel filesystems (Lustre, BeeGFS, GPFS) for optimal I/O performance\n\n - Tune system performance from kernel parameters to CUDA configurations\n\nProduction Operations \u0026 Support\n\n - Serve as primary technical escalation point for customer infrastructure issues\n\n - Diagnose and resolve complex problems across the full stack - hardware, drivers, networking, and software\n\n - Implement monitoring, alerting, and automated remediation systems\n\n - Provide 24/7 on-call support for critical customer deployments\n\n - Create runbooks and documentation for customer operations teams\n\nTechnical Requirements\n\nRequired Experience\n\n - 3+ years hands-on experience with GPU clusters and HPC environments\n\n - Deep expertise with SLURM and Kubernetes in production GPU settings\n\n - Proven experience with InfiniBand configuration and troubleshooting\n\n - Strong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stack\n\n - Experience with infrastructure automation tools (Ansible, Terraform)\n\n - Proficiency in Python, Bash, and systems programming\n\n - Track record of customer-facing technical leadership\n\nInfrastructure Skills\n\n - NVIDIA driver installation and troubleshooting (CUDA, Fabric Manager, DCGM)\n\n - Container runtime configuration for GPUs (Docker, Containerd, Enroot)\n\n - Linux kernel tuning and performance optimization\n\n - Network topology design for AI workloads\n\n - Power and cooling requirements for high-density GPU deployments\n\nNice to Have\n\n - Experience with 1000+ GPU deployments\n\n - NVIDIA DGX, HGX, or SuperPOD certification\n\n - Distributed training frameworks (PyTorch FSDP, DeepSpeed, Megatron-LM)\n\n - ML framework optimization and profiling\n\n - Experience with AMD MI300 or Intel Gaudi accelerators\n\n - Contributions to open-source HPC/AI infrastructure projects\n\nGrowth Opportunity\n\nYou'll work directly with customers pushing the boundaries of AI, from startups training foundation models to enterprises deploying massive inference infrastructure. You'll collaborate with our world-class engineering team while having direct impact on systems powering the next generation of AI breakthroughs.\n\nWe value expertise and customer obsession - if you're passionate about building reliable, high-performance GPU infrastructure and have a track record of successful large-scale deployments, we want to talk to you.\n\nApply now and join us in our mission to democratize access to planetary scale computing.\n\nCompensation\n\nCash Compensation Range of $150-300k plus Equity Incentives","salary_min":150000,"salary_max":300000,"location":"San Francisco, CA","workplace":"remote","remote_scope":"unknown","job_type":"full-time","experience_level":"lead","tags":["llm","pytorch","generative-ai","agents","gpu","distributed-systems","infrastructure"],"apply_url":"https://jobs.ashbyhq.com/PrimeIntellect/297d925e-5a42-40bd-b02f-5c928d226f18/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-08T18:45:18.934Z","expires_at":"2026-08-15T14:10:47.377589Z","created_at":"2026-04-13T15:01:32.586506Z","updated_at":"2026-07-16T14:10:47.498156Z","company_name":"Prime Intellect","company_slug":"PrimeIntellect","company_logo_url":"https://www.google.com/s2/favicons?domain=primeintellect.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/f1bf694f-6890-4864-b7de-7d77bfbc9a49"},{"id":"537b089a-1139-46c6-9166-2dc6b9693a2f","company_id":"3da82454-107f-427f-88e7-01f315ef93fb","title":"Research Engineer - RL Infrastructure ","slug":"research-engineer-rl-infrastructure-af69c92c","description":"OWN YOUR INTELLIGENCE\n\n\n\nPrime Intellect is building the open superintelligence stack: the infrastructure frontier AI labs build internally, made available to every ambitious AI team.\n\n\n\nOur platform, Lab, unifies compute, environments, evaluations, secure sandboxes, high-performance training, and deployment into one full-stack system for post-training at frontier scale - from SFT and RL to tool use, agent workflows, and continuously improving production models. We are building open frontier AI: open-source models trained end to end for long-horizon tasks like autonomous research, and the full-stack platform our own research team uses to build them. The next generation of AI companies, enterprises, and research teams do not just need more GPUs. They need the ability to turn their own workflows, tools, data, and feedback loops into superintelligence they own.\n\nWe train open frontier models and ship the same stack to our customers. Its spans the full stack of training, deploying and continuously improving models — compute, large-scale RL, environments, sandboxes, evals, and deployment.\n\n\n\nPrime Intellect has raised $150M in total funding from Founders Fund, Radical Ventures, NVIDIA, and exceptional AI, infrastructure, and enterprise operators — including Andrej Karpathy, Dwarkesh Patel, and leaders and founders from Ramp, Perplexity, Harvey, Mercor, Zapier, Datadog, Cognition, OpenAI, Thinking Machines, Together AI, SemiAnalysis, LangChain, Browserbase, Cloudflare, Sierra, Databricks, Airbnb, OpenRouter, Standard Intelligence, Fleet, Core Auto, and more. We are looking for people who want to build at the intersection of frontier research, real infrastructure, and go-to-market for a category that does not fully exist yet.\n\n\n\n\n\nWHAT YOU’LL WORK ON\n\n - Build and optimize the systems infrastructure behind large-scale RL and distributed training workloads by contributing to our prime-rl https://github.com/PrimeIntellect-ai/prime-rl framework.\n\n - Improve end-to-end training efficiency across compute, memory, networking, and scheduling layers.\n\n - Design and implement low-level performance optimizations, including kernels, communication paths, and runtime improvements.\n\n - Work on distributed training systems spanning data, tensor, and pipeline parallel workloads.\n\n - Help shape the architecture of our RL training stack, including async rollout and post-training systems.\n\n - Contribute to open-source libraries and internal infrastructure used for frontier-scale model training.\n\n - Collaborate closely with researchers and infrastructure engineers to translate bottlenecks into concrete systems improvements.\n\n - Stay at the frontier of training systems, inference systems, compiler/runtime tooling, and hardware-aware optimization techniques.\n\n\n\n\n\nYOU MAY BE A FIT IF YOU HAVE\n\n - Strong systems engineering experience in AI/ML infrastructure, especially around large-scale model training or inference.\n\n - Deep familiarity with PyTorch and distributed training frameworks such as PyTorch Distributed, DeepSpeed, FSDP, Megatron, vLLM, Ray, or related tooling.\n\n - Experience optimizing training performance across kernels, memory movement, communication overhead, or parallelization strategy.\n\n - Hands-on experience with large-scale training techniques including data parallelism, tensor parallelism, and pipeline parallelism.\n\n - Strong understanding of GPU architecture, profiling, and performance debugging.\n\n - Ability to identify bottlenecks across the stack and drive improvements from first principles.\n\n - Comfort working in a fast-moving environment with ambiguous problems and high ownership.\n\n\n\n\nESPECIALLY EXCITING\n\n - Experience writing or optimizing CUDA / Triton kernels.\n\n - Experience with compiler or runtime optimization for ML systems.\n\n - Experience working on RL training infrastructure, rollout systems, or asynchronous training pipelines.\n\n - Experience with multi-node GPU clusters and high-performance networking.\n\n - Contributions to open-source ML systems or infrastructure projects.\n\n - Interest in publishing technical work or sharing insights through engineering blogs and technical writing.\n\n\n\n\nWHY THIS ROLE MATTERS\n\nThe next frontier in AI will not be unlocked by models alone. It will be unlocked by systems that let those models train faster, adapt continuously, and operate across real environments at scale.\n\nThat infrastructure does not exist yet in the form the world needs.\n\nWe’re building it.\n\n\n\n\nBENEFITS \u0026 PERKS\n\n - Cash Compensation Range of $150-350k, plus equity.\n\n - Flexible work arrangements, with the option to work remotely or in person from our San Francisco office.\n\n - Visa sponsorship and relocation support for international candidates.\n\n - Quarterly team offsites, hackathons, conferences, and learning opportunities.\n\n - A deeply technical, high-agency team working on infrastructure for open superintelligence.\n\nIf you’re excited about building the systems foundation for frontier-scale RL an","salary_min":150000,"salary_max":350000,"location":"San Francisco, CA","workplace":"remote","remote_scope":"unknown","job_type":"full-time","experience_level":"senior","tags":["pytorch","search","distributed-systems","llm","gpu","agents","research","infrastructure"],"apply_url":"https://jobs.ashbyhq.com/PrimeIntellect/05e4b76b-2570-4c89-baf2-9833fff7378f/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-08T18:43:53.584Z","expires_at":"2026-08-15T14:10:47.919401Z","created_at":"2026-04-13T15:01:32.609376Z","updated_at":"2026-07-16T14:10:48.037879Z","company_name":"Prime Intellect","company_slug":"PrimeIntellect","company_logo_url":"https://www.google.com/s2/favicons?domain=primeintellect.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/537b089a-1139-46c6-9166-2dc6b9693a2f"},{"id":"8c402485-1400-4e3b-aacf-eaa1ab3b5dfb","company_id":"3da82454-107f-427f-88e7-01f315ef93fb","title":"Research Engineer - Distributed Training","slug":"research-engineer-distributed-training-19cda6e4","description":"OWN YOUR INTELLIGENCE\n\n\n\nPrime Intellect is building the open superintelligence stack: the infrastructure frontier AI labs build internally, made available to every ambitious AI team.\n\n\n\nOur platform, Lab, unifies compute, environments, evaluations, secure sandboxes, high-performance training, and deployment into one full-stack system for post-training at frontier scale - from SFT and RL to tool use, agent workflows, and continuously improving production models. We are building open frontier AI: open-source models trained end to end for long-horizon tasks like autonomous research, and the full-stack platform our own research team uses to build them. The next generation of AI companies, enterprises, and research teams do not just need more GPUs. They need the ability to turn their own workflows, tools, data, and feedback loops into superintelligence they own.\n\nWe train open frontier models and ship the same stack to our customers. Its spans the full stack of training, deploying and continuously improving models — compute, large-scale RL, environments, sandboxes, evals, and deployment.\n\n\n\nPrime Intellect has raised $150M in total funding from Founders Fund, Radical Ventures, NVIDIA, and exceptional AI, infrastructure, and enterprise operators — including Andrej Karpathy, Dwarkesh Patel, and leaders and founders from Ramp, Perplexity, Harvey, Mercor, Zapier, Datadog, Semianalysis, Cognition, OpenAI, Thinking Machines, Together AI, SemiAnalysis, LangChain, Browserbase, Cloudflare, Sierra, Databricks, Airbnb, OpenRouter, Standard Intelligence, Fleet, Core Auto, and more. We are looking for people who want to build at the intersection of frontier research, real infrastructure, and go-to-market for a category that does not fully exist yet.\n\n\n\n\nWHAT YOU’LL WORK ON\n\n - Build and optimize the distributed training infrastructure behind our pre-training and large-scale RL training workloads by contributing to our prime-rl https://github.com/PrimeIntellect-ai/prime-rl framework.\n\n - Improve end-to-end training efficiency across compute, memory, networking, and scheduling layers.\n\n - Design and implement low-level performance optimizations, including kernels, communication paths, and runtime improvements.\n\n - Work on distributed training systems spanning data, tensor, and pipeline parallel workloads.\n\n - Help shape the architecture of our RL training stack, including async rollout and post-training systems.\n\n - Contribute to open-source libraries and internal infrastructure used for frontier-scale model training.\n\n - Collaborate closely with researchers and infrastructure engineers to translate bottlenecks into concrete systems improvements.\n\n - Stay at the frontier of training systems, inference systems, compiler/runtime tooling, and hardware-aware optimization techniques.\n\n\n\n\n\nYOU MAY BE A FIT IF YOU HAVE\n\n - Strong systems engineering experience in AI/ML infrastructure, especially around large-scale model training or inference.\n\n - Deep familiarity with PyTorch and distributed training frameworks such as PyTorch Distributed, DeepSpeed, FSDP, Megatron, vLLM, Ray, or related tooling.\n\n - Experience optimizing training performance across kernels, memory movement, communication overhead, or parallelization strategy.\n\n - Hands-on experience with large-scale training techniques including data parallelism, tensor parallelism, and pipeline parallelism.\n\n - Strong understanding of GPU architecture, profiling, and performance debugging.\n\n - Ability to identify bottlenecks across the stack and drive improvements from first principles.\n\n - Comfort working in a fast-moving environment with ambiguous problems and high ownership.\n\n\n\n\nESPECIALLY EXCITING\n\n - Experience writing or optimizing CUDA / Triton kernels.\n\n - Experience with compiler or runtime optimization for ML systems.\n\n - Experience working on RL training infrastructure, rollout systems, or asynchronous training pipelines.\n\n - Experience with multi-node GPU clusters and high-performance networking.\n\n - Contributions to open-source ML systems or infrastructure projects.\n\n - Interest in publishing technical work or sharing insights through engineering blogs and technical writing.\n\n\n\n\n\n\n\nBENEFITS \u0026 PERKS\n\n - Cash Compensation Range of $150-350k, plus equity incentives, aligning your success with the growth and impact of Prime Intellect.\n\n - Flexible work arrangements, with the option to work remotely or in-person at our offices in San Francisco.\n\n - Visa sponsorship and relocation assistance for international candidates.\n\n - Quarterly team off-sites, hackathons, conferences and learning opportunities.\n\n - Opportunity to work with a talented, hard-working and mission-driven team, united by a shared passion for leveraging technology to accelerate science and AI.\n\nIf you’re excited about building the systems foundation for frontier-scale training and open superintelligence, we’d love to hear from you.","salary_min":150000,"salary_max":350000,"location":"San Francisco, CA","workplace":"remote","remote_scope":"unknown","job_type":"full-time","experience_level":"senior","tags":["pre-training","search","agents","llm","pytorch","gpu","distributed-systems","research"],"apply_url":"https://jobs.ashbyhq.com/PrimeIntellect/8bd52610-175c-42a7-a7cd-b29c45f9d305/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-08T18:43:34.749Z","expires_at":"2026-08-15T14:10:46.400006Z","created_at":"2026-04-13T15:01:32.550978Z","updated_at":"2026-07-16T14:10:46.52985Z","company_name":"Prime Intellect","company_slug":"PrimeIntellect","company_logo_url":"https://www.google.com/s2/favicons?domain=primeintellect.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/8c402485-1400-4e3b-aacf-eaa1ab3b5dfb"},{"id":"5e2a1a2e-0a5c-4e85-b7ac-b319a84dbd55","company_id":"3da82454-107f-427f-88e7-01f315ef93fb","title":"Member of Technical Staff - Security","slug":"member-of-technical-staff-security-18bb5dcb","description":"OWN YOUR INTELLIGENCE\n\n\n\nPrime Intellect is building the open superintelligence stack: the infrastructure frontier AI labs build internally, made available to every ambitious AI team.\n\n\n\nOur platform, Lab, unifies compute, environments, evaluations, secure sandboxes, high-performance training, and deployment into one full-stack system for post-training at frontier scale - from SFT and RL to tool use, agent workflows, and continuously improving production models. We are building open frontier AI: open-source models trained end to end for long-horizon tasks like autonomous research, and the full-stack platform our own research team uses to build them. The next generation of AI companies, enterprises, and research teams do not just need more GPUs. They need the ability to turn their own workflows, tools, data, and feedback loops into superintelligence they own.\n\n\n\nPrime Intellect has raised $150M in total funding from Founders Fund, Radical Ventures, NVIDIA, and exceptional AI, infrastructure, and enterprise operators — including Andrej Karpathy, Dwarkesh Patel, and leaders and founders from Ramp, Perplexity, Harvey, Mercor, Zapier, Datadog, Cognition, OpenAI, Thinking Machines, Together AI, SemiAnalysis, LangChain, Browserbase, Cloudflare, Sierra, Databricks, Airbnb, OpenRouter, Standard Intelligence, Fleet, Core Auto, and more. We are looking for people who want to build at the intersection of frontier research, real infrastructure, and go-to-market for a category that does not fully exist yet.\n\n\n\n\n\nROLE IMPACT\n\nSecurity is the single highest-stakes function at Prime Intellect. Our customers — from frontier AI labs to enterprises — trust us with their most valuable assets: proprietary models, training data, and the compute that powers them. This role owns the security posture of everything we ship: the hosted RL training platform, distributed GPU infrastructure, liquid compute marketplace, and all customer-facing surfaces.\n\nYou'll be the first dedicated security hire and will define how we think about security as a company — from threat modeling and secure architecture to incident response and compliance. You'll work directly with engineering, research, and leadership to embed security into every layer of the stack, and you'll manage relationships with external penetration testers and security auditors to continuously validate our defenses.\n\n\n\n\nCORE TECHNICAL RESPONSIBILITIES\n\n\nPREVENTIVE SECURITY \u0026 SECURE ARCHITECTURE\n\n - Own threat modeling across our entire surface area: multi-tenant training infrastructure, sandboxed execution environments, API surfaces, and internal tooling\n\n - Design and implement zero-trust networking, identity, and access control across distributed GPU clusters and cloud infrastructure\n\n - Build secure-by-default patterns for our platform engineers — auth, secrets management, supply chain integrity, container hardening\n\n - Architect tenant isolation and data boundary enforcement for hosted RL training workloads (customers run arbitrary code in our environments)\n\n\nAI-NATIVE SECURITY\n\n - Develop security frameworks specific to AI infrastructure: model weight protection, training data isolation, checkpoint integrity, gradient privacy\n\n - Secure the RL training loop end-to-end — from environment execution in sandboxes to reward signal verification and model artifact storage\n\n - Build detection and prevention for AI-specific attack vectors: prompt injection across agentic pipelines, model exfiltration, adversarial environment manipulation\n\n\nOFFENSIVE SECURITY \u0026 EXTERNAL ENGAGEMENTS\n\n - Scope, manage, and run point on external penetration tests across our platform, hosted training infrastructure, and liquid compute layer\n\n - Build and maintain an internal red-teaming practice — automated and manual — targeting our most critical systems\n\n - Drive vulnerability management: triage, remediation SLAs, and root cause analysis\n\n\nDETECTION, RESPONSE \u0026 OBSERVABILITY\n\n - Build security monitoring and alerting across infrastructure (distributed clusters, Kubernetes, cloud) and application layers\n\n - Implement runtime security for containerized training workloads and sandboxed environments\n\n - Own incident response — build the playbooks, run the drills, lead the post-mortems\n\n - Design audit logging and forensic capability across all customer-facing systems\n\n\nCOMPLIANCE \u0026 CUSTOMER TRUST\n\n - Drive SOC 2 Type II readiness and other compliance frameworks required by enterprise customers\n\n - Own the security narrative for customer-facing materials — questionnaires, architecture reviews, trust documentation\n\n - Partner with GTM to unblock enterprise deals that depend on security posture\n\n\nTECHNICAL REQUIREMENTS\n\n - 5+ years in security engineering, infrastructure security, or offensive security roles — ideally at companies operating multi-tenant cloud or compute infrastructure\n\n - Deep experience with cloud security (GCP preferred), Kubernetes security, and container runtime hardening","salary_min":180000,"salary_max":350000,"location":"San Francisco, CA","workplace":"hybrid","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["gpu","agents","security","cloud","distributed-systems"],"apply_url":"https://jobs.ashbyhq.com/PrimeIntellect/fb497090-0336-45b2-b802-9d34d8758d06/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-08T18:33:48.099Z","expires_at":"2026-08-15T14:10:48.097918Z","created_at":"2026-04-14T01:34:55.711907Z","updated_at":"2026-07-16T14:10:48.218712Z","company_name":"Prime Intellect","company_slug":"PrimeIntellect","company_logo_url":"https://www.google.com/s2/favicons?domain=primeintellect.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/5e2a1a2e-0a5c-4e85-b7ac-b319a84dbd55"},{"id":"d6456870-ff5c-4c3f-89d2-a6e8784670b8","company_id":"57a9b50d-a69a-4f6f-9acb-910495c3c359","title":"MTS, Research Engineer","slug":"mts-research-engineer-69babe33","description":"About Us: \n At Fireworks, we’re building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We’ve been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We’re an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.\n About the Role \n We are looking for a Research Engineer to join our team, operating at the critical intersection of model research and training infrastructure.\n In this role, your time will be split between tackling open-ended research problems—such as designing novel architectures and improving algorithmic efficiency — and building the distributed training systems required to make those research breakthroughs a reality. You won't just be handed a paper to implement; you will be expected to reproduce state-of-the-art results from the literature, identify their limitations, and build the infrastructure needed to push beyond them.\n The most significant advances in deep learning require massive scale. We need engineers who are as comfortable reasoning about gradient descent and loss landscapes as they are about distributed systems, GPU cluster utilization, and data pipelines.\n  \n What You'll Do \n \n Conduct Open-Ended Research: Explore new model architectures, training objectives, and optimization techniques. Formulate hypotheses, design experiments, and iterate quickly based on empirical results.\n Reproduce and Extend State-of-the-Art: Implement and reproduce results from recent machine learning papers. Identify bottlenecks, propose improvements, and scale these methods to larger datasets and models.\n Build and Scale Training Infrastructure: Design, implement, and maintain high-performance, distributed machine learning systems. Optimize training loops, data loaders, and communication overhead across large GPU clusters.\n Bridge Science and Engineering: Translate abstract mathematical concepts and research ideas into robust, bug-free, and efficient code.\n Collaborate Cross-Functionally: Work closely with Research Scientists to unblock their experiments by providing tooling, optimizing code, and co-designing experiments that are hardware-aware.\n \n We Expect You To Have: \n \n Strong programming skills (Python, C++, or Rust) and a commitment to writing clean, maintainable code.\n Deep practical knowledge of machine learning frameworks (PyTorch, JAX, or TensorFlow).\n Experience working with large distributed systems and parallel computing (e.g., CUDA, NCCL, MPI).\n A strong foundation in linear algebra, calculus, probability, and statistics.\n A proven track record of implementing complex deep learning algorithms from scratch.\n \n Nice to Have: \n \n A Master’s or PhD in Computer Science, Machine Learning, Physics, Mathematics, or a related field (or equivalent industry experience).\n Experience with low-level GPU programming (CUDA/Triton) or hardware co-design.\n Familiarity with the challenges of training Large Language Models (LLMs)\n Familiarity with the challenges of inference, and OSS inference engines such as SGLang and vLLM\n Total compensation for this role also includes meaningful equity in a fast-growing startup, along with a competitive salary and comprehensive benefits package. Base salary is determined by a range of factors including individual qualifications, experience, skills, interview performance, market data, and work location. The listed salary range is intended as a guideline and may be adjusted.\n Base Pay Range (Plus Equity)\n $250,000 — $400,000 USD \n Why Fireworks AI? \n \n Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.\n Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.\n Ownership \u0026 Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.\n Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.\n \n Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.","salary_min":250000,"salary_max":400000,"location":"New York, NY","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["pytorch","distributed-systems","tensorflow","data-pipeline","mlops","gpu","search","generative-ai"],"apply_url":"https://job-boards.greenhouse.io/fireworksai/jobs/4308305009","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-08T01:55:35Z","expires_at":"2026-08-15T14:02:25.096312Z","created_at":"2026-07-09T14:02:13.613892Z","updated_at":"2026-07-16T14:02:25.217607Z","company_name":"Fireworks AI","company_slug":"fireworks-ai","company_logo_url":"https://www.google.com/s2/favicons?domain=fireworks.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/d6456870-ff5c-4c3f-89d2-a6e8784670b8"},{"id":"8d2175c9-2ff5-4e92-97ef-765b6919eddc","company_id":"2721f049-2cf2-4e3e-82d0-8d8df89c8f90","title":"Forward Deployed Engineer - Physical AI Cloud Platform","slug":"forward-deployed-engineer-physical-ai-cloud-platform-5dc02f51","description":"About Nebius: \n Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.\n Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.\n Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R\u0026D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R\u0026D.\n The role \n The Forward Deployed Engineer, Cloud Platform is a senior, high-autonomy individual contributor role that owns the infrastructure foundation making the physical AI platform fast, reliable, scalable, secure, and cost-effective. This role sits with strategic customers and ISV partners, embedded directly inside their engineering teams, and ships production infrastructure that lets customers run real physical AI workloads, not just demos. Your job is to make the platform feel like a product, not a collection of cloud scripts. \n You will work alongside the Field CTO and the Head of Physical AI, and partner closely with the Physical AI Systems and Platform \u0026 Product FDEs. Inside each account, you own end-to-end technical execution: discovery, scoping, infrastructure design, build, and production rollout. Across accounts, you turn repeated infrastructure pain into reusable platform capabilities and partner with Product and Engineering to fold them into the core platform. Your field work is the primary input to the Nebius Physical AI roadmap. \n You are welcome to work remotely from the United States (SF Bay Area, CA or Austin, TX preferred).   \n Your responsibilities will include:   \n \n End-to-End Ownership Inside Strategic Accounts:  Own discovery, technical scoping, infrastructure design, build, and production rollout for each design partner and ISV engagement, translating ambiguous infrastructure problems into deployable production systems.   \n \n \n Cloud Infrastructure \u0026 Compute Orchestration:  Build and operate the cloud infrastructure that powers customer physical AI workflows. Own compute orchestration for simulation, training, evaluation, inference, and batch workloads, not just what runs, but how it runs at scale.   \n \n \n Platform Services:  Build platform services for job execution, scheduling, retries, observability, logging, secrets, access control, and cost tracking. Integrate Nebius cloud services into the product experience so infrastructure complexity is abstracted away from customers.   \n \n \n Customer Onboarding Infrastructure:  Build onboarding infrastructure for pilots, including sandbox environments, dataset storage, workflow execution, and deployment, and make sure early customer workloads run for real: secure, isolated, observable, and reliable.   \n \n \n Reliability, Security \u0026 Cost:  Optimize cloud cost, utilization, performance, and reliability across workloads, and debug infrastructure issues across application, network, storage, compute, and orchestration layers, wherever the failure actually lives.   \n \n \n Cross-FDE Partnership:  Partner with the Physical AI Systems FDE to support GPU-heavy simulation, training, and evaluation pipelines, and with the Platform \u0026 Product FDE to expose infrastructure capabilities through clean APIs, SDKs, and product workflows.   \n \n \n Long-Term Architecture:  Help define the long-term infrastructure architecture for multi-tenant SaaS, enterprise deployments, and high-throughput physical AI workloads.   \n \n \n Pattern Codification \u0026 Productization:  Turn repeated customer infrastructure pain into reusable platform capabilities. Partner with the Field CTO, Product, and Engineering teams to fold these into the core platform. Treat every engagement as a forcing function for the next ten.   \n \n \n Rapid Engineering Velocity:  Use modern AI coding tools (Claude Code, Codex, Cursor) as primary leverage. Compress build timelines from weeks to days. Treat engineering velocity as a primary success metric.   \n \n \n Field Enablement \u0026 Feedback Loops:  Co-author reference architectures, solution templates, and technical blogs for the broader Nebius field, and maintain structured channels to ensure customer learnings flow back to the Field CTO, Product, and Engineering teams.   \n \n We expect you to have:   \n \n 6+ Years of Hands-On Engineering:  Strong backend, cloud infrastructure, platform engineering, or SRE experience, with at least two years in a customer-facing or deployment-oriented technical role (Forward Deployed Engineer, founding engineer, technical co-founder, tech lead embedded with strategic customers, or equivalent).   \n \n \n Distributed Systems \u0026 Compute Platforms: ","salary_min":179500,"salary_max":224300,"location":"Remote (US)","workplace":"remote","remote_scope":"restricted","job_type":"full-time","experience_level":"lead","tags":["gpu","distributed-systems","robotics","data-pipeline","mlops","cloud"],"apply_url":"https://careers.nebius.com/?gh_jid=4875906101","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-06T20:11:20Z","expires_at":"2026-08-15T14:15:45.34061Z","created_at":"2026-07-09T14:14:52.942179Z","updated_at":"2026-07-16T14:15:45.46108Z","company_name":"Nebius","company_slug":"nebius","company_logo_url":"https://www.google.com/s2/favicons?domain=nebius.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/8d2175c9-2ff5-4e92-97ef-765b6919eddc"},{"id":"6e73bc75-a490-4b93-af2d-5d0040a7eb71","company_id":"6ea0f41a-b13e-481a-b410-5195f391f939","title":"Research Engineer, Post-Training Inference","slug":"research-engineer-post-training-inference-ff4ae18b","description":"About the role \n The Model Shaping team at Together AI works on products and research focused on tailoring open foundation models to downstream applications. We build services that enable machine learning developers to choose the best models for their tasks and further improve these models using domain-specific data. In addition, we develop new methods for more efficient model training and evaluation, drawing inspiration from a broad range of ideas across machine learning, natural language processing, and ML systems.\n As a Research Engineer within Model Shaping, you will develop a platform that enables users to customize open-source models with their own data. Working across the training and inference stacks, you will build and improve our Fine-Tuning, Reinforcement Learning, and Evaluation services – from ensuring a seamless path from post-training to production serving, to optimizing the inference engine for RL training workloads. You will collaborate closely with our product, research, and engineering teams to keep the API reliable, performant, and well integrated into the company's technical infrastructure. Above all, you will help build the foundational layer of the open-source AI ecosystem, enabling developers around the world to efficiently create high-quality models tailored to their specific applications.\n Responsibilities \n \n Design and build Together’s systems for customizing open-source models\n Build integrations between the Model Shaping and Inference platforms to ensure a seamless path from post-training to serving production workloads\n Add features to inference engines for large-scale post-training experiments, including optimizations for RL workloads\n Make sure the service is stable and robust, participating in an on-call rotation and ensuring 24/7 availability of our platform\n \n Requirements \n \n Have 2+ years of experience building and deploying machine learning-based services in a production environment\n Have hands-on experience with modern inference engines, such as SGLang, vLLM, and TensorRT-LLM\n Are familiar with the latest methods for fine-tuning LLMs and other AI models\n Have a strong software engineering background in Python or Go\n Stay up to date with the latest advances and trends in the machine learning community\n \n Experience in any of the following will make you stand out \n \n Serving low-precision (FP4/FP8) models, multiple LoRA adapters within one model instance (Multi-LoRA), or models distributed across several GPU nodes\n Optimizing the performance of RL training workloads\n Developing CUDA/Triton/CuTE DSL kernels for inference\n Developing large-scale and high-load production systems\n Maintaining or contributing to open-source ML projects\n Managing machine learning workloads on Kubernetes clusters\n \n About Together AI \n Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, ATLAS, RedPajama, and Mamba. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.\n Compensation \n We offer competitive compensation, startup equity, health insurance, and other benefits. The US base salary range for this full-time position is $200,000 - $290,000. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.\n Equal Opportunity \n Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.\n Please see our privacy policy at  https://www.together.ai/privacy","salary_min":200000,"salary_max":290000,"location":"San Francisco, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"junior","tags":["search","llm","nlp","gpu","reinforcement-learning","fine-tuning","generative-ai","research"],"apply_url":"https://job-boards.greenhouse.io/togetherai/jobs/5179372007","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-06T18:21:40Z","expires_at":"2026-08-15T14:02:19.341947Z","created_at":"2026-07-09T14:02:08.323229Z","updated_at":"2026-07-16T14:02:19.456183Z","company_name":"Together AI","company_slug":"together-ai","company_logo_url":"https://www.google.com/s2/favicons?domain=together.ai\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/6e73bc75-a490-4b93-af2d-5d0040a7eb71"},{"id":"cbbcd488-a46d-4346-b071-7ae9c7a752dd","company_id":"cec3f1a8-c7e9-4ff6-a22d-19edaf0e2b25","title":"Software Engineer, GPU Infrastructure","slug":"software-engineer-gpu-infrastructure-c6ab329f","description":"ABOUT FLUIDSTACK\n\nWe exist to make humanity more free. For most of human history, you farmed or you starved. Technology gave people more time for the things they wanted to do, instead of things they had to do. Powerful AI will be the biggest lever for human choice we've ever built - but only if models are aligned with what humanity actually wants. There are groups building AI who don't share these goals. Whoever deploys frontier compute infrastructure fastest will decide whether AI expands human freedom or shrinks it.\n\n\nWe're singularly focused on delivering 10 to 100s of GWs of compute faster than anyone else, rethinking every layer of the stack. We acquire power, design and build data centers, and operate them - with teams spanning hardware and software. Speed and scale are our key differentiators. Come be a part of building civilization-scale infrastructure for AI.\n\n\nWe hire people who care deeply about this problem space. If that is you, please apply!\n\n\n\n\nHOW WE OPERATE\n\n - Extreme ownership. Full autonomy. Own things end to end often taking on scope outside your core role without being asked to get things done.\n\n - Velocity. We drive everything forward as fast as possible.\n\n - First principles. Challenge every assumption. Zero analogy thinking, no egos, the best idea wins.\n\n - Love of the game. The frontier of AI is the most interesting problem of our time. We put in long hours at high intensity to push the frontier forward.\n   \n    \n\n\nTHE PRODUCTION ENGINEERING TEAM\n\nExamples of key exciting problems the team is working on\n\n - Build the repair pipeline that keeps pace with a 10 GW fleet: at our scale, a GPU failure isn't a ticket. It's a throughput problem. We're building the automation that takes a chip from fault detection through triage, RMA, and return to service without human intervention.\n\n - Qualify every new GPU generation inside a 6-month build window: our platform covers burn-in, performance baselining, and NPI execution. It has to define \"production-ready\" before a site goes live, not after. New hardware gets certified at speeds unheard of in the industry.\n\n - Migrate live compute at construction speed: we're converting clusters across production sites simultaneously, bringing new sites online, and making Kubernetes-orchestrated bare metal sustainable at the pace we're building – multiple GW annually.\n\n - See and own the entire fleet in real time, at any scale: build the observability and orchestration layer that makes hyperscale AI compute actually operable. Debug, tune, and performance-test infrastructure that grows by another site every few months.\n   \n    \n\n\nROLE SCOPE\n\n - Own compute fleet health end to end. Build the metrics pipelines, alerting, and unified health view that tell you the true state of every GPU in production — across Kubernetes-orchestrated workloads and bare metal, at scale.\n\n - Turn deployment/repair into a pipeline, not a procedure. Build and own the automation that takes a compute failure from detection through triage, parts management, and return to service. No one-off scripts, no heroics.\n\n - Design and expand the GPU qualification platform. Burn-in, performance baselining, and NPI execution for every new GPU generation. You define what \"good\" looks like before hardware goes into production.\n\n - Own Redfish and BMC tooling. Firmware-level telemetry, log collection at fleet scale, and the low-level access layer that repair automation and health tooling depend on.\n\n - Own end-to-end reliability, scalability, and operation of the compute fleet at-scale. Fluidstack is building one of the largest GPU fleets in the world and that can only be accomplished with aggressive automation, tooling, and incident discipline.\n   \n    \n\n\nWHAT WE'RE LOOKING FOR\n\nThe below is a starting point. We always make space for exceptional people, so if you don't fit this role exactly, tell us where you would. https://jobs.ashbyhq.com/fluidstack/05c2e69c-42f9-4fcb-9cf0-a467aaf98f1c\n\n - You treat toil as a bug. Manual steps in a repair workflow are a backlog item, not a job description.\n\n - You have an instinct for hardware. You're comfortable reasoning about failure modes at the firmware and silicon level, not just the software stack above it.\n\n - You move toward ambiguity, not away from it. You walk into the fog, build the map, and explain it to everyone else.\n\n - You learn at a steep slope. You reach real competence in an unfamiliar domain fast. We value this over existing expertise.\n\n - You carry a pager without flinching. You run the incident, write the postmortem, fix the systemic cause, and move on.\n\n - You're fluent with AI tooling. LLM APIs, MCP servers, and agentic frameworks, and you drive Claude Code, Cursor, or similar every day.\n\n - You've shipped production automation that other teams depend on, and you're comfortable in any language using AI coding tools.\n\n - Bonus: Hardware lifecycle management and RMA automation. BMC/Redfish or IPMI tooling. GPU qualification or burn-i","salary_min":175000,"salary_max":300000,"location":"San Francisco, CA","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["gpu","llm","agents","infrastructure"],"apply_url":"https://jobs.ashbyhq.com/fluidstack/474c1a81-c4ee-4504-8751-3ff9bee9759f/application","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-05T16:21:48.476Z","expires_at":"2026-08-15T14:15:09.581215Z","created_at":"2026-07-06T14:14:36.443794Z","updated_at":"2026-07-16T14:15:09.736855Z","company_name":"FluidStack","company_slug":"fluidstack","company_logo_url":"https://www.google.com/s2/favicons?domain=fluidstack.io\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/cbbcd488-a46d-4346-b071-7ae9c7a752dd"},{"id":"0402b0a6-e0aa-4326-9c18-9ffb95668d72","company_id":"e3915539-5a8f-4461-9f26-06366a918674","title":"Senior Advanced Research Scientist ","slug":"senior-advanced-research-scientist-1b17b815","description":"Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril’s family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center. As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.\n \n ABOUT THE TEAM \n Anduril’s Research Scientists excel at developing state-of-the-art algorithms and software that solve scientific problems with real-world applications. Working in small, innovative teams, our research scientists create impactful solutions that make a difference. Our research endeavors don’t end once we’ve published papers; our work is complete when our technology is deployed in mission-critical systems, ensuring success for our customers in government and industry.  Join us in our mission to expand the boundaries of what’s possible!\n WHAT YOU WILL DO \n \n Drive rapid prototyping initiatives for advanced R\u0026D projects, focusing on specialized algorithm development in the context of radar systems, video sensors, space-based sensing, and Command and Control (C2) systems.\n Utilize high-fidelity modeling and simulation tools to assess and quantify the impact of innovative technologies on system performance.\n Collaborate with cross-disciplinary teams to ensure seamless integration of software and hardware, optimizing system functionalities for radar systems.\n Implement rigorous software quality assurance processes, using various testing methodologies to ensure reliability and efficiency of developed solutions.\n Engage with stakeholders to align R\u0026D outcomes with mission-critical objectives, ensuring optimal performance and operational success.\n Mentor junior team members, fostering a culture of innovation and continuous improvement within the team.\n \n REQUIRED QUALIFICATIONS \n \n An M.S. or Ph.D. in Applied or Computational Mathematics, Electrical Engineering, Computer Science, Controls and Dynamical Systems, Aerospace Engineering, Physics, Statistics and Probability, or a related field.\n 2+ years of professional experience in embedded software/firmware engineering.\n Strong foundation in applied mathematics, including probability theory, optimization theory, linear algebra, and numerical analysis.\n Familiarity with functional programming languages (e.g., C/C++, Julia, Rust, Python, CUDA).\n Demonstrated experience in scientific computing, including algorithm implementation, optimization methods/theory, probabilistic/stochastic models, graphical models.\n Knowledge of digital signal processing (DSP) and image processing, as well as controls and estimation theory.\n Excellent written and verbal communication skills to convey complex technical concepts to diverse audiences.\n Adept at problem identification and principled approaches to problem formulation and solution.\n Effective data analysis, deep-diving, trouble-shooting.\n Open-minded, creative, imaginative.\n Agile learner.\n Enthusiastic collaboration, energized by driving team success.\n Ability to obtain and maintain a U.S. TS/SCI security clearance.\n \n PREFERRED QUALIFICATIONS \n \n Experience with either radar signal processing or image processing.\n Experience in GPU programming (CUDA programming) and rapid prototyping.\n \n We request transcripts as part of the early application process to understand your academic background and how your coursework supports the skills deemed critical for the role. Transcripts help us assess your technical and analytical abilities, complementing our interview process in which we also evaluate practical experience and cultural fit. If you choose not to share your transcripts, you will need to provide detailed information regarding your academic performance in relevant courses, including projects and coursework specifics, to ensure we evaluate your academic accomplishments properly. If you do provide academic transcripts, feel free to redact non-technical information (e.g., student ID, dates, non-technical coursework, etc.). Unofficial transcripts obtained online acceptable for this assessment. \n \n  \n  \n US Salary Range\n $190,000 — $252,000 USD \n The salary range for this role is an estimate based on a wide range of compensation factors, inclusive of base salary only. Actual salary offer may vary based on (but not limited to) work experience, education and/or training, critical skills, and/or business considerations. Highly competitive equity grants are included in the majority of full time offers; and are considered part of Anduril's total compensation package. Additionally, Anduril o","salary_min":190000,"salary_max":252000,"location":"Broomfield, CO","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"senior","tags":["gpu","computer-vision","cloud","payments","research"],"apply_url":"https://boards.greenhouse.io/andurilindustries/jobs/5179750007?gh_jid=5179750007","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-02T23:31:53Z","expires_at":"2026-08-15T14:07:43.700632Z","created_at":"2026-07-03T14:06:43.598002Z","updated_at":"2026-07-16T14:07:43.844075Z","company_name":"Anduril","company_slug":"anduril","company_logo_url":"https://www.google.com/s2/favicons?domain=anduril.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/0402b0a6-e0aa-4326-9c18-9ffb95668d72"},{"id":"b3a015af-b684-456c-a54e-896dc1546f22","company_id":"e3915539-5a8f-4461-9f26-06366a918674","title":"Lead Software Architect, Battlespace Awareness ","slug":"lead-software-architect-battlespace-awareness-f405c12e","description":"Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril’s family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center. As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.\n ABOUT THE TEAM\n At Anduril, our Battelspace Awareness Command and Control Software team specializes in solving complex, real-world problems through cutting-edge algorithms and intelligent software integrations. Operating in small, innovative teams, we push the boundaries of what's possible to deliver advanced technologies with mission-critical applications. Our commitment doesn't end with academic research or proof-of-concept experiments; we measure our success by the real-world impact of our deployed solutions. \n  \n ABOUT THE JOB \n We are looking for a Lead Software Architect to join our rapidly growing C2 Systems team in  Broomfield, CO.   In this role, you will be responsible for defining the architectural direction of large-scale, real-time, mission-critical C2 software that powers Anduril's air and missile defense and battlespace awareness capabilities. You will lead the design of high-performance systems spanning tactical edge deployments to distributed backend infrastructure, make critical trade-offs between performance, modularity, and maintainability at scale, and mentor senior engineers across the C2 organization . This will require deep expertise in  modern C++ and Rust , experience architecting numerics-heavy real-time systems, fluency with asynchronous/multithreaded programming (e.g., Tokio), and a strong grasp of the complex intersection of software and math — including state estimation, target tracking, optimization, and dynamic programming (MCP / Bellman equations) . If you are someone who thrives on ambitious defense problems, enjoys troubleshooting real-world systems where issues could range from electromagnetics to faulty bit encodings to incorrect math assumptions, and wants to build software that is beyond a proof-of-concept — part of real tactical code deployed to the warfighter — then this role is for you\n WHAT YOU’LL DO\n \n Define and drive the architectural vision for large-scale C2 software systems that ingest, fuse, and act on data from diverse sensors in real time\n Lead the design and implementation of performant, real-time, numerics-heavy algorithms in Rust and/or C++ at production scale\n Partner closely with research scientists to transition advanced algorithms (target tracking, state estimation, sensor-effector pairing, asset scheduling) from prototype into tactical, deployed code\n Make high-leverage architectural trade-offs across performance, modularity, testability, and maintainability for mission-critical, edge-deployed systems\n Mentor and technically lead senior engineers, setting coding standards, review processes, and CI/CD best practices across the team\n Engage directly with customers (DoD agencies, Army, Air Force, MDA, SCO) to ensure successful outcomes for mission-critical needs\n Troubleshoot complex, real-world system issues spanning software, math assumptions, sensor behavior, and networking\n Contribute to all phases of the software development lifecycle including prototyping, modeling \u0026 simulation, field testing, and deployment\n Help shape hiring and technical growth of the broader C2 organization as it hyper scales\n \n REQUIRED QUALIFICATIONS\n \n 10+ years of software engineering experience with a Bachelor's degree (or equivalent) in Computer Science, Applied/Computational Mathematics, Electrical Engineering, Aerospace Engineering, Controls/Dynamical Systems, Statistics, or related field\n Expert-level proficiency in modern C++ and/or Rust, including asynchronous and multithreaded programming (e.g., Tokio for Rust)\n Proven experience architecting large-scale, production-grade codebases in real-time or high-performance environments\n Deep experience writing performant, real-time software with numerics-heavy algorithms\n Strong foundation in applied mathematics: probability theory, linear algebra, optimization, differential equations (ODEs), and statistics\n Experience with CI/CD, unit testing, git version control, and microservices\n Eligible to obtain and maintain an active U.S. Secret security clearance\n \n PREFERRED QUALIFICATIONS\n \n M.S. or Ph.D. in a technical field (dual academic background in software + math highly valued)\n Domain expertise in target tracking, state estimation, Kalman filters, sensor fusion, or signal processing \n Prio","salary_min":219000,"salary_max":290000,"location":"Broomfield, CO","workplace":"onsite","remote_scope":"not_remote","job_type":"full-time","experience_level":"lead","tags":["llm","computer-vision","gpu","microservices","payments","cloud"],"apply_url":"https://boards.greenhouse.io/andurilindustries/jobs/5178351007?gh_jid=5178351007","is_featured":false,"is_sticky":false,"status":"active","published_at":"2026-07-01T19:26:15Z","expires_at":"2026-08-15T14:07:38.585728Z","created_at":"2026-07-03T14:06:38.649716Z","updated_at":"2026-07-16T14:07:38.71139Z","company_name":"Anduril","company_slug":"anduril","company_logo_url":"https://www.google.com/s2/favicons?domain=anduril.com\u0026sz=128","quality_score":90,"url":"https://aidevboard.com/job/b3a015af-b684-456c-a54e-896dc1546f22"}],"market_demand_pack":{"amount_cents":2900,"api_checkout_url":"https://aidevboard.com/api/v1/checkout?product_id=aidevboard_ai_skills_demand_pack","checkout_url":"https://aidevboard.com/market-demand-pack?qc=api-jobs-market-demand-pack\u0026utm_campaign=skills_demand_pack\u0026utm_medium=jobs_api\u0026utm_source=api","currency":"USD","description":"Full ranked public AI/ML demand CSV, source job URLs, and decision brief with market and offer angles.","fulfillment":"automatic_email_after_paid_checkout","human_checkout_url":"https://aidevboard.com/market-demand-pack?qc=api-jobs-market-demand-pack\u0026utm_campaign=skills_demand_pack\u0026utm_medium=jobs_api\u0026utm_source=api","name":"AI Market Demand Pack","next_step":"Open checkout_url for Stripe Checkout, or call api_checkout_url to get the non-charging checkout handoff payload.","price_usd":29,"product_id":"aidevboard_ai_skills_demand_pack","quote_url":"https://aidevboard.com/api/v1/quote?product_id=aidevboard_ai_skills_demand_pack"},"page":1,"per_page":20,"total":543,"total_pages":28}