Systems Integration Engineer - SW Focused Issue Triage & RCA

Agility Robotics · Hybrid- Fremont, CA · $170k - $221k
full-time mid Posted 1 month ago

About this role

Agility’s commercially deployed humanoids operate alongside teams in warehouses, manufacturing facilities, and distribution centers—tackling physically demanding and repetitive tasks while enabling workers to focus on higher-value work. With industry-leading safety standards and years of proven deployment data, we're pioneering a new era of automation that enhances human potential. Role Overview: We are seeking a Systems Integration Engineer specialized in Software Issue Triage and Root Cause Analysis (RCA). Your main function is to conduct remote triage, utilizing log parsing, telemetry data, and video analysis, to identify failures with software root causes and ensure they are accurately dispositioned to the appropriate SW development teams. You will conduct deep-dive root cause analysis on novel failures occurring at the hardware-software interface, while simultaneously architecting the diagnostic scripts and tools required to streamline these investigations. In this role you will move beyond basic data review to navigate ambiguous failure modes, develop automated diagnostic scripts, and create the technical documentation that drives software reliability across the fleet. Issue Triage Serve as a lead voice in the triage process, providing the expertise required to classify complex failures specifically as software, firmware, or system-level regressions. Effectively disposition identified issues to the software organization, providing clean tickets (logs, video clips, and analysis) that allow developers to act quickly. Manage and prioritize escalated SW-related investigations, making informed trade-offs to ensure that critical safety or performance risks are addressed first. Root Cause Analysis Lead end-to-end investigations into novel failures using deep-dive log review, telemetry analysis, and video diagnostics to pinpoint bugs at the software/hardware interface or unexpected system behaviors. Develop and execute scripts or other data visualization tools to parse massive log sets and identify intermittent failure trends. Leverage structured methodologies such as 5-Whys or Fishbone to move from a surface-level symptom to a definitive root cause.  Continuous Improvement Author and maintain "Gold Standard" RCA reports and troubleshooting guides that improve the technical autonomy of the broader triage team. Promote a culture of rigorous documentation and data-driven problem-solving. Create reusable diagnostic frameworks that automate the identification of known software issues, increasing the efficiency of the entire R&D loop. Qualifications: Experience: 4+ years of experience in Systems Integration, Software-Hardware interface, or R&D with a focus on software on complex mechatronic or autonomous systems. Proven experience using monitoring and observability platforms (e.g., Datadog, Splunk, or New Relic) to track system health and identify performance anomalies across a fleet. Experience interacting with cloud-based storage and databases (e.g., AWS S3, SQL, or NoSQL) to retrieve and manage large-scale telemetry and video datasets. Proven track record of navigating highly ambiguous software-hardware intersections to find definitive root causes. Experience creating technical documentation or bug reports intended for software engineering audiences. Preferred: Experience with HW/SW integration and design on HiL.  Technical Expertise: Mastery of log parsing via CLI and proficiency in using Python or similar scripting languages for data visualization and failure trend analysis. Familiarity with database environments, specifically regarding data retrieval and log management. Experience correlating video and/or HW symptoms with system telemetry to identify physical manifestations of software bugs. Strong understanding of software stacks in robotics, including communication protocols (e.g., EtherCAT, CAN) and how they manifest in system logs. Preferred: Experience with characterizing or troubleshooting  HW/SW interactions such as cameras, encoders, IMUs, or other sensors.  Skills: Ability to tackle  ambiguous, unprecedented problems and create reusable, scalable solutions. Capacity to operate independently on  initiatives and proactively anticipate the needs for effective and efficient triage and RCA. Exceptional ability to synthesize complex telemetry and video data into clear, actionable insights for software engineering stakeholders. Education: Bachelor’s or Master’s degree in Computer Science, Robotics, Electrical Engineering, or a related field. This is an hybrid position at our Fremont, CA office. The final salary offered to a successful candidate will be dependent on several factors that may include but are not limited to: job-related knowledge, skills, and experience. Agility Robotics is a multi-state employer and this salary range may not reflect positions who work in other locations. These range

Similar Jobs

Related searches:

Hybrid Jobs Mid-Level Jobs Hybrid Mid-Level Jobs Mid-Level AI InfrastructureMid-Level Robotics & Autonomy cloudrobotics

Get jobs like this delivered weekly

Free AI jobs newsletter. No spam.