Principal AI Software Engineer, Enterprise AI Platform

Natera · Remote (US) · $174k - $218k
full-time principal Posted 4 months ago

About this role

Role Overview   The Enterprise AI Platform Engineer is responsible for build and delivery of Natera’s enterprise agentic AI platform. The enterprise AI platform will be used to prototype and build multiple agentic AI solutions using low code across Natera in a federated approach.This is a hands-on technical leadership role at the intersection of engineering excellence, low code platform design, and applied GenAI engineering.    You will architect and build the core AI operating system that powers modular, low-code  enterprise AI agentic automation that is complete with agent templates, agent orchestration engine, data and MCP connectors, prompt optimization capabilities, evaluation guardrails, abstracted AI services, intelligent data extraction and reasoning capabilities. The platform empowers citizen developers, business analysts, and engineers to prototype and test AI-powered workflows using a low-code interface, while enabling developers to extend functionality through a pro-code framework.   Key Responsibilities   Platform Architecture & Core Infrastructure Design: Design and implement the core architecture of the Enterprise AI Platform — low code, modular, scalable, and secure. Agentic Orchestration: Build the agent orchestration runtime, including task queues, state management, and inter-agent communication. Complexity: Architect for long-running, resilient AI workflows, enabling agents to execute and monitor multi-step, asynchronous processes. Low code abstraction: Develop APIs and services for automation, evaluation, and agent lifecycle management. Deployment: Establish DevOps, CI/CD pipelines, and configuration management to ensure smooth deployment at scale.   Low-Code/Pro-Code Experience Low-Code Interface: Build an intuitive visual builder that allows business users to compose agent workflows through drag-and-drop and configuration. Pro-Code Mode: Provide a developer extension layer where engineers can author and deploy agents in code (Python, TypeScript) directly into the same framework. Unified Runtime: Ensure both low-code and pro-code workflows share common infrastructure for orchestration, evaluation, and governance. Transparency & Debugging: Surface workflow traces, model evaluations, and output explanations directly in the user interface. Experimentation & Versioning: Support iterative experimentation, evaluation-based comparison, and rollback through integrated version control.   Agentic Orchestration & Long-Running Agents Orchestration Engine: Build a robust orchestration system supporting both short-lived agent calls and long-running AI agents that persist over time to automate complex processes. Workflow Automation: Enable orchestration of multiple agents with shared state, scheduling, dependency resolution, and event-driven execution. Enterprise Integration: Connect agents to core enterprise systems to perform real-world actions securely. Autonomy & Resilience: Implement mechanisms for persistence, checkpointing, recovery, and human-in-the-loop interventions. Human in the Loop Feedback: Design human-in-the-loop and self-assessment mechanisms for continuous prompt and workflow improvement. Evaluation as a Core Layer: Architect an evaluation-first framework for monitoring and improving AI agent performance across all workflows.   AI Services & Capabilities MCP Integration: Integrate with Model Context Protocols (MCPs) to enable plug-and-play connectivity with external systems and actions. Retrieval-Augmented Generation (RAG): Build services to retrieve information from unstructured data using vector databases and retrieval pipelines. Prompt Optimization & Evaluation: Implement automated systems for prompt tuning, evaluation, and feedback loops to ensure reliable results. Abstracted AI Services: Build modular APIs for AI services such as unstructured document processing, information retrieval, information summarization, data extraction, content generation, classification etc. Evaluation as a Core Layer: Architect an evaluation-first framework for monitoring and improving AI agent performance across all workflows. Reusable Components: Create shared, composable AI primitives (e.g., document loaders, semantic routers, extractors) to accelerate workflow design.   Governance, Security & Observability Enforce governance, security, and compliance principles (SOC2, HIPAA, GDPR) across all platform operations. Implement RBAC, audit logging, and lineage tracking for all data and agent interactions. Build observability tools for tracing, cost monitoring, and system performance metrics. Integrate evaluation-based guardrails that detect hallucinations, bias, or policy violations in real time. Metric Tracking: Create structured metrics dashboards (precision, recall, task success rate, cost efficiency) for every deployed agent.   Tech

Similar Jobs

Related searches:

Remote Jobs Principal Jobs Remote Principal Jobs Principal AI Agents & RAGPrincipal Data SciencePrincipal Healthcare AIPrincipal Backend & SystemsPrincipal Data EngineeringPrincipal Machine LearningPrincipal NLP & Language AIPrincipal AI Infrastructure llmcloudapi-designdistributed-systemssearchhealthcareragembeddings