Founding AI Engineer

Contextual AI · Remote

full-time mid Posted 3 months ago

Apply Now Stand out: build a proof-of-work pitch →

Free GitHub-based preview. Direct apply stays one click away.

Get weekly job alerts like this →

Hiring for this role?

AI Market Demand Pack · $29 one-time

Compare this role's skills with the full AI hiring market. Get ranked demand, salary bands, leading companies, public source URLs, and a decision brief.

See the live sample →

data-pipeline rag embeddings reinforcement-learning agents mlops fine-tuning llm

About this role

JOB DESCRIPTION Remote (Open to Berlin move ideal) • Full-time • Pre-seed, $1.8M raised We're offering significant “founding-team-level” equity TL;DR - We're building the first AI that can actually complete real design tasks on existing products—understanding context, design systems, and layout to generate production-quality work. We just raised a $1.8M pre-seed and are now building out our founding team. You'll architect the agentic intelligence system that makes this possible. Unique “founding team” opportunity with significant equity. The Gap We're Filling Right now there are two types of AI tools: - Figma Make/Lovable/v0: Create designs from scratch. Can't touch your actual product. - Cursor: Requires repo access. 90% of product stakeholders don't have and shouldn't have it. The massive gap: No AI can actually iterate on real products—log into your live app, understand your design system, analyze the surrounding context, and generate “senior-level quality” designs that match your product's language. PMs are stuck waiting days for designers to add simple components. We're building the tool that finally lets non-technical product teams directly edit their live product's front-end. No repo access. No starting from scratch. Just: enter your URL, describe what you need, get production-ready designs. Here’s a 2m demo video: https://www.loom.com/share/8a2101511cc64eff8382adebe20e4173 Why This Is Genuinely Hard Anyone can generate random UI. We're solving something no one else has: - Context-aware design intelligence - Understanding a product's existing design system, component patterns, layout structure, and brand conventions, then generating new designs that feel native, not generic. - Codifying design excellence - The principles, methods, and approaches used by world-class product designers aren't written down anywhere. Your job is to encode human design expertise—how elite designers actually think, what makes design feel right, why certain approaches work—into AI systems. - Consistent quality at scale - Anyone can get one good output. We need every output to be good enough to ship. That means understanding context (design systems, product patterns, brand language), applying sophisticated reasoning, and hitting production quality every single time. - Speed without compromise - Designers take days because quality demands it. We need to compress that thinking into seconds without losing what makes it good. - The bar: Designs so good that PMs trust them over waiting for their designer. Not "good for AI"—actually good. Your job: Shape agentic capabilities that consistently produce designs worth shipping. What You'll Do In one sentence: Build the agentic reasoning system that turns design expertise into production AI. - Own our multi-agent orchestration layer—how specialized AI systems collaborate to think through design problems - Design RAG pipelines that retrieve and apply design patterns at the right moments - Build evaluation frameworks that measure "senior designer quality" at scale - Experiment with frontier approaches—we need breakthrough results, not incremental gains - Shape technical strategy—you're defining what's possible, not just executing Success in 6 months: Our output consistently matches or exceeds what mid-level designers produce—in seconds instead of days. You're a Great Fit If: You've solved the hard problems, not just used the frameworks: - You've designed RAG systems from scratch—made real architectural decisions about chunking strategies, embedding models, retrieval approaches, and re-ranking. You know why naive similarity search fails and what actually works. - You've debugged agentic systems in production—dealt with loops, hallucinations, tool selection failures, context window management. You know where multi-step reasoning breaks and how to make it reliable. - You've built your own evaluation frameworks—not just vibes-based testing. You've measured quality systematically because you needed to know if changes actually improved things. - You can explain what went wrong in past projects—you've hit the wall where "just add more context" stops working, where prompts become unmaintainable, where latency kills UX - Bonus points: Experience with reinforcement learning—we're building systems that learn from user choices and improve over time Technical requirements: - Deep LLM expertise: You understand model inference, fine-tuning, context optimization, and prompt engineering at a production level—not just API calls - ML infrastructure: Experience with MLOps/LLMOps practices, model training pipelines, and deployment at scale - Core skills: Python, experience with neural architectures (transformers, RNNs, CNNs), data pipelines (ETL) - Cloud platforms: Production experience with AWS/GCP/Azure - Bonus: Reinforcement learning experience—we're building system