Apr 2, 2026
Building AI in Regulated Industries: Key Lessons from a Private a16z Event
AI Engineering

Notes from a closed-door panel with engineers from Ramp, Rippling, Harvey, and Valon
Your AI demo crushed it. The prototype works. The use case is obvious. And then the enterprise prospect in healthcare, finance, or legal goes quiet for six weeks, comes back with a 40-page security questionnaire, and your pilot dies in procurement.
Or maybe you shipped something. Users try it once, raise an eyebrow, and quietly go back to the old spreadsheet. "We trust the AI for some things, but not that decision."
Sound familiar?
Building AI in regulated industries is one of the most common sources of founder frustration I hear about right now, and also one of the least well-documented problems in the builder community. Most AI content assumes you're shipping a chatbot or a coding assistant where "move fast and fix it" is a viable strategy. In healthcare, finance, law, mortgage servicing, or immigration? That strategy gets you sued, dropped by your insurance, or blacklisted by the only ten enterprise buyers in your vertical who matter.
Last night, I got into a16z's curated, invite-only event on exactly this topic. Thousands of applicants, a small group selected. As a software engineer and AI founder who has been building in regulated domains for years, I went with a specific list of questions and left with a notebook full of real answers from engineers at Ramp, Rippling, Harvey, and Valon who are doing this at scale.
Here's everything worth keeping.
The gap is real, and it's an architecture problem
There's a 34-point gap between AI adoption in tech companies (92%) vs. regulated industries (58%), according to a 2025 survey across 220+ organizations. That's not a "people are scared of AI" problem. As one panelist put it: "The AI adoption gap isn't a technology problem. It's an architecture problem."
Regulated industries face strict compliance mandates like HIPAA, PCI DSS, and FedRAMP that require security and compliance validation before deployment, not after. Tech companies can adopt cloud AI and fix security reactively. Financial services, healthcare, and government cannot.
But here's the flip side most people miss: that friction is also a moat. If you can crack a regulated vertical, you're playing a field most generalist AI companies cannot easily enter, and the stickiness once you're in is immense. The companies on this panel have figured out how to earn that access. Here's what they know.
Lesson 1: You don't have an AI strategy without a data strategy first
The most quotable line of the evening came from the Rippling team, drawing on prior experience at Palantir:
"A company does not have an AI strategy if it does not first have a data strategy. LLMs are only as good as the context and the resources and tools they can access."
This sounds obvious until you hear the backstory. At Palantir, the speaker spent years getting enterprise data pipelines in place for government and large enterprise clients. The one dataset they could never unlock, even with all that access? HRIS data. Who works at the company, who reports to whom, benefits, SSNs, tax allocations. Too regulated, too sensitive.
The insight is that unlocking that data (which Rippling does as its core product) enables entirely new categories of AI applications. Not incrementally better, but qualitatively different. When you can combine revenue data with people data, you stop showing a CRO "revenue by team" and start showing them "how individual quota attainment drove this org to outperform that one." That's a different product category entirely.
The practical order of operations before you think about which model to use:
What data does this decision actually require?
Where does that data live right now?
Is it structured, accessible, and fresh enough to trust?
If you're skipping to step 3 without steps 1 and 2, you're building on sand.
Lesson 2: Context engineering is the unlock everyone is sleeping on
Everyone talks about prompt engineering. The more precise and more powerful concept I heard throughout the evening was context engineering: assembling every piece of information a good decision requires into a single coherent context window, at decision time.
Ramp's policy agent is the clearest illustration. Instead of a manager reading 40 pages of company expense policy, comparing it to a transaction, and making a half-informed call, the model gets three things at once: the full transaction details, the complete policy document, and relevant historical approvals. The result is faster, more consistent, and more accurate decisions than humans produce. Not because the model is smarter than your manager, but because your manager was never reading all 40 pages. They were making calls on incomplete context. The model doesn't have that problem.
Think of it like a courtroom: the best lawyer wins not by being the cleverest person in the room, but by showing up with every piece of evidence organized and accessible. Context engineering is building that evidence prep into your AI system at a structural level.
Practical tip: Before building any AI feature, map out every data point a human expert would consult to make this decision well. Then design your system to assemble all of it into the context window at inference time. Missing data sources are not a prompting problem. They are a data infrastructure problem that has to be solved upstream.
One underrated step the panel surfaced: many companies don't even have a formalized policy document to begin with. The process has to start with interviewing stakeholders, pulling information out of people's heads, and consolidating it into a single written source of truth. AI plus a well-structured policy document beats humans. AI plus scattered, unwritten institutional knowledge? Not so much.
Lesson 3: Draw the deterministic boundary. This one is non-negotiable.
This was the most architecturally important insight of the evening and the most directly applicable for anyone building AI in regulated spaces.
The Valon team, who are rebuilding mortgage servicing infrastructure with AI, framed it clearly. There are two types of correctness in regulated systems:
Type 1: Deterministic correctness. If a customer doesn't have enough funds, the payment fails. No model judgment needed. No ambiguity acceptable.
Type 2: Judgment-based correctness. If a homeowner affected by a hurricane calls in to renegotiate their mortgage terms, there's a range of acceptable outcomes. This is where AI judgment belongs.
The key engineering insight: once the AI makes a Type 2 judgment, say, categorizing an ambiguous incoming payment, everything downstream must be deterministic. Ledger entries, journal entries, reconciliation, reporting. You hand off from AI reasoning to pure code the moment a decision is made.
The analogy I keep coming back to: a surgeon and an operating room. The surgeon makes the judgment call on how to proceed (the AI's domain). But the instruments are sterilized, the monitors are calibrated, the anesthesia dosing follows exact protocols. You do not ask the surgeon to also calibrate the equipment.
The practical payoff is significant. Giving your agent clean, well-defined interfaces for deterministic operations dramatically reduces the size and complexity of your system prompts, and accuracy improves sharply. One panelist mentioned their agent SOP shrank by roughly half once they stopped trying to teach the agent operations that should just be method calls.
Where things go wrong: Most regulated AI failures happen when LLM judgment creeps into operations that should be deterministic. That is where you get audit failures, compliance flags, and the kind of production incident that ends pilot conversations.
Lesson 4: Agent speciation is already happening. Pick your territory now.
Andrej Karpathy's concept of AI "speciation" came up in the panel. His 2025 year-in-review predicted that just as biological evolution produces animals specialized for specific ecological niches, the AI landscape will produce specialized agents for specific verticals and workflows. He called this one of the defining paradigm shifts of the year.
Ramp's team articulated what this means for product strategy with unusual clarity. General-purpose models will keep improving. But for finance operations specifically, no general model will have:
Deep integrations with the relevant financial data sources
Workflow-specific expertise and edge case handling built from years of customer data
Compliance guardrails tuned to the domain
Feedback loops from 50,000+ customers improving things like OCR stacks and policy agents over time
The same logic applies to legal AI (Harvey), mortgage servicing (Valon), and every other regulated vertical. The general model can draft a decent response. It cannot reliably navigate the specific judgment calls your domain requires without the specialized data, integrations, and feedback loops you build.
The honest question to sit with: what specialized knowledge, data, and feedback loop in your vertical can a general model simply not replicate? That's your moat. Build toward it deliberately rather than waiting for a foundation model to commoditize you.
Lesson 5: Domain knowledge is the hardest engineering problem nobody talks about
Great engineers and deep domain experts rarely overlap. The Venn diagram is small. One panelist put it bluntly: if you draw the intersection of "great software engineers" and "people who deeply understand mortgage servicing," you probably get one or two people in any given room.
So how do the best companies handle it?
Hire for curiosity over existing knowledge. Every company on this panel emphasized this. Valon, Harvey, Rippling, and Sphere all prioritize engineers who demonstrate genuine respect for the domain and a desire to go deep, even without prior expertise.
Embed engineers with domain experts. Valon flies engineers to Phoenix to sit with mortgage operators for a full week. Harvey has "Applied Legal Researchers" (ALRs) who participate in product design discussions, run customer implementations, and have been so effective that some have converted into PMs.
Build domain AI tools for your own engineers. This was the most underrated practical tip of the evening. Valon's CEO spent a single weekend downloading mortgage servicing rules from a non-user-friendly regulatory website and built a domain skill that engineers can now query directly. "How many days do I have to acknowledge a mortgage assistance application?" gets back a precise answer with the relevant regulatory citations. AI dramatically lowers the cost of domain knowledge acquisition for your engineering team. If you are not building internal tools like this, you are leaving significant leverage on the table.
Lesson 6: Forward-deployed engineering is the new enterprise motion
If you're building enterprise AI in complex regulated domains, you will eventually develop a forward-deployed engineering (FDE) function. This came up across both panel sessions and prompted the most audience questions.
The logic: you cannot reasonably expect enterprise customers to also be expert software engineers, understand your platform architecture, solve their domain-specific problem, and integrate everything themselves. FDE combines goods, services, and engineering judgment into a single motion where the customer pays for outcomes, not for software licenses or service hours.
What separates good FDE from a services trap: the best FDE teams are obsessive about generalization. They resist building bespoke one-off solutions. Every customer engagement is a signal for what core product should build next. Think of it as a flywheel: FDE surfaces what enterprise customers actually need, core product builds it for everyone, and the cycle compounds.
For earlier-stage founders: you are probably your own FDE team right now. That's fine, and it's actually a feature. The discipline to build is a clear mental model for distinguishing "what's unique to this customer" from "what every customer in this vertical needs." Hold onto that distinction in every customer call.
The hiring conversation (this part got spicy)
Every panelist had a different flavor of the answer, but the themes converged more than expected.
Drive, full stop (Ramp): Marc Andreessen wrote about hiring for drive back in 2009. It's more relevant now than ever. A highly driven person plus an AI co-pilot is an output multiplier that is genuinely hard to overstate. The way they screen for it: ask candidates about the hardest period they've worked. Quantify it. The delta between candidates is large, and it tells you more than any technical screen.
Curiosity, coachability, ambition (Rippling): In a world where factual recall is largely outsourced to LLMs, the traits that actually compound are the ability to keep up with rapidly evolving tooling, absorb feedback and change course, and bring genuine personal ambition to the work. Not being able to do something is not a red flag. Not responding well to feedback is.
High agency and ownership (Valon): The constraint at an early-stage startup is not the number of problems to solve. It's problem-solving throughput. High-agency people maximize that throughput. They don't wait to be unblocked. Screen for this explicitly, because it's not always correlated with credentials or seniority.
Generalists who can go deep (Harvey): For horizontal work that occasionally pulls you into vertical depth, you need people who can switch modes without losing momentum. Harvey operates at what was described as a "marathon-sized sprint" pace, 100% intensity constantly. Energy and decisiveness are load-bearing character traits.
One meta-observation across all four companies: no one mentioned evaluating for static knowledge. The ability to answer specific questions is increasingly table stakes. How you think, learn, and operate under pressure is the actual signal.
The meta-lesson that runs through everything: fit before force
Every success story from this panel involved AI fitting naturally into an existing workflow, rather than being pushed into one.
HR admins at Rippling are not grudgingly adopting AI. They are actively asking for more of it, because it frees them from being a help desk for "what is my PTO policy?" questions that probably should not have required a human in the first place. Ramp's policy agent works because it has more context than any human reviewer consistently had.
When users understand what the AI is doing, where it fits, and where it does not, adoption becomes a pull rather than a push. Resistance almost always means one of two things: you're asking AI to do something it shouldn't, or you haven't communicated the fit clearly enough to the people it affects.
This is, honestly, the most important thing to internalize before your next enterprise pilot conversation.
The TL;DR
If you're building AI in a regulated industry, here's the checklist:
Fix your data strategy first. No model compensates for fragmented, untrusted data infrastructure.
Practice context engineering, not just prompt engineering. Assemble complete decision context structurally, before you worry about the model.
Draw a hard deterministic boundary. AI makes the judgment call. Code handles everything downstream. Never blur this line.
Specialize in your vertical. General models won't have your integrations, guardrails, domain feedback loops, or institutional knowledge.
Build internal domain knowledge tools for your engineers. AI makes this cheap. A weekend of work can save months of onboarding ramp-up.
Build or plan for FDE as your enterprise motion. When deal complexity requires it, you need engineers operating inside customer environments.
Hire for drive, curiosity, and high agency. Raw knowledge is increasingly a commodity. How people think and operate under pressure is not.
Building AI in regulated industries is one of the hardest and most interesting engineering problems of this era. The compliance overhead is real, but so is the moat once you're through it.
If you're building in legal, healthcare, finance, mortgage, or any other vertically complex space, I'd genuinely love to compare notes. These problems are hard and the conversations get more useful the more specific they are.
If you want to follow along as I write more about production AI in high-stakes domains and the technical decisions that actually matter at this stage, subscribe to my newsletter. This is the first of many.