LLMs in Clinical Settings: What the FDA, HIPAA, and Your Hospital Client Actually Require

Shahid MansuriJune 08, 202615 min read

LLMs in Clinical Settings: What the FDA, HIPAA, and Your Hospital Client Actually Require

Where the FDA draws the line between a clinical LLM and a regulated medical device
Three HIPAA risks LLMs create that most compliance programs overlook
What hospital procurement teams review before approving an LLM product
A practical pre-deployment checklist to use before any hospital discussion

A common scenario that keeps repeating across HealthTech in this era of AIs now is- A founder or CTO builds a genuinely useful clinical LLM product. It summarizes notes and helps with documentation. It flags potential drug interactions. The demo goes well and the hospital is interested too.

But, when the IT security team sends a 90-question vendor questionnaire. Along with the clinical governance committee asking for the FDA regulatory classification and for HIPAA Business Associate Agreement.

Everything goes to complete stop. Not because the product is bad. Because the founder and the team built an impressive AI product forgetting to build a compliant AI product first.

Expert Insight

Under HIPAA, unauthorized disclosure of PHI can lead to penalties ranging from $141 to over $2.1 million per violation. Healthcare data breaches affected over 168 million individuals in 2024, driven largely by large-scale cyberattacks on healthcare vendors and clearinghouses.

When an LLM is in that chain, the liability does not sit with the model. It sits with you.

This guide covers what the FDA, HIPAA, and your hospital clients actually require before an LLM goes anywhere near a clinical workflow.

Understand What Makes a Clinical LLM Different From Any Other LLM

Unlike general AI models trained on internet content, medical LLMs learn from PubMed literature (35 million+ biomedical citations), clinical documentation, and electronic health records. But the data the model was trained on is not what determines your compliance obligations. What determines your obligations is what the model does with patient data in deployment.

The moment your LLM touches, processes, generates, or transmits Protected Health Information (PHI), HIPAA applies. The moment it produces outputs that influence clinical decisions about specific patients, FDA oversight becomes relevant.

These are two separate regulatory questions that require separate answers.

LLM Use Case	HIPAA Applies?	FDA Oversight?
Ambient documentation (clinical note drafting)	Yes (processes PHI)	Generally no, if not decision support
Discharge summary generation	Yes	Generally no
Prior authorisation letter drafting	Yes	Generally no
Drug interaction checking	Yes	Yes (likely SaMD)
Diagnostic suggestion from patient history	Yes	Yes (SaMD, moderate to high risk)
Clinical risk stratification	Yes	Yes (SaMD)
Chatbot answering patient symptom questions	Yes	Depends on clinical specificity
Billing code suggestion (ICD-10)	Yes	Generally no
Research literature summarization (no PHI)	Maybe	No

The most dangerous assumption in HealthTech right now is that an LLM used "just for documentation" sits outside the regulatory perimeter. If it processes PHI, it is inside HIPAA. If its output influences a clinical decision, it may also be inside FDA oversight.

What HIPAA Actually Requires From Your LLM Architecture

Traditional HIPAA compliance focused on database access controls and encryption. LLMs introduce three distinct attack surfaces that standard compliance programmes do not automatically cover.

Attack Surface 1: PHI in Training Data

When you fine-tune a model on clinical notes, PHI can become embedded in model weights. Large language models can reproduce verbatim text from training data under adversarial prompting.

This means:

Any LLM fine-tuned on patient records without proper de-identification is a standing HIPAA violation, regardless of how the model performs clinically
De-identification must meet the HIPAA Safe Harbor standard (removing 18 specific identifiers) or Expert Determination (statistical proof that re-identification risk is very low)
If your model was pre-trained or fine-tuned by a third party on clinical data, you need documented evidence of how that data was handled before you can deploy the model for your own clinical use cases

Attack Surface 2: PHI in Inference Calls

Every time your application sends patient data to an LLM API, that data is transmitted to the model provider's infrastructure. If that infrastructure is not covered by a signed Business Associate Agreement (BAA), the transmission is a HIPAA violation.

BAA status for major LLM providers as of 2026:

Provider	HIPAA BAA Available?	Notes
OpenAI API (direct)	No	No BAA available for direct API access
Azure OpenAI Service	Yes	BAA available through Microsoft enterprise agreements
Google Vertex AI (Gemini)	Yes	BAA available through Google Cloud healthcare agreements
Anthropic Claude API	Yes	BAA available for enterprise agreements
AWS Bedrock	Yes	BAA available through AWS healthcare agreements
Self-hosted open-source (Llama, Mistral)	Not applicable	You control the infrastructure. HIPAA applies to your setup, not a provider

While the OpenAI API is not currently compliant with HIPAA, Azure services provide HIPAA-compliant access to OpenAI's models. Similarly, Anthropic provides HIPAA-certified API hosting for its Claude models.

The practical implication: if you are calling an LLM API directly and sending any patient data in the prompt, check whether a BAA exists. If it does not, you are in violation regardless of what your privacy policy says.

Attack Surface 3: PHI in Model Outputs and Logs

LLM outputs that contain patient information are themselves PHI if they identify or could identify a specific patient. This means:

Application logs that capture LLM inputs and outputs must be treated as PHI
Any analytics or monitoring on LLM responses that stores raw text must be on HIPAA-compliant infrastructure
Caching of LLM responses that contain patient-specific content must follow HIPAA retention and deletion rules

What HIPAA requires from your LLM infrastructure:

All PHI processed by the LLM is on HIPAA-eligible infrastructure with a signed BAA
Encryption in transit (TLS 1.2 minimum) for all API calls containing PHI
Encryption at rest for all stored inputs, outputs, and logs
Audit logging: who accessed what, when, and what the LLM produced
Role-based access controls on all systems that handle LLM inputs or outputs
A HIPAA risk assessment that specifically addresses your LLM use cases

What the FDA Requires: SaMD Question for LLMs

The FDA's approach to LLMs in clinical settings flows from its existing Software as a Medical Device framework. The key question is the same one covered in our guide on building compliant AI for HealthTech products: does the LLM output inform, drive, or replace a clinical decision about a specific patient?

For LLMs specifically, the FDA's 2024 Clinical Decision Support (CDS) guidance provides the clearest framework:

Non-Device CDS (generally not requiring FDA clearance):

The LLM displays standard medical reference information (like a drug reference tool)
The clinician can independently verify the basis of the recommendation without relying on the LLM
The recommendation does not acquire patient-specific data beyond basic demographics to operate

Device CDS (likely requiring FDA clearance):

The LLM acquires, processes, or analyses patient-specific data to provide a recommendation
The basis of the recommendation is not independently verifiable by the clinician using the LLM
The recommendation is intended to replace or reduce the clinical judgement required from the clinician

For LLM products specifically, this means:

LLM Feature	Non-Device or Device CDS?	Regulatory Implication
Summarising a patient record without clinical interpretation	Non-Device	No clearance required
Generating a differential diagnosis list from patient history	Device CDS	510(k) or De Novo likely required
Drafting a clinical note from ambient audio	Non-Device (if no clinical recommendation)	No clearance required
Flagging abnormal lab values with clinical interpretation	Device CDS	FDA clearance pathway needed
Answering patient symptom questions with clinical guidance	Device CDS if specific to patient	Clearance likely required
Extracting structured data from unstructured clinical text	Non-Device	No clearance required

The adaptive LLM problem:

Most modern LLMs can be updated, retrained, or prompted differently after deployment. This is the adaptive AI issue that the FDA's Predetermined Change Control Plan (PCCP) framework addresses.

If your LLM can change its clinical behaviour after deployment, you need a PCCP filed with the FDA as part of your clearance submission. Teams that deploy a fixed-version LLM and then update it without a PCCP are creating a significant regulatory problem.

What Your Hospital Client Actually Checks

FDA classification and HIPAA compliance are the regulatory requirements. But hospital clients add a layer of practical requirements on top of these that many product teams are not prepared for.

A typical hospital enterprise procurement for a clinical LLM product will include:

From the IT Security Team:

Evidence of HIPAA-eligible infrastructure and signed BAA with your LLM provider
SOC 2 Type II report for your product
Penetration test results within the last 12 months
Data flow diagram showing exactly where PHI goes: from EHR to your application, into the LLM API, and back
Confirmation of data residency (especially for UK NHS and GCC hospital systems)
Confirmation that no patient data is used to train or improve the model without explicit consent

From the Clinical Governance Committee:

FDA regulatory classification document confirming whether clearance is required and its status
Clinical validation data: how was the model tested and what were the results on your target patient population
Hallucination rate and what controls are in place to detect and handle incorrect outputs
Version control policy: what version of the model is deployed and how are updates managed
Audit trail: how do you log which model version produced which clinical output for which patient

From the Legal and Procurement Team:

HIPAA Business Associate Agreement
Data Processing Agreement (DPA) for UK and EU deployments under UK GDPR or EU GDPR
Indemnification terms covering AI-related clinical errors
Subprocessor list: every third party that handles PHI on your behalf, including the LLM provider

93% of healthcare organisations were hit by a cyber attack in the previous 12 months. 35% of respondents identified employee failure to follow policies as the main reason behind data loss. Hospital procurement teams know these numbers. Their questions reflect that awareness.

Read More: UK Digital Health 2026: DTAC, MHRA, NHS Digital- What Founders Get Wrong

International Compliance: UK NHS and GCC Requirements

UK NHS requirements for LLM products:

NHS Digital's Data Security and Protection (DSP) Toolkit is the baseline compliance framework for any product handling NHS patient data. For LLM products specifically:

UK GDPR applies to all PHI processing (separate from EU GDPR post-Brexit)
The MHRA's SaMD guidance mirrors FDA logic for decision-support AI
NHS Digital's guidance on AI and data ethics applies to any LLM used in clinical care
Data processed for NHS patients must remain within UK or adequately protected jurisdictions
NICE evidence standards for AI products are increasingly referenced in NHS procurement

GCC requirements for LLM products:

Saudi Arabia's PDPL requires that patient data is processed on Saudi-resident infrastructure. An LLM API call that routes PHI through a US or EU data centre for a Saudi patient is non-compliant regardless of model quality.
UAE's MOHAP digital health framework is developing specific AI guidance. Current expectation is that PHI remains on UAE-based infrastructure.
Saudi Arabia's NCA ECC 2:2024 adds mandatory cybersecurity controls that extend to AI systems handling health data.
SFDA guidance for SaMD applies to any LLM that qualifies as a medical device under Saudi Arabia's framework.

Pre-Deployment Checklist for Clinical LLM Products

Work through this before your next hospital conversation or procurement submission.

FDA SaMD classification completed. Documented as Device CDS or Non-Device CDS with rationale

FDA SaMD classification completed. Documented as Device CDS or Non-Device CDS with rationale.

If SaMD: regulatory pathway identified (510(k), De Novo, or PMA)

If SaMD: regulatory pathway identified (510(k), De Novo, or PMA).

If adaptive model: PCCP filed or in progress

If adaptive model: PCCP filed or in progress.

UK MHRA classification confirmed if selling to NHS

UK MHRA classification confirmed if selling to NHS.

SFDA/MOHAP classification confirmed if selling in GCC

SFDA/MOHAP classification confirmed if selling in GCC.

BAA signed with every LLM provider that receives PHI

BAA signed with every LLM provider that receives PHI.

All infrastructure confirmed HIPAA-eligible (not just cloud provider, but specific services used)

All infrastructure confirmed HIPAA-eligible (not just cloud provider, but specific services used).

Training data de-identified under Safe Harbor or Expert Determination if fine-tuned on clinical data

Training data de-identified under Safe Harbor or Expert Determination if fine-tuned on clinical data.

Data flow diagram produced showing PHI flow from source to LLM to output to storage

Data flow diagram produced showing PHI flow from source to LLM to output to storage.

HIPAA risk assessment covers LLM-specific use cases explicitly

HIPAA risk assessment covers LLM-specific use cases explicitly.

UK GDPR DPA in place for NHS deployments

UK GDPR DPA in place for NHS deployments.

Data residency confirmed per market (UK, KSA, UAE)

Data residency confirmed per market (UK, KSA, UAE).

Encryption in transit (TLS 1.2 minimum) for all LLM API calls

Encryption in transit (TLS 1.2 minimum) for all LLM API calls.

Encryption at rest for all stored inputs, outputs, and logs

Encryption at rest for all stored inputs, outputs, and logs.

Audit logging per inference: model version, timestamp, user, input category, output category

Audit logging per inference: model version, timestamp, user, input category, output category.

Model version control: you can answer which model version produced any specific output

Model version control: you can answer which model version produced any specific output.

Hallucination detection or confidence scoring in output layer

Hallucination detection or confidence scoring in output layer.

Role-based access controls on all systems handling LLM inputs and outputs

Role-based access controls on all systems handling LLM inputs and outputs.

SOC 2 Type II report available

SOC 2 Type II report available.

Penetration test completed within 12 months

Penetration test completed within 12 months.

Clinical validation data documented with methodology and results

Clinical validation data documented with methodology and results.

FDA regulatory status document prepared

FDA regulatory status document prepared.

HIPAA BAA template ready to sign

HIPAA BAA template ready to sign.

Subprocessor list current and available

Subprocessor list current and available.

Conclusion

Building an LLM for healthcare is no longer just an AI challenge. It is a compliance, security, and regulatory challenge from day one. A model that summarizes notes today can quickly become a regulated clinical decision support tool tomorrow, depending on how it is used.

At the same time, HIPAA obligations extend far beyond databases to include training data, prompts, outputs, logs, and every vendor involved in the AI workflow. Hospital buyers understand these risks and increasingly evaluate AI products through the lens of governance, security, and clinical safety rather than model performance alone.

Teams that address FDA classification, HIPAA requirements, data residency, auditability, and procurement readiness early gain a significant advantage. Before your next hospital conversation, ensure your LLM architecture is designed not only to deliver value but also to withstand regulatory scrutiny and enterprise due diligence.

Frequently Asked Questions

No. Only LLMs that qualify as SaMD by driving clinical decisions about specific patients.

No. OpenAI's direct API has no BAA available. Use Azure OpenAI instead.

LLMs can produce confident but incorrect outputs. Incorrect clinical information in a patient record creates liability.

Yes. They are separate post-Brexit. NHS deployments require UK GDPR compliance, not EU GDPR.

A Predetermined Change Control Plan. Required when your FDA-cleared LLM will update its behaviour after deployment.

Yes, if the output could identify a patient through other included details (dates, conditions, provider names).

Liability depends on your contract terms, FDA classification, and whether your outputs were presented as clinical recommendations.

NCA ECC 2:2024 applies to all health AI systems in KSA. Data must be on Saudi-resident infrastructure.

Your Clinical LLM Needs More Than a Good Demo to Pass Procurement

Know exactly where your LLM product stands on FDA, HIPAA, and hospital procurement requirements before your next enterprise deal.

Book Your 45 Min Free Audit →

In this guide, you’ll learn: