Skip to main content
Back to glossary

Glossary

PHI (Protected Health Information)

Any health information that can be linked to an individual — names, dates, addresses, medical record numbers, biometric identifiers, and 18 specific identifier types under HIPAA.

PHI is the regulatory category that triggers HIPAA's safeguards. The definition is broader than most engineers expect: it covers not just diagnoses and treatments, but any of the 18 HIPAA identifiers (names, geographic subdivisions smaller than a state, dates more specific than a year, phone numbers, emails, SSNs, MRNs, account numbers, certificate numbers, vehicle IDs, device IDs, URLs, IP addresses, biometric identifiers, full-face photos, and any other unique identifier) when linked to health information.

The key practical points: an IP address logged alongside a clinical note is PHI. A cookie ID linked to a patient portal session is PHI. Free-text fields where users might paste anything are presumed to contain PHI. Once data is PHI, it stays PHI until it has been formally de-identified using one of HIPAA's two methods.

Most AI projects underestimate where PHI lives. A "health analytics" pipeline that ingests appointment timestamps and provider IDs is handling PHI. An LLM prompt log that captures user input is a PHI store. Designing the system to acknowledge this from day one is much cheaper than retrofitting compliance later.

Architecture Review