In the healthcare domain, data is pivotal for effective patient care, operational efficiency, and regulatory compliance. This data comes in various formats, each with unique characteristics and uses. Understanding these formats—structured, unstructured, and semi-structured—is essential for healthcare professionals and organizations to manage and utilize information efficiently. In this article, we will explore the specifics of structured, semi-structured, and unstructured data in healthcare, highlighting examples of each format, their characteristics, and key differences between them. Finally, we’ll walk you through Snowflake’s capabilities with regard each format, along with some best practices and use cases specific to the AI Data Cloud for Healthcare and Life Sciences.
Structured Data in Healthcare
Structured data in healthcare is highly organized and formatted to be easily searchable and analyzable. This type of data is crucial for clinical decision support systems, patient management, and reporting.
Examples:
- Electronic Health Records (EHR): Structured data about patient demographics, medical history, medications, immunizations, lab results, and vital signs.
- Medical Billing and Coding: Data related to insurance claims, procedure codes (CPT), diagnosis codes (ICD), and billing records.
- Clinical Trials Data: Data collected during clinical trials, including patient enrollment, treatment plans, and outcomes.
Characteristics:
- Fixed Schema: Adheres to predefined models and standards (e.g., OMOP).
- Easily Searchable: Can be quickly retrieved and queried using clinical databases.
- High Level of Organization: Structured into rows and columns, making it suitable for statistical analysis and reporting.
Unstructured Data in Healthcare
Unstructured data in healthcare includes diverse information that does not fit into traditional databases but provides valuable insights into patient care and outcomes.
Examples:
- Clinical Notes: Doctors’ notes, discharge summaries, and progress notes.
- Medical Images: X-rays, MRIs, CT scans, and other diagnostic imaging.
- Pathology Reports: Detailed reports from laboratory tests and biopsies.
- Patient Correspondence: Emails, messages, and communication logs between patients and healthcare providers.
Characteristics:
- No Fixed Schema: Lacks a consistent format and structure.
- Flexibility: Can take various forms, such as text, images, and audio, capturing the complexity of patient information.
- Complexity: Requires advanced techniques like natural language processing (NLP) and image recognition to analyze.

Semi-Structured Data in Healthcare
Semi-structured data in healthcare has some organizational properties but does not conform to a rigid schema. It is often used to facilitate data exchange and integration across different systems.
Examples:
- C-CDA (Consolidated Clinical Document Architecture): Standard for the exchange of healthcare information that provides a common structure for clinical documents.
- FHIR Resources: Fast Healthcare Interoperability Resources (FHIR) for electronic exchange of healthcare information.
- Clinical Documentation: Structured fields within narrative text in EHRs.
Characteristics:
- Flexible Schema: Contains tags or markers that define data elements.
- Hierarchical: Often represented in a tree-like structure, allowing for complex data relationships.
- Interoperability: Facilitates communication and data exchange between disparate healthcare systems.
Key Differences in Healthcare
- Structured Data: Essential for routine clinical tasks, regulatory reporting, and decision support; examples include patient records and billing information.
- Unstructured Data: Captures the richness and complexity of patient interactions and medical diagnostics; examples include clinical notes and diagnostic images.
- Semi-Structured Data: Balances structure and flexibility, supporting data integration and interoperability; examples include C-CDA documents and FHIR resources.
Use Cases in Healthcare
- Structured Data: Managing patient records, billing, and clinical trial data; supporting decision-making and regulatory compliance.
- Unstructured Data: Analyzing clinical notes for patient insights, interpreting medical images, and improving diagnostics through machine learning.
- Semi-Structured Data: Facilitating data exchange between EHR systems, integrating patient data from various sources, and ensuring interoperability.
Understanding the format of healthcare data is crucial for improving patient care, enhancing operational efficiency, and ensuring compliance with regulatory standards. Each type of data format presents unique challenges and opportunities in the healthcare domain.

Data Formats in Healthcare Standards of Interoperability
Healthcare standards of interoperability provide guidelines for implementing semi-structured data. However, the format of the content is not always recognized as semi-structured when ingesting or transforming data. For example, HL7v2 and EDI messages are semi-structured, but follow a specific structure unique to their respective standards. Another example is DICOM images, the standard used to store medical images and associated metadata about the patient and image. All these examples require specific processing to leverage the semi-structured nature that the data supports.
Leveraging Structured and Unstructured Data in Healthcare with Snowflake and Hakkōda
Snowflake supports both structured (tabular) and semi-structured (JSON and XML) data formats. This capability facilitates processing healthcare data such as structured data ingested from Epic or OMOP-compliant data, and semi-structured data in the FHIR (See FHIR Data Loader) or CCD-A formats.
For other healthcare standards of interoperability, Snowflake still provides support, but they must be handled in a specific way to unlock the information contained in the data. The data must first be loaded as unstructured data. For certain standards like HL7v2 and EDI, the order of the message matters and must be maintained. Once the unstructured data is loaded, it can be processed and transformed into a fully structured format. This transformation is accomplished by leveraging Snowflake’s Snowpark capability, which not only converts the data into a fully structured format but also supports the dynamic nature of these standards. This flexibility allows for data expansion down to the lowest primitive level (sub-element, subcomponents), enabling reporting, analysis, and data enrichment.
Snowflake also supports working with DICOM images. DICOM-formatted images can be processed for various purposes, from extracting the metadata in the images for analysis and reporting to processing the images through an ML model to assist in anomaly detection. Hakkoda has published a great example of Snowflake’s DICOM support in action see Implementing image Classification on Snowflake.
Hakkoda offers flexible teams, ramping up or down as your organization’s needs change and as projects progress from building to monitoring. This means that we can start with a larger team when hands on the ground are needed most, then scale down as time goes on to key players needed to support unlocking the information in your healthcare data.
Hakkoda offers all the skill sets your team would need—from data engineers to set up the data, to data scientists to build it, to industry experts and change management leads to take a standards of interoperability focused solution from a great idea to a fully integrated tool that can be used in clinician workflows.
Are you ready to unlock your healthcare data? Hakkoda’s healthcare data experts are here to help. Let’s talk today.
The post Data Formats Demystified: Working with Structured, Semi-Structured, and Unstructured Data in Healthcare appeared first on Hakkoda.