Healthcare DWH & ETL Design

My engineering thesis - A data warehouse project in a hospital.

Project info


Data Engineering & Analysis

2019/12/01

Introduction

A brief project description or quote to summarize its goals.

“Design and implementation of a data warehouse on the example of a healthcare institution. Original, polish title: 'Projekt i implementacja hurtownia danych na przykładzie instytucji służby zdrowia.'”
— Arek

Explore the Project

Tips

Tech details - Go to general technical info about components (on this page)

GitHub - View the project repository

Docs - Open detailed documentation


About

Healthcare DWH & ETL Design

This project represents my engineering thesis, where I designed and implemented a data warehouse for a healthcare institution. The original Polish title is: "Projekt i implementacja hurtownia danych na przykładzie instytucji służby zdrowia."

ABSTRACT

In this thesis, I described the design and implementation of an OLAP system with a multidimensional data model based on data from the Hospital Information System (HIS). Complementary Oracle technologies were used for the project: their relational database in versions 11g and 18c, SQL, and its procedural extension - PL/SQL - which enabled the construction of the ETL process. In this thesis, I focused on designing and programming the database, and I also presented the analytical functions of the query language and the possibility of visualizing reports in the Oracle APEX application. The work is comprehensive, showing the successive stages of working with data, from extraction to preparation in the data warehouse and usage in the client application.

Thesis - Table of Contents

  1. Introduction
  2. Aim and Scope of the Thesis
  3. Overview of Key Theoretical Concepts
    3.1. Database
    3.1.1. Normalization
    3.1.2. Relational Database Management System
    3.1.3. SQL – Structured Query Language
    3.1.4. Oracle PL/SQL
    3.1.5. Database Modeling
    3.2. Logical and Hardware Architecture of the Application
    3.3. HIS – Hospital Information System
    3.4. Differences Between OLTP and OLAP Processing
    3.5. Data Warehouse
    3.6. OLAP Cube
    3.7. RAD Methodology
  4. System Architecture and Working Environment
    4.1. Applied Applications, Technologies, and Their Use
    4.2. Data Warehouse System Architecture
    4.3. Design and Implementation of the Multidimensional Data Warehouse Model
  5. ETL and Transitional Area
    5.1. ETL Process Specification
    5.2. Design of the Transitional Area Schema Model
    5.3. Technical Implementation of ETL
    5.4. Stage 0 – Table Cleaning
    5.5. Stage 1 – Extraction, Procedures of the EKSTRAKCJA_PKG Package
    5.6. Stage 2 – Data Transformation
    5.6.1. Fact Table: POBYT
    5.6.2. Dimension: CZAS_DIM
    5.6.3. Dimension: PATIENTS_DIM
    5.6.4. Dimension: GENDER-AGE
    5.6.5. Dimension: UNIT
    5.6.6. Dimension: TERYT
    5.6.7. Dimension: DOCTOR
    5.7. Stage 3 – Data Loading
    5.8. Task Scheduling
  6. Client Application and Predefined Analytical Queries
  7. References

Features

  • Data Warehouse Design: Designed a healthcare-specific data warehouse with an emphasis on analytics and reporting.
  • ETL Pipeline: Implemented an Extract, Transform, Load (ETL) process to populate the warehouse with hospital data.
  • Data Analysis: Utilized advanced PL/SQL and SQL analytic functions to analyze data for healthcare reporting and decision-making.
  • Oracle APEX: Deployed interactive dashboards and reports using Oracle Application Express (APEX).
  • Bulk Processing: Optimized ETL performance with bulk processing techniques.

Tech Stack

  • Database: Oracle DB 12c
  • ETL: PL/SQL, Oracle DBMS_SCHEDULER, CTAS, DBLINK
  • Frontend: Oracle APEX for building reports and dashboards
  • Development Tools: Oracle SQL Developer, Oracle Data Modeler, Oracle Linux

Project Overview

In General

Components

Quick Summary

  1. Data Warehouse

List of All Technologies

Grouped by Category

Category Technologies
DatabaseOracle DB 12c for data storage and analytics.
ETLPL/SQL with DBMS_SCHEDULER, CTAS, and DBLINK for bulk processing.
FrontendOracle APEX for building reports and dashboards.
Development ToolsOracle SQL Developer, Oracle Data Modeler, Linux CentOS.

Components Specification

Detailed Information Grouped by Components

Table of Contents:

  1. Data Warehouse

Details
  1. Data Warehouse

    Design and implementation of a data warehouse for healthcare data, including the ETL pipeline and Oracle APEX reporting.
    Features
    • Data Warehouse design with healthcare-specific data models.
    • ETL pipeline built with PL/SQL for data extraction, transformation, and loading.
    • Oracle APEX for interactive dashboards and reports.
    • Optimized for performance with bulk data processing techniques.
    Tech Stack
    Category Description
    Database Oracle DB 12c for data storage and analytics.
    ETL PL/SQL with DBMS_SCHEDULER, CTAS, and DBLINK for bulk processing.
    Frontend Oracle APEX for building reports and dashboards.
    Development Tools Oracle SQL Developer, Oracle Data Modeler, Linux CentOS.

Data Sources

  • ICD Domain Dictionary: International Classification of Diseases (ICD). Link. Quantity: .
  • Dummy Data: Generated dummy data. Link. Quantity: .

Additional Information

  • Domain: Data Engineering, Healthcare IT
  • Status: 0
  • Keywords: PL/SQL Data Modeler DWH ETL Analytic SQL Oracle DB APEX SQL Developer

Thumbnail

Chart