Analyzing 21M Hospital Records

Comprehensive ETL, data warehousing, processing & visualization project.

Project info


Data Engineering & Analysis

2024/07/01

Introduction

A brief project description or quote to summarize its goals.

“Project analyzing 21,194,349 hospitalization records from Poland's NFZ. Involves ETL, data warehousing, and visualization using SQL, Shell Scripts, SQLLoader, and Apache Superset.”
— Arek

Explore the Project

Tips

Tech details - Go to general technical info about components (on this page)

GitHub - View the project repository

Docs - Open detailed documentation


Project Overview

In General

Components

Quick Summary

  1. ETL Pipeline
  2. BI Visualization
  3. Exploratory data analysis

List of All Technologies

Grouped by Category

Category Technologies
BackendOracle SQL ✦ Shell Scripting ✦ Docker ✦ OVM
ReportsApache Superset
ToolSQLcl ✦ SQL Developer
IDEVisual Studio Code

Components Specification

Detailed Information Grouped by Components

Table of Contents:

  1. ETL Pipeline
  2. BI Visualization
  3. Exploratory data analysis

Details
  1. ETL Pipeline

    Extract, transform, load data from CSV to Oracle DB.
    Use Cases
    • Data extraction from NFZ API.
    • Data transformation and cleansing using SQL.
    Tech Stack
    Category Description
    Backend Oracle SQL
    Backend Shell Scripting
  2. BI Visualization

    Create business intelligence dashboards with Apache Superset.
    Use Cases
    • Visualize hospitalization trends.
    Tech Stack
    Category Description
    Reports Apache Superset
    Tool SQLcl
    Backend Docker
    Backend OVM
  3. Exploratory data analysis

    Create EDA queries
    Use Cases
    Tech Stack
    Category Description
    Reports Apache Superset
    Tool SQLcl
    IDE Visual Studio Code
    Tool SQL Developer

Data Sources

  • Dane.gov.pl: 'Dane dotyczące hospitalizacji rozliczonych JGP w latach 2019-2021 (contains hospitalizations 2017, 2018)'. Link. Quantity: >20M.
  • Dane.gov.pl: 'Dane dotyczące hospitalizacji rozliczonych JGP w latach 2022'. Link. Quantity: >20M.
  • CEZ Polish HL7 Impl.: Discharge and admision modes. Link. Quantity: ~10.

Additional Information

  • Domain: Healthcare IT, Data Engineering
  • Status: in_progress
  • Keywords: SQL Shell Scripting Apache Superset SQLLoader ETL BI Healthcare IT

Thumbnail

Main dashboard