Analyzing 21M Hospital Records
Comprehensive ETL, data warehousing, processing & visualization project.
Project info
Data Engineering & Analysis
2024/07/01
Introduction
A brief project description or quote to summarize its goals.
“Project analyzing 21,194,349 hospitalization records from Poland's NFZ. Involves ETL, data warehousing, and visualization using SQL, Shell Scripts, SQLLoader, and Apache Superset.”
Explore the Project
Tips
Tech details - Go to general technical info about components (on this page)
GitHub - View the project repository
Docs - Open detailed documentation
Project Overview
In General
Components
Quick Summary
- ETL Pipeline
- BI Visualization
- Exploratory data analysis
List of All Technologies
Grouped by Category
Category | Technologies |
---|---|
Backend | Oracle SQL ✦ Shell Scripting ✦ Docker ✦ OVM |
Reports | Apache Superset |
Tool | SQLcl ✦ SQL Developer |
IDE | Visual Studio Code |
Components Specification
Detailed Information Grouped by Components
Table of Contents:
- ETL Pipeline
- BI Visualization
- Exploratory data analysis
Details
-
ETL Pipeline
Extract, transform, load data from CSV to Oracle DB.
Use Cases
- Data extraction from NFZ API.
- Data transformation and cleansing using SQL.
Tech Stack
Category Description Backend Oracle SQL Backend Shell Scripting -
BI Visualization
Create business intelligence dashboards with Apache Superset.
Use Cases
- Visualize hospitalization trends.
Tech Stack
Category Description Reports Apache Superset Tool SQLcl Backend Docker Backend OVM -
Exploratory data analysis
Create EDA queries
Use Cases
Tech Stack
Category Description Reports Apache Superset Tool SQLcl IDE Visual Studio Code Tool SQL Developer
Data Sources
- Dane.gov.pl: 'Dane dotyczące hospitalizacji rozliczonych JGP w latach 2019-2021 (contains hospitalizations 2017, 2018)'. Link. Quantity: >20M.
- Dane.gov.pl: 'Dane dotyczące hospitalizacji rozliczonych JGP w latach 2022'. Link. Quantity: >20M.
- CEZ Polish HL7 Impl.: Discharge and admision modes. Link. Quantity: ~10.
Additional Information
- Domain: Healthcare IT, Data Engineering
- Status: in_progress
- Keywords: SQL Shell Scripting Apache Superset SQLLoader ETL BI Healthcare IT