Featured Project: Medical Insurance Fraud Detection
Built a machine learning model using Medicare datasets to detect fraudulent claims.
Objective
Develop a big data and machine learning model to detect fraudulent activities in Medicare claims, reducing costs and improving compliance in the healthcare system.
Approach
Integrated multiple public datasets (CMS Part D, LEIE, and physician payment records), performed data cleansing and feature engineering, and applied classification algorithms including Random Forest, with emphasis on anomaly detection and geo-demographic analysis.
Results
Built a fraud detection model where Random Forest achieved the best performance with an AUC of 72%, successfully identifying fraud patterns across providers, patients, and regions, highlighting significant fraud concentration in the Bay Area.
Project Files
Datasets:
- CMS Part D Datasets: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Part-D-Prescriber.html
- LEIE Datasets: https://oig.hhs.gov/exclusions/exclusions_list.asp
- FDA Datasets https://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm#collapseOne
- Dataset Downloads. (n.d.). Retrieved June 23, 2020, from https://www.cms.gov/OpenPayments/Explore-the-Data/Dataset-Downloads
PDF – Project Summary: Comprehensive report outlining objectives, datasets, methodology, and key findings from the Medicare Fraud Detection project
PPTX – Presentation:
Slide deck summarizing project motivation, approach, visual insights, and results for a professional audience.
How to View and Understand the Project:
-
Start with the PDF Summary
Read the Project Summary PDF to understand the objectives, datasets, methodology, and key findings. This document gives you the overall context before diving into details. -
Explore the Jupyter Notebook (IPYNB Code File)
Open the IPYNB Notebook to review the full data exploration, feature engineering, and machine learning implementation. For convenience, both the live notebook link and a PDF are available. -
Review the Presentation Slides (PPTX)
Finally, go through the Presentation PPTX for a concise, visual overview of the project motivation, analytical approach, and main results—ideal for quick understanding.
👉 Recommended order: PDF → IPYNB → PPTX for the most complete learning experience.