Fatima-s_Portfolio

Fatima Nurmakhamadova - Data Analyst Portfolio

Hello! I am Fatima, and welcome to my portfolio! 🚀

About Me 🙋🏻‍♀️

Experienced Data Analyst | Proficient in Python, R, SQL, Power BI, and Tableau | Expertise in Advanced Data Analysis, Predictive Modeling, and Big Data Analytics |

🔍 Collaborated with the Head of Data Analytics at Free Float Media, contributing to automated algorithm development quantifying board members’ influence at global companies.

📊 Analyzed 1 billion+ records for market basket analysis, driving strategic sales insights.

💡 Master’s in Analytics from Northeastern University, GPA: 3.97/4.00.

🌐 Multilingual - Fluent in Russian, and Kyrgyz; Proficient in English, Arabic, and Turkish.

PROJECTS

Capstone Project: Predicting the Aftermath of Fraud on Board of Directors in the US (2017-2022)

DESCRIPTION:

A final capstone project sponsored by Free Float Media is an Empirical Study on Board Directors in US Companies (2017-2022) that I did with my teammates based on more than 10 data sets provided. The project goal is to analyze the structural aftermath of the director boards during their firms’ disclosed controversial events by predicting their likelihood of departure and finding the factors affecting it. The report covers various aspects of our research such as literature reviews, research methodologies and hypotheses, data analysis, modeling, and so on. The main question we intended to answer in this analysis is:

Q: “How each characteristic of the individual director influences the departure when their companies involve controversial events?”

What I did?

SKILLS:

Hypotheses Testing, EDA, Data Cleaning, Data Transformation, Feature Engineering, Descriptive Analysis, Visualization, Writing Functions, Predictive Modeling, Classification

TECHNOLOGY:

Power BI, Excel, Python: Pandas, Numpy, Matplotlib, Seaborn, Sklearn, Statsmodels, Pycaret

XN Project: Analysis & Visualization of the Equipment Utilization Using PowerBI

DESCRIPTION:

The experiential learning project sponsored by Simplex Solution is about creating dashboards to be integrated into the construction work management software for large contractors working for gas & electric utilities. This is a group project where I with my teammate (Xiaolu Shen) created dashboard reports and a set of drill-downs to help track and analyze equipment utilization. Most of the equipment is rented, while others are owned. The rent payment depends not on the time the equipment is being turned on (utilization time) but on the total working hours of the crew (billing hours). Therefore, visualizing data using Power BI allows the client to analyze data deeper and discover idle time and inefficiencies. This will help to create a better business strategy to optimize equipment utilization and find cost-saving opportunities. The main question we intended to answer in this analysis is:

Q: “Do all using equipment actually being utilized?”

What we did?

SKILLS:

EDA, Data Cleaning, Data Transformation, Descriptive Analysis, Dashboard Reports, Data Visualization, Utilization Analysis, Recommendations

TECHNOLOGY:

Power BI, Excel, Python

Market Basket Analysis for Online Retail Industry (Databricks)

Description:

A group project that I did with my teammates based on the data set from a UK-based online sales store. It contains records of more than 1 million transactions from January 12, 2009, to September 12, 2011. We created a Manual with instructions for doing data analysis, especially the Market Basket Analysis by using Apriori Algorithm for the retail industry. The purpose of this report is to help analysts who have no experience using the big data management tool – Databricks understand how to use it to tackle real big data. The main question we intended to answer in this analysis is:

“How to create a better product bundle sales strategy for online retail companies by using Market Basket Analysis?”.

What I did?

Skills: Data Cleaning, EDA, Data Transformation, Visualization, Market Basket Analysis, Association Rule Learning

Technology: Excel, Apache Spark, Databricks: Apriori Algorithm, PySpark, SQL

Predicting Causes for Chicago Traffic Crashes (Classification)

Description: A group project that I did with my teammate (Min-Chi Tsai) based on the real-data obtained from the Chicago Data Portal. The goal of the project is to help Chicago Police Department to predict the traffic crash type and understand the causes that lead to it. We are wondering if there are similar or common patterns that might help to predict the traffic crash. The main question we intended to answer in this analysis is “What factors affect the severity of the traffic crash type?”.

Skills: Data Cleaning, Data Analysis, Descriptive Statistics, Visualization, Writing Functions, Feature Engineering, Predictive Modeling, Classification

Technology: Python: Pandas, Numpy, Matplotlib, Seaborn, Datetime, Sklearn, Statsmodels, Xgboost

Models Comparison

Bike Sharing Analysis - Tableau

Description: A group project that I did with my teammates based on the data set about the bike rental information collected mainly from Washington D.C in 2011 and 2012. This project aims to analyze the demand changes for bike-sharing by different periods and weather conditions as well as the different behavior patterns of the 2 user types - casual users and registered users. Based on the findings, we also provided suggestions to the bike-sharing company from a product and marketing manager perspective. Here we raised questions from 2 aspects that would help us understand the user behaviors of bike sharing:

  1. What are the time patterns of bike-sharing? Any difference between the 2 user types?
  2. How do the weather conditions influence the number of bike rentals regarding specific user types?

Skills: EDA, Descriptive Statistics, Visualization, Time-Series Analysis, Frequency Distribution

Technology: Tableau: Heatmap, Bar Chart, Line Chart, Time-Series

Predictive Analysis of Telco Customer Churn using ML Models - R

Description: A group project that I did with my teammates based on the data obtained from Kaggle. The goal of the project is to help Telecom Company to predict the churn rate of the telecom company, and understand the causes that lead to it. This is done by building 2 supervised machine learning models: Logistic Regression, and LASSO Regularization Regression. The main question we intended to answer in this analysis is “What are significant predictor variables that affect the Telco customers churn rate?”.

Skills: Data Cleaning, Data Analysis, Descriptive Statistics, Visualization, Writing Functions, Feature Engineering, Predictive Modeling, Classification

Technology: R: caret, ggplot2, gridExtra, pROC, psych, dplyr, tidyverse

LR Model fitting on the Training Set