Hello! I am Fatima, and welcome to my portfolio! 🚀
Experienced Data Analyst | Proficient in Python, R, SQL, Power BI, and Tableau | Expertise in Advanced Data Analysis, Predictive Modeling, and Big Data Analytics |
🔍 Collaborated with the Head of Data Analytics at Free Float Media, contributing to automated algorithm development quantifying board members’ influence at global companies.
📊 Analyzed 1 billion+ records for market basket analysis, driving strategic sales insights.
💡 Master’s in Analytics from Northeastern University, GPA: 3.97/4.00.
🌐 Multilingual - Fluent in Russian, and Kyrgyz; Proficient in English, Arabic, and Turkish.
Python_Capstone_Fatima
EDA_PowerBI_Capstone_FatimaN.pdf
PPT_Capstone_Group7.pdf
Report_Capstone_Group7.pdf
DESCRIPTION:
A final capstone project sponsored by Free Float Media is an Empirical Study on Board Directors in US Companies (2017-2022) that I did with my teammates based on more than 10 data sets provided. The project goal is to analyze the structural aftermath of the director boards during their firms’ disclosed controversial events by predicting their likelihood of departure and finding the factors affecting it. The report covers various aspects of our research such as literature reviews, research methodologies and hypotheses, data analysis, modeling, and so on. The main question we intended to answer in this analysis is:
Q: “How each characteristic of the individual director influences the departure when their companies involve controversial events?”
What I did?
SKILLS:
Hypotheses Testing, EDA, Data Cleaning, Data Transformation, Feature Engineering, Descriptive Analysis, Visualization, Writing Functions, Predictive Modeling, Classification
TECHNOLOGY:
Power BI, Excel, Python: Pandas, Numpy, Matplotlib, Seaborn, Sklearn, Statsmodels, Pycaret
EDA_PowerBI_Capstone_FatimaN.pdf
DESCRIPTION:
The experiential learning project sponsored by Simplex Solution is about creating dashboards to be integrated into the construction work management software for large contractors working for gas & electric utilities. This is a group project where I with my teammate (Xiaolu Shen) created dashboard reports and a set of drill-downs to help track and analyze equipment utilization. Most of the equipment is rented, while others are owned. The rent payment depends not on the time the equipment is being turned on (utilization time) but on the total working hours of the crew (billing hours). Therefore, visualizing data using Power BI allows the client to analyze data deeper and discover idle time and inefficiencies. This will help to create a better business strategy to optimize equipment utilization and find cost-saving opportunities. The main question we intended to answer in this analysis is:
Q: “Do all using equipment actually being utilized?”
What we did?
SKILLS:
EDA, Data Cleaning, Data Transformation, Descriptive Analysis, Dashboard Reports, Data Visualization, Utilization Analysis, Recommendations
TECHNOLOGY:
Power BI, Excel, Python
Final Databricks_ALY6110_GroupEpsilon.pdf
Final Report_ALY6110_GroupEpsilon.pdf
Final PPT_ALY6110_GroupEpsilon.pdf
Description:
A group project that I did with my teammates based on the data set from a UK-based online sales store. It contains records of more than 1 million transactions from January 12, 2009, to September 12, 2011. We created a Manual with instructions for doing data analysis, especially the Market Basket Analysis by using Apriori Algorithm for the retail industry. The purpose of this report is to help analysts who have no experience using the big data management tool – Databricks understand how to use it to tackle real big data. The main question we intended to answer in this analysis is:
“How to create a better product bundle sales strategy for online retail companies by using Market Basket Analysis?”.
What I did?
Skills: Data Cleaning, EDA, Data Transformation, Visualization, Market Basket Analysis, Association Rule Learning
Technology: Excel, Apache Spark, Databricks: Apriori Algorithm, PySpark, SQL
Final Python_ALY6140_group5.ipynb
Final Report_ALY6140_group5.pdf
Final PPT_ALY6140_group5.pdf
Description: A group project that I did with my teammate (Min-Chi Tsai) based on the real-data obtained from the Chicago Data Portal. The goal of the project is to help Chicago Police Department to predict the traffic crash type and understand the causes that lead to it. We are wondering if there are similar or common patterns that might help to predict the traffic crash. The main question we intended to answer in this analysis is “What factors affect the severity of the traffic crash type?”.
Skills: Data Cleaning, Data Analysis, Descriptive Statistics, Visualization, Writing Functions, Feature Engineering, Predictive Modeling, Classification
Technology: Python: Pandas, Numpy, Matplotlib, Seaborn, Datetime, Sklearn, Statsmodels, Xgboost
Tableau
Final Tableau Dashb_ALY6070_Group 7.twbx
Final Report_ALY6070_Group 7.pdf
Final PPT_ALY6070_Group 7.pdf
Description: A group project that I did with my teammates based on the data set about the bike rental information collected mainly from Washington D.C in 2011 and 2012. This project aims to analyze the demand changes for bike-sharing by different periods and weather conditions as well as the different behavior patterns of the 2 user types - casual users and registered users. Based on the findings, we also provided suggestions to the bike-sharing company from a product and marketing manager perspective. Here we raised questions from 2 aspects that would help us understand the user behaviors of bike sharing:
Skills: EDA, Descriptive Statistics, Visualization, Time-Series Analysis, Frequency Distribution
Technology: Tableau: Heatmap, Bar Chart, Line Chart, Time-Series
Final R Script_ALY6015_Alpha.R
Final PPT_ALY6015_Alpha.pdf
Description: A group project that I did with my teammates based on the data obtained from Kaggle. The goal of the project is to help Telecom Company to predict the churn rate of the telecom company, and understand the causes that lead to it. This is done by building 2 supervised machine learning models: Logistic Regression, and LASSO Regularization Regression. The main question we intended to answer in this analysis is “What are significant predictor variables that affect the Telco customers churn rate?”.
Skills: Data Cleaning, Data Analysis, Descriptive Statistics, Visualization, Writing Functions, Feature Engineering, Predictive Modeling, Classification
Technology: R: caret, ggplot2, gridExtra, pROC, psych, dplyr, tidyverse