Posts with the tag: Data Analysis
Etherium Token Recommender
Read in 1 minute ·Etherium Token Recommender When using Ethereum, users hold a unique wallet, within each wallet the user holds tokens that seem attractive to the user. Project Goal Generate top 5 relevant tokens based on collaborative filtering from the Ethereum Blockchain Genesis. Tools PySpark DataBricks Delta Lake S3 MLFlow Github Project Link
Posts
Read in 0 minute ·Metropolis Hastings Airlines
Read in 1 minute ·Metropolis Hastings Airlines An airline company is interested in examining vaccination trends among its travelers. It ultimately wants to know what percentage of its travelers are 1) full vaccinated and boosted, 2) fully vaccinated but not boosted, 3) partially vaccinated, or 4) not vaccinated (i.e. there are K = 4 groups). The company wants to examine such vaccination trends both for domestic and international flights (i.e. J = 2 groups). They collect data from a random sample of travelers on recent flights and have compiled it in the table below:
Graduation Rate Predictor
Read in 1 minute ·Graduation Rate Predictor The aim of this project is to model the way government expenditures and labor appropriation impacts secondary education graduation rates in New York State Public Schools. Our experiments show that diminishing returns are not present in funding, rather the educational staff’s quality affects graduation rates. Our highest performing model predicted graduation rate SVR with a median squared error of 3.863. Github Project Link
Amazon Recommendation Classification
Read in 1 minute ·Amazon Recommendation Classification Amazon curates the buying experience for each user utilizing advanced algorithms and frequent item-set techniques to drive revenue. In addition to recommendation algorithms, pessimistic or interested buyers will consult the reviews posted below a product to gauge whether the product is a “smart” purchase. Our goal is to accurately classify the review score as function of review summary and text. We utilized NLP techniques, such as Non-Negative Matrix Factorization (NMF), Latent Dirichlet Allocation (LDA), and Term Frequency–Inverse Document Frequency (TF-IDF) to classify the Amazon reviews.
Congressional Tweet Classification
Read in 1 minute ·Congressional Tweet Classification We extract, transform, and analyze over 857,000 records to classify a tweet’s owner as a Democrat or Republican. We utilized the Logistic regression technique that exhibited 88.884 percent accuracy. We conclude that a tweet’s content can reveal the owner as Democrat or Republican. Github Project Link
Heart Illness Classification AutoEncoders
Read in 2 minutes ·Heart Illness Classification AutoEncoders In this project, you will work with LSTM-based autoencoders to classify human heart beats for heart disease diagnosis. The dataset contains 5,000 Time Series examples with 140 timesteps. Each time-series is an ECG or EKG signal that corresponds to a single heartbeat from a single patient with congestive heart failure. An electrocardiogram (ECG or EKG) is a test that checks how your heart is functioning by measuring the electrical activity of the heart.
BPL Player Type Price Optimization
Read in 2 minutes ·British Premier League Player Type Price Optimization This project seeks to explore the frequent line-ups of successful & unsuccessful clubs constrained by finances. To investigate this problem, the researchers (1) built a game week over game week linear optimized model, (2) used actual club squad rosters, and conducted (3) dissimilarity analysis by drawing a network exhibiting the distance between maximized frequent itemsets and minimized itemsets. The researchers clustered the players based on position, market value, & season points contributed; thus, appropriating a Gold, Silver, or Bronze tier to each player in a given season.
Time-Series SARIMAX
Read in 1 minute ·Time-Series SARIMAX I analyzed Autocorrelation Functions (ACF) & Partial Autocorrelation Functions (PACF) to determine the ARIMA models' order. Additionally, I conducted necessary differencing to preprocess the data into the stationary assumption. In addition to appraisal processing, I conducted a graph brute force search to identify the best model order with the smallest AIC & BIC metrics. Lastly, I executed a predictive analysis with 95% confidence to forecast the succeeding year’s revenue.
Sandy Bartender Dashboard
Read in 1 minute ·Beach Side Bar Database Management System Welcome to Beach Side Bar, a drinking experience that topples the great Jon Taffer. At Beach Side Bar we pride ourselves on data driven decision-making, we collect data on each transaction through our POS (Point of Sale) devices that inserts records into our data management software - Sandy. Sandy enables our business to thrive by determining inventory cycles, forecasting sales, & ensure bartenders are serving at the epitome level.