Piyush Verma

Logo
Data Scientist, Recovering Runner

Email
Resume
LinkedIn
RPub
TableauPublic
GitHub
StackOverFlow
HackerRank


About Me     Skills     Professional Experience     Projects


Music recommendation based on collaborative filtering
R, Collaborative Filtering: user-based, content-based, singular value decomposition
As the part of the capstone project, I used Last.FM data, a music streaming website, which contained data about a user’s preference in music. Compared different recommenders (User-Based, Item-Based, and Singular Value Decomposition, SVD) to recommend new artists to a user based on his/her similarity in music taste with other similar users. User-Based Collaborative Filtering was found to be giving better recommendations. Check out the final report here.

Claim risk analytics for an insurance company
R, Logistic Regression, Missing Value Imputation, XGBoost
Part of a data challenge to identify potentially risky new insurance policies based on factors like claim history and customer’s demographics. Also to segment risky customers into several segments based on their risk values so as to target for a marketing campaign.

Customer Segmentation for a retail supermarket
R, SQL, Tableau, K-means Clustering, Customer Value Model
Used Partition Around Medoids realisation of K-medoids clustering algorithm (similar to K-Means algorithm) to perform customer segmentation of the customers (households) of a US-based supermarket. The data had 3 parts: transaction data (at the product level), demographic data (of the customers) and product details data. First, the Customer Value Model was made to give each household three attributes: Recency, Frequency and Monetary and then clustering was performed.
Code files can be found here.
Tableau visualization can be found here.

Predicting text using N-Grams
R, R Shiny, N-Grams, Text Mining
Built an interactive R Shiny web application where a user can enter a string of text and the application would predict the next word. The algorithm used here is Katz Back-Off which uses the conditional probability of a N-Gram. This was done as a capstone project for a 10 course Data Science Specialization certificate from John Hopkins University in Coursera.

Classification of dysfunctional stores
R, K-means clustering, Hypothesis Testing, HR Analytics
Worked on a case study project with a retail client in Cincinnati. The aim of the project was to identify its potential stores which may go dysfuntional in the near future because of any kind of activities related to dissatisfied employees. The data used for the analysis was at the store level having attributes related to employees. The recommendations given to the client were well received and were in-line with their expectations.

Classification of the type of exercise based on fitness device data
R, random forest, linear discriminant analysis and gradient boosting model
Used classification techniques to identify the quality of the exercise (Best > Good > Medium > Bad > Worst) based on data collected from fitness devices.

Process simulation study of order processing at Starbucks University of Cincinnati
Arena software, Statistics, Simulation modelling
Used Arena software to simulate the order processing at the Starbucks. The study suggested recommending an additional beverage server during peak hours (11:30 AM - 1:00 PM) for reducing customer’s total waiting time in the system from 9.6 minutes to 1.8 minutes (An improvement of 7.8 minutes). Addition of an extra cashier or food server had no effect.