I am a data enthusiast with experience wrangling, modeling and visualizing data in its many shapes and forms. Aside from data-related ventures, you can catch me jamming on my Gibson LP Standard or shamelessly singing some musical theater tunes. I also love messing with bicycles, currently rocking an '86 Trek 620.
Areas of ownership:
Consumer Electronics | Automotive Integrations | Collections | Offline & Downloads
Predictive Maintenance: Implemented various anomaly detection methods to quantify the state-of-health of lead-acid car batteries and forecast battery death as much as two months in advance.
Dealership Master Tables: Built and managed PySpark ETL pipelines for source-of-truth hive tables tracking all dealership partnership data which also fed various internal KPI dashboards.
Dealer Insights: Developed and productionized a consumer-facing visualization dashboard providing dealerships with actionable, data-driven insights to increase their customer retention.
Customer LTV model: Built a multi-class classification random forest model to predict the lifetime value of our customers with 96% accuracy and a Micro-Avg. F1 score of 0.96.
Venue data enrichment: Deduplicated over 5 million venue addresses using a HMM and fuzzy string similarity metrics resulting in venue data assets to be used in predictive models and other analyses. Implemented the search ranking methodology for our venue marketplace using these assets.
Customer insights: Collaborated with the Product team to define metrics and performed statistical analyses that uncovered product features that are highly correlated to the success of our customers. Presented insights to the company and impacted the product roadmap.
Created a real-time web-app for the Decision Desk at ABC News in election forecasting. First team to call multiple 2018 US midterm races by building a proprietary mathematical model. More info.
Developed a rule-based anomaly detection dashboard that identified important features and the optimal way to visualize each feature.
Researched the most effective aesthetic choices in building scatter plots. Conducted a MTurk study that emulated scatter plots created by five common visualization tools and asked users to perform various interactive tasks.
Analyze and aggregate climate data from various sources and provide visualizations. Assist in the development of an open-sourced tool, allowing composers and enthusiasts to create climate data-driven music.
Advance Machine Learning · Deep Learning · Experimental Design · Distributed Computing · Time Series Analysis · Linear Regression · Bayesian Statistics
Overall GPA: 3.81 · Major GPA: 3.66
Probability · Statistics · Linear Regression · Numerical Linear Algebra · Stochastic Processes · Statistical Learning · Data Mining · Databases · Data Visualization