Hadoop and Spark
Galvanize Data Science Immersive
B.S. Physics, University of Rochester
B.A. Music, University of Rochester
Large Real Estate Analytics Company
Clustering Real Estate Data
Implemented several different clustering approaches to organize and categorize real estate data. Implemented custom pyspark clustering routines to perform the clustering w.r.t. to the size and complexity of the data.
Online Music Download Service
Music Search Engine
Revamped an existing search system for music tracks by using Elasticsearch and custom indexers to handle misspelled tracks and complex artist names with atypical tokenization. Incorporated popularity metrics into the indexing scheme to boost search results and improve retrieval relevancy.
- Numpy, Pandas, Scikit-learn, Matplotlib, Scipy, Pyspark, NLTK
- Big Data: Spark, MLlib, Hadoop, AWS, EMR, SQL
- ML Techniques: NLP, PCA, SVD, SVM, Gradient Boosting, Neural Networks (MLP, CNN, RNN), Regression, Random Forests
- Clustering: KNN, K-Means, Hierarchical, Density-based