Karen Farbman

Data Science


Hadoop and Spark
Machine Learning


Galvanize Data Science Immersive
B.S. Physics, University of Rochester
B.A. Music, University of Rochester



Recent Projects

Large Real Estate Analytics Company
Clustering Real Estate Data
Implemented several different clustering approaches to organize and categorize real estate data. Implemented custom pyspark clustering routines to perform the clustering w.r.t. to the size and complexity of the data.

Online Music Download Service
Music Search Engine
Revamped an existing search system for music tracks by using Elasticsearch and custom indexers to handle misspelled tracks and complex artist names with atypical tokenization. Incorporated popularity metrics into the indexing scheme to boost search results and improve retrieval relevancy.

Technical Expertise

  • Numpy, Pandas, Scikit-learn, Matplotlib, Scipy, Pyspark, NLTK
  • Elasticsearch
  • Big Data: Spark, MLlib, Hadoop, AWS, EMR, SQL
  • ML Techniques: NLP, PCA, SVD, SVM, Gradient Boosting, Neural Networks (MLP, CNN, RNN), Regression, Random Forests
  • Clustering: KNN, K-Means, Hierarchical, Density-based