Karen Farbman

Data Science

Specialties

Hadoop and Spark
Elasticsearch
Machine Learning
Clustering

Education

Galvanize Data Science Immersive
B.S. Physics, University of Rochester
B.A. Music, University of Rochester

Location

Colorado

Recent Projects

Large Real Estate Analytics Company
Clustering Real Estate Data
Implemented several different clustering approaches to organize and categorize real estate data. Implemented custom pyspark clustering routines to perform the clustering w.r.t. to the size and complexity of the data.

Online Music Download Service
Music Search Engine
Revamped an existing search system for music tracks by using Elasticsearch and custom indexers to handle misspelled tracks and complex artist names with atypical tokenization. Incorporated popularity metrics into the indexing scheme to boost search results and improve retrieval relevancy.

Technical Expertise

  • Numpy, Pandas, Scikit-learn, Matplotlib, Scipy, Pyspark, NLTK
  • Elasticsearch
  • Big Data: Spark, MLlib, Hadoop, AWS, EMR, SQL
  • ML Techniques: NLP, PCA, SVD, SVM, Gradient Boosting, Neural Networks (MLP, CNN, RNN), Regression, Random Forests
  • Clustering: KNN, K-Means, Hierarchical, Density-based