Hello, and thanks for visiting my webpage. I am a highly experienced data scientist with 10+ years of experience in quantitative and data analytics.
I have significant experience in the full machine learning lifecycle: data gathering and analysis, modeling and production.
On this page you'll find an overview of my expertise, data science projects, the tools I used and links to the underlying code and data.Experince with a wide variety of machine learning techniques including: classification methods, clustering methods, regression techniques, neural networks, ensemble and tree methods, logistic and linear regression, support vector machines.
Sentiment analysis, language models, topic modeling and deep learning.
Expertise in the full data science stack including: Pandas, Scikit-learn, Tensorflow, Keras
Amazon Web Serices and Google Cloud Platform experience.
Worked with Hadoop, Spark, PySpark in large organizations to implement modeling and big data solutions.
Below are some of the projects that I've worked on using a variety of data analysis and machine learning tools and techniques. They cover a variety of subjects and tools. Links to the github repo with the jupyter notebook and the original data sources are provided.
Using a corpus of over 1 million news articles we developed a tensorflow based model to summarize news headlines. The model used cutting edge Python libraries for natural language processing and AI.
In this project I used Google cloud platform to determine the usage of Ford Go Bikes in SF. The main focus of the project was the usage of bikes and the level of usage during commuter hours. Visualizations were done in matplotlib.
In this project I used gradient boosting machines to predict house prices on a Kaggle dataset. I explored three techniques: linear regression (as a baseline), random forests and gradient boosting machines. The notebook with the analysis and results is available on the github.
Using publicly available tax revenue data at the zip code level, we explored the distribution of income tax. The analysis explores which zip codes pay the highest and lowest income, which states generate the most tax revenue and contribute the highest level of per capita tax revenue.
Using data from Kaggle this project looks at how to predict the type of a tree in a forest from topographic attributes of the region. We explored several classifiers including logistic regression, multi-layer neural networks, support vector machines, random forests and gradient boosting trees.