Galvanize Denver Data Science Capstone Projects – January 2016


Students are preparing to showcase their capstone projects to an audience of Galvanize community members, including hiring partners, members, peers, and staff. Here are brief descriptions from Galvanize students about what they’ve been hard at work on.

Analyzing medical device data to improve instrument performance: Gary Vanzin

In collaboration with a global medical device manufacturer, I’m analyzing instrument performance data from blood collection devices. From this analysis, we hope to gain insight at the patient/instrument interface, and learn methods to improve the patient experience as well as the blood collection process.  This problem involves overcoming big data ETL issues by bridging the gap between proprietary and open source tools.

Predicting Custom Shirt Sizes using Machine Learning: Kelsey Hanzlik

For my project, I teamed up with a local menswear clothing company, Ratio Clothing, to improve their current model for predicting shirt sizes. Ratio’s business model focuses on crafting the perfect fitting, custom dress shirt without the need for a tailor. To do this, they use the customer’s survey answers and correlated measurements. Through machine learning techniques, I helped optimize Ratio’s model by reducing the error of their estimates to allow them to make more accurate predictions.

Mitosis Detection in Breast Cancer Histology Images: Pooja Ramesh

Breast cancer grading plays an important role in predicting the aggressiveness of the disease. A key component in breast cancer grading is mitosis count (quantifying the number of cells in the process of diving at a given time). Currently, pathologists in the labs manually detect and count mitosis.

The goal of this project  is to bring the power of machine learning to the field of pathology and provide a consistent tool and diagnostic aid that relieves pathologists of this tedious task. In this project, I leverage digital pathology images and convolutional neural networks to learn features of cells undergoing mitosis and detect them. The unique feature of this approach is that it detects mitosis cells and non-mitosis cells that might not be distinguishable to the human eye.

Predicting and Simulating Wave Energy along the Northern California Coast: Christina Kestler

One of the major issues in integrating renewable energy into the electrical grid is uncertainty with short-term forecasts.  Knowing in advance when an increase in electricity is on the way is an important part of integrating any renewable energy resource into existing electricity grids. While wind and solar have been leading sources of renewable energy up until now, waves are increasingly being recognized as a viable source of power for coastal regions. It turns out, the energy produced by waves is a lot easier to forecast than energy produced by wind or the sun.

In my project, I use data from the National Oceanic and Atmospheric Administration to predict certain wave properties that dictate the amount of usable energy generated by a specific wave energy converter, The Pelamis.  I will also analyze past data to see how much energy could have been contributed to California’s electrical grid with one of these wave energy ‘farms’ consisting of one, two, or even forty of these machines.

Predicting Removal of Stocks from the S&P 500: Matt Sherrell

I am building a model that predicts when stocks will be removed from the S&P 500. Multiple machine learning techniques were used to predict stock removal.   The analysis compares traditional machine learning techniques such as random forest and traditional forecasting techniques such as ARIMA as potential models. The S&P 500 removes stocks from its index depending on factors such as its market cap and merger and acquisition activity.  These and other features will be incorporated into these models.

Congressional Bill Modeling: Samuel Sherman

Using The Sunlight Foundation Congress API, GovTrack, and websites, I web scraped congressional bill text and retrieved JSON data for votes and bills since congress 103 (1993). I then applied Non-Negative Matrix Factorization to derive latent topics from the bill text and modeled the prevalence of different topics since congress 103. Using the important text features and data obtained from JSON files, applied Random Forest and Gradient Boosted ensemble methods to predict the percent of yes votes belonging to a specific party. I then applied similar models to predict whether a bill will reach the floor for a vote.

Real Time Picture Quality Rating from a Agricultural Drone Flight: Nancy Abramson

A drone flight takes snapshots of a field to target crop quality. The quality of the picture is influenced the smoothness of the flight which is reflected in the drone’s time series telemetry logs. Using these logs and blending in third party weather data predictive models were compared to find the best fit for predicting a flight’s quality score. I implemented a predictive model using the drone telemetry data and location weather to provide an on-site picture quality estimate.  Obtaining this on-site information allows for a re-run of the flight.  Currently, results are not obtained until leaving the site, which necessitates returning to the site.

Predicting User Age from Craigslist Personals Postings: Annie Gillan

On a given day, there are over 500,000 personals posted across the US. For my project I am using craigslist personals data to identify key features in predicting a user’s age. Processing methods applied to the text involve detecting native language, use of slang, and writing style. I am also applying clustering techniques to make matches between users based on their post.

Predicting Location from Google Street View Images in Colorado: Jesse Lieman-Sifry

The emerging field of feature recognition in images is revolutionizing the way computers understand the world around us. Inspired by geoguessr, my project uses convolutional neural networks to discern relevant features that correspond to geographic locations in Colorado. This type of modeling has applications for self-driving cars, where maintaining a keen sense of environment is vitally important. Distinguishing canyon roads from local streets and a clear day from a rainy one will be integral in making smarter autonomous vehicles.

Want more data science tutorials and content? Subscribe to our data science newsletter.

[[formassembly formid=432325]]