The Bias Variance Tradeoff

Foundational to any data science curriculum is the introduction of the terms bias and variance, and subsequently the trade-off that exists between the two. As machine learning continues to grow it is imperative that we understand these concepts, as they directly effect the predictions we make and the business value we can derive from our generated models. While machine learning may seem simple, one of the more difficult parts is optimizing your models but sometimes optimization can lead to over-fitting and if your model is too simple it may be under-fitting your data....

June 17, 2020 · 4 min

And Now for Something Completely Different

On January 21st, 2020, I enrolled in Flatiron School’s Data Science Bootcamp with the intention of gaining and developing the foundational skills and techniques necessary to become a Data Scientist. At the time of writing, I’m about 4 months into the program and in retrospect, I believe my decision to enroll was one of the best choices I’ve made. Along with the opportunities that will be available to me when I finish, the passionate and intelligent peers that I get to collaborate with, and the breadth of exciting, new, and challenging material I’m learning, I am glad I made the decision to learn data science....

May 2, 2020 · 3 min

Virtual Environments with Python

Similar to other programming languages (R, Ruby, Scala, JavaScript) Python comes with its own way of managing third party packages you choose to install for projects. And since Python 3.4, pip has been included by default in all binary installations of Python, allowing users to install packages from the Python Packaging Index (a public repository of open source licensed packages). However, there is one major shortcoming of the way packages are managed, and that is all packages get installed and retrieved from the same place....

March 17, 2020 · 8 min
Some texture

High Level Overview of Quantile Quantile Plots

A part of any data analyst’s toolkit when working with one dimensional data, is the Quantile Quantile plot. Colloquially referred to as Q-Q plots, these visualizations are unique in that they’re mainly utilized when comparing samples and/or comparing distributions. Although they’re not intuitive, Q-Q plots are amazing tools, especially when assessing whether a sample fits a known distribution, like the Gaussian distribution. Q-Q plots work simply by plotting the quantiles of one distribution (x-coordinate), typically a theoretical distribution, against the quantiles of another distribution (y-coordinate), typically an observed dataset....

February 1, 2020 · 3 min