Sign in

Articles about Data Science and Machine Learning | @carolinabento

Getting Started

Decision Tree is a Supervised Machine Learning Algorithm that uses a set of rules to make decisions, similarly to how humans make decisions.

Image by author.

This is article number one in a series dedicated to Tree Based Algorithms, a group of widely used Supervised Machine Learning Algorithms.

Stay tuned if you’d like to see Decision Trees, Random Forests and Gradient Boosting Decision Trees, explained with real-life examples and some Python code.

Decision Tree is a Supervised Machine Learning Algorithm that uses a set of rules to make decisions, similarly to how humans make decisions.

One way to think of a Machine Learning classification algorithm is that it is built to make decisions.

You usually say the model predicts the class of the new, never-seen-before input…


Hands-on Tutorials

Image by author.

Several phenomena in the real world can be represented as counts of things. For example, the number of flights departing from an airport, number customers lining up at the store register, the number of earthquakes occurring in a year at a specific region.

Counting events is a relatively simple task, but if you want to go from just counting the occurrence of events to asking questions about how likely are these events to happen in a specific unit of time, you need more powerful tools like the Poisson distribution.

The Poisson distribution models the probability that a given number of…


Image by author.

Logistic regression is a machine learning classification model with quite a confusing name!

The name makes you think about Linear Regression, but it’s not used to predict an unbounded, continuous outcome. Instead, it is a statistical classification model, it gives you the likelihood that an observation belongs to a specific class.

Logistic regression is used across many scientific fields. In Natural Language Processing (NLP), it’s used to determine the sentiment of movie reviews, while in Medicine it can be used to determine the probability of a patient developing a particular disease.

Classifying your daily productivity

Lately you’ve been interested in gauging your productivity. Not…


HANDS-ON TUTORIALS

Your dog’s nap time as a regularized linear model

Image by author

When you’re building a machine learning model you’re faced with the bias-variance tradeoff, where you have to find the balance between having a model that:

  1. Is very expressive and captures the real patterns in the data.
  2. Generates predictions that are not too far off from the actual values,

A model that is very expressive has a low bias, but it can also be too complex. While a model that generates predictions that aren’t too far off from the true value has low variance.

Overfitting

When the model is too complex and tries to encode more patterns from the training data than…


Markov defined a way to represent real-world stochastic systems and processes that encode dependencies and reach a steady-state over time.

Image by Author

Andrei Markov didn’t agree with Pavel Nebrasov, when he said independence between variables was necessary for the Weak Law of Large Numbers to be applied.

The Weak Law of Large Numbers states something like this:

When you collect independent samples, as the number of samples gets bigger, the mean of those samples converges to the true mean of the population.

But Markov believed independence was not a necessary condition for the mean to converge. So he set out to define how the average of the outcomes from a process involving dependent random variables could converge over time.

Markov chain: a random chain of dependencies

Thanks to this…


Random Forests is a Machine Learning algorithm that tackles one of the biggest problems with Decision Trees: variance.

Image by author.

This is article number two in a series dedicated to Tree Based Algorithms, a group of widely used Supervised Machine Learning Algorithms.

The first article was about Decision Trees. The next, and last article in this series, explores Gradient Boosted Decision Trees. Everything explained with real-life examples and some Python code.

Stay tuned!

Random Forests is a Machine Learning algorithm that tackles one of the biggest problems with Decision Trees: variance.

Even though Decision Trees is simple and flexible, it is greedy algorithm. It focuses on optimizing for the node split at hand, rather than taking into account how that…


Predicting your pizza’s cooking time

Image by author

Gradient Descent is one of the most popular methods to pick the model that best fits the training data. Typically, that’s the model that minimizes the loss function, for example, minimizing the Residual Sum of Squares in Linear Regression.

Stochastic Gradient Descent is a stochastic, as in probabilistic, spin on Gradient Descent. It improves on the limitations of Gradient Descent and performs much better in large-scale datasets. That’s why it is widely used as the optimization algorithm in large-scale, online machine learning methods like Deep Learning.

But to get a better understanding about Stochastic Gradient Descent, you need to start…


Image by Carolina Bento.

Ever since I can remember, I’ve been asked long-game questions without realizing it. I couldn’t identify them as long-game questions, because no one teaches you how to develop a long-game mindset. It’s not something you learn in school, discuss at the dinner table or at a barbecue with friends.

The long-game mindset is a decision-making approach that focuses on the long-term outcomes and impact of your decisions.

In many points in life, you’re faced with decisions that go beyond short-term impact. In these situations, you can choose to cut some corners and make a quick decision without thinking much about…


Image by Author

SQL is a fundamental part of a Data Scientist’s toolbox. It’s a great tool to explore and prepare your data, either for analysis or to create a machine learning model.

An effective approach to learn SQL is to focus on the questions you want to answer, rather than on specific methods or functions. Once you know what you’re looking for, what questions you want to answer with data, the functions and operands you use to get there will make more sense.

This article is organized around what questions to ask about data, and you’ll become familiar with:

  • Structure of a…


This year was challenging, stressful, messed up, overwhelming, brutal, … for everyone.

Lots of us found comfort in books, which might seem like a small thing, but it’s an immense privilege.

This year my readings gravitated around:

  • Fiction,
  • Science,
  • Entrepreneurship & history behind high-performers,
  • Personal development & curiosity.

I love talking about books, and discover lots of new books through these kinds of lists. So I hope you enjoy this article and that you can find a book that sparks your interest on something new.

Happy reading 📚

Fiction

Carolina Bento

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store