Predicting your pizza’s cooking time — Gradient Descent is one of the most popular methods to pick the model that best fits the training data. Typically, that’s the model that minimizes the loss function, for example, minimizing the Residual Sum of Squares in Linear Regression. Stochastic Gradient Descent is a stochastic, as in probabilistic, spin on…