Using Log Likelihoods to solve Maximum Likelihood Estimation Problems

1. The Context

If your clicking into this tutorial, chances are you already have some familiarity with Maximum Likelihood Estimation (MLE) and its extensive application in addressing Machine Learning Problems. However, if you're not familiar with it, don't worry!

This first section is devoted to providing the necessary context for this tutorial and explaining some essential prerequisite knowledge. This will help you understand not only the following sections in this tutorial but also appreciate the utilization of MLE in resolving common issues that arise in machine learning problems.

📕 Necessary Knowledge

1) “What is Likelihood?”

Likelihood refers to the probability of observing a specific set of data points, considering the parameters of a model. To grasp this concept, one can employ the standard distribution as an illustrative example. In the case of the standard distribution, the likelihood can be interpreted as determining the mean and standard deviation of the normal distribution that would most accurately fit the given data.

It is important to note that in order to utilize the concept of likelihood, you must already have a predefined model in mind and possess knowledge of its parameters. In the normal distribution case,

Model: Normal Distribution

Parameters: Mean, and the standard deviation

2) “What is Optimization”

Optimization refers to the process of finding the best possible solution for a given problem. The best possible solution differs by context, however common objectives would be to maximize a or minimize an objective function.

In calculus terms, solving for the best possible solution often involves taking the derivative of the objective function and solving for zero. However, under situations where taking the derivative is not possible, we can use other optimization methods such as algebraic workarounds or gradient descent to find the best possible solution.

An example of an algebraic workaround is performing valid operations on the objective function which would not change the best solution. Multiplying a function by a constant, or taking the log of the function, are just some examples. These workarounds will ultimately prove useful especially in the case of MLE.