delving into machine learning...
2 views
programming
2026-05-09
With the rise of Artificial Intelligence comes the rise of the technology that powers the Artificial Intelligence that we all know and love: machine learning. But the truth is, this technology has existed for a long time, and is also used in systems that we have been utilizing for years without realizing: auto-correct, personal recommendations via social services such as Spotify and Netflix, as well as digital assistants like Siri and Alexa. However, these older systems typically have limited capability in comparison to modern Artificial Intelligence, partly because modern AI relies on more advanced models, larger amounts of data, and better ways of selecting the proper Machine Learning model. There are many Machine Learning models, and each has its own data requirements. Typically, the simplest models are often used as a baseline, in order to decide which model makes the most sense to switch to. Both baseline models support supervised learning, meaning that they take labeled data: Linear Regression and Logistic Regression.
While both Linear and Logistic Regression are often treated as baselines, they are functionally different, and output two different types of data: continuous numerical values, which is supported by Linear Regression, and discrete labels, which is supported by Logistic Regression. For many, discrete labels, such as True or False, are much more valuable than a regular number, like 99. However, Logistic Regression, which outputs discrete labels, suffers a major flaw: it only supports output of two classes. And while this is acceptable for systems that may detect “pass” or “fail,” such as how email systems may detect if an email is “spam” or “not spam,” its usage suffers due to this feature.
Fortunately, Multinomial Logistic Regression is a model which is similar to Logistic Regression, but outputs probabilities for 3 or more classes at a time, making it more practical for real-world classification problems. To understand how this is possible, it’s important to first understand how Logistic Regression works. Logistic Regression uses a sigmoid function to output a number within the range [0,1], allowing us to interpret it as a probability, and when combined with a threshold, allows us to determine if the value should be one of two labels, which can also be referred to as Binary Logistic Regression, due to its ability to perform Binary Classification, meaning it can only output one of two classes. Multinomial Logistic Regression, on the other hand, uses a softmax function instead of the sigmoid function in order to support 3 or more outputs at a time. Furthermore, instead of using Logistic Regression’s threshold approach, such as >0.5 for pass, and <=0.5 for fail, to determine a class, N probabilities are output, totaling up to 1.0, and the largest probability is selected. In practice, the output of Multinomial Logistic Regression is more beneficial than Binary Logistic Regression in many scenarios, as not only is the class with the highest probability selected, but additionally, the probabilities for all classes are also computed, with the ability to support multiple possible classes depending on the dataset.
Interestingly, the softmax function utilized in Multinomial Logistic Regression is similar to the sigmoid function utilized in Binary Logistic Regression. When utilized with two classes, the softmax function behaves similar to the sigmoid function: σ(z) = 1 / (1 + e⁻ᶻ). However, when utilized with 3 or more classes, it takes its own shape: σ(z)ᵢ = eᶻⁱ / Σⱼ₌₁ᴷ eᶻʲ. Therefore, the softmax function can be utilized for both Binary Logistic Regression, as well as Multinomial Logistic Regression. This is important, as it shows that Multinomial Logistic Regression is not a separate idea, but rather a generalization of Logistic Regression. It takes the same concept of turning model outputs into probabilities, but expands it so the model can compare multiple possible classes at a time, instead of only two.
Multinomial Logistic Regression solves a major limitation of Binary Logistic Regression. While Binary Logistic Regression is useful for simple classification problems, like yes or no, real world problems often have more than two answers. A system may need to classify an image as a dog, cat, bird, or fish, or classify a customer sentiment as positive, neutral, or negative. In these situations, a binary classification model would not be enough on its own. Multinomial Logistic Regression has the capability to handle more advanced classification problems than Logistic Regression while still retaining the idea of probability based decisions, making it an effective machine learning model.
Be the first to leave a comment!