Academic Syntax

Top 10 Most Commonly Used Elementary Math Formulas in Machine Learning

Published: May 24, 2026 Read Time: 8 mins

Mathematics is the bedrock of machine learning algorithms. Whether you are building simple linear models or deep neural networks, understanding foundational equations from algebra, calculus, and statistics is crucial.

When writing research papers, documentation, or technical blogs, rendering these expressions neatly requires LaTeX syntax. This guide aggregates the 10 most essential elementary mathematical formulas used in Machine Learning along with their live execution formats and ready-to-copy LaTeX source blocks.


1. Simple Linear Regression Model

Used to map the linear relationship between a dependent scalar variable and one or more explanatory variables.

$$y = w_1x + w_0$$
y = w_1x + w_0

2. Mean Squared Error (MSE)

The premier loss function used for continuous regression tasks to measure average squared residual distances.

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

3. Sigmoid Activation Function

Maps arbitrary real-valued numbers into a strict probability range between 0 and 1, vital for logistic regression binary classification models.

$$\sigma(z) = \frac{1}{1 + e^{-z}}$$
\sigma(z) = \frac{1}{1 + e^{-z}}

4. Softmax Function

Generalizes the logistic sigmoid function to handle multi-class neural output probability distributions.

$$\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}$$
\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}

5. Gradient Descent Weight Update

The core optimization step parameter formula utilized to iteratively minimize loss functions in training systems.

$$w := w - \alpha \frac{\partial L}{\partial w}$$
w := w - \alpha \frac{\partial L}{\partial w}

6. Euclidean Distance (L2 Norm)

Calculates straight-line geometric spatial distance gaps, universally referenced in K-Means clustering and KNN classifiers.

$$d(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}$$
d(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}

7. Binary Cross-Entropy Loss

Measures optimization penalties for classification errors between two discrete probability thresholds.

$$L = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$
L = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]

8. Pearson Correlation Coefficient

Evaluates linear dependence strengths between two unique datasets for exploratory feature engineering.

$$\rho_{X,Y} = \frac{\text{cov}(X,Y)}{\sigma_X \sigma_Y}$$
\rho_{X,Y} = \frac{\text{cov}(X,Y)}{\sigma_X \sigma_Y}

9. Gaussian (Normal) Distribution PDF

Defines standard probability densities across natural continuous variables, prominent in Naive Bayes systems.

$$f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$
f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}

10. Bayes' Theorem

Calculates conditional likelihood values based on historical prior condition configurations.

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$
P(A|B) = \frac{P(B|A)P(A)}{P(B)}

Need to prototype, edit, or check your own custom scientific math formulas and expressions instantly?

Open Free LaTeX Math Formula Editor

You May Also Like: