Academic Syntax

Top 10 Most Commonly Used Elementary Math Formulas in Machine Learning

Published: May 24, 2026 • Read Time: 8 mins

Mathematics is the bedrock of machine learning algorithms. Whether you are building simple linear models or deep neural networks, understanding foundational equations from algebra, calculus, and statistics is crucial.

When writing research papers, documentation, or technical blogs, rendering these expressions neatly requires LaTeX syntax. This guide aggregates the 10 most essential elementary mathematical formulas used in Machine Learning along with their live execution formats and ready-to-copy LaTeX source blocks.

1. Simple Linear Regression Model

Used to map the linear relationship between a dependent scalar variable and one or more explanatory variables.

$$y = w_1x + w_0$$

LaTeX Source:

y = w_1x + w_0

2. Mean Squared Error (MSE)

The premier loss function used for continuous regression tasks to measure average squared residual distances.

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

LaTeX Source:

MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

3. Sigmoid Activation Function

Maps arbitrary real-valued numbers into a strict probability range between 0 and 1, vital for logistic regression binary classification models.

$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

LaTeX Source:

\sigma(z) = \frac{1}{1 + e^{-z}}

4. Softmax Function

Generalizes the logistic sigmoid function to handle multi-class neural output probability distributions.

$$\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}$$

LaTeX Source:

\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}

5. Gradient Descent Weight Update

The core optimization step parameter formula utilized to iteratively minimize loss functions in training systems.

$$w := w - \alpha \frac{\partial L}{\partial w}$$

LaTeX Source:

w := w - \alpha \frac{\partial L}{\partial w}

6. Euclidean Distance (L2 Norm)

Calculates straight-line geometric spatial distance gaps, universally referenced in K-Means clustering and KNN classifiers.

$$d(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}$$

LaTeX Source:

d(\mathbf{p}, \mathbf{q}) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}

7. Binary Cross-Entropy Loss

Measures optimization penalties for classification errors between two discrete probability thresholds.

$$L = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$

LaTeX Source:

L = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]

8. Pearson Correlation Coefficient

Evaluates linear dependence strengths between two unique datasets for exploratory feature engineering.

$$\rho_{X,Y} = \frac{\text{cov}(X,Y)}{\sigma_X \sigma_Y}$$

LaTeX Source:

\rho_{X,Y} = \frac{\text{cov}(X,Y)}{\sigma_X \sigma_Y}

9. Gaussian (Normal) Distribution PDF

Defines standard probability densities across natural continuous variables, prominent in Naive Bayes systems.

$$f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

LaTeX Source:

f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}

10. Bayes' Theorem

Calculates conditional likelihood values based on historical prior condition configurations.

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

LaTeX Source:

P(A|B) = \frac{P(B|A)P(A)}{P(B)}

Need to prototype, edit, or check your own custom scientific math formulas and expressions instantly?

Open Free LaTeX Math Formula Editor

1. Simple Linear Regression Model

2. Mean Squared Error (MSE)

3. Sigmoid Activation Function

4. Softmax Function

5. Gradient Descent Weight Update

6. Euclidean Distance (L2 Norm)

7. Binary Cross-Entropy Loss

8. Pearson Correlation Coefficient

9. Gaussian (Normal) Distribution PDF

10. Bayes' Theorem

You May Also Like: