# The Importance of Optimisation in Machine Learning, Artificial Intelligence

As I am currently studying for the Certified Financial Data Scientist, I need to be familiar with optimisation in machine learning. Below is an overview of where and why optimisation is very important. Note that error, loss and cost function all are the same thing. They are what we are trying to optimise, reduce. Almost every machine learning algorithm has an optimisation algorithm at its core that wants to minimize its cost function.

Optimisation, optimisation function: Optmisation function is needed because brute force with millions of weights not efficient. Backpropagation in Neutral Networks (optimisation function) aim: Bring down error to minimum. Backprogation is an algorithm that performs efficient search for optimal weights.

Optimiser: Aim to iteratively converge on parameters that predict better loss values. After many iterations without relevant improvment: Optimiser has converged and terminates. Given loss value optimiser will try and update model parameters to produce better loss values.

In tanH and Sigmoid: Chains are produced during optimisation. Aim: Converge to optimum.

Asset allocation: Markowitz Mean/Variance optimisation (MVO), also known as Modern Portfolio Theory (MPT), is a quantitative asset allocation technique that promotes diversification to balance the risk and return in the portfolio.

Neural Network Loss Function Aim: Learn a set of optimal model parameters that optimise argmin over the dataset. How: Achieve optimisation objective by minimising a loss function. Typical strategy: Initialise the weights randomly, then start optimising from there.
Optimisation method used will influence amount of iterationss needed to reach convergence.

Model Training: Gradient computation and optimisation. Stochastic Gradient Descent (SGD) optimisation. (Stochastic optimisation) Stochastic approximation methods are iterative methods used for optimisation problems.

First-order iterative optmisers: Gradient descent, trained with backpropagation.

Bayesian optimisation algorithm: Picks samples based on previous samples performance. New samples picked that improve primary metric.

Hard to optimise: Deeper networks which have fewer parameters than wider networks.

Linear function: Learning linear function easier because optimising loss function usually ends up in convex optimisation.

Support Vector Machines: Trying to find optimal that seperates hyperplane. Hyper parameter optimisation. Think: Maximum-margin principle. A simple optimisation problem.

Clustering: The choice of optimisation algorithm and clustering cost function used to solve unsupervised learning problem determines the resulting clustering.

Expectation-Maximisation Algorithm: Approach to performing maximum-likelihood estimation in presence of latent variables.
By first estimating the values of the latent variables, then optimising the model, and repeats these two steps until convergence is achieved.
Means treating problem as an optimisation or search problem: Seeking parameters that result in best fit for joint probability of data sample.

Quadratic programming: Problem of optimising a quadratic objective function, one of the simplest forms of non-linear programming.
Used to optimise financial portfolios. Quadratic programms are a particular class of numerical optimisation. First step beyond linear programming in convex optimisation.

K-Means Clustering: (where labels unknown) Interleaved optimisation.

Non-convex optimisation: Hard, because potentially many local minima, saddle points, flat regions, varying curvature.

PyTorch: Library helps optimise and update the network parameters.
Stochastic Gradient Descent (SGD) optimisation can be used.
Optimiser: Updates the model parameters according to degree of classification error.

## Questions or feedback?

Feel free to write me at contact@zuberbuehler-associates.ch or add a comment.