# Gaussian Processes Tutorial

This is a short tutorial on the following topics using Gaussian Processes: Gaussian Processes, Multi-fidelity Modeling, and Gaussian Processes for Differential Equations. The full code for this tutorial can be found here.

**Gaussian Processes**

A Gaussian process

is just a shorthand notation for saying

A typical choice for the covariance function is the square exponential kernel

where are the hyper-parameters.

**Training**

Given the training data , we make the assumption that

Consequently, we obtain

where . The hyper-parameters and the noise variance parameter can be trained by minimizing the resulting *negative log marginal likelihood*

**Prediction**

Prediction at a new test point can be made by first writing the joint distribution

One can then use the resulting conditional distribution to make predictions

**Illustrative Example**

The following figure depicts a Gaussian process fit to a synthetic dataset generated by random perturbations of a simple one dimensional function.

**Multi-fidelity Gaussian Processes**

Let us start by making the assumption that

are two independent Gaussian processes. We model the low fidelity function by and the hight-fidelity function by

This will result in the following multi-output Gaussian process

where

**Training**

Given the training data and , we make the assumption that

and

Consequently, we obtain

where

and

The hyper-parameters and the noise variance parameters can be trained by minimizing the resulting *negative log marginal likelihood*

**Prediction**

Prediction at a new test point can be made by first writing the joint distribution

where

One can then use the resulting conditional distribution to make predictions

**Illustrative Example**

The following figure depicts a multi-fidelity Gaussian process fit to a synthetic dataset generated by random perturbations of two simple one dimensional functions.

**Machine Learning of Linear Differential Equations using Gaussian Processes**

A grand challenge with great opportunities facing researchers is to develop a coherent framework that enables them to blend differential equations with the vast data sets available in many fields of science and engineering. In particular, here we investigate governing equations of the form

where is the unknown solution to a differential equation defined by the operator , is a black-box forcing term, and is a vector that can include space, time, or parameter coordinates. In other words, the relationship between and can be expressed as

**Prior**

The proposed data-driven algorithm for learning general parametric linear equations of the form presented above, employs Gaussian process priors that are tailored to the corresponding differential operators. Specifically, the algorithm starts by making the assumption that is Gaussian process with mean and covariance function , i.e.,

where denotes the hyper-parameters of the kernel . The key observation here is that any linear transformation of a Gaussian process such as differentiation and integration is still a Gaussian process. Consequently,

with the following fundamental relationship between the kernels and ,

Moreover, the covariance between and , and similarly the one between and , are given by , and , respectively.

**Training**

The hyper-parameters and more importantly the parameters of the linear operator can be trained by employing a Quasi-Newton optimizer L-BFGS to minimize the negative log marginal likelihood

where , , and is given by

Here, and are included to capture noise in the data and are also inferred from the data.

**Prediction**

Having trained the model, one can predict the values and at a new test point by writing the posterior distributions

with

where

Note that, for notational convenience, the dependence of kernels on hyper-parameters and other parameters is dropped. The posterior variances and can be used as good indicators of how confident one could be about predictions made based on the learned parameters .

**Example: Fractional Equation**

Consider the one dimensional fractional equation

where is the fractional order of the operator that is defined in the Riemann-Liouville sense. Fractional operators often arise in modeling anomalous diffusion processes and other non-local interactions. Their non-local behavior poses serious computational challenges as it involves costly convolution operations for resolving the underlying non-Markovian dynamics. However, the machine leaning approach pursued in this work bypasses the need for numerical discretization, hence, overcomes these computational bottlenecks, and can seamlessly handle all such linear cases without any modifications. The algorithm learns the parameter to have value of .

**Example: Heat Equation**

This example is chosen to highlight the ability of the proposed framework to handle time-dependent problems using only scattered space-time observations. To this end, consider the heat equation

Here, . The algorithm learns the parameter to have value of .

**Further Reading**

For more information please refer to paper1 and paper2. The codes for these two papers are publicly available on GitHub.

All data and codes for this tutorial are publicly available on GitHub.