A General Framework for Uncertainty Estimation in Deep Learning

Introduction and Background

This paper is based on combining Bayesian belief networks and Monte-Carlo sampling for uncertainty estimation. The proposed framework is supposed to have the following properties,

  1. Agonistic to the network architecture and task.
  2. It doesn’t require changes in the optimization process.
  3. It doesn’t require re-training of already trained architectures.

Prediction uncertainty arises from two sources,

  1. Data uncertainty - It is generated by noise in the data.
  2. Model uncertainty - It is generated by imbalances in the training data i.e., frequency of some types of samples being higher than the other types.

Traditional approaches for uncertainty estimation model the network weights and activations by parametric probability distributions, similar to Bayesian Neural Networks. This approach is difficult to train and is rarely used in robotics applications. Other approaches estimate uncertainty through sampling. However, these do not take into account the relationship between data and model uncertainty. An image with higher noise should have higher model uncertainty than the one with lower noise.

Drawbacks of some already existing approaches,

  1. ‘Natural-parameter networks: A class of probabilistic neural networks’, models, inputs, targets, parameters, and nodes by Gaussian distribution. Though successful, it increases the number of trainable parameters, in a super linear fashion and requires specific optimization techniques.
  2. ‘Lightweight Probabilistic Deep Networks’ reduces the number trainable parameters by modeling only inputs, outputs and activations by distributions and keeping the weights deterministic. This approach results in poor uncertainty distribution by not taking weights uncertainty into account.
  3. ‘What uncertainties do we need in bayesian deep learning for computer vision?’ proposed to add ‘variance’ variable to each output trained by maximum-likelihood loss. This was successful to model both types of uncertainties but required changes in the architecture of the neural network.

Methodology

Total Uncertainty

It is defined by the following equation,

Stot = Varp(y|x)(y)

where, p(y|x) is required to be approximated.

Modeling Data Uncertainty

Joint density distribution of all activations,

p(z(0:l)) = p(z(0))Product(p(z(i)|z(i-1)), (i, 1, l))

p(z(i)|z(i-1)) = δ[z(i) - f(z(i)|z(i-1))]

ADF approximates the above distribution with,

p(z(0:l)) ~ q(z(0:l)) = q(z(0))Product(q(z(i)), (i, 1, l))

where p(z) is the following,

p(z(i)) ~ N(z(i); m(i), v(i))

The i-th layer function f(i) inputs z(i-1), transforms the above distribution, p(z(0:i)),

p(z(0:i)) =* p(z(i)|z(i-1))q(z(0:i-1))*

ADF approximates, p(z(0:i)) with q(z(0:i-1)), by KL divergence given by,

q(z(0:i)) = argminq(z(0:i))KL(q(z(0:i))||p(z(0:i)))

Under normality assumptions this is equivalent to,

m(i) = Eq(z(i-1))[f(i)(z(i-1))] v(i) = Vq(z(i-1))[f(i)(z(i-1))]

Modeling model uncertainty

p(w|X, Y) can be approximated as Bern(w, p), where, p is the dropout rate at test time. If T sampled outputs are taken, the model uncertainty will be,

Smodel = Sum(((yt - y)2)/T, (t, 1, T))

The uncertainty of an already trained network can be obtained by finding the best for the network by minimising KL(p(w|X, Y)||q(w;p)) which can be solved analytically under normality assumptions.

Total uncertainty

The total uncertainty is the average data uncertainty v(t) over t = 1…T samples of weights and model uncertainty calculated earlier.

Overall algorithm

  1. Convert the neural network into its ADF version.
  2. Collect T samples of outputs by using MC drop out.
  3. Compute output predictions and variances.

Experiments

End-to-end steering angle prediction

  1. DroNet architecture
  2. Udacity dataset
  3. RMSE for angle predictions and NLL for variances.

Object future motion prediction

  1. Flownet2S architecture
  2. Davis 2016 dataset
  3. End point metric and KL for predictions and NLL for variances.

Conclusion

The main bottleneck of the proposed framework is several passes are needed to calculate the uncertainties. Information theory can be used to find model uncertainties in a faster way.

Address

Jodhpur, Rajasthan, India