# Visualisation of blame

I sometimes want to blame something or somebody else for errors.

This time I want to blame the **inputs** of **neural network** for its **bad prediction**.

The key idea is simple:

- Imagine you have build and trained a neural network with a last layer representing \(P(y \mid x)\) distribution for each input \(x\).
- Suppose that for \(x_k\) input vector of the neural network predicts \(\hat{y_k} \mid \hat{y_k} \neq y_{k-gold}\) which is not equal to known truth \(y_k\)
- You are am not satisfied.
- I blame the neural network. Which part of the input vector does the neural network need to change so it would predict the correct input?

- Let’s start to compute the “
*blame*”:- Compute the expected \(E_{x \in X}(P(y \mid x))\) distribution over your set \(X\)
- The expected distribution is our uninformative prior before presenting input \(x_k\) but after training the network with inputs from \(X\).

- Compute gradients of the network with respect to its parameters from the loss function \(\nabla_{\phi} distance(E_{x \in X}(P(y \mid x)), P(y_{k-gold} \mid x_k))\)
- Note: I intend to use cross-entropy as the
*distance*function - The difference between the expected distribution and the gold (one-hot) distribution is how surprising the gold label is.

- Note: I intend to use cross-entropy as the
- For each coordinate \(x^i_k \mid i \in \{1, ..., \mid x_k \mid \}\) in input vector \(x_k\) and corresponding true label sum the gradients computed from previous step \(\sum_{j \in \mid FirstHidden \mid}{\nabla_{\phi} W^{i,j}_k}\)
- Note that we are only interested in gradients on weights \(W\) between input and first hidden layer

- Find the member \(i \in \{1, ..., \mid x_k \mid \}\) with the “largest gradient” to be blamed the most!
**Why?**Suppose that our input \(x_k\) consisted from words \(w_1, w_2, w_3, ..., w_{\mid x_k \mid}\) and \(w_2\) has the “largest gradient”. As a result, if I could change only one word in the input in order change the output of the network to the correct label, I would change \(w_2\). Obviously, the network might still predict the same incorrect label or other incorrect label after changing one input. However, if there exist input \(x_{artificial}\) for which the network with parameters \(\phi\) predicts label \(y_{k-gold}\), I would start changing values of \(x_k\) into \(x_{artificial}\) according the order “gradients” on the input from step 3.

- Compute the expected \(E_{x \in X}(P(y \mid x))\) distribution over your set \(X\)

**Please let me know what you think about the proposal!**

I want to code it up soon, so please let me your opinions. Before that I have some todos:

**The idea is already implemented!**See Inverting a neural net- Training the inversion directly may be easier than sampling from the expected probability over training data

*Consider your feedback.*- read Distributted Representations of Sentences and Documents
- look for implementations tricks in Extraction of Salient Sentences from Labelled Documents

*Read more about contrastive estimation - which as I learned basically described above. I bet Mr Hinton have some notes on this.**Explore if someone else did something similar.**How can I use samples from inputs distribution given the golden label \(y_{k-gold}\), the parameters \(\phi\) and the expected distribution \(E_{x \in X}(P(y \mid x))\)?*

## Notes & Questions

#### In step 4 the sum \(\sum_{j \in \mid hidden \mid}{\nabla_{\phi} W^k_{i,j}}\) is not a real number but a symbolic expression

There are two solutions how to deal with symbolic gradients and convert them to real numbers in our situation:

- Perform a single step in gradient descent i.e. fill in the missing variables
- Since all the weights should have the same symbolic variables, divide all the weights with all the symbolic variables and just use the coefficients