# Visualisation of blame

I sometimes want to blame something or somebody else for errors.

This time I want to blame the **inputs** of **neural network** for its **bad prediction**.

The key idea is simple:

- Imagine you have build and trained a neural network with a last layer representing distribution for each input .
- Suppose that for input vector of the neural network predicts which is not equal to known truth
- You are am not satisfied.
- I blame the neural network. Which part of the input vector does the neural network need to change so it would predict the correct input?

- Let’s start to compute the “
*blame*”:- Compute the expected distribution over your set
- The expected distribution is our uninformative prior before presenting input but after training the network with inputs from .

- Compute gradients of the network with respect to its parameters from the loss function
- Note: I intend to use cross-entropy as the
*distance*function - The difference between the expected distribution and the gold (one-hot) distribution is how surprising the gold label is.

- Note: I intend to use cross-entropy as the
- For each coordinate in input vector and corresponding true label sum the gradients computed from previous step
- Note that we are only interested in gradients on weights between input and first hidden layer

- Find the member with the “largest gradient” to be blamed the most!
**Why?**Suppose that our input consisted from words and has the “largest gradient”. As a result, if I could change only one word in the input in order change the output of the network to the correct label, I would change . Obviously, the network might still predict the same incorrect label or other incorrect label after changing one input. However, if there exist input for which the network with parameters predicts label , I would start changing values of into according the order “gradients” on the input from step 3.

- Compute the expected distribution over your set

**Please let me know what you think about the proposal!**

I want to code it up soon, so please let me your opinions. Before that I have some todos:

**The idea is already implemented!**See Inverting a neural net- Training the inversion directly may be easier than sampling from the expected probability over training data

*Consider your feedback.*- read Distributted Representations of Sentences and Documents
- look for implementations tricks in Extraction of Salient Sentences from Labelled Documents

*Read more about contrastive estimation - which as I learned basically described above. I bet Mr Hinton have some notes on this.**Explore if someone else did something similar.**How can I use samples from inputs distribution given the golden label , the parameters and the expected distribution ?*

## Notes & Questions

#### In step 4 the sum is not a real number but a symbolic expression

There are two solutions how to deal with symbolic gradients and convert them to real numbers in our situation:

- Perform a single step in gradient descent i.e. fill in the missing variables
- Since all the weights should have the same symbolic variables, divide all the weights with all the symbolic variables and just use the coefficients