Why do you need the derivative of the sigmoid of the output in neural networks?
I am reading through a beginner's Neural Network book, and I have gotten to the part where we are using mean squared error with derivatives on the output to calculate what the author calls "how sensitive the error is in changes to the link weights". Quite frankly this makes no sense, and the next couple of pages are all dedicated to complicated math that I am struggling to understand.
So, my situation is that I am lost in a ton of math and not even sure what it's all for. I hope that someone could explain the purpose of the derivative in simple terms so I can regain some direction in my studies. Up until now, everything made sense:
We used matrices and weights to feed forward through the network
We used the error and the transposed weight matrices to backpropagate based on how much each weight was to blame for the error
Now I have no idea why I am spending hours learning calculus to obtain the derivative-mean-squared-something of the output
I'm guessing it has something to do with how much we want to adjust the weights by, and I think there's something to do with the sigmoid function that means a lower slope = more confidence so less aggressive updates. But these are all scattered thoughts that I can't tie together. How come a 1 or a 0 means it's confident? What if 0.5 is the right answer?????? A coherent explanation or a link to some resources would be much appreciated. Thanks!
Submitted July 13, 2017 at 02:15PM by ClearlyCoder
via reddit http://ift.tt/2sUKfO7