Does anyone have an intuitive explanation of how the derivative is being used here to adjust the weights?
I've asked a lot of questions here about neural networks and backpropagation and people have been very patient and understanding with my cluelessness. I appreciate that. My final one (for the day at least :P) is about this blog post here: http://ift.tt/23GllSq
Basically, the author takes the derivative of the cost function w.r.t any given weight, and subtracts that derivative (times a learning rate) from the old weight to get the updated one… all with 0 explanation. Try as I might, I cannot understand why this works. I would appreciate it if anyone could help here. What I think I understand so far (please correct me where I am wrong) is that:

The derivative of a cost function w.r.t a weight holds everything else as a constant and tells us how changes in the weight affect the function's output
 It could also be thought of as the gradient or slope; basically it tells us, given certain values in the function, how much the weight contributed

We want to get to the lowest value of the cost function, so we want to change the weights with that in mind

The learning rate allows us to take slow steps in case some of the data is wrong, although it isn't really that helpful since we actually just want to get slower as the slope decreases
Getting into neural networks was deceptively easy — a little matrix multiplication here, some activation functions there, but sadly the most important part (fixing the outputs!) is also turning out to be the most complex.
Submitted July 16, 2017 at 06:41PM by ClearlyCoder
via reddit http://ift.tt/2v7wyNc