+ 5

Have you implemented gradient descend including covariance?

Hello dear fellows, I'm implementing a gradient descend (in this case an ascend) to find a maximum log-likelihood of a complex function by varying it's parameters when comparing it to real measured values. The implementation using only the first order derivative works, but kinda ill, because some of the parameters are strongly correlated (not independent) and sometimes the convergence gets out of control. So I'm trying to implement it using covariance as well, using the mathematical operator gradient (symbolized by an inverse triangle). The covariance matrix (i.e. gradient) is done, I'm just not getting it to work when using the X update for the iteration. Currently I'm using NewX = (Identity+alpha*gradient)*X, but I'm not sure this is right. Can anyone help?

8th Mar 2019, 5:59 AM
Edward
Edward - avatar
20 odpowiedzi
+ 4
Dear helpers: forget gradient descend. Check this iRprop+ algorithm out: http://citeseerx.ist.psu.edu/viewdoc/download?rep=rep1&type=pdf&doi=10.1.1.102.1273 it works like a charm, is robust against noisy numerical derivatives, handles non linearity quite well and is easy to implement. I was astonished how fast it converged (around 400 loops, against 2000+ using GD) and with less computational effort. Edit: it behaved even better than the solver in excel, so I have a new robust tool in the toolbox =:P
9th Mar 2019, 4:43 PM
Edward
Edward - avatar
+ 10
Kishalaya Saha and Kirk Schafer might be able to help. Maybe break down the question. ( I am weak in calculus 😅). I d say, share what you have implemented till now, maybe code will help to understand better. I can study gradient descend/ascend for finding local minima/maxima for a function here. https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html But after that, I was lost at "some of the parameters are strongly correlated"
8th Mar 2019, 7:47 AM
Morpheus
Morpheus - avatar
+ 8
I have not dabbled with ai stuff so I can't help much. Maybe Joakim Nyland or Kuba Siekierzyński can help.
8th Mar 2019, 6:24 AM
Haris
Haris - avatar
+ 7
Edward I once made a code with gradient descend 😅 But right now I have no idea anymore ^^ Maybe the code can help you, sorry I can't, it's too far back :P ^^ Maybe I should have commented it 🤔 The maths at the bottom of the js section. https://code.sololearn.com/W3L60cbJWfJG/?ref=app
10th Mar 2019, 8:01 AM
Martin Möhle
Martin Möhle - avatar
+ 6
Haris I'm literally about to sleep at the moment. However, I figured you might be able to round up the math gang who might be interested in this problem. Do me a favor though, translate this into something a 5 year old would understand. He lost me at "gradient descend including covariance." 🙃 And yes... "I am not a smart man." 😂😜😅 But I am a sleepy one. It's 1:08am here. 😉 cc: Morpheus, Madhav, PROgrammer , Flandre Scarlet
8th Mar 2019, 6:08 AM
David Carroll
David Carroll - avatar
+ 6
LOL... Okay... so it's not just me who is confused. 🤪
8th Mar 2019, 6:53 AM
David Carroll
David Carroll - avatar
+ 5
I can't understand you well, can you give some examples?😅
8th Mar 2019, 7:28 AM
Flandre Scarlet
Flandre Scarlet - avatar
+ 4
I cannot understand a single word😵😵 , Sorry looks like wouldn't be able to help😅
8th Mar 2019, 6:51 AM
Madhav
Madhav - avatar
+ 3
FunnySocks my postings are not related to your suggestions. I have a good idea what PCA is (Andrew Ng coursera), but have no idea how to apply it to a function fit. Correlation is between Xn and Xk (like in Y=Xn*Xk), dependence is between Y and X. I'm really not sure it it is a Jacobian, a gradient, a covariance matrix or a fishers inverse information matrix. Sadly enough I had calculus 1, 2 and 3 more than 25 years ago and rarely used it afterwards, so it is not so fresh in my memory. Have you seen my code? Can you help me?
8th Mar 2019, 6:35 PM
Edward
Edward - avatar
+ 3
FunnySocks Thanks again for trying to help! I use maximum likelihood as a standard for my fitting problems because it is much more robust against outlayers than least squares. Typically I use normal distribution in my data fitting problems, but once used lognormal when I was fitting survival data. Dimension of my input data is one input x (angle) and one output y (height). This is why I had a ? about how to use PCA. The co-related terms are the 6 terms I am adjusting into my generation formula to best fit my experimental measured data. If you are still keen to try to understand, take a look into the code, specially the last lines that contain my generation function and the sum of log likelihood to be maximized.
9th Mar 2019, 4:56 PM
Edward
Edward - avatar
+ 3
And for those willing to see iRprop+ in action, take a look to this: https://code.sololearn.com/cxFH9bpJ1ui7/?ref=app
23rd Mar 2019, 7:37 PM
Edward
Edward - avatar
+ 2
As you already know your variables are correlated, why don't you apply PCA before to receive uncorrelated variables?
8th Mar 2019, 11:41 AM
FunnySocks
FunnySocks - avatar
+ 2
Here a version of the code I'm trying to implement. https://code.sololearn.com/cRdCQa5QZgbX
8th Mar 2019, 4:12 PM
Edward
Edward - avatar
+ 2
I don't see how your answers are related to my suggestion. Do you know what PCA is? Do you know what a covariance matrix is? What the difference between correlation and dependence is? To me it seems you confuse terms. E.g. The "gradient" of a function cannot be a matrix. Do you mean a Jacobian matrix instead? I suggest you choose another forum/platform to ask your math related questions. You need to clarify the math before programming.
8th Mar 2019, 4:38 PM
FunnySocks
FunnySocks - avatar
+ 2
I will not look at code which is to solve a problem which I don't know. If you are not sure about the mathematical fundament, you have to fix this first. This is what you have to do and what you can do.
8th Mar 2019, 7:25 PM
FunnySocks
FunnySocks - avatar
+ 2
FunnySocks Ok, help me with the math. This is at the end what is needed here. When I update the X in gradient descend I usually use Xnew = Xold - alpha*f'(X), with alpha being the step size and small enough not to "oversee" the minimum. Now, this is considering only the partial derivatives for each x in X. How would you express the Xnew using as well the knowledge of the covariances between each x (it would be a matrix like f''([[x0x0, x0x1, x0x2],[x1x0, x1x1, x1x2],[x2x0, x2x1, x2x2]]).
8th Mar 2019, 8:00 PM
Edward
Edward - avatar
+ 2
As you say variables x are correlated you should change your optimization by either: (1) manually choose a "uncorrelated subset" among the variables x, e.g. change your function to depend on 2 instead of 4 Inputs Or (2) Transform your input data by PCA. You can then apply your optimization in the reduced parameter space. https://en.m.wikipedia.org/wiki/Principal_component_analysis
8th Mar 2019, 8:25 PM
FunnySocks
FunnySocks - avatar
+ 2
Also check this: https://en.m.wikipedia.org/wiki/Maximum_likelihood_estimation And please also note: The described problem is still incomplete. You write about a likelihood, without mentioning the assumed distribution or the dimension of your data. Or how the function you aim to minimize looks like I suggest you turn to a math forum
8th Mar 2019, 8:37 PM
FunnySocks
FunnySocks - avatar