Talk:Gauss–Newton algorithm: Difference between revisions

Content deleted Content added

Inline

Revision as of 18:44, 31 January 2008

Name?

I always thought that Newton's method, when applied to systems of equation, is still called Newton's method (or Newton-Raphson) and that the Gauss-Newton is a modified Newton's method for solving least squares problems. To wit, let f(x) = sum_i r_i(x)^2 be the least squares problem (x is a vector, and I hope you don't mind my being too lazy to properly type-set the maths). Then we need to solve 2 A(x) r(x) = 0, where A(x) is the Jacobian matrix, so A_ij = derivative of r_j w.r.t. x_i. In my understanding, Newton's method is as described in the article: f'(x_k) (x_{k+1} - x_k) = - f(x_k) with f(x) = 2 A(x) r(x). The derivative f'(x) can be calculated as f'(x) = 2 A(x) A(x)^\top + 2 sum_i r_i(x) nabla^2 r_i(x). On the other hand, the Gauss-Newton method neglect the second term, so we get the iteration A(x_k) A(x_k)^\top (x_{k+1} - x_k) = - A(x_k) r(x_k).

Could you please tell me whether I am mistaken, preferably giving a reference if I am? Thanks. -- Jitse Niesen 18:33, 18 Nov 2004 (UTC)

It is quite possible that you are right. Please feel free to improve both this article and the corresponding section in Newton's method. I do not have an appropriate reference at hands, therefore I am unable to contribute to a clarification. - By the way, thanks for the personal message on my talk page. -- Frau Holle 22:23, 20 Nov 2004 (UTC)

I moved the description of the general Newton's method in R^n back to Newton's method and wrote here a bit about the modified method for least squares problems. -- Jitse Niesen 00:45, 5 Dec 2004 (UTC)

Transform of the Jacobians in the wrong place?

Should the formula not be p(t)=p(t+1)-(J'J)^(-1)J'f?

I thought the RHS was supposed to look like an OLS regression estimate... —The preceding unsigned comment was added by 131.111.8.96 (talk • contribs) 15:48, 10 January 2006 (UTC)

Please be more precise. Do you want to change the formula

p^{k+1}=p^{k}-(J_{f}(p^{k})\,J_{f}(p^{k})^{\top })^{-1}J_{f}(p^{k})\,f(p^{k})

in the article to read

p^{k}=p^{k+1}-(J_{f}(p^{k})\,J_{f}(p^{k})^{\top })^{-1}J_{f}(p^{k})\,f(p^{k})?

If so, can you give some more justfication than "I thought the RHS was supposed to look like an OLS regression estimate", like a reference? If not, what do you want? -- Jitse Niesen (talk) 18:12, 10 January 2006 (UTC)[reply]

montazr rabiei

The term that is neglected in the Newton step to obtain the Gauss-Newton step is not the Laplacian. It also contains mixed second-order derivatives, which is not the case for the Laplacian! Georg

You're right. Many thanks for your comment. It should be fixed now. -- Jitse Niesen (talk) 03:10, 28 October 2006 (UTC)[reply]

no inversion?

What is meant by "The matrix inverse is never computed explicitly in practice"? Right after it is said that a certain linear system should be solved. How does that system gets solved if not by the inversion of that matrix?

It seems to mee also that there is a pseudoinversion going on there, and this should be made explicit in the article. -- NIC1138 18:17, 21 June 2007 (UTC)[reply]

I have added more details on how to solve that equation without inverting the full matrix. -- Tim314 19:46, 26 September 2007 (UTC)[reply]

Re: Computing the step $\delta ^{k}$

Invertible refers to a square matrix. Since $J_{f}(p^{k})$ is $m\times n$ , it would be more exact to say it is of rank $n$ rather than invertible. Note that this might invalidate the following argument (using the inverse matrix), so some comment like "for the square case ..., and proper generalization is available for the rectangular case" might be needed. --AmitAronovitch (talk) 11:42, 10 January 2008 (UTC)[reply]

Missing Assumptions - Difference between Gauss-Newton and Newton

I would like to see an explicit statement about the different assumptions behind Gauss-Newton and Newton method. The Newton-method-article is quite clear in its assumptions.

Now, in the Gauss-Newton method, why can the Hessian (implicitely) be approximated by $J_{f}(p)^{\top }J_{f}(p)$ . What is the geometric interpretation? Is the reason a special assumption, e.g. that the function is very close to zero near the true solution and therefore $\sum _{i=1}^{m}f_{i}(p)\,H(f_{i})(p)$ is neglectable ? That would mean that Gauss-Newton (and also Levenberg-Marquardt!) is not suitable for finding minima where the "residual" function value is quite large, e.g. $f_{i}(p)=p^{2}+100000000$

Or from another viewpoint: Given the funtion value and the jacobian there are infinitely many parabolas which are tangent to the function, what's the justification to guess the Hessian to be (close to) $J_{f}(p)^{\top }J_{f}(p)$ ?

Without this explanation, the update rule given in the article seems too unjustified for me, where does it come from? I do believe Gauss had good reasons for it :-)

--89.53.92.248 17:18, 6 September 2007 (UTC)[reply]

Meanwhile I do believe that G.N. is the special case where we have a sum of squared differences (ssd) between functions and "desired function values". In pure Newton method, we would compute the jacobian and hessian of the overall ssd. In Gauss-Newton we dont approximate the ssd but we linearize each of the functions (1st order taylor series), which implies that the hessian of these linearized functions is zero and the resulting "modified" ssd is quadratic in the original parameters. However, I would still like to see an explicit derivation here. --89.53.75.102 10:17, 16 September 2007 (UTC)[reply]

I have made additions to the "Derivation" section (also retitled by me), which hopefully address these issues. In answer to the above question, it's true that Gauss-Newton may have problems in the case of large residuals. Levenberg-Marquardt is more robust in those cases (although I believe Gauss-Newton is usually faster when the minimum is zero). --Tim314 19:54, 26 September 2007 (UTC)[reply]

Derivation

Much like the comment above, I am looking for more of a derivation of the Gauss-Newton algorithm. This article is useful in telling us what the algorithm is, what it uses as an update equation, but not why it works. —Preceding unsigned comment added by 138.64.2.77 (talk) 14:15, 13 September 2007 (UTC)[reply]

I've added a derivation from Newton's method to address this.

Thanks also to Jitse Niesen for his positive feedback on my talk page. --Tim314 19:57, 26 September 2007 (UTC)[reply]

Merger proposal

I propose that the article Levenberg-Marquardt algorithm be merged into this one.Petergans (talk) 08:28, 4 January 2008 (UTC)[reply]

I do not think they should be merged. I view them as separate. fnielsen (talk) 09:21, 4 January 2008 (UTC)[reply]

The basis of my proposal is that the derivation of the normal equations is the same whether or not a Marquardt parameter is used. I see the Marquart parameter as a "damping" device to overcome problems of divergence in "un-damped" non-linear least-squares calculations. The issue is discussed in detail in my book "Data Fitting in the Chemical Sciences", Wiley, 1992 Petergans (talk) 11:11, 4 January 2008 (UTC)[reply]

Comment that is maybe worth adding

Note that the Newton method can also be considered a special case of the GNA: for minimizing a general function ${\tilde {S}}$ , choose $f_{i}={\frac {\partial {\tilde {S}}}{\partial p_{i}}}$ , then $S=\sum {f_{i}}^{2}$ has as its minima (which are exactly 0) the extrema of ${\tilde {S}}$ , $J_{f}(p)$ is the Hessian of ${\tilde {S}}$ , and the step formula reduces to the Newton step. —Preceding unsigned comment added by AmitAronovitch (talk • contribs) 12:17, 10 January 2008 (UTC)[reply]

Note on a section in the article

The section

==Computing the step  $\delta ^{k}$ ==

assumes that the jacobian is a square matrix, which is seldom the case (its dimensions are m×n, where n is the size of the vector p of variables and m is the number of terms in the sum of squares). I suggest we either specify that the jacobian must be square for what is in that section to work, or even better, cut this section out altogether. Comments? Oleg Alexandrov (talk) 23:09, 18 January 2008 (UTC)[reply]

The suggestion that J can be invertible is rubbish, as J is always rectangular. This article is generally of very poor quality and totally at odds with the article linear least squares. However, I don't want to edit this article yet as I want to review the whole area of least-squares fitting. I'm currently thinking about how to go about it. Are you interested in taking part in a general clean-up? my home page Petergans (talk) 16:46, 19 January 2008 (UTC)[reply]

I cut out that section. Is there anything in this article which is wrong? It looks reasonably good to me. Oleg Alexandrov (talk) 05:50, 20 January 2008 (UTC)[reply]

I have begun the process mentioned above. If you look at my (in progress) draft re-write of linear least squares in User:Petergans/a/sandbox you will see why this article misses the point completely. In non-linear least squares the system is approximated to a linear one so that the theory of linear least squares can be applied, but further complications arise because of the approximation. Petergans (talk) 08:30, 20 January 2008 (UTC)[reply]

Mmm. I find the current linear least squares article much easier to understand than your proposed rewrite. I'd very much prefer to keep the first part of the current article the way it is, and only later go into the facts that linear least squares is about fitting experimental data, and that the least squares solution is the minimal variance solution.

In other words, the article the way it is, it requires from reader just a background in linear algebra and multivariate calculus. The way you propose to write it people would need a more serious background to understand what the article is about.

By the way, this discussion belongs at talk:linear least squares. Any objections if we copy it and continue there? Oleg Alexandrov (talk) 17:07, 20 January 2008 (UTC)[reply]

Please don't do that just at the moment - I'm consulting elsewhere by e-mail as to how to proceed. When the time is ripe a discussion will be opened there. In the meantime, thanks for your comments. It's difficult to find the right level at which to pitch an article. I usually err on the side of being too rigorous! Petergans (talk) 18:45, 20 January 2008 (UTC)[reply]

I have a question for you as a mathematician. Are there instances where a sum of squares is minimized other than the minimization of a sum of squared residuals? Your comments, and this article itself, seem to suggest that there are, but I don't know of any. Petergans (talk) 16:29, 21 January 2008 (UTC)[reply]

I guess most applications are indeed for some kind of sum of squares of differences of things, at least that's all I know too. However, the algorithm holds for a general sum of squares, and I think that's the most natural way of stating the algorithm. Oleg Alexandrov (talk) 04:31, 22 January 2008 (UTC)[reply]

I now understand your position. Looked at as an abstract concept this article is fine. I shall have to argue that the practical aspect is more important, even if it means that the concept becomes more complicated. After all, Gauss first developed the method of least squares to predict the orbit position of Ceres. This has been a useful discussion. Many thanks. Petergans (talk) 10:49, 22 January 2008 (UTC)[reply]

A major proposal

Please see talk:least squares#A major proposal for details. Petergans (talk) 17:35, 22 January 2008 (UTC)[reply]

I have prepared a new article in User:Petergans/b/sandbox. I propose that this article should replace the present one and be renamed Non-linear least squares. Petergans (talk) 09:09, 31 January 2008 (UTC)[reply]

I an engineer with a PhD in math. use Gauss-Newton in my daily work. With your rewrite, I would not be able to understand what the article is about and how to use the method.

I suggest you do not attempt to integrate things like that. Wikipedia is not a book for the specialist, it is a loose collection of quasi-independent articles written as simply as possible. I object to the rewrite in the strongest terms. Oleg Alexandrov (talk) 15:54, 31 January 2008 (UTC)[reply]

If you want, please create the new Non-linear least squares article, but leave the original articles unmodified, so that the reader can get the theory from your article, but the individual applications from the simpler separate articles. Oleg Alexandrov (talk) 15:55, 31 January 2008 (UTC)[reply]

See Wikipedia talk:WikiProject Mathematics#Least squares objectionable rewrite. Oleg Alexandrov (talk) 16:02, 31 January 2008 (UTC)[reply]

I think the new article is great -- it really brings together a lot of NLLS material -- but I'd prefer that it not overwrite this article. Why don't you move it to Nonlinear least squares (or recreate it there)? CRGreathouse (t | c) 17:41, 31 January 2008 (UTC)[reply]

I agree with CRGreathouse. The new article is great, move it to Non-linear least squares, but leave the current article too. Wikipedia is not paper, and there is plenty of room for two presentations of related material from different viewpoints. The current article is short, clear, and humble, so that humble editors can still contribute to it. Your new article is comprehensive, detailed, and quite bold, making it harder for timid editors like me to contribute. I had a year or two of numerical at the graduate level in math, and would feel comfortable editing the current article with only a single reference in hand, but I'd probably have to consult my whole library to feel sure I was providing edits that kept the perspective of the new article. For instance, I have only a few references that pay more than lip service to the statistical distribution of the errors in the stability analysis.

At any rate, good work. Your experience is invaluable, and your articles are bold. Your steps "talk page, sandbox, implement" are an ideal model. In this case, I think a new article is in order. JackSchmidt (talk) 18:40, 31 January 2008 (UTC)[reply]

Minor point, purely wiki technical, nonlinear least squares became an article last night. I don't have the admin move tool, where you don't leave behind a redir with empty history, so someone might need to assist the move, depending on which punctuation is preferred. JackSchmidt (talk) 18:44, 31 January 2008 (UTC)[reply]

@@ Line 106: / Line 106: @@
 : I think the new article is great -- it really brings together a lot of NLLS material -- but I'd prefer that it not overwrite this article.  Why don't you move it to [[Nonlinear least squares]] (or recreate it there)? [[User:CRGreathouse|CRGreathouse]]<small> ([[User talk:CRGreathouse|t]] | [[Special:Contributions/CRGreathouse|c]])</small> 17:41, 31 January 2008 (UTC)
-:I agree with CRGreathouse.  The new article is great, move it to [[Nonlinear least squares]], but leave the current article too.  Wikipedia is not paper, and there is plenty of room for two presentations of related material from different viewpoints.  The current article is short, clear, and humble, so that humble editors can still contribute to it.  Your new article is comprehensive, detailed, and quite bold, making it harder for timid editors like me to contribute.  I had a year or two of numerical at the graduate level in math, and would feel comfortable editing the current article with only a single reference in hand, but I'd probably have to consult my whole library to feel sure I was providing edits that kept the perspective of the new article.  For instance, I have only a few references that pay more than lip service to the statistical distribution of the errors in the stability analysis.
+:I agree with CRGreathouse.  The new article is great, move it to [[Non-linear least squares]], but leave the current article too.  Wikipedia is not paper, and there is plenty of room for two presentations of related material from different viewpoints.  The current article is short, clear, and humble, so that humble editors can still contribute to it.  Your new article is comprehensive, detailed, and quite bold, making it harder for timid editors like me to contribute.  I had a year or two of numerical at the graduate level in math, and would feel comfortable editing the current article with only a single reference in hand, but I'd probably have to consult my whole library to feel sure I was providing edits that kept the perspective of the new article.  For instance, I have only a few references that pay more than lip service to the statistical distribution of the errors in the stability analysis.
 :At any rate, good work.  Your experience is invaluable, and your articles are bold.  Your steps "talk page, sandbox, implement" are an ideal model.  In this case, I think a new article is in order. [[User:JackSchmidt|JackSchmidt]] ([[User talk:JackSchmidt|talk]]) 18:40, 31 January 2008 (UTC)
+::Minor point, purely wiki technical, [[nonlinear least squares]] became an article last night.  I don't have the admin move tool, where you don't leave behind a redir with empty history, so someone might need to assist the move, depending on which punctuation is preferred. [[User:JackSchmidt|JackSchmidt]] ([[User talk:JackSchmidt|talk]]) 18:44, 31 January 2008 (UTC)