Talk:Elo rating system/Archive 1
This is an archive of past discussions about Elo rating system. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 |
Precise statistical model
For the ELO system the precise statistical model and the estimation of parameters is difficult to be retrieved on the internet. therefore I would much appreciated seeing it on this page, esp. since it should be a couple of lines only.
- Done, roughly speaking. It's not clear what the precise model is, since Elo himself waffled between the normal and logistic curves. Moreover, the implementation of the model varies significantly from one organization to the next. Finally, it should be noted that it is a stretch to label this adjustments of ratings up and down as statistical estimation. Yes, there is a model, but adding and subtracting points on a game-by-game basis is a klutzy way to estimate anything, and highly unlikely to be used in any real statistical application.
- The rating systems in place today are a political compromise between mathematicians who would like to estimate hypothetical parameters accurately and players who want each game to be a fight over the rating points they win and lose. Players seem to prefer being able to say, "I beat that guy four games straight and took 45 points from him," as opposed to being able to say, "My rating is accurate to the third digit." They don't want accuracy, they want to win and lose points. That way they have something to fight for every single game, even if they are not in contention to win a given match or tournament. --Fritzlein 20:19 28 Jun 2003 (UTC)
- Can't they fight over fractions, or floating points (:)), instead? lysdexia 17:12, 12 Nov 2004 (UTC)
His name
Tidbit: "élő" means "living" in Hungarian language. --grin 19:45, 2004 Apr 6 (UTC)
Please, would it be possible to mention his name correctly spelled at least once, maybe in parantheses? His name is Élő Árpád (beware the accents) in Hungarian, or Árpád Élő in the English order of names. ("Élő" is his family name and "Árpád" is the given name.)
And please, kill those acronym-like, all-caps references to "Elo".
91.120.127.14 22:05, 29 September 2007 (UTC)
- His name, correctly spelled, is Arpad Elo. That's the way he chose to spell it himself. No evidence whatsoever has been provided that that Elo ever spelled his name in any other way. Your complaint about the use of ELO would be a good point, except it's already explained in the second paragraph of the article and the capitalized usage you object to appears nowhere in the article except in a single external link of dubious value. Quale 00:01, 30 September 2007 (UTC)
There's no need for evidence, it's plain common sense for every Hungarian :D He was born in Hungary, and Árpád is an ancient and common Hungarian name. Élő is a common word in Hungarian as well, while Elo means nothing at all. There's no way someone could spell his name in Hungary as Elo Arpad, and he must have spelled it as Élő Árpád until his family moved to the US, where of course the accents wouldn't be understood. The French president is called Sarkozy, and not Sárközy, but his Hungarian ancients were called as such before immigrating to France. Please don't remove information out of ignorance. Thank you. http://hu.wikipedia.org/wiki/%C3%89l%C5%91_%C3%81rp%C3%A1d http://www.google.hu/search?hl=hu&q=%22%C3%A9l%C5%91+%C3%A1rp%C3%A1d%22&btnG=Keres%C3%A9s&meta=cr%3DcountryHU —Preceding unsigned comment added by Fblodilovics (talk • contribs) 17:17, 30 January 2008 (UTC)
- Please read and understand WP:V and WP:RS. There's no such thing as "plain common sense for every Hungarian" as a justification for an English wikipedia edit. "He must have" doesn't meet wikipedia policy. Also, you should take up your complaint at Arpad Elo where it belongs. Quale (talk) 17:28, 30 January 2008 (UTC)
Well, you claim authority on a Hungarian name while you don't speak the language and you don't believe Hungarians on this matter, but I understand you'd like to make sure the information added comes from a reliable source. So I've read WP:V and WP:RS. All right. Please check that the English Arpad Elo and a lot of Elo rating system Wikipedia articles in other languages and article itself are clearly stating his native name as Élő Árpád. If you make a Google search for Arpad Elo on the Hungarian Google site, you'll find nothing but Élő Árpád references. This reflects the fact the general consensus of the people of the country he was born in is that he was born as Élő Árpád. Actually there is no known source which claims otherwise. In fact it is a common fact who he is and what is his name among Hungarian people who has a little interest in chess. For another reliable source as an example, here is a biographical encyclopaedia of famous people published by the Hungarian state's local government of the county Arpad Elo was born in. ÉLŐ Árpád Imre in Veszmprém county's biographical encyclopaedia. Of course the name doesn't require a translation. Can National Geographic be called a 'third-party published source with a reputation for fact-checking and accuracy'? I guess it can. Here is an NG article about him. It's in Hungarian but the name doesn't require a translation. —Preceding unsigned comment added by Fblodilovics (talk • contribs) 21:08, 30 January 2008 (UTC)
- I've seen those references, including the National Geographic reference. Over a year ago I asked it be cited in Arpad Elo as a reference if someone who reads Hungarian could check to see if it was higher quality than the references the article had. See the end of Talk:Arpad Elo#Reality is whatever some Wikipedia editor says it is. The main concern I have with those references is that it is not clear if they explicitly claim a spelling for Elo's birth name, or if they are instead back transliterations of his name from English to Hungarian. It is not obvious that a Hungarian name transliterated to English and then transliterated from English back to Hungarian will give the original name. (Not all transliterations map one-to-one this way.) If you insist that his birth name be given in this article on a rating system he developed over 40 years after he had immigrated to the U.S. as a child, you would have been really happy with the 30 May 2007 version. Quale (talk) 05:57, 31 January 2008 (UTC)
- WP:MOS "For terms in common usage, use anglicized spellings; native spellings are an optional alternative if they use the Latin alphabet. Diacritics are optional, except where they are required for disambiguation (résumé). Where native spellings in non-Latin scripts (such as Greek and Cyrillic) are given, they appear in parentheses (except where the sense requires otherwise), and are not italicized, even where this is technically feasible. The choice between anglicized and native spellings should follow English usage "
- WP:NC "Convention: Name your pages in English and place the native transliteration on the first line of the article unless the native form is more commonly recognized by readers than the English form. The choice between anglicized and native spellings should follow English usage " Bubba73 (talk), 02:13, 31 January 2008 (UTC)
- I agree with WP:MOS and WP:NC that for the name of the page and the common usage of his name throughout the article should be spelled anglicized. However, I don't see any problem with mentioning the 'born as' tidbit beside the date of his birth and death. It's additional information and yes, I insist :) The pro/back transliterations might not be obvious sometimes, but in this case it is only not obvious to you. So I've read the discussion of the mentioned article and saw there are some confusions about the reliability of Hungarian sources. No wonder if one doesn't speak the language. The first link mentioned is an online port of an originally print version biographical encyclopaedia of famous people published by the Hungarian state's local government of the county Arpad Elo (Élő Árpád) was born in. It got nothing to do with Wikipedia, in fact the collection of data for the book ended in 1997 and some of its sources reach back to the 1920s. The website itself is a Cultural Ministry sponsored project for porting the more important books of Hungary's national library (with 8 million items in its catalogue) to the online world. It is an as reliable source as it can get. The other source is a joint project of KFKI (Central Research Institute for Physics) and the Hungarian Academy of Sciences. Also a very reliable source. Fblodilovics (talk) 13:45, 31 January 2008 (UTC)
- No, I don't think you understand. Firstly, "Elo" isn't just his name Anglicized, it was Elo's name. He chose to spell it that way, and he published in English spelling his own name that way. I haven't seen any evidence that Elo ever used diacritics in his own name at all, and certainly not after age 13. The point is, it's entirely irrelevant how Hungarians choose to spell Elo's name today, unless we know that's how his family spelled it when they were in Hungary. The Hungarian references don't seem to ever give Elo's name spelled the way he chose to spell it as an adult, and I'm not sure they explicitly state that his birth name was spelled that way. For a comparison, an article on Elo in Russian might give his name in Cyrillic, but we don't provide the Cyrillic transliteration here. The Hungarian transliteration is only of interest if we are sure that that was his birth name. Your "insistence" seems wildly out of place in this article, since the Hungarian is given at Arpad Elo where it belongs. "Additional information" in this article should be about the rating system; we already have an entire article on Arpad Elo himself. I frankly don't see what point it has in this article. Elo was fully Americanized, living and working in the U.S. for over 40 years under the name "Arpad Elo" when he developed his rating system. We've had tedious tug-of-wars with Hungarian POV pushers in this and many other articles; it would be a shame if you were to do this too. Compare this with John von Neumann, a much more famous Hungarian-American. He immigrated to the U.S. at a much later age, but the absolute insistence of giving the Hungarian spelling of his name in every article in which he is mentioned and at every single opportunity doesn't seem to be there. It's given at John von Neumann where it belongs. Quale (talk) 15:15, 31 January 2008 (UTC)
- I'm not a Hungarian POV pusher, in fact in my opinion the Ernő_Rubik article should be called Erno Rubik as this is the English Wikipedia. The article is indeed about the rating system, but there is a paragraph specifically about his name, for clarifying that it is not an acronym. I think his preimmigration name perfectly fits there. His family never could give Arpad as his name, as there is no such name in Hungarian at all, while Árpád is known to be a common name from as early as the 9th century A.D. I also cited a governmental published biographical encyclopaedia compiled by known academic researchers who cite their sources. You might argue about why his birth name doesn't belong there, but doubting Hungarians about Hungarian names is just plain silly. Fblodilovics (talk) 16:02, 31 January 2008 (UTC)
- No, I don't think you understand. Firstly, "Elo" isn't just his name Anglicized, it was Elo's name. He chose to spell it that way, and he published in English spelling his own name that way. I haven't seen any evidence that Elo ever used diacritics in his own name at all, and certainly not after age 13. The point is, it's entirely irrelevant how Hungarians choose to spell Elo's name today, unless we know that's how his family spelled it when they were in Hungary. The Hungarian references don't seem to ever give Elo's name spelled the way he chose to spell it as an adult, and I'm not sure they explicitly state that his birth name was spelled that way. For a comparison, an article on Elo in Russian might give his name in Cyrillic, but we don't provide the Cyrillic transliteration here. The Hungarian transliteration is only of interest if we are sure that that was his birth name. Your "insistence" seems wildly out of place in this article, since the Hungarian is given at Arpad Elo where it belongs. "Additional information" in this article should be about the rating system; we already have an entire article on Arpad Elo himself. I frankly don't see what point it has in this article. Elo was fully Americanized, living and working in the U.S. for over 40 years under the name "Arpad Elo" when he developed his rating system. We've had tedious tug-of-wars with Hungarian POV pushers in this and many other articles; it would be a shame if you were to do this too. Compare this with John von Neumann, a much more famous Hungarian-American. He immigrated to the U.S. at a much later age, but the absolute insistence of giving the Hungarian spelling of his name in every article in which he is mentioned and at every single opportunity doesn't seem to be there. It's given at John von Neumann where it belongs. Quale (talk) 15:15, 31 January 2008 (UTC)
- I agree with WP:MOS and WP:NC that for the name of the page and the common usage of his name throughout the article should be spelled anglicized. However, I don't see any problem with mentioning the 'born as' tidbit beside the date of his birth and death. It's additional information and yes, I insist :) The pro/back transliterations might not be obvious sometimes, but in this case it is only not obvious to you. So I've read the discussion of the mentioned article and saw there are some confusions about the reliability of Hungarian sources. No wonder if one doesn't speak the language. The first link mentioned is an online port of an originally print version biographical encyclopaedia of famous people published by the Hungarian state's local government of the county Arpad Elo (Élő Árpád) was born in. It got nothing to do with Wikipedia, in fact the collection of data for the book ended in 1997 and some of its sources reach back to the 1920s. The website itself is a Cultural Ministry sponsored project for porting the more important books of Hungary's national library (with 8 million items in its catalogue) to the online world. It is an as reliable source as it can get. The other source is a joint project of KFKI (Central Research Institute for Physics) and the Hungarian Academy of Sciences. Also a very reliable source. Fblodilovics (talk) 13:45, 31 January 2008 (UTC)
Depth of something ranked with ELO?
I removed the section below from the article, as I can't find any information about this concept elsewhere... can anyone provide a cite? -- The Anome 14:16, 12 Sep 2004 (UTC)
- The ELO rating depth also states something over the "depth" of the game. The total depth of a game is defined by two end points of the possible range of skills, from the total beginner to the theoretical best play by an infallible, omniscient player.
- Both are not easy to establish: Is someone already a beginner who just heard the rules, thereby setting the lowest standard or does it need several games until one has immersed the rules of a game and is able to play on its own? On the other end of the range one simply has to take the best player at a given time. The total beginner, yet playing on its own according to the simple rules can in Go safely be set at 30 kyu. Theoretical best play could result in the strength of an imaginable 13 dan according to measurements of standard deviations among professional games.
- Only taking 20 kyu and 9 dan as endpoints makes Go a very deep game. A rating difference of 2900 ELO points from (Gu Li) to a 20 kyu with 100 ELO points is a difference in insight into the game by 29 times the standard deviation (100 ELO points).
- Chess in comparison has a similar endpoint (Gari Kasparow with once 2851 points, s.a.), yet the standard deviation is set at 200 ELO points. More difficult to compare due to the draws, however it results in a depth of chess of (only) 14 layers of standard deviation if the total beginner in chess had a rating of zero ELO points (which s/he has not AFAIK).
I remember reading something similar to this in Chess magazine (London) probably about eight or nine years ago, but I don't have a cite (I've a feeling it was in one of Fox and James' columns, but can't be sure). If I remember correctly, it reported a study which had counted the number of steps one needed to take in a number of games to get from the weakest player in the world to the strongest, where each intermediate player could score 75% against the one below. Go had the most steps by far (and so was considered the most "deep" or "difficult" game); chess was second; various other things were also considered (checkers I remember was in there, backgammon too, I think). But in any case, I'm not sure something like the above really belongs in this article: it's not about the Elo system per se; the Elo system is just being used as a tool to measure the "depth" of chess. Perhaps a mention could be made in the chess or Go articles or in some new comparison of chess and go article. --Camembert
- Sorry I didn't chip in on this topic before. Yes, the ELO system has certainly been used to measure the depth of games in the manner described by the paragraphs which were removed from the article. By this measure go is a deeper game than chess, after which checkers, bridge, and poker follow in close succession. However, there is a serious problem in comparing chess to games like bridge and poker: how many hands of the latter are equal to one game of chess? The luck involved in cards means that it may take a whole evening for the superior skill of one player to manifest itself. Also there is a question of the margin of victory, as one big pot in poker can cover lots of small losses.
- I think the appropriateness of this section for the article is marginal, because the fundamental concept is not really that of statistical estimation, but that of a "class interval" being a difference in skill such that the stronger player can win 75% of the time. For different games the statistical model may be different. I believe that for go tests have shown that the normal curve approximates performance better than the logistic curve. When two games use a different model it is a stretch to say that you are comparing the range of ELO ratings in each case. On the other hand, the notion of measuring the depth of a game by the number of class intervals is an interesting topic in its own right, and deserves to be covered somewhere in Wikipedia. Maybe it makes more sense for it to be attached to this article than to be put anywhere else?
- Oh, and the explosion of scholastic chess in the U.S. has indeed given rise to ratings of zero. It shouldn't be too surprising that a random 6-year-old with no special gift for that game can play that badly. But if you include a zero rating in chess, you have to go down to something like 35 kyu or lower in go. Furthermore the tradition that 9-dan is the highest rank doesn't allow ratings on the upper end to expand as much as they should. Therefore, if we measure chess in a way that shows 15 class intervals, then a comparable measurement in go may show 45 or more class intervals. No matter how you slice it, the class interval measurement asserts that go is vastly deeper than chess. --Fritzlein 16:18, 14 Nov 2004 (UTC)
Glicko system?
Do we have an article about the Glicko rating system, which is gaining popularity? Apparently Glicko-2 could replace Elo one day.--Sonjaaa 02:26, Jan 31, 2005 (UTC)
- Glickman's system has real advantages over the current clunky implementations of Elo's model, but that's not enough to make it a likely replacement. Are you suggesting that the USCF might adopt it any time soon? If so, you know more about USCF politics than I do. I was under the impression that the USCF ratings committee was a fairly conservative body. Or is ICC making the switch? Last I knew (and I confess to being out of date) only FICS was using Glicko ratings. Who else is jumping on the bandwagon? --Fritzlein
- While the idea that some players have a better determined rating than others is appealing, and may be useful in other sports, actual sports organizations penalize inactivity by taking away points over time, rather than increasing the rating "uncertainty". Elo system has theoretical underpinnings that make it a true statistical estimator, at least when K is set sufficiently low. But so far there has not been any indication that Glicko is actually an improvement in terms of its predictive ability. Glicko-2 is even less well motivated than Glicko: it has both a rating deviation, RD, and a rating volatility . I believe that both systems can probably be manipulated by a group of conspirators fixing games against each other in such way as to drive the ratings up for one of the participants.--Kotika
- Glickman is a statistician, so it isn't surprising that he thinks improvements in the rating system will come from doing better statistics on the same data. Unfortunately for his project, the underlying model IS NOT QUITE TRUE. Adding layers of refinement to the estimation technique is akin to finding the radius of the earth to the tenth digit: eventually you must face the fact that the earth is not truly spherical (It is wider at the equator than at the poles.), so extra digits of accuracy in the radius have no meaning.
- The most compelling evidence that the Elo model doesn't hold true comes from the on-line chess servers. The blatant counter-example to the truth of the model is computer players, but subtler proof comes from the distortions of ratings that arise from players being able to select their opponents, favoring some and avoiding others. It is no coincidence that many ICC members consider the only accurate ratings on the server to be those from which computer players are barred and the games are paired randomly by the server rather than by choice of the participants themselves.
- My opinion is that, since the underlying model is false, it is misguided to focus on more accurate estimation. Rather one should focus on the concern Kotika raises, namely rating manipulation. One's primary focus should be to minimize the opportunities for participants, either singly or in collusion, to distort their ratings, particularly opportunities to inflate their ratings. I suspect that Kotika's imputation is not quite right, i.e. I suspect the Glicko system is if anything slightly less vulnerable to manipulation than plain vanilla Elo ratings. But I do think Glicko's energy is somewhat misdirected. In practice, the biggest accuracy problems with the Elo system don't come from the klunky estimation technique, they come from the model being wrong, and from clever people exploiting the wrong model to cheat the system. --Fritzlein 16:35, 27 Mar 2005 (UTC)
- The exploits you refer to would not be possible in OTB tournaments. --Malathion 07:36, 24 Jun 2005 (UTC)
- Very true. It was self-selection of opponents on-line that first showed us the inadequacies of the USCF model. When you don't get to choose your opponents, it covers up 95% of the deficiencies of the model. If you are in an environment where players can't select their opponents, I guess it makes sense to focus on the 5% of the problem that remains, rather than focusing on the huge problem of rating manipulation that opponent-selection creates. --Fritzlein 20:26, 25 October 2005 (UTC)
- Hello. Do you have some sources for that figure (95%)? Or is it just an estimation from your experience? If there is somewhere a study on this topic, I'll be really interested in reading it. BTW, Glicko model is different (but similar, I agree) from ELO one. Moreover, could you explain briefly the strategy used by cheaters to increase their rating by selecting opponents? Finally, when you say that "the underlying model" is false, i understand that you think that some aspects of real games are not well modeled. What are these aspects (at least the most important ones) to your mind? Thanks a lot.Dangauthier 16:47, 4 June 2007 (UTC)
- Very true. It was self-selection of opponents on-line that first showed us the inadequacies of the USCF model. When you don't get to choose your opponents, it covers up 95% of the deficiencies of the model. If you are in an environment where players can't select their opponents, I guess it makes sense to focus on the 5% of the problem that remains, rather than focusing on the huge problem of rating manipulation that opponent-selection creates. --Fritzlein 20:26, 25 October 2005 (UTC)
- The exploits you refer to would not be possible in OTB tournaments. --Malathion 07:36, 24 Jun 2005 (UTC)
- My opinion is that, since the underlying model is false, it is misguided to focus on more accurate estimation. Rather one should focus on the concern Kotika raises, namely rating manipulation. One's primary focus should be to minimize the opportunities for participants, either singly or in collusion, to distort their ratings, particularly opportunities to inflate their ratings. I suspect that Kotika's imputation is not quite right, i.e. I suspect the Glicko system is if anything slightly less vulnerable to manipulation than plain vanilla Elo ratings. But I do think Glicko's energy is somewhat misdirected. In practice, the biggest accuracy problems with the Elo system don't come from the klunky estimation technique, they come from the model being wrong, and from clever people exploiting the wrong model to cheat the system. --Fritzlein 16:35, 27 Mar 2005 (UTC)
- The 95% figure was pulled from thin air based on my experience. I basically meant that, even if you don't allow self-selection of opponents, you can probably statistically prove that Elo's model is wrong, but if you do allow self-selection of opponents, it is glaringly obvious that the model is wrong.
- As for aspects of reality that are not well-modeled, the simplest argument is a circle of dominance. Suppose I can find three players A, B, and C, any three in the world, such that A beats B more than 50% of the time, and likewise B beats C, and C beats A. If I find even one such triplet, I have proven the model false. Unfortunately, while it is intuitively obvious (at least to me) that such circles exist, it is very hard to accumulate enough evidence to prove it statistically, because during the number of games it takes to measure, the skill level of the participants will have changed!
- A better hope statistically is to prove non-transitivity in some more general sense. If A beats B 75% of the time, and B beats C 75% of the time, the system demands that A beat C exactly 90% of the time. These percentages correspond to rating gaps of 191, 191, and 382 points respectively. If you can show that for rating gaps of 382 points the favorite only wins 88% of the time, plus or minus 1%, then the model has been busted. Actually, Mark Glickman has already proven something very like this to be true, but he chose to interpret it as a evidence of poor estimation of ratings, rather than taking it as evidence the model is wrong. He has a case.
- The best way to inflate your rating (without cheating!) is to pick a computer you know how to beat, and beat it over and over in essentially the same way. It will never catch on to your methods and stop you. Meanwhile a weaker player who can't beat that computer yet will lose to it over and over, because the computer will never blunder. The fish donates tons of point to the bot, which it then transfers to you. In the end you might end up rated 400 points above the computer, while the poor schmoe ends up 400 points below the computer, but in reality you are nowhere near 800 points better than the fish is. This is the historically first obvious violation of transitivity of rating differences.
- A secondary way to inflate your ratings is to play exclusively against humans who have inflated their ratings by the first method, and diligently avoid playing the underrated schmoes who gave all their points to computers. --Fritzlein 03:38, 5 June 2007 (UTC)
- Thanks for these explanations. I'll think to your arguments and comment on latter . E.g. are some computers officially rated? However, this page is probably not the best place for that.Dangauthier 00:22, 9 June 2007 (UTC)
Elo for Multiplayer games??
Is there a version of Elo, or a different rating system that's ideal for rating multiplayer games like Scrabble or what not?--Sonjaaa 13:01, Feb 26, 2005 (UTC)
- Scrabble is considered a two-player game by serious Scrabble players, because the multiplayer version is hugely influenced by the order of play, so much so that it seems impossible to make multiplayer Scrabble fair enough for tournament play. Nevertheless your question is valid for true multiplayer games like Diplomacy. There is a natural extension of Elo's basic formula for expected number of wins, which can be expressed on the same logarithmic scale Elo chose, i.e. 200 points for a class interval. If there are N players with ratings R1, R2, ... RN, then the expected wins for player I would be 10^(RI/400)/[10^(R1/400) + 10^(R2/400) + ... + 10^(RN/400)]. Based on this model, one can produce ratings estimates from game results in a variety of ways, including simple linear adjustments parallel to Elo's suggestion for chess.
- The validity of this method for any given multiplayer game is very much open to question, but I have never heard of anything better. At least this extension of Elo is plausibly fair to all players. --Fritzlein 04:03, 27 Feb 2005 (UTC)
- I missed something in there. In the Main article it state that expected wins can be calculated as 1 / 1 + 10^(R[a]-R[b]/400). Where does the series you note above fit into that?--Nolesce
- I apologize for not noticing your question when it was written, but I'll answer it now. Before generalizing the two-player formula to a multiplayer formula it pays to notice that 1/(1+10^((R_a - R_b)/400)) is equivalent to 10^(R_b/400)/(10^(R_a/400)+10^(R_b/400)). If you take chess ratings, divide by 400, and take the inverse logs, the expectancy formula is a simple proportion. For example, let R_a = 1102 and R_b = 1295. We calculate 10^(1102/400) = 569 and 10^(1295/400) = 1728. The odds of winning are therefore 569:1728. Player A's probability of winning is 569/(569+1728), while Player B's probability of winning is 1728/(569+1728).
- Now we can easily generalize. If Player C has rating R_c = 1427, we calculate 10^(1427/400) = 3694. When the three players contest a multi-player game, the odds will be 569:1728:3694. Player A's probability of winning is 569/(569+1728+3694), while Player B's probability of winning is 1728/(569+1728+3694), and Player C's probability of winning is 3694/(569+1728+3694). Does this make more sense now? --Fritzlein 20:06, 25 October 2005 (UTC)
- I'm not so sure about this method. The expected percentage seems to come out right, but what should the user's actual score be? With 2 players it's 0 for a loss, .5 for a draw, and 1 for a win. But what if it's four players? Should you use 1, .667, .333, 0? I tried this with four players of 1200 rating with a K of 32, the results were 1224, 1213 (rounded), 1203 (rounded), 1192. This is a total of 40 points gained and 8 lost. Heck the last place only went down 8 points whereas if they were playing a 2 player game, but rated at 1200, the loser would lose 16 points. Surely a user who loses to three other players should lose more than a user who only loses to 1 player. The reason all this happens because all the expected percentages will be less, being distributed across 4 players. Thus each player in the two player game has a 50% chance of winning, but in the four player game, they have a 25% chance. I've played around with some things, like multiplying by # players / 2, but that hasn't worked completely. —Preceding unsigned comment added by 76.212.129.202 (talk) 04:23, 31 July 2008 (UTC)
First of all, I LOVE DIP TOO! I was actually thinking of using it for games like Setters or Carcassonne or Ticket to Ride in our group of friends. But anyway, what about this idea suggested by a friend: If player A wins against B and C, then the Elo is calculated as if it were 2 games: A beats B, A beats C. Is that any mathematically better or worse than the one you mention?--Sonjaaa 08:18, Feb 27, 2005 (UTC)
- Ah, your idea is also superficially reasonable, and in fact it is what Yahoo Games uses for hearts. The winner is assumed to have beaten all three opponents at individual games. However, it is not at all mathematically equivalent to what I propose, and I don't like it one bit, because your rating adjustment depends on who you lose to. This unbalances the incentives and places the players on an uneven footing in the meta-game of ratings.
- Let's say we are playing Settlers. I am rated 1200, you are rated 1600, and Jughead is rated 2000. Now it turns out that late in the game I am about to win (lucky dice), Jughead is close behind, but you have slim chances yourself. You do a quick mental calculation and see that if I win you will lose 29 rating points to me, but if Jughead wins you will lose only 3 rating points to him. Therefore you abandon your own slim chances and give all of your resource cards to Jughead for free, and otherwise try in every way to help him win instead of me.
- That shouldn't happen. When you sit down to play you should know that you win X points for winning and lose Y points for losing no matter how the other players fare, so you no incentive to favor anyone. Buz Eddy realized this when he made his Maelstrom ratings for Diplomacy using the extension of Elo ratings I first mentioned, and I haven't seen it improved upon. --Fritzlein 17:02, 27 Feb 2005 (UTC)
The above seems reasonable for multiplayer games with one winner. What about multi-player games with multiple winners, such as Mafia?
- For Dipolmacy, which may end in a draw including some of the players and excluding others, the ratings give the losers a score of zero each and split one point between the winners. For example, suppose the seven players in Diplomacy are rated 1200, 1300, 1400, 1500, 1600, 1700, 1800. Their expected scores would be 0.014, 0.025, 0.045, 0.079, 0.141, 0.251, 0.446 respectively. If the latter three share in a three-way draw, the actual scores would be 0, 0, 0, 0, 0.333, 0.333, 0.333. With a K factor of 100, the ratings adjustments would be -1, -3, -4, -8, +19, +8, -11 respectively. Note that expectations on the top-rated player are so high that a three-way draw is actually a sub-par performance that costs points. --Fritzlein 19:27, 10 March 2006 (UTC)
In my opinion, the discussion about Elo for multiplayer (3+ players) games should be added to the main article. Or at least the generalized formula for multiplayer matches, what do you tnink? --Joaotorres (talk) 07:21, 2 April 2008 (UTC)
The ranked multiplayer part of Company of Heroes is using a rating system based upon the Elo system. --Fblodilovics (talk) 16:14, 21 April 2008 (UTC)
Formula for Ea, Eb
Is there a way to make the formula for calculating Ea and Eb more clear? When I read it the denominator looks like 1+10*(Ra-Rb)/400, which didn't work mathmatically. I had to research some other sites before I found that it was actually 1+10^((Ra-Rb)/400). Did anyone else have this problem? PK9 03:54, 24 October 2005 (UTC)
- Would parentheses around the exponent help? I think the formula is clear now, but of course I'm expecting the right answer, which makes it easier to see. I believe that for most readers the current layout is easier to comprehend than it was when it was in plain text, even though the plain text is unambiguous, as your paragraph above demonstrates. Please experiment with the math markup if you have any ideas. --Fritzlein 20:16, 25 October 2005 (UTC)
I also had the same problem. The ideal thing would be to superscript the exponent more. The parentheses around the exponent didn't help me but thanks for trying. Other text formulas use a caret for the exponent -- while it looks amateurish, it's actually clearer. erixoltan 11/9/2006.
Jeff Sonas' site
chessmetrics.com for more info on his rating system, since it has changed a bit since 2002. 128.6.175.26 17:53, 2 February 2006 (UTC)
Elo or ELO?
I think all the instances of this word should be spelled in lower-case: "Elo".Chvsanchez 04:03, 4 April 2006 (UTC)
- I also would prefer to always spell it "Elo". Given that it is not an acronym, I don't understand the capitalization. Unfortunately, for whatever reason, "ELO" seems to be standard. --Fritzlein 17:42, 4 April 2006 (UTC)
The Hydra handle Zor_Champ
The Hydra team has always used the handle Zor_Champ in the Playchess server, this has been known for years. When you say "team," it makes it appear as if they use a commercial program or grandmaster advice along with their Hydra engine to decide on what moves to play, which is untrue, all moves are decided purely by Hydra. You can log into Playchess and ask Zor_Champ yourself. Dionyseus 21:35, 27 April 2006 (UTC)
I didn't say team; their website says team.WolfKeeper 21:51, 27 April 2006 (UTC)
- But what they mean by "team" is that they as a team created Hydra, in other words they want some credit too. Log into Playchess and ask them yourself, they regularly test their engine modifications in the Engine room. Their entire goal is to prove to the world that Hydra is the strongest chess entity, it would make no sense for them to use the aid of other engines, or human aid during games. Dionyseus 22:03, 27 April 2006 (UTC)
And even if what you say is true (and I've seen contrary claims elsewhere); that doesn't prove that Hydra has the highest Elo; or establish what it is, they haven't played enough games yet; it takes more than a couple of matches.WolfKeeper 21:54, 27 April 2006 (UTC)
I'd also like to know why you insist on putting in the article that centaurs regularly outperform Hydra. Where is your proof of this? The recent 2006 PAL/CSS Freestyle Tournament clearly shows otherwise. Dionyseus
- It lost in previous years. If you can find evidence that Hydra actually was playing alone in this 2006 competition (when the team was under no obligation to do that); add it or refer to it. Otherwise stop reverting; you're violating NPOV every single time.WolfKeeper 22:09, 27 April 2006 (UTC)
- The main reason it was unable to qualify into the finals in the 2005 PAL/CSS Freestyle tournament was because of outright and obvious human errors. The fact that it was only using 32 nodes as opposed to the 64 nodes it uses now doesn't help either. I can provide you with a link where you can download the games from that tournament if you'd like. Dionyseus 22:18, 27 April 2006 (UTC)
- Irrelevant as to your deletion. The fact that some people think centaurs or cyborgs play better than humans does not seem to be controversial; and probably should go in the article. The trick is not putting undue weight on it, or putting undue weight on the different idea that Hydra is inevitably stronger either (because zor_team won one match???). NPOV is about capturing the points of view, not trying to impose any supposedly correct view on the wikipedia.WolfKeeper 22:36, 27 April 2006 (UTC)
- I can't off-hand remember how many ELO points twice as much speed gives you. Maybe 50 points; not necessarily decisive.WolfKeeper 22:36, 27 April 2006 (UTC)
- It is obvious that centaurs perform better than humans, no one disputes that. However, there is no evidence that centaurs have outperformed Hydra, in fact the data available thus far indicates otherwise. By the way, where did you get the idea that doubling of speed equals 50 elo points? Do not dismiss the 2004 match between Hydra and Shredder 8, Hydra with just 16 nodes dominated the former computer world champion [1], made the former computer world champion look like an amateur program, sort of how it made Michael Adams, who at the time of the match in 2005 was ranked 7th in the world, appear as an amateur even though it only used 32 nodes. Now Hydra is using 64 nodes, this is 4 times the speed of the Hydra that dominated Shredder 8 in 2004, this is twice as fast as the Hydra that dominated Michael Adams. Dionyseus 23:25, 27 April 2006 (UTC)
- Arno Nickel has beaten Hydra 2 games with computer assistance. In addition, humans do better at longer time schedules. Other engines are weaker than Hydra, but whether they are weaker with Human assistance is very much less clear. There's also the point that in Freestyle play in principle anyone can network enough iron together to outprocess Hydra. Hydra is inflexible, the owners have to buy nodes, rather than rent or borrow.WolfKeeper 17:08, 18 May 2006 (UTC)
- It is obvious that centaurs perform better than humans, no one disputes that. However, there is no evidence that centaurs have outperformed Hydra, in fact the data available thus far indicates otherwise. By the way, where did you get the idea that doubling of speed equals 50 elo points? Do not dismiss the 2004 match between Hydra and Shredder 8, Hydra with just 16 nodes dominated the former computer world champion [1], made the former computer world champion look like an amateur program, sort of how it made Michael Adams, who at the time of the match in 2005 was ranked 7th in the world, appear as an amateur even though it only used 32 nodes. Now Hydra is using 64 nodes, this is 4 times the speed of the Hydra that dominated Shredder 8 in 2004, this is twice as fast as the Hydra that dominated Michael Adams. Dionyseus 23:25, 27 April 2006 (UTC)
- The main reason it was unable to qualify into the finals in the 2005 PAL/CSS Freestyle tournament was because of outright and obvious human errors. The fact that it was only using 32 nodes as opposed to the 64 nodes it uses now doesn't help either. I can provide you with a link where you can download the games from that tournament if you'd like. Dionyseus 22:18, 27 April 2006 (UTC)
I have requested mediation
I have requested mediation about the Hydra matter. I would appreciate it if you would stop reverting my edits and cooperate so that we can resolve this matter. Here's the page, http://wiki.riteme.site/wiki/Wikipedia:Mediation_Cabal/Cases/2006-04-27_Elo_rating_system Dionyseus 00:21, 28 April 2006 (UTC)
Elo rating and Computer Programme
Many computer chess programmes are available which give rating. FIDE or http://www.fide.com should develop a computer programme easily available to world for rating. I reqest the reader of this discussion to forward a email to fide.com vkvora 18:47, 23 May 2006 (UTC)
Ratings Inflation
The article needs a section on ratings inflation. Rocksong 02:54, 7 August 2006 (UTC)
- I agree. When I first wrote the article it seemed like too much detail to talk about rating inflation/deflation, but some of the sections that have been added since are arguably even less relevant, so the time is ripe to address the issue.
- Unfortunately, all the different implementations of Elo's ideas mean that each implementation suffers from different problems. For example, the USCF implemented "rating floors" to combat sandbagging and deflation (both real problems), and as a result got ridiculous inflation of ratings within the chess-playing prison population, which is both more active and more insular than the general USCF population. How much space does USCF's failed experiment deserve?
- Moreover, even if we restrict ourselves to talking about inflation of FIDE ratings, people mean two very different things by "rating inflation". Some people mean that the top ratings and average ratings are higher than they used to be. A 2600 FIDE rating used to make you a World Championship contender, and now it doesn't get you into the world top 100.
- On the other hand, an equally powerful definition of inflation is that playing at the same absolute skill level now earns a higher FIDE rating than it used to. The intuition is that a rating of, say, 2400, should not necessarily place you at the same ranking in the world list as it used to, but instead it should mean a 50% chance of winning a game if you could go back in time to play someone rated 2400 decades ago.
- By this second definition, FIDE ratings are probably not suffering inflation. Indeed, they are actually suffering deflation, in that you have to play much better chess now to be rated 2400 than they had to in the old days. You have to know more about openings, and be more accurate tactically, for example.
- Given that FIDE ratings are gradually inflating according to one definition, and gradually deflating according to an equally valid definition, extending this article to cover rating inflation is a rather tricky project. ;-) --Fritzlein 18:08, 9 August 2006 (UTC)
- Nevermind, I did it. Edit away! --Fritzlein 19:54, 9 August 2006 (UTC)
Deliberately Misleading Information
Deep Junior did not win a match or even a game against Hydra. The article claims that as of 2006, Junior is the Computer Chess Champion, proving that Hydra's 32 processors are not superior to Junior on a dual AMD processor. This is misleading. Junior won a tournament that crowned it computer champion, but Hydra was not in that tournament. This piece of misleading information was inserted by Chessbase. Chessbase is the author of Junior and did so to advertise it's product. They have a history of lying and being deceitful to promote their software. For example they refuse to acknowledge Rybka which is a commercial engine vastly superior in playing strength to anything Chessbase has produced. It is well known among any computer chess enthusiast that Hydra would destroy Junior handily. This is not something that could be printed in the article becuse they have not had such a direct match. But what is currently in the article needs to be removed ASAP. It is misleading... and damnit I'm sick of Chessbase's lies.
- More to the point, (a) arguing over what is the best chess programs does not belong in Wikipedia, and (b) any comparisons belong in Computer chess, not here. I say delete the entire 2 paragraphs which discuss computer chess. p.s. Remember to sign your comments. Rocksong 12:29, 21 August 2006 (UTC)
- The point is, the article is about ratings, so to the extent that we know the ratings, it is reasonable to discuss players (including computer players) ratings a little here.WolfKeeper 17:05, 21 August 2006 (UTC)
- Fair enough. But how about this: we should explain the often-used term "performance rating" (which, surprisingly, the article doesn't do yet). Then we could list the best performance ratings of computers (and people). Also - I wanted to say this but I wasn't certain - computers don't have official ratings, probably because they don't play people often enough under tournament conditions, right? Rocksong 23:47, 21 August 2006 (UTC)
- Mainly because it's just not allowed. I don't necessarily agree that we should remove the computer chess discussion as it ties in with ratings (once, as suggested by rockson, performance ratings are explained). Explaining why Hydra's domination of Adams only "proved" it had a rating of 2850 or higher is a very important concept.
- I'm not happy about the paragraph about Rybka either. At least two of the 4 sources are rapid chess, and the results are all against other computers. Better, I think, to note that computers don't have official ratings, and link to some of these comparison sites; rather than single out Rybka (or any other program). Rocksong 01:56, 23 August 2006 (UTC)
- There's more than one rating list for humans though as well.WolfKeeper 02:21, 23 August 2006 (UTC)
- So? That doesn't affect my point: that a score of 2900 on these rating lists, generated solely from computer-versus-computer play, often in conditions completely different from tournament play, means (almost) nothing when compared to a FIDE rating. Don't some people have ratings over 3000 on ICC? Again, so what? Rocksong 06:13, 23 August 2006 (UTC)
- Do you have a cite for the claim that it means almost nothing?WolfKeeper 07:48, 23 August 2006 (UTC)
- I don't think RockSong needs a cite for his point. The point is that the article compares computer ratings to human ratings as though they are equivelant. Clearly they aren't. He doesn't need to cite that.
- Do you have a cite that they correlate to FIDE ratings? Rocksong 08:06, 23 August 2006 (UTC)
- I'm not making a positive claim, you are. The idea that they have '(almost) nothing' connecting them to the FIDE ratings seems to be highly unlikely, given that there *are* games played between humans and computers and they help keep the two rating scales in step, but I'll accept a good cite. So- cite please?WolfKeeper 08:18, 23 August 2006 (UTC)
- See my comment below (dated 06:34, 23 August 2006 (UTC)). So long as there's a reasonable qualifier in the article, I don't care. The debate bores me. Rocksong 08:42, 23 August 2006 (UTC)
- I've put "Ratings of Computers" in a separate section, and added a qualifying paragraph at the front. I think the qualifier is important. Beyond that, I've no interest in debates on the relative merits of different computers. Rocksong 06:34, 23 August 2006 (UTC)
Provisional period crude averaging
This section sounds extremely biased including these quotes. "for some reason a crude averaging system" "Apart from the obvious flawed logic" although I see the point, and agree with it, it sounds extremely insulting to the sites that use this method.24.237.198.91 05:58, 24 August 2006 (UTC)
- That section is so poorly written, it doesn't even make clear what it is objecting to. I think I can guess what the author is upset about, but I don't know how anyone unfamiliar with the ratings ecosystem would be able to figure it out.
- All rating systems have difficulty giving a roughly accurate rating to a previously unrated player. Many systems have a method of calculating "provisional" ratings for new players by some means radically different from Elo's standard formula of upward/downward adjustment. One such system, which I agree is literally "crude", is to calculate the "performance" of a player as equal to the rating of the opponent in case of a draw, 400 points higher than the opponent for a victory, or 400 points lower than the opponent for a loss. So if I beat someone rated 1400, draw someone rated 1500, and lose to someone rated 1750, that gives me "performances" of 1800, 1500, and 1350. My average performance would be 1550, which can serve as a provisional rating.
- What makes this system objectionable is that a win against a low-rated player can lower my provisional rating, while a loss to high-rated player can raise my provisional rating. In the above example, suppose I lost my fourth game to a player rated 2150. That would give me a "performance" of 1750, and raise my provisional rating from 1550 to 1600. It is intuitively obviously unfair to be rewarded for any loss or punished for any victory. This provisional system effectively rewards selecting opponents who are rated as high as possible.
- If the system simply adds an exception that "a win can't hurt you and a loss can't help you", it can actually make the problem worse. As an unrated player in that "fixed" system, I need only make sure to play my first game against someone rated way above my skill level, and the rest of my provisional games against players so weak I can easily beat them. Say I play a 2350-rated player first, and get a provisional rating of 1950 for the loss. Then I win ninteen games in a row against players rated 1000 or less, and since a win can't hurt me, I get to keep my provisional rating of 1950 all the way until it becomes a regular rating.
- Based on my cursory reading of what the BCF does for provisional ratings, it goes even further than the "fixed" system. The BCF will not only insure that you can't lose points for a win, it will actually insure that you gain points for a win, no matter if you are already overrated in the provisional period. This addresses the intuitive issue of fairness in gaining/losing points on a per-game basis, but may actually result in less-accurate provisional ratings. The ECF system effectively rewards selecting opponents who are rated as low as possible. I would therefore add my voice to those questioning the neutrality of the section in question. However, I think a larger issue than NPOV is that the section needs to be re-written so that people can tell what the heck it is talking about. --Fritzlein 17:19, 24 August 2006 (UTC)
- I agree it's hard to work out it's point. I say delete that whole subsection. Rocksong 11:59, 25 August 2006 (UTC)
Tone
This is an informative and detailed article, so congratulations to those who have worked on it, but it's tone is distinctly unencyclopaedic. In many places it has the hallmarks of text that has been reworked many times in different directions by different parties, and reviewing this talk page suggests that this is so. I have slapped the 'tone' tag on it for now, but please don't consider this an aggressive gesture. I would like to see that aspect of the article improved and would do it myself but for time constraints. Soo 23:00, 26 September 2006 (UTC)
- I agree, the tone has a lot of problems that are obvious in several sections. Night Gyr (talk/Oy) 21:51, 4 November 2006 (UTC)
Geocities?
Geocities fails WP:V and WP:RS as a self-published source. I've removed the reference. If the information is present in a reliable source it can be referenced there, if it isn't, it can't be referenced.--Crossmr 07:00, 4 January 2007 (UTC)
- And how are the other 3 refs any different? All of them appear to be self-published and unverifiable. Rocksong 22:46, 4 January 2007 (UTC)
- If they are, feel free to remove the information or put a cite tag on it. I only had time to look at the geocities citation.--Crossmr 22:41, 15 January 2007 (UTC)
- It isn't be used as the primary source, thus it should be ok as a secondary source. Mathmo Talk 10:01, 19 January 2007 (UTC)
- If they are, feel free to remove the information or put a cite tag on it. I only had time to look at the geocities citation.--Crossmr 22:41, 15 January 2007 (UTC)
Questions from an Uninformed Reader
For somebody who has no existing information about Elo, this page seems vague in some areas, especially regarding provisionally rated players. Can established players gain or lose rating points as the result of a match with a provisionally rated player? If so, does the increased K factor apply to the established player as well, or does she use her normal K factor?
I notice some discussion about provisional ratings on the talk page, but the information there hasn't been carried over into the article. I also agree that the formulas are confusing as formatted on the article. I was able to figure them out after seeing the ASCII versions on this discussion page. —The preceding unsigned comment was added by 70.184.146.67 (talk) 20:01, 9 February 2007 (UTC).
- The problem with discussing provisional ratings is that every institution that implements Elo ratings does something different. It isn't even clear what type of provisional ratings count as "Elo" provisional ratings. Provisional rating changes often aren't linear adjustments, so the concept of K factor may not even apply to provisional players, although typically provisional ratings change more from game to game than established ratings do.
- In general, an established player can gain or lose points from playing a provisionally rated player, although some implementations make that gain or loss less than it would be from playing an established player, in which case the established player effectively uses a lower-than-normal K factor.
- How to properly rate newcomers is a very thorny issue. Folks are usually glad if provisional ratings are even approximately correct, and then hope that lots of games between established players will even everything out eventually. --Fritzlein 04:30, 10 February 2007 (UTC)
Rating and probability of a win (Player A vs. Player B)
Another question from an uninformed/unknowning reader... Is it possible to calculate the probability of win/loss with the Elo system? That is, if I am rated at 2000, and People A and B are 1800 and 2500, what probability do I have of winning/losing against either of them? This would seem non-trivial based on the inclusion of draws and the way points are allocated (maybe %age of win/draw vs. loss?). But, I would imagine this is tremendously useful bit of information. As a beginner, if I'm rated at 1200, should I even waste my time against a 1400 opponent? Or should I expect to win enough of the time that the games will be both instructive and have a chance of reward greater than a slot machine? ;-) Thanks! 71.60.83.239 13:10, 22 October 2007 (UTC)
- According to the current USCF formulas, your probabilities as a 2000 of winning against an 1800 and a 2500 are 75 percent (0.5 + (2000 − 1800) / 800 = 0.75) and zero, respectively. Your chance as a 1200 of beating a 1400 is 25 percent, so you shouldn't waste your time playing him once -- you should play him four times. --Mr. A. (talk) 01:08, 17 January 2008 (UTC)
The old "classical" formula for the win expectancy is , where is the difference in rating. (See formula in the December 1999 issue of Chess Life.) If you set to 200, you get . However, the USCF have revamped their rating formulas according to a October 2006 interview with Glickman in Chess Life. One problem with this formula is that it does not take into account that players do not play at the same strength in each and every game, that the rating is merely an estimate. That inserts a random element to the win expectancy. Such random variations favor the underdog, and hence the "real" win expectancy for the lower rated player is higher than what the formula suggests. Sjakkalle (Check!) 07:39, 1 February 2008 (UTC)
- Also, a rating difference of 500 is by no means a "certain win", the formula gives about a 5% win expectancy for the lower rated player. To illustrate the fact that 500 points is not certain at all, in 2004 I faced a 13-year old girl who was rated 578 points below me, and I thought that would be a fairly easy game. It wasn't, and I proceeded to lose that game (and the next game as well, this one to an 11-year old girl). Shows you what good ratings are... Sjakkalle (Check!) 07:50, 1 February 2008 (UTC)
recently-added paragraph
This paragraph
In addition, one major problem is the starting rating of players; the current average "provisional" player's rating is significantly lower then the "provisional" player of yesteryear. While several decades ago a beginning rating of 1200 was not uncommon, now young players tend to start with 400 ratings.
was recently added. I have some problems with it:
1. I don't see that it is a "problem". A few decades ago there were very few scholastic players in the USCF, especially below the high school level. Now there are tens of thousands of very young players in the USCF.
2. About starting with 400 ratings (a) that may be an accurate reflection of the playing strength of young players, (b) I think I remember that the starting rating is 100 points x the grade level, but I couldn't find that at the USCF website. Bubba73 (talk), 01:55, 31 January 2008 (UTC)
- The entire "Ratings inflation and deflation" section, including its subsections, looks like WP:Original Research and is almost entirely unreferenced. It also appears to be an argument in one direction (against the general consensus that there is ratings inflation). Also note that the text from "A common misconception" was added in a single hit.[2] Whether or not those arguments are true, they are WP:Original Research. I believe the entire section should be deleted, or at least savagely reduced. Peter Ballard (talk) 02:12, 31 January 2008 (UTC)
- The "common misconception" sentence makes little or no sense to me. FIDE ratings have inflated since 1985 - I saw data about that yesterday. But there are clearly problems with the rest of the section. Bubba73 (talk), 02:17, 31 January 2008 (UTC)
- Perhaps I'm being a little harsh - we should say something on ratings inflation, but it should be referenced. BTW the sections "7.1 Game activity versus protecting one's rating", "7.2 Chess engines" and "7.3 Selective pairing" are also WP:OR and should be deleted. Peter Ballard (talk) 02:19, 31 January 2008 (UTC)
- This is a personal website (so it may not be a WP:RS), but it does seem to have good research and data. Bubba73 (talk), 02:23, 31 January 2008 (UTC)
- I'm a little suspicious because it is advocating an alternative ratings system, and there is nothing about his qualifications. It's better than nothing, but only just. Peter Ballard (talk) 02:51, 31 January 2008 (UTC)
- Good point. In Elo's book, page 18, he considers players over 2600 to be "world champion contenders". I'm not knocking a 2600 player, but that probably wouldn't be a contender today. Perhaps that shows inflation. Bubba73 (talk), 03:11, 31 January 2008 (UTC)
- I'm taking out that paragraph. But the article needs a lot more work too. Bubba73 (talk), 21:45, 12 February 2008 (UTC)
WikiProject Chess Importance
Upgraded to Top from High due to high linkage to article. ChessCreator (talk) 16:35, 17 February 2008 (UTC)
- It is linked a lot, but on the other hand, you can play plenty of chess without ever having to know about the rating system. But I'm not going to change the rating. Bubba73 (talk), 02:56, 19 February 2008 (UTC)
- Reduced to high, although not because you can NOT play chess without it(you can play chess without almost every top rated chess article), but because most times the link to ELO is not important to the linking article. ChessCreator (talk) 01:34, 7 March 2008 (UTC)
Practical issues section
This section has a lot of tags in it. On the Chess wikiproject page, I mentioned problems with this article. I don't think this section can be fixed - I think all or almost all of it should be removed. At best it doesn't really relate to the ELO system, but to rating systems in general. At worse it is unsubstantiated POV or O.R. Bubba73 (talk), 05:09, 13 March 2008 (UTC)
- Someone else has tagged it too. It needs a lot of work or needs to be deleted. Bubba73 (talk), 14:40, 17 March 2008 (UTC)
Splitting the article in two
In my opinion this article talks a lot about the use of Elo ratings in Chess and not so much about the rating system itself. In my opinion, the article should be split in two, so that the "Elo rating system" article should focus on the workings of the rating system and another article would focus on its application to Chess. There's more than 120 links to this page and a lot of them are not related to Chess at all. Even though the main use of Elo ratings is related to Chess, in my opinion the article is pretty confusing in some areas and vague in others, splitting the article could help fix this. --Joaotorres (talk) 07:47, 2 April 2008 (UTC)
- You have a point there. I'm not sure what to do. Bubba73 (talk), 17:36, 2 April 2008 (UTC)
I guess splitting would be a good solution. I'm just not sure how to do it. Guess we could outline which topics should go to each article and then create the new article, move the topics there and make the proper adjustments. What do you think? -- Joaotorres (talk) 19:36, 9 April 2008 (UTC)
I've lost the faith
I have been busy defending Elo to players on Scrabulous on Facebook. This has caused me to go back the maths and think again - and I've lost the faith
The probability stage of the calculation, where ratings are used to stand for actual strength, has no correction for the K factor with the result the "probabilities" vary wildly depending on the K factor chosen.
The probability estimate has a term 10^((A-B)/400). I wonder where the two constants 10 and 400 come from and if there’s a value of K that makes them work properly, or from which they were derived.
Is there anyone out there who can explain?
I play chess on http://64squar.es and Scrabulous (the Scrabble knock off) on http://Facebook.com The chess site uses Elo with a K factor of 32; Scrabulous uses a K factor of 120. Both sites attract masses of complaints. The chess players complain that it takes too long to reach the "right" rating and the Scrabulous players complain their ratings fluctuate wildly from day to day.
Tesspub (talk) 01:38, 10 April 2008 (UTC)
Edit to add: I think I'm saying much the same thing as the "Rating and Probability of a Win" para a few paras up. Tesspub (talk) 08:55, 10 April 2008 (UTC)
- I don't think the constans 10 and 400 have been derived from anywhere, they have simply been taken out of the air. The formula for calculating the expectacy to win, tells that if player A and B have the same rating, they have the same chance to win; if player A has 400 higher rating than B, A has 10 times bigger chance to win than player B; If player A has 800 higher rating than B, A has 100 times bigger chancce to win than player B, etc. Someone must have thoght that 400 was a good number for the rating difference between two players, of which one player has 10 times bigger chance to win than the other.
- There is a similar formula for calculating the sound intensity in decibel. There, the corresponding constants would be 10 and 10, instead of 10 and 400. --Kri (talk) 18:34, 27 December 2008 (UTC)
That makes sense. But it implies that the 400 ratings difference relates to a specific K factor and therefore that it should be scaled according to K. If I have understood correctly, Elo used a K factor of 10. This implies that "400" should actually be 40K. —Preceding unsigned comment added by 79.70.212.19 (talk) 00:19, 28 March 2009 (UTC)
Switch from Normal Distribution to Logistic
In "Implementing Elo's scheme" section, there are 2 problems.
The 1st problem is that FIDE still uses the normal distribution. See the FIDE Handbook. The section B.02.9.1 says, "This shall be done using the rating system formula based on the percentage expectancy curve and derived from the normal distribution function of statistical and probability theory. (GA 1999)". And, in the section B.02.10.1, the conversion tables between percentage score and rating diffrences, B.02.10.1.a and B.02.10.1.b are obviously calculated according to not the logistic distribution but the normal distribution.
The 2nd problem is the reason of USCF's switch from the normal distribution to the logistic distribution. "Implementing Elo's scheme" section says, "Subsequent statistical tests have shown that chess performance is almost certainly not normally distributed. ... Therefore, both the USCF and FIDE have switched to formulas based on the logistic distribution." But I cannot find any report which says the logistic distribution significantly fits. Many reports show that both the normal distribution and the logistic distribution doesn't fit to the actual performance. So, there is no statistically valid reason that the logistic distribution is better.
I think the main reason is different. Glickman's report "A Comprehensive Guide to Chess Rating" says only that the most likely reason of using the logistic distribution is that it is mathematically tractable to work with. Hammerhand (talk) 16:25, 18 July 2008 (UTC)
My edits today
Just a comment explaining my edits today, because they look so extensive: all I did was add headings, rearrange the sections, and remove some redundant "See also" entries. I didn't delete or change any other text. (Though there's a lot that deserves to be deleted or merged, but I'll let the article "settle" for a few days). Peter Ballard (talk) 10:39, 26 July 2008 (UTC)
Bilbao 2008 is not category 22
Please note that, despite what Chessbase and others say, the Bilbao 2008 tournament is not category 22. The official players' ratings (FIDE July 2007 list) are: Anand 2798, Ivanchuk 2781, Topalov 2777, Carlsen 2775, Radjabov 2744, Aronian 2737. That's an average of 2768.67. It's only a Category 22 (average 2776 or higher) if the unofficial live ratings are used. Curiously, chessbase have published my feedback, but are still calling it category 22.[3] Peter Ballard (talk) 23:59, 8 September 2008 (UTC)
- I wish people wouldn't take so much stock in "live ratings". Rating fluctuate and they have a standard error of measurement which is larger than the differences between the top players. I think a person's rating is plus/minus 50 points. And right now the top five or so players are so close together in ratings. At his peak, Fischer's rating was 120 points over the second highest rating. Bubba73 (talk), 01:24, 9 September 2008 (UTC)
Reference 16 and 28
Reference 16 and 28 are the same ;) --90.185.76.189 (talk) —Preceding undated comment was added at 05:05, 21 October 2008 (UTC).
Using ELO for 2-on-2 games, such as doubles in tennis
I'm looking to use the ELO system to determine individual ratings for players in 2-on-2 games. Currently, I'm just averaging the ratings of the two players on each team, applying the ELO formula to these temporary "team" ratings thus determining the increase or decrease, and then adding or subtracting these changes to each player's individual rating.
Example:
Andy (1610) and Jack (1588) play a game against Dustin (1468) and Jordan (1410). Andy and Jack's team rating is 1599, Dustin and Jordan's 1439. Andy and Jack win (I'm using a K-Factor of 36), so their individual ratings each go up 14.45 while those of Dustin and Jordan drop that much.
Kind of simple. I was wondering if anyone has a better way to do this. 24.140.26.11 (talk) 04:37, 14 December 2008 (UTC)Jack
Added the section 'Exponential variant' yesterday
The section I added yesterday, I derived completely from the other information in this article, using my common sense. I have played a few single player rpg's myself, and know approximately how the experience works; I have not read any articles about it though. I have not either played any mmorpg's. Maybe I'm wrong about mmorpg's using an exponential scale; if that's the case feel free to edit the section and make all the corrections that's needed. --Kri (talk) 13:58, 27 December 2008 (UTC)
This section seems completely unnecessary and AFAICT should be removed. It's a guess at how an mmorpg might implement experience, and technically it's just a transform on top of a completely bog-standard ELO rating system. Maybe if we have any cites that anyone actually uses ELO like this there should be a note about it? But it looks like we don't. In any case I think the derivation is absolutely unnecessary. GreedyAlgorithm (talk) 16:39, 28 August 2009 (UTC)
- Yes, what Kri says in the first sentence makes it original research so it should be removed. Bubba73 (talk), 20:16, 28 August 2009 (UTC)
Deflation: circular argument
The article says:
Players generally believe that since most players are significantly better at the end of their careers than at the beginning, that as they tend to take more points away from the system than they brought in, the system deflates as a result. This is a fallacy and is easily shown. If a system is deflated, players will have strengths higher than their ratings. But if they take points out of the system EQUAL TO their strength when they leave the system, no inflation or deflation will result.
This sounds like a circular argument. Those arguing that inflation or deflation occurs are saying a player's points are not equal to strength because, over time, a player's points will increase or decrease relative to their actual strength. The last sentence quoted above seems to say that inflation or deflation can't occur as long as the points taken out of the system are equal to the players actual strength.
To simplify, those who argue that inflation or deflation occurs are saying the system is not fair over time. The last sentence might was well say that if the system is fair, no inflation or deflation will result. —Preceding unsigned comment added by 65.7.247.118 (talk) 18:59, 23 January 2009 (UTC)
- I don't really understand the argument either. I find that when a player leaves the system (he dies or something), what has to be done in order to make the system stay balanced is the following:
- The number of points he took out of the system is his rating when he left minus his rating when he started, i.e. his rating gain during his career. This rating gain must also have been equally much rating loss for other players, since the system doesn't add or remove any points when calibrating the ratings – it just moves them between players.
- To give back what he took from the system, these points (his rating gain) should be equally distributed to all players currently in the system, no matter how long they have played. In this way, the average player rating will always be at the same level, namely that of a player who has just entered the system.
- This method would have another problem though; if a player enters the system but never plays any rated games, his rating will rise anyway, since he is continually getting rating points from diseased high rated players. This could maybe be solved by not distributing the points evenly after all, maybe they should be distributed only to players who play regularly, or according to some smooth transition between not playing at all and playing much.
- Another solution would be to give each player (losing or winning) a small rating gain after each played game, approximately corresponding to the increased experience both players get, and hoping the system will stay somewhat balanced. --Kri (talk) 03:39, 24 January 2009 (UTC)
Initial rating? Maximum rating?
Is there a standard or convention for what score a player starts out with? In the online versions of Scrabble that I've played, everyone starts out with a rating of 1200, and (of course) it goes up or down from there depending on wins, losses (or, ties, in the rare event that they occur?). If 1200 is standard across all settings in which the Elo rating system might be used, then why? If Scrabble, chess, and other games use different initial ratings, what criteria are used to decide on those initial ratings?
Is there a maximum possible rating that someone can get, or can it theoretically increase without bound? In practice, all ratings are below 3000, according to this article. Suppose a player is so good that he wins every game he plays (and suppose he can always find someone who is willing to get beaten). Can his rating increase to infinity (given an infinite amount of time to play)? Or is there some number, say 10000, such that the score will always be below that number? Mathmoose (talk) 20:58, 23 February 2009 (UTC)
- The score a player starts out with, that is a good question, I can't find it in the article... I don't know about that. I think it's up to everyone who wants to use the system to decide how high the rating for beginners will be. It seems that the most chess engines might have that rating at 1200? Also, I think it differs from game to game (chess, go, hex, etc.).
- And no, there's no maximum rating. In practice, it's not possible for a player to get that much points, because when he plays against a much lower rated opponent, his expected score (see the section about mathematical details) will get very close to 1, which means that if he wins he will gain very little points (if any at all after rounding them, if the system uses only integer ratings), but if he loses, he will lose almost twice as much in rating as the losing one in a game played between two equally high rated chess players. So he will have to win a lot more than he loses in order to keep increasing his rating. In fact, if he has a rating advantage of x scores over his opponent, he will have to win times as much as he loses (supposed he doesn't draw any game) to stay at the same level, so for a rating advantage of 800 points, he can only lose one games in hundred wins. --Kri (talk) 23:42, 23 February 2009 (UTC)
- The initial rating of 1200 may be in ELO's system since that is the way the US Chess Federation used to do it. With so many scholastic players now, the USCF uses an initial rating of 50 X the players age, max of 1300.
- I think the max possible rating is 400 points above the second highest rated player, and that is assuming that he wins every game. Bubba73 (talk), 23:48, 23 February 2009 (UTC)
Logistical distribution
This puzzles me, but in the bottom of the section about mathematical details, there is a reference to the Hubbert curve, logistic curves and normal distribution. Also, the section Most accurate distribution model mentions that a logistical distribution is better in this case than the normal distribution. Is there any motivation for this? In which way is the logistical distribution model used? --Kri (talk) 23:59, 23 February 2009 (UTC)
- I think Elo assumed a normal distribution but actual data fits a logistic distrubution more closely. Bubba73 (talk), 21:02, 12 March 2009 (UTC)
- What exactly is it that is distributed? --Kri (talk) 21:06, 12 March 2009 (UTC)
- I think it is the liklihood that player A beats player B if there is a rating difference of x points. Elo made an assumption that is close to the actual case, but the logistic distribution is closer. The normal distribution is a special case of the logistic distribution, IIRC. Or it is close and approaches it in the limit. Bubba73 (talk), 04:07, 18 March 2009 (UTC)
- That makes sense; in that case it would be the same thing as the expected score for player A, which does follow a logistic sigmoid function. But I don't think the normal distribution is a special case of the logistic distribution? Anyway, thanks for the reference you added, though I haven't had time to read it yet. --Kri (talk) 23:22, 18 March 2009 (UTC)
- The normal distribution is most definitely not a special case of the logistic distribution, despite that they have similar shapes. 74.210.118.117 (talk) 05:19, 29 April 2009 (UTC)
- OK, I was wrong about that. I thought some choice of parameters, in the limit, gave the normal distribution. Bubba73 (talk), 05:35, 29 April 2009 (UTC)
New Article(s)?
I don't play chess. I heard about the Elo Rating system from playing Scrabble. This article was one of the first resources I came to, but I don't find it helpful. The article is heavily geared toward the application of the Elo rating system to chess. This makes sense, for historical reasons, since that is what it was originally developed for. On the other hand, as the article itself states, it has been applied in many other contexts since then. Thus, a lot of people who don't play chess, such as myself, come here looking for information. I'm wondering if it would it make more sense to have one article that explains the Elo system in general, with chess mentioned as an example (the example?) and another which describes goes into more detail on how it is used in chess specifically (and other articles for other games that use it?).
I also find this article to be poorly organized, even if the information is correct. For example, the first time the term K factor is used, no explanation is given for what the K-factor is. I want to know how new ratings are calculated after a game. I think I found the answer, but it is not clear, because of the emphasis on the use of the system in chess tournaments. I, too, am confused about the distributions. I understand some of it, and I can fill in the details from my own knowledge. I know what the distributions are, for example, but it's not clear why the logistic distribution might be preferred over the normal.
I would be willing to tackle some of this myself, but I am not knowledgeable in the field. Mathmoose (talk) 03:40, 18 March 2009 (UTC)
- I understand your concerns. The Elo system is general and can be used on many types of competitions. I think it depends only on two parameters that can be set. C measures how spread out the ratings are and K measures how much a rating can change due to one game or tournament. About the only other thing is whether it is applied to individual games or to a tournament. Otherwise you can pretty much imagine how it works for other things based on how it works for chess games. I think the article should be revised to address your concerns. Bubba73 (talk), 04:03, 18 March 2009 (UTC)
- If you mean that C = 400, then yes, C measures how spread out the ratings are. Else I used the constant C in the section about the Exponential variant; there it works much like the start rating for a player does in the normal Elo rating system. By the way, I find the number 400 to be a fairly arbitrarily chosen constant; for me it seems a little bit strange to have a hard coded constant built in to a general method like this. Maybe we should change the constant to some variable and mention that it is set to 400 when rating chess? --Kri (talk) 22:13, 29 April 2009 (UTC)
- From what I read, the 400 seems to have been chosen to make the ratings comparable to its predecessor, the Harkness system. There is a little about it at chess rating systems. Bubba73 (talk), 22:17, 29 April 2009 (UTC)
inflation/deflation example is bogus?
I'm not a specialist on the subject, but the deflation example is not making a lot of sense. The before and after total of points in the system, and median value are unchanged (total 6000, median 1500). So the conclusion should be, no inflation, no deflation?
(The whole section about inflation/deflation is of fairly poor quality overall really, not sure if it should be rewritten or just replaced by a mention that inflation/deflation problems can happen)
TTimo (talk) 23:08, 24 March 2009 (UTC)
- The topic is briefly discussed a few sections earlier, maybe that could be to some use when improving the section? --Kri (talk) 22:19, 29 April 2009 (UTC)
Mistake in the deflation's example
I think I've found a mistake in the deflation's example:
"It can be shown that if these four players continue playing games, without changing their chess skills, their final ratings will be
A: 1700 B: 1433 C: 1433 D: 1433,"
I think there should be numbers: 1650, 1450, 1450, 1450 respectively, as the difference in rankings of A and B/C/D needs to be 200 points (75%:25% ratio of results), while all rankings must sum up to the number of 4 * 1500. —Preceding unsigned comment added by 83.14.189.6 (talk) 03:26, 15 May 2009 (UTC)
- I think I corrected it. You are right in that the average should still be 1500. The 200 points for 75:25 ratio is not exact, and the formula gives 1740, 1420, 1420, 1420. Bubba73 (talk), 03:47, 15 May 2009 (UTC)
No, you misunderstood the idea of the "final ratings". Those final ratings are those which the players will achieve when they keep playing against each other for a longer time (i.e. many tournaments, theoretically - infinite number of tournaments). Mathematically speaking this should be a limiting case. 1740, 1420, 1420, 1420 are just numbers that the players obtain after first tournament. And after infinite number of tournaments their numbers should finish on 1650, 1450, 1450, 1450. The difference must be 200 points (or about) as this is from the definition of Elo ratings. Refer to "Mathematical details". —Preceding unsigned comment added by 83.14.189.6 (talk) 04:21, 15 May 2009 (UTC)
- According to page 31 of his book, 75% expected score comes at 193 point difference, so in a large number of games it would seem that the difference would approach 193 points. The equation gives 1740 for the better player after only 30 games, and I don't know why it is that large, except that I think the equation is a linear approximation to the true distribution. Maybe that is the problem. Bubba73 (talk), 05:08, 15 May 2009 (UTC)
- Also those calculations are done with K=32, which is on the high side. And another thing, the calculations in the example are assuming a single tournament where each plays each other 10 times. If there were 10 individual tournaments where each plays each one time (a round robin tournament), and the ratings were recalculated after each, the inflation/deflation would be less. Bubba73 (talk), 05:17, 15 May 2009 (UTC)
Deflation really a "misconception"?
I have an issue with the following paragraph:
"A common misconception is that rating points enter the system every time a previously unrated player gets an initial rating and that likewise rating points leave the system every time someone retires from play. Players generally believe that since most players are significantly better at the end of their careers than at the beginning, that as they tend to take more points away from the system than they brought in, the system deflates as a result. This is a fallacy and is easily shown. If a system is deflated, players will have strengths higher than their ratings. But if they take points out of the system EQUAL TO their strength when they leave the system, no inflation or deflation will result."
First, on a mathematical level I am not sure this holds water. A player may get a rating of 1000 after the first tournament, and then gradually improve, and ten years later this player may be of expert strength (2000). The 1000 points were gained due to improved playing strength, but they came from beating stronger players. The ratings the player had over the course of the years while he was advancing were all an underestimation of his true playing skill, that is why the rating kept going up, hence he was "underrated". The 1000 points gained when the player was in the system came from somewhere, and in the simplest of Elo systems, the only source of rating points once you are in the system is from taking points from other players. The fact that this player went up 1000 points must mean that his opponents collectively lost 1000 points. If the player now quits chess, 1000 points will be gone from the pool, aka deflation.
This is not purely my speculation and original research. I can point to the rating policy of the Norwegian Chess Federation (Google translated here), where there is a separate heading titled "Deflasjonsproblemer/Deflation Issues". That policy explicitly cites the issue of players entering the system with a lower rating than what they leave with as something which will cause a deflationary effect. The Norwegian policies of larger K for juniors is in place precisely to counteract that.
So I see two ways forward:
- Declare that the current text in this Wikipedia article is wrong, and fix the article.
- Tell the Norwegian Chess Federation that all their concerns about deflation are wrong, and tell them to fix the policy.
At the moment, I think #1 is what I would prefer, because the current text in our article is unsourced and seems to be of dubious mathematical validity, but I'd appreciate further comment. Sjakkalle (Check!) 13:55, 19 August 2009 (UTC)
- One little point - players gain points by beating stronger and weaker players. But I agree with you. That paragraph may be O.R. Somewhere I've read something authorative about deflation, but I don't remember where. The USCF also has methods to counteract deflation, so they must think it is true too. (Of course, overdoing it can inflate ratings.)
- This article has more problems than any other in the chess project, seeWikipedia:WikiProject Chess/Cleanup listing. Bubba73 (talk), 14:11, 19 August 2009 (UTC)
- If the player now quits chess, 1000 points will be gone from the pool, aka deflation. This is a complete nonsense, even if a player quits chess his rating still remains frozen in the system. For example if Kasparov would start to play again he'd start from his 2810 or what it was rating. Loosmark (talk) 14:16, 19 August 2009 (UTC)
- Yes, the points are in the system if he starts playing again, but until and unless that happens, the points are no longer "in play", nobody can get points by playing against someone who has quit. To end the question of the possibility of someone making a comeback, replace "quits chess" with "dies". Sjakkalle (Check!) 14:21, 19 August 2009 (UTC)
- Still I don't understand what exactly is your point. If the player managed to took 1000 points out of the players it only means that the other players were slightly overrated, otherwise they would have not lose the points. He did not cause deflation but simply "corrected" the rating of other players. Loosmark (talk) 15:06, 19 August 2009 (UTC)
- He has taken those points away from other players. When he was rising, he was underrated and the others could be rated accurately, yet lose points to him. Bubba73 (talk), 15:24, 19 August 2009 (UTC)
- That's two asumptions: 1) he was underrated 2) they were rated accurately. We just don't know that, in all probability his rating matched his improvements in strenght. I'd also like to point out that the very concept of being underrated is wrong, the ELO rating can't calculate somebody's true strenght (whatever that is) but rather it is mathematical evaluation of his past results. Loosmark (talk) 17:44, 19 August 2009 (UTC)
- You are claiming that players can be "slightly overrated" (a couple replies up) but that "the concept of being underrated is wrong" (same discussion, previous reply)? I don't understand this, as you seem to be contradicting yourself. Quale (talk) 18:01, 19 August 2009 (UTC)
- If a player is going from being a 1000 player to a 2000 player, at any particular time during that climb he will be underrated because the rating is a measure of past performance. He will be taking points away from players who are accurately rated. Bubba73 (talk), 19:46, 19 August 2009 (UTC)
- Sorry, I was actually addressing the comments that Loosmark made, but I didn't make that clear. I think I understand what you are saying. Quale (talk) 22:27, 19 August 2009 (UTC)
- No, I understood that. I was replying to Loosmark too. Bubba73 (talk), 22:29, 19 August 2009 (UTC)
- If a player is going from being a 1000 player to a 2000 player, at any particular time during that climb he will be underrated because the rating is a measure of past performance. I'm sorry but that is just speculation, you could equally say that there are a lot of players who are overrated and their worse results still haven't caught with them so to say. There are other problems with Sjakkalle's claim like for example there are more players entering the system than retiring, he ignores the rating floors etc. etc. etc. Loosmark (talk) 12:31, 20 August 2009 (UTC)
- I ignored the rating floors because they are not in the "pure" Elo system. I agree that rating floors do contribute to keeping the average rating up because they stop a player from losing points. Once you implement them, the entire principle that someone must lose rating points when another gains rating points is weakened. Sjakkalle (Check!) 13:47, 20 August 2009 (UTC)
- The pure Elo system is a type of moving average of performance. If the thing being measured is increasing, the moving average lags behind. Therefore the player getting better has a lower rating than his performance ability, so he is underrated. Bubba73 (talk), 16:21, 21 August 2009 (UTC)
- If a player is going from being a 1000 player to a 2000 player, at any particular time during that climb he will be underrated because the rating is a measure of past performance. I'm sorry but that is just speculation, you could equally say that there are a lot of players who are overrated and their worse results still haven't caught with them so to say. There are other problems with Sjakkalle's claim like for example there are more players entering the system than retiring, he ignores the rating floors etc. etc. etc. Loosmark (talk) 12:31, 20 August 2009 (UTC)
- If a player is going from being a 1000 player to a 2000 player, at any particular time during that climb he will be underrated because the rating is a measure of past performance. He will be taking points away from players who are accurately rated. Bubba73 (talk), 19:46, 19 August 2009 (UTC)
- You are claiming that players can be "slightly overrated" (a couple replies up) but that "the concept of being underrated is wrong" (same discussion, previous reply)? I don't understand this, as you seem to be contradicting yourself. Quale (talk) 18:01, 19 August 2009 (UTC)
- That's two asumptions: 1) he was underrated 2) they were rated accurately. We just don't know that, in all probability his rating matched his improvements in strenght. I'd also like to point out that the very concept of being underrated is wrong, the ELO rating can't calculate somebody's true strenght (whatever that is) but rather it is mathematical evaluation of his past results. Loosmark (talk) 17:44, 19 August 2009 (UTC)
- He has taken those points away from other players. When he was rising, he was underrated and the others could be rated accurately, yet lose points to him. Bubba73 (talk), 15:24, 19 August 2009 (UTC)
- Still I don't understand what exactly is your point. If the player managed to took 1000 points out of the players it only means that the other players were slightly overrated, otherwise they would have not lose the points. He did not cause deflation but simply "corrected" the rating of other players. Loosmark (talk) 15:06, 19 August 2009 (UTC)
Is there any source to back up the paragraph in question? If not, I think the entire text is original research. However, looking a bit deeper into the material, I think I have understood what the text is trying to say: that the deflation is due to the system's inability to add rating points to the pool when players improve. The example also lacks sourcing, but is at least correct in that the pure zero-sum game rating system is not able to accurately reflect an improvement to the average chess strength. However, the example does seem to contradict instead of support the text about the "misconception" of deflation due to quitting. If "player A" with his 1645 rating now quits, the average rating of the players B, C, and D will be 1452, even though they have not weakened from their proper level of 1500. Player A entered the system with 1500, and left the system with 1645. Consequently the players left behind saw their rating drop from 1500 to 1452, which is exactly the deflation effect which the text labels as "misconception". I have the following proposals:
- The example is more textbook material than encyclopedia material, and it lacks sourcing. Probably original research, and I suggest removing it.
- The text about the deflation mechanism being a "misconception" is contradicted, at the very least, by the rating policy of the Norwegian Chess Federation, and I suspect it is contradicted by much more authoritative sources than that.
- The idea that deflation is caused by the inability to track players who are improving is supported by Mark Glickman in the October 2006 interview in Chess Life, posted online here.
I think a text along these lines would be an improvement:
"In 1995, the United Chess Federation experienced that several young scholastic players were improving faster than what the rating system was able to track. As a result, established players with stable ratings started to lose rating points to the young and underrated players. As a result, several of the older players quit chess in frustration over what they considered an unfair rating decline. The current system includes a bonus point scheme which feeds rating points into the system in order to track improving players.<ref>A conversation with Mark Glickman [http://math.bu.edu/people/mg/ratings/cl-article.pdf]</ref> Other methods used to combat deflation, used in Norway for example, include using a larger K factor for young players, and even boosting the rating progress of young players who score well above their predicted performance.<ref>[http://www.sjakk.no/nsf/elosystem_index.html]</ref>"
Sjakkalle (Check!) 06:31, 20 August 2009 (UTC)
- PS. I noticed that the Mark Glickman biography was deleted as a WP:CSD "A7" candidate ("no assertion of notability"). Not sure I agree with that assesment. Sjakkalle (Check!) 06:31, 20 August 2009 (UTC)
- I don't know how notable Glickman (the person) is, but there is Glicko rating system and his system is mentioned in Chess rating systems. Bubba73 (talk), 16:24, 20 August 2009 (UTC)
- I agree the full section seems to be OR, as no sources are cited. In that respect your paragraph is much better. Maybe we should go as far as removing everything that is not sourced in this article, to get a fresh look ? SyG (talk) 10:33, 20 August 2009 (UTC)
- I support that. There's been too much unsourced material in the article for too long. I'm generally not thrilled with the opposing idea that we must take heroic measures to try to justify questionable unsourced claims found in an article. This often requires contorting sources to try to fit and leads to a lot of errors. If some editors are willing to sweep the article clean of unsourced claims and rebuild it with sources in hand as they go, I think that would be great. Quale (talk) 14:35, 20 August 2009 (UTC)
- IMO, the sections on "protecting one's rating", "selective pairing", and "inflation and deflation" have major problems with wp:OR, wp:RS, and WP:V. Bubba73 (talk), 16:21, 20 August 2009 (UTC)
- OK, I have tried tackling the "inflation and deflation" section with this edit. Hope the other sections Bubba mentioned can be taken care of as well. Any review would be appreciated. Sjakkalle (Check!) 07:47, 23 August 2009 (UTC)
- Good work so far. One thing, the article says Rating floors in the USA work by guaranteeing that a player will never drop below a certain limit. This also combats inflation.... How do these artificial floors combat inflation? It seems to me that they would contribute to inflation or prevent deflation. Bubba73 (talk), 16:19, 23 August 2009 (UTC)
- That was me messing up inflation and deflation. Thanks for spotting it, I have fixed that one. Sjakkalle (Check!) 05:57, 24 August 2009 (UTC)
- I think the USCF put in those floors to combat sandbagging, where players lose points to win prizes in a lower class. But the overall effect on ratings is not clear to me. It will tend to keep the rating of declining players artificially high, but on the other hand players who have declined below their rating will have an artificially high rating, so other players will gain more points and their rating will inflate a little. Bubba73 (talk), 14:38, 24 August 2009 (UTC)
- That was me messing up inflation and deflation. Thanks for spotting it, I have fixed that one. Sjakkalle (Check!) 05:57, 24 August 2009 (UTC)
- Good work so far. One thing, the article says Rating floors in the USA work by guaranteeing that a player will never drop below a certain limit. This also combats inflation.... How do these artificial floors combat inflation? It seems to me that they would contribute to inflation or prevent deflation. Bubba73 (talk), 16:19, 23 August 2009 (UTC)
- OK, I have tried tackling the "inflation and deflation" section with this edit. Hope the other sections Bubba mentioned can be taken care of as well. Any review would be appreciated. Sjakkalle (Check!) 07:47, 23 August 2009 (UTC)
- IMO, the sections on "protecting one's rating", "selective pairing", and "inflation and deflation" have major problems with wp:OR, wp:RS, and WP:V. Bubba73 (talk), 16:21, 20 August 2009 (UTC)
- I support that. There's been too much unsourced material in the article for too long. I'm generally not thrilled with the opposing idea that we must take heroic measures to try to justify questionable unsourced claims found in an article. This often requires contorting sources to try to fit and leads to a lot of errors. If some editors are willing to sweep the article clean of unsourced claims and rebuild it with sources in hand as they go, I think that would be great. Quale (talk) 14:35, 20 August 2009 (UTC)
Rating floors
There are two types of rating floors and they have different effects. FIDE uses a floor - only players with ratings above that floor are listed. The recent article by Sonas (referenced in the article) talks about that. The USCF uses a different type of floor - a player with an established rating will not drop below a certain level. Bubba73 (talk), 17:24, 23 August 2009 (UTC)
Benefits
On a hypothetical basis, if someone was forced to play against somone who they knew they were going to lose against no matter what, would it be more beneficial for that person to win as many games as possible and raising their ELO as high as possible before playing the game they would inevitably lose - OR would it be better to not play - keep their rating lower so they would lose less points and then recover afterwoods? —Preceding unsigned comment added by 220.239.20.242 (talk) 06:04, August 25, 2007 (UTC)
Other Gaming Mediums
It might be worth mentioning that the Elo ratings have also been applied to videogames, specifically the game Age of Empires III with the cuetech ratings based on the Elo system. These ratings are often taken in the same seriousness as the chess ratings among players. —Preceding unsigned comment added by 24.218.178.198 (talk • contribs) 18:40, 18 May 2006 (UTC)
- They've also been used in Unreal Tournament's online play rating system.WolfKeeper 17:02, 18 May 2006 (UTC)
- And Guild Wars... 91.16.138.168 (talk) 16:00, 26 July 2009 (UTC)
- Should I also note that Heroes of Newerth also has a similar rating system already in place? Ulaire (talk) 13:50, 6 December 2009 (UTC)
- And Guild Wars... 91.16.138.168 (talk) 16:00, 26 July 2009 (UTC)