ELECTRONIC T-NOTES


CHESSBASE USA'S WEEKLY ON-LINE NEWSLETTER


FOR THE WEEK OF JULY 7, 2002


ELO LISTS IN FRITZ -- PART TWO

by Steve Lopez

In last week's ETN, we discussed the Elo rating system as the prelude to seeing how an interesting feature of Fritz (and its "sister" playing programs) works: the ability to retroactively assign ratings to unrated players in a database. This week we'll examine the "nuts and bolts" of how to use it. The Help file in Fritz does a pretty fair job of describing the Elo list feature, so we'll start by quoting from the file:

Unix inventor and computer chess pioneer, Ken Thompson, has developed an algorithm which allows one to create an Elo rating list out of an arbitrary set of games. Any database of games can be treated as a gigantic tournament. Each player gets the same initial rating (e.g., 2400). After evaluating the results of all games in the database, each player gets a new rating. Using these new values, the games of the database are rated again. This is done over and over again, until the ratings of all players stabilize and the values remain constant.

OK, so what does this mean in plain English? You feed Fritz a batch of games. Each player is assigned a starting default rating of 2400. The program then runs through the database, looking at the results of each player's games, and adjusting his or her rating up or down depending on the game results. This works just like an actual chess tournament, in which each participant's rating rises and falls depending on well or poorly he/she does. These ratings get repeatedly cross-indexed by the program, meaning that each player's rating is going to depend on other player's ratings (and you can refer back to last week's ETN for a link to a page I wrote describing how the rating system works), and these ratings are adjusted over and over until their initial fluctuations "normalize" (that is, the program makes a pass through the database and no more adjustments are required).

In theory, you could actually do this calculation yourself, by hand, but I'd hate to think how long it would take to perform on a huge database containing thousands of players. The Fritz program can do this in a matter of minutes or hours (depending on the size of the database and the processor speed -- we'll talk more about this later on). Going back to the Help file, we read:

The Elo management in our chess program was implemented mainly in order to evaluate engine tournaments. But it is very interesting to use it on human results as well. It can also be quite exciting to create Elo lists for historical game data. For the system to work properly, it is absolutely critical that the players’ names are completely unified.

This last point is a major one. You can't have a player in the database who has his name spelled two or more different ways -- this will screw up the results. In the databases sold by ChessBase, the player names are generally standardized. But if you have a database you've created by hand or put together from Interrant downloads, you might need to edit the player names. You can use ChessBase to globally make these corrections to standardize the names; this was discussed many, many moons ago in the June 22, 1997 issue of ETN.

When you're ready to create an Elo list for a database, open that database in Fritz (or one of the other playing programs) by hitting F12 in the main chessboard screen (to go to the game list window), then go to the File menu, select "Open", then "Database", and use the Windows file select dialogue to go to that database's folder; then select the database by double-clicking on it. If it's a database you've recently used, you can use the shortcut of selecting it from the pulldown window on the right side of the Toolbar in the game list window.

The next step is to highlight all of the games in the game list -- the keyboard shortcut for this is CTRL-A. Next go to the Tools menu, select "Rating", and then "Create Elo start list". A new dialogue appears which lets you name the file (which will end in the extension .elo) and select the folder it will be stored in. Once you've done this, the process begins. You'll see a progress bar appear on the screen; it will display how much of the work (in per cent) has been completed and the approximate amount of time required to finish the job. Take the latter with a pillar of salt -- you'll see this figure fluctuate wildly at various points in the process.

Note that if you're trying to run this feature on a very small database, Fritz will tell you that the sampling of games is too small and will "kick out" of the process.

The big question at this point is, "How long will it take Fritz to generate an Elo list?" That's going to depend on the size of the database, the speed of your processor, and how much RAM your computer has in it. I generated a few lists using a Pentium III 800 MHz machine with 512 MB RAM. An Elo list for a few hundred computer vs. computer games took a few seconds. A database of about 180,000 games required about five minutes. Creating an Elo list for Mega Database 2002 (with over 2 million games) required about five hours.

To be honest, I don't know that I'd try creating a list for a huge database on anything less than a Pentium III. I once ran this feature on an 875,000 game database using a Pentium II and stopped the process after 12 hours -- it was nowhere near finished when I killed the process (and, by the way, there is a button that allows you to stop the process at any point; be aware, though, that you can't later restart it from the point at which you stopped it. If you terminate the process, it will have to start all over again if you later do it a second time).

When the process completes, you'll see the following dialogue appear:

This will display the Elo list in the default mode of highest Elo to lowest. However, there's another step you should take to finalize the generation of the Elo list. Because each player starts with a default of 2400, the displayed rating may actually be a bit higher or lower than the ratings of players who have an established Elo. To correct this, use the "Gauge" button to raise or lower the ratings by a set amount. For an example, we once again turn to the Fritz Help file:

The Thompson algorithm calculates relative playing strengths. It assumes that the average rating of all the players in a rating list is 2400. This will normally not be the case. To get correct practical ratings, it is necessary to rescale or gauge the list by adding or subtracting a certain percentage from each player. The best way to do this is to take a very stable player with a known rating (our favourite is John Nunn, who has been 2600 for years), and add or subtract an offset to make the rating equal his real rating. The program will then adjust the ratings of all other players accordingly.

When I ran the Elo list feature on Mega Database 2002, I noticed that John Nunn had an Elo of 2520 listed next to his name. So I clicked "Gauge" and typed "80" in the dialogue to add the difference in his rating as described in the Help file. After a few moments the program had adjusted all of the ratings by the offset of 80 points.

There are a number of interesting things you can do with this list once it's generated. If you'd like to see the list of players sorted alphabetically (rather than in order of rating), click the "Sort A-Z" button; this makes it easier to find a particular player in the list. "Sort games" displays the list in order of the number of games each player was involved in; note, though, that gauging the list (see above) elimates this information. "Sort Elo" returns the display to the default, which sorts the players by the generated performance rating (from highest to lowest).

If you want to eliminate a player from the list, highlight his name and click "Delete". You can send the whole list to the Windows Clipboard by clicking the "Clip" button; you can then paste the whole shebang into a spreadsheet or word processor document (by hitting CTRL-V in your other application). The "Tournaments" button will show a list of the tournaments that were used to create the list, while highlighting a player's name and clicking "Career" shows a list of events that particular player participated in.

If you're more interested in current players rather than historical ones, click the "Clean Up" button. Any player who has been inactive for more than six years will be dropped from the list. The "Show all" check box will lengthen the list to display every player whose games were used in generating the list. Normally any player who didn't play enough games to get an accurate rating is not shown; "Show all" displays everyone, regardless of how few games that player participated in.

On the other hand, if you're only interested in players who were active before a certain year, you can check the "Year limit" box and type a year into the dialogue. This will drop from the list any players who were active exclusively after that point.

There are also some additional commands that are accessed by going to the Tools menu and selecting "Rating". If you want to open a particular Elo list that you've previously created, select "View Elo list" from the submenu. If you later add games to the database and want to update an existing rating list accordingly, highlight the new games in the game list and select "Add to Elo list"; the results of the new games will be factored into the list (make sure, though, that you've first loaded the proper Elo list by using the command "View Elo list").

Two other features should be used with caution. "Set Elos in games" will take the ratings displayed in the rating list and add them to the database -- the numbers will then be displayed next to the players' names in the game list. Note, though, that this will overwrite any existing Elos in the game list, so you'll lose the actual historical Elos for players with ratings displayed after 1971. Conversely, "Erase Elos in games" will rip out all Elo information from the headers of every game in the database regardless of whether they were added from an Elo list or contained the information as a prior part of the database.

Obviously, you'll want to use these last two features with care; you can really screw up the header information in a database by injudicious use of these commands. Why are they included? We can find that answer by going back to the Help file:

The Elo management in our chess program was implemented mainly in order to evaluate engine tournaments.

As mentioned in past ETN issues, there exists an entire "subculture" of the chess community whose main interest lies in pitting chess programs against each other and evaluating the results. The Thompson algorithm was introduced primarily as a means of giving these folks a quick, easy way of creating performance ratings for databases of games from engine vs. engine matches (see ETN for June 9th, 2002) and tournaments (which we've not yet examined in ETN). A Fritz user can have his engines play out a large number of games, create an Elo list for that database, and then drop the engines' performance ratings into the game headers with just a mouse click by using "Set Elos in games". Conversely, he could also rip the ratings out of the headers by using "Erase Elos in games". But these are dangerous procedures to use on databases of historical games, because you'd lose any actual historical ratings that were previously part of the game headers. The only way to get them back would be to replace the altered database with a backup copy of the original (assuming you have one -- big "if").

Aside from generating performance ratings for computer programs, the other use of this feature is to generate performance ratings for players who were active before the birth of the Elo system. The byproduct of such a list is the generation of arguments (and, if we're very lucky, some highly entertaining fistfights). "How strong was Player So-and-so [who was active pre-1970's]?" is a question that's always good for a heated argument, as well as "Who was better -- Player A or Player B?" The Elo list feature can provide us with some extra "grist for the mill". However, before we proceed with this article, please check all weapons at the door and anyone throwing a chair will be immediately expelled from the room -- fists only, please...

We spot the immortal Greco at the top, with a performance rating of 3014. We have a problem with this -- checking the database we see that all of his opponents are listed as "N.N.", meaning that we don't know the player's name (this is a standard for chess gamescores, by the way, in use for many, many years). Unfortunately, all games in the database that are credited to "N.N." are assumed by the Thompson algorithm to be the same player, so it's safe to assume that Greco's rating has thus been artificially inflated by his wins over a "player" who was active over a 400+ year span of chess history.

Assuming that we drop Greco from contention, the top spot belongs to Garry Kasparov at 2852. The next player on the list also warrants investigation: a Canadian named H. Jung who was active throughout the 1990's and weighs in at a performance rating of 2786. Some database research also uncovers the source of this anomaly: all of his games in which he played against someone with an established FIDE rating were wins for Jung. We can assume that an established FIDE rating indicates that each of those players has some other games in the database and these get factored into the performance rating calculation for Jung -- hence the inflation. If you play a lot of chess on the Java-based chess servers and know how the Elo rating system works, you're likely already ahead of the game here in figuring this one out. I once saw a young lady playing on the Yahoo server who had a rating in five digits (something like 65000) and had never won a game! Obviously, she'd only played against players with similarly inflated ratings so that, win or lose, her performance rating began and stayed in the five-digit range. That's a similar case to our friend Jung here -- he'd done very well against other players with established high ratings in the database, so his performance rating was artificially inflated.

So we're already seeing a reason why I call the performance ratings generated by the Thompson algorithm "grist for the mill" rather than a definitive answer to the nagging question of the strongest player's identity: the algorithm works quite well, but the results depend on the raw information provided by the database. Sometimes the results can be skewed by non-standardization factors (like our example with "N.N." above -- the algorithm has no way to know that the many instances of games listing "N.N." in a database weren't all played by the same person) or by circumstance (Jung's rating inflated by his success against players with FIDE ratings).

This is why I honestly prefer to use the algorithm on a database of computer games or on smaller databases in which I can easily weed out the various anomalous factors. If you run this on a historical database, please look at the results as being just for fun. However, just so the boxing fans among us don't go away disappointed that they didn't see any fights break out, here are some "discussion points" that are certain to start the fur flying.

Back in Part One, we mentioned the oft-asked question, "How strong was Paul Morphy?" The (imperfect, due to a few anolmalies) Elo list I generated lists the American at 2599, while his English rival Howard Staunton weighs in at 2479. I'll ask my countrymen to please keep the woofing to a minimum -- we'll all get together and celebrate in the hotel bar later.

Capablanca vs. Alekhine -- hmmm, that's a good one. We all know that Capa lost the world championship title to Alekhine in 1927, but quite a few folks in the Cuban's figurative camp maintain that Capa didn't take the match seriously enough and didn't prepare properly. So how do their performance ratings stack up? It's Capablanca's 2601 against Alekhine's 2569 -- too close to call, but I'll bet that Richard Reti could have beat either one of them at arm wrestling.

Let's compare other contemporaries in the world championship debate: Vera Menchik, who was women's world champion at the same time that Alekhine had the men's crown. We've already seen that Alekhine's performance rating was 2569. Menchik, by comparison, comes in at a paltry 2303 -- but I'll bet that she could have beaten Alekhine at arm wrestling.

Aron Nimzovich gets disregarded by a lot of people -- a friend of mine recently referred to him in print as a "lesser light" (which kind of hacked me off, but the writer's a pal so I didn't make a big deal out of it). Nimzo comes in at a respectable lifetime 2524 -- hardly a "lesser light", especially from a guy who used to stand on his head when it was his opponent's turn to move.

Reuben Fine, a favorite among American chess readers: 2564. Yeah, baby! But better than Nimzovich? Hmmm.... And while we're duking it out on the subject of American chess writers, let's compare three guys who often get dissed in chess newsgroups: Eric Schiller (who recently got ripped [hilariously] by Alex Dunn in the Correspondence Chess News) 2319, Fred Reinfeld 2390, and Bruce Pandolfini (who, with all due respect to the other gentlemen, writes rings around either of them) is missing in action because he didn't have enough games in the database to show up in the list (hey, major props to Bruce, though, for writing Russian Chess -- that book made me a permanent member of his fan club).

And, just for chuckles, BYE gets a rating of 1734 (not bad for never playing a game) and two players named "Schmuck" are at master level (2237 and 2301). Joachim Patzer (could this be the "Joe Patzer" we're always hearing about? But if you pronounce it right, shouldn't that be "Wah" Patzer? Many questions, no answers) has performed at a 2144 level. This makes me a sub-Patzer and less than a Schmuck. Now that I'm thoroughly depressed, I'll stop typing now and go have a good cry.

Until next week...awwww, what's the use? I'll see you then if I haven't jumped off a bridge.

You can e-mail me with your comments, suggestions, and analysis for Electronic T-Notes.


Click here to return to the main Electronic T-Notes page.




© 2002, Steven A. Lopez. All rights reserved.