Wikipedia:Wikipedia Signpost/2008-06-23/Dispatches

Dispatches

Dispatches: How Wikipedia's 1.0 assessment scale has evolved

Two different grading systems: "importance" and "quality"

Most users will have seen the talk page banners that indicate what stage an article has reached in the writing process: {{A-Class}}, {{B-Class}}, {{Start-Class}}, or even {{Stub-Class}}. They may also have noticed that many articles are graded according to their importance: from {{Low-importance}} to {{Top-importance}}. These rankings may seem cryptic to new or occasional editors, and even seasoned editors may not have given much thought to the role of these templates in Wikipedia's quality control process. Moreover, there is often confusion about the relationship between this assessment scale and the processes that determine good articles (GA) and featured articles (FA).

Importance scheme

Wikipedia's importance scheme aims to determine the importance attached to an article's topic by its related WikiProject(s) – from those that are "extremely important, even crucial", to those that are "not particularly notable or significant". Thus, the same topic may be more important to one project than to another, and as such can receive more than one assessment on the importance scale. Powderfinger, for instance, has been rated of "top-importance" (priority) by the Powderfinger WikiProject, "high-importance" by WikiProject Australia, and "mid-importance" by WikiProject Alternative music.

Quality assessment

The encyclopedia's quality assessment scheme is more complex, because it has to address many facets of article quality, such as completeness, layout and language. Since a June 2008 poll added a new "class", WikiProjects will begin using five levels for quality assessment:

Stub – a basic description in a paragraph or two;
Start – an article that is developing, but is quite incomplete and lacks reliable sources;
C – an article that is moderately complete, but lacks sources or contains cleanup tags;
B – an article that is mostly complete, without POV or other major cleanup issues, but which requires further work to reach Good Article standards;
A – an article that is organized well and is essentially complete, but needs style issues addressed before submission as a featured article candidate).

Critically, such "importance" and "quality" are not necessarily correlated: one article might be of "low importance" and "A Class" (see Clea Rose example); another might be a "top-importance" stub (see Judiciary of Australia example).

At press time, the new C-Class still needs to be fully enabled in the WP1.0 bot and elsewhere. This new classification has effectively raised the standards of quality required to attain B-Class. Other classes are included, such as FA-Class and GA-Class, which are not WikiProject-based, as are descriptive classes such as "Portal-Class"; for a complete list, see below.

Developing the scale

The original purpose of the assessment processes was twofold: to facilitate the production of an offline release, and to assist WikiProjects in organizing their articles, by categorizing the quality of articles as simply, accurately and comprehensively as possible. A test CD (Version 0.5) was released by the Version 1.0 Editorial Team in 2007, and a larger DVD release (Version 0.7) is planned for the third quarter of 2008. The gargantuan task of sifting through 2.4 million articles (as of June 2008) would be impossible with just a handful of team members. To solve this problem, a standardized baseline had to be developed so the task could be distributed among the editors who comprise Wikipedia's base.

Instead of developing a brand-new scale, the Version 1.0 Editorial Team adopted existing guidelines, and modified them for greater scalability. The assessment scheme in use across the community was originally developed at the Chemicals WikiProject as a method of tracking the completeness of the articles in their Worklist (a set of around 400 articles on which the project decided to focus its effort). By late 2005, the scheme was proposed as part of the article selection process at the 1.0 project. The Work via WikiProjects sub-project was started with the aim of having projects provide subject-expert assessments, which the 1.0 team could then put together to produce a broad selection of articles from the encyclopedia. The initial method was to request manually written lists of the top articles from each project; this did generate around 3,000 assessments and provided some suitable articles, but was very labor-intensive. In April 2006, there were about 1.1 million articles in Wikipedia, so continuing with the older method would have proved ineffective. At about this time, a new category-based, bot-assisted system was introduced; this gave projects valuable tools for their work (lists, a log and a statistics table) and provided the 1.0 group with a much more comprehensive list of articles. Tagging an article (via the talk page) is straightforward, and so the scheme rapidly grew to encompass 30,000 articles by August 2006, and to around 1.3 million articles in June 2008. The following table shows the aggregate of all the assessments by more than 1300 participating WikiProjects and task forces throughout Wikipedia:

All rated articles by quality and importance
Quality	Importance
Quality	Top	High	Mid	Low	???	Total
FA	1,582	2,515	2,424	1,972	182	8,675
FL	180	702	772	695	100	2,449
A	372	684	787	582	92	2,517
GA	3,269	7,428	14,879	19,889	1,773	47,238
B	17,176	33,272	55,091	71,179	23,768	200,486
C	17,171	54,903	137,396	318,247	93,165	620,882
Start	18,544	93,119	419,426	1,650,355	415,733	2,597,177
Stub	4,256	31,297	277,283	2,812,676	759,480	3,884,992
List	4,947	17,469	54,832	203,601	81,834	362,683
Assessed	67,497	241,389	962,890	5,079,196	1,376,127	7,727,099
Unassessed	113	407	965	16,532	392,495	410,512
Total	67,610	241,796	963,855	5,095,728	1,768,622	8,137,611

About this table

Although the assessment scheme is only approximate, it allows users to broadly gauge article quality, and WikiProjects to keep track of their articles. When combined with the importance assessment scheme (which is not universally used), projects can see which of their key articles need the most work. The Wikipedia 1.0 project is now able to integrate the information from all of the WikiProjects and make selections of articles for offline release.

Quality
FA
FL
A
GA
B
C
Start
Stub
Needed
Other classes
Future	Current
List	Redirect
Disambig	Template
Category	File
Portal	NA

Note: The chart is generated from WikiProject templates, and represents the scheme used until June 2008. There are currently 6645 featured articles, but some wikiprojects include featured lists in their featured article tally, so the number of featured articles in the chart is overstated. On the other hand, there are currently 40710 good articles, but as some articles have no WikiProject templates or the templates are not updated to include GA, the number of good articles in the chart is understated.

Criticisms and changes

Although the scheme is generally working, there is a steady trickle of criticisms and suggestions. The scheme is designed mainly for WikiProjects to assess article content and completeness, but GA and FA levels are included as "cross-references" to Wikipedia-wide quality assessment processes. This has been a regular source of confusion, since GA and FA status are not awarded by WikiProjects.

The Version 1.0 Editorial Team recently reevaluated the number of levels for project-based quality assessments. Until now there have been four (Stub, Start, B and A), but a recent poll indicated support for expanding this to five. To be useful across the community, the system must be simple and straightforward, so that all editors in all projects can use a common system for assessing articles. A greater number of assessment levels may yield a finer analysis of quality, but this is meaningless if the assessments cannot be performed to this level of detail. A majority of those polled believe that a fifth level (C-Class) will give a more refined scheme without seriously compromising reliability. The C-Class level will be introduced in the coming weeks.

The 1.0 team is testing a bot for automatic selection of articles. This involves evaluating the importance of an article using four parameters: a manual assessment by the project, the number of page hits, the number of foreign language "interwiki" links, and the number of links into the article. These factors are weighed along with the quality assessment to produce a selection of the most important "decent" articles for release. Initial test results look promising, but require an improved balance between WikiProjects. This new method should allow the 1.0 team to easily make regular general releases, and individual WikiProjects should be able to produce their own offline releases on paper, CD or DVD.


Also this week: From the editor Board elections WikiWorld News and notes Dispatches Features and admins Technology report Arbitration report

(← Previous Dispatches)	Signpost archives	(Next Dispatches→)

In this issue

23 June 2008 (all comments)

Dispatches

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

Oddity

Here's a (small) oddity. I happened to find myself at Talk:Henry_Ford and I note that all of the projects rate it B-class, but the version 1.0 team rates it A-class. I'm thinking that's a mistake... --jbmurray (talk • contribs) 21:20, 14 June 2008 (UTC)[reply]

At the time the {{WP1.0}} tag was added, WikiProject Michigan called the article an A-Class article. Most likely, the class parameter wasn't updated after it was downgraded. Titoxd^{(?!? - cool stuff)} 07:29, 15 June 2008 (UTC)[reply]

Is someone going to finish the description of the Grading scheme? SandyGeorgia (Talk) 07:31, 15 June 2008 (UTC)[reply]

Can you explain what exactly you want? I though that most of the article was about the grading scheme....! Walkerma (talk) 04:27, 16 June 2008 (UTC)[reply]

A one- or two-sentence summary that describes each "grade" in the scheme. "A typical stub is x, while a start class article also has y and B-class includes z." (That is, take into account that most editors reading this page will never have dealt with this scheme; I haven' engaged it much beyond what to do when an FA is defeatured, and if I have to assess anything as to stub, start, or B, I'll have to go read the whole thing. A brief summary for the uninitiated is needed.) SandyGeorgia (Talk) 04:35, 16 June 2008 (UTC)[reply]

Thanks! Now done. Is it ready now? Walkerma (talk) 16:58, 16 June 2008 (UTC)[reply]

When does the poll close? Will you add that? The Signpost always publishes several days late, so that can still be added, and ... it's a Wiki ... never done :-)) But it looks great so far; I now have a better understanding what assessment is about. SandyGeorgia (Talk) 17:24, 16 June 2008 (UTC)[reply]

In theory we will close it at 0300h UTC on June 18th. At present the vote (for the new C-Class) is running around 4:3 in favor, and from the comments I'd say the overall consensus is probably running at a similar ratio. We can extend the poll if the Signpost comes out in time, but I'd like to give advance warning. Having done the earlier publication date, I think I'd like to use the Signpost to promote the poll if possible, but if necessary we can just use it to promote the result of the poll. When will this be published? Walkerma (talk) 17:35, 16 June 2008 (UTC)[reply]

The Signpost is published ... whenever Ral315 publishes it. Sometimes on time, sometimes three days late, sometimes five days late. Just keep the page as updated as you can. SandyGeorgia (Talk) 17:47, 16 June 2008 (UTC)[reply]

Is the grading scheme really a common system?

Some people think the A,B,Start,Stub classes are free for the WikiProjects to use or not. Others think that they should be standard and have the same meaning across all projects. Based on the history of the Version 1.0 project, I think the latter interpretation is correct. But the way things are going now, the grading scheme has been co-opted for the projects' own use and the Version 1.0 project became an incidental thing. --seav (talk) 04:47, 17 June 2008 (UTC)[reply]

I'm sorry, but I can't decipher what you're asking. SandyGeorgia (Talk) 17:02, 18 June 2008 (UTC)[reply]

The 1.0 project set up the system and still maintains the bot, and we oversaw the recent C-Class poll. It remains the coordinating project for assessment. It was expected that the projects would adapt things to their own needs, though it is obviously better if "B-Class" (say) means the same to all. The 1.0 project is using the data from all the assessments to compile a DVD release for this autumn. Walkerma (talk) 18:55, 21 June 2008 (UTC)[reply]

Poll results

I just glanced for the first time; the poll results appear at a quick glance to be mixed and almost an even split, particularly after factoring in neutrals, so unless I'm missing something, I suggest we adjust this wording to reflect split opinion, and explain why it was split (summarize the pro and cons):

The poll results indicate a good deal of support for a fifth level (C-Class), with many believing it will give a more refined scheme without seriously compromising reliability.

SandyGeorgia (Talk) 17:00, 18 June 2008 (UTC)[reply]

Now ready to publish?

I updated the effects of the C-Class issue as requested (although the goalposts are moving as I type this!). Regarding the examples of Top-Stub and FA-Low, such examples are both rare and hard to find; if you find a well-known Top-importance article like Star Wars/WP:Films, it's unlikely to be a Stub, and a Low-Importance article in any project is not well-known by definition. But I think most people will understand what the Judiciary of Australis is, and that it's important for WP:Australia, and a click on the link will explain more.

Do you think we need to elaborate on closing of the poll? There is a link to the relevant section, but we can copy over some of that section into the Dispatch if you think it's needed. My only concern is that superficial coverage of a long/complex debate may invite drive-by criticisms from those who weren't involved, and at this point we are committed to the change anyway. (I spent around 12 hours studying every comment and weighing the factors before I declared the final decision.) What do you think?

Is it ready for publication now? Walkerma (talk) 19:15, 21 June 2008 (UTC)[reply]

I think it's in good shape now; this is the latest I've ever seen the Signpost, so I don't know what's up with publication. SandyGeorgia (Talk) 19:23, 21 June 2008 (UTC)[reply]

The Signpost is written by editors like you – join in!

Home

About