Wikipedia:WikiProject History Merge/Reading the Reports
AarghBot produces reports of what it believes are cut-and-paste moves. AarghBot searches for a particular type of cut-and-paste move: it searches for cut-and-paste moves where the author cut the text from one page, pasted it into another page, then turned the original page into a redirect to the new page that they just created.
Each row represents one potential cut-and-paste move. Each row contains the following columns:
- Index is simply the order in which the bot found the cut-and-paste move. It has no significance other than to give you a handy mechanism for returning the report to its original order (should you find yourself clicking on a column to sort it).
- Source is the page from which the text was cut.
- PreID is the revision ID of the revision from which the text was cut. Clicking on it will show you that revision.
- Predate is the date/time that the source page was turned into a redirect. It is formatted as YYYY/MM/DD HH:MM:SS (in 24-hour format) so that it can be sorted easily from oldest to newest (or vice-versa).
- Destination is the page where the text was pasted.
- PostID is the revision ID of the revision where the text was pasted. Clicking on it will show you that revision.
- Postdate is the date/time that the destination page became an article. It is formatted in the same way as the predate for the same reason.
- Diff score is a number that represents how different the PreID and PostID revisions are. Higher numbers mean that there are more differences between the two texts; lower numbers mean that the texts are more similar. A diff score of zero means that the two revisions are identical. The bot automatically filters out results that are higher than a certain number (currently set at 15%), so the diff score will never be higher than this.
Example 1
[edit]Index | Source | PreID | Predate | Destination | PostID | Postdate | Diff score |
---|---|---|---|---|---|---|---|
44 | Dallas Center for the Performing Arts | 314125683 | 2009/09/15 10:38:30 | AT&T Performing Arts Center | 314125916 | 2009/09/15 10:37:20 | 8.70% |
The entry above tells us that the text from Dallas Center for the Performing Arts was cut and pasted into AT&T Performing Arts Center at 10:37am on 15 September 2009. Dallas Center for the Performing Arts was then turned into a redirect (to AT&T Performing Arts Center) about one minute later. The diff score of 8.70% means that there are some differences between the text that was cut and the text that was pasted. If you click on the diff score, you can see that the differences are due to the author changing the name of the venue in the new article.
Example 2
[edit]Index | Source | PreID | Predate | Destination | PostID | Postdate | Diff score |
---|---|---|---|---|---|---|---|
55 | Induced hypothermia | 222548955 | 2008/06/29 17:59:57 | Therapeutic hypothermia | 222402456 | 2008/06/28 23:27:31 | 0.00% |
This entry tells us that the text from Induced hypothermia was cut and pasted into Therapeutic hypothermia at 11:27pm on 28 June 2008. The texts are identical -- the author didn't make any changes to the text. The entry seems to indicate that it took another 18 1/2 hours for Induced hypothermia to be turned into a redirect to Therapeutic hypothermia; however, if you look at Induced hypothermia's edit history, it was actually turned into a redirect one minute after the new page was created -- some edit warring occurred afterwards. (The bot works through the source page's history from newest to oldest, so it found the revision immediately before the page was turned into a redirect and assumed that this was the revision from which the text was cut.)