Jump to content

Wikipedia:History merging

From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:CPM)

In the early days of Wikipedia, renamings took place manually, using cut and paste, before the move page function was enabled for non-administrators in August 2002.

Cut-and-paste moves still occur today because of unfamiliarity with the move function, unawareness that attribution is necessary, or when the move function fails (e.g., because the target has history) and people don't know to use the Requested moves forum to start a move request.

When a cut-and-paste move is done, the page history of an article or talk page can be split among two or more different pages. This is highly undesirable, because we need to keep the history with the content for copyright reasons. (See Wikipedia:Copying within Wikipedia.)

In some circumstances, administrators and importers can fix this by merging page histories, using the procedure given below.

When to request a histmerge

An example of an inappropriate cut/paste pagemove

A history merge is required for attribution purposes, as attribution is lost during a cut/paste page move where there are multiple editors at the old page. In the image shown, it appears as if the user Thegreatrebellion had created the entirety of the added text at Syed Saddiq, when the reality is that there were contributions from over 200 editors at the previous page name of Syed Saddiq Syed Abdul Rahman.

While this is not an exhaustive list, any pages meeting the below criteria may be eligible for a histmerge:

  • There are several editors in the page history at the original location
  • The editor who blanked and/or redirected the page used an edit summary such as "move to <new page name>"
  • The new page was originally a redirect and was overwritten by the "new" article

When not to request a histmerge

New editors are often unaware of the ability to move pages (or are unable because of new-account restrictions), and will thus copy/paste a draft they have been working on into the article space. Similarly, a New Page Reviewer may move a new article to the Draft space and the original editor will simply recreate it in the Article space. In both of these situations, if the original editor is the only one that has contributed content to the pages, a history merge is not necessary because there are no attribution issues (only one editor has written all of the content).

If trivial edits are made by other editors, such as maintenance tags or categorisation, and these edits are not transferred over by the primary content author, a history merge is not needed.

Instructions for tagging a page for history merging

  1. Place {{History merge|NAME OF PAGE THE ARTICLE WAS CUT FROM}} at the new location, where the pasting was done. The page will appear in the hidden category Candidates for history merging.
  2. Consider notifying the user of the issue on their talk page, perhaps using {{subst:uw-c&pmove}}.

In cases where additional edits were made to the original version after the copy-and-paste and which the additional edits can all be safely discarded (e.g. WP:WPAFC-related templates, edits which were reverted, etc.), place {{History merge|NAME OF PAGE THE ARTICLE WAS CUT FROM|reason=|details=}} at the new location as described above. Fill in the two parameters as needed for this particular situation (see {{history merge}} for an example).

If there are no changes since the copied-from revision in either the original page or the pasted-to page, consider tagging the pasted page for temporary deletion using {{db-copypaste}} (see WP:Speedy deletion#G6), and then do a proper page move. Special:ComparePages or a similar tool should be used to verify that no changes have been made.

In more complex cases (explained below), please leave a description of the problem at Wikipedia:Requests for history merge.

Parallel versions

The ideal situation for a history merge is when an editor copies and pastes all content from one page to a brand new page, and then the old page does not receive any more edits. In other words, where the first page's history stops, the second page's history begins, and there are no overlapping diffs.

Users sometimes send in an ill-advised history-merge request after the two pages involved have been text-merged. If the two pages have separate origins and simultaneous separate parallel histories before they were text-merged, they should not be history-merged, as that would shuffle the parallel editing histories together in one list and make a mess. There is an example in this edit of page Clemson Tigers football. There is an example with 5 incoming pages in this edit of page Wikipedia talk:WikiProject Emo. The best thing to do would be to use the {{Copied}} template and place it on the source and/or destination's talk page, in order to meet the copyright attribution requirements of Wikipedia:Copying within Wikipedia.

Repair process (for admins)

Using the MergeHistory special page

Administrators can use a special page, Special:MergeHistory, to perform history merges. It differs from the manual methods, as follows:

  1. It automatically detects the latest version of the source page which is older than the oldest version of the target page, and won't allow the user to move later revisions. This feature is good if the source page eventually became something else, but can be bad if the target page had started out as a redirect to the source. When a redirect is blocking a full MergeHistory merge, the redirect and any older edits will need to be either deleted or merged to another redirect. Deletion and restoration of pages with lengthy edit histories is very time- and resource-intensive, and administrators are not allowed to delete pages with more than 5000 edits in their history. An easier option in these cases may be to history-merge the redirect and any earlier history to another redirect that was created later. See § Clearing away merge-blocking redirects.
  2. The user can, however, tell it to only move earlier revisions than that – it is possible to select the latest revision it should move.
  3. It doesn't mix deleted and non-deleted versions of the target page.
  4. It retains any protection the target page may have.
  5. It doesn't create a new revision of the old page.
  6. If the user moves all non-deleted revisions of the source, a hard redirect is automatically created. This can't be overridden.
  7. The logs for this action aren't in the move log - they're in a separate log.

Clearing away merge-blocking redirects

  • To clear a blocking redirect by deleting it:
    1. Check for deleted history at the target, and take note of all deleted edits there
      • WARNING: Beware the co-mingled revisions issue. WORKAROUND:
        1. Move the page to draft: namespace before deleting it, with the rationale: "history-merging process".
        2. Restore the oldest (redirect) revision(s) – these would have been deleted by a regular move when it "moved over the redirect"
        3. Move the page with the oldest (redirect) revision(s) back to mainspace, then delete it (adding to the already existing deleted revisions)
        4. Restore the remaining history and move it back to mainspace
    2. Delete the target page with the rationale "setting up for a history merge"
    3. Restore all but the previously deleted edits and the oldest (redirect) revision(s) – these would have been deleted by a regular move when it "moved over the redirect"
      • Deletion and restoration operations often time out with errors on pages with long edit histories. Simply try the delete or restore again; it virtually always succeeds on the second try
      • Now MergeHistory can do the merge; this technique avoids making a new edit that needs to be reverted, which happens when the source is moved to the target
  • To clear a blocking redirect by history-merging it:
    1. Find other redirects to the target page using Special:WhatLinksHere, while hiding links and transclusions: What redirects here Example: Pages that redirect to "Yasser Arafat International Airport"
    2. Find an appropriate redirect, whose oldest revision (creation date) is newer than the most recent revision of the redirect history that needs to be cleared
    3. Merge the blocking redirect history to that redirect. Example: Special:PageHistory/Gaza Airport. The April and July 2005‎ revisions were merged here from Yasser Arafat International Airport (log)
      • Now MergeHistory can do the merge; this technique avoids making a new edit that needs to be reverted, which happens when the source is moved to the target
  • Instead of finding redirects, you can make a temporary page (for example, in draftspace), merge the blocking redirect history there, and then delete the temporary page when you're done.

Manual process

Warning: this procedure may only be undone by spending quite silly amounts of time. To undo a merge, see below. Do not do this if you're not sure what you're doing.

An easy case

Steps for a simple case
Steps for a simple case

The following procedure merges the page histories in the case of a hypothetical example:

Suppose Alabama/History (old title) was the only article on that subject, and that the article developed in the course of a number of edits, until a decision that History of Alabama (new title) was a better style of name for the article. Suppose further that for whatever reason, the contents of the old article were

  • cut from the old article,
  • replaced in it with a redirect to the new title, and
  • pasted into a newly created article bearing the new title.

(That is, the move tool was not available or not used to simultaneously transfer the Wiki text and the history of edits to the new title.) And suppose this replacement (new-title) article develops further and reflects the new history of these further edits. Our goal is to graft the (old) edit history from Alabama/History (article with old title) onto the new history in History of Alabama (article with new title) where those partial histories can complement each other. The process is as follows:

  1. Move Alabama/History to History of Alabama, using the move tool. The admin approves deletion of History of Alabama to allow the move. (Now the old versions are the whole of the new title's history.)
  2. Undelete the History of Alabama article, by
    1. Viewing the Page history,
    2. Linking via "View or restore ... deleted edits?", and
    3. Clicking on "Restore". (Now the new title's history has both the old and new versions, including an extra copy of the most recent version of Alabama/History, created by the move tool.)
  3. At this stage, History of Alabama will only show the text "#redirect History of Alabama" (assuming a redirect was the most recent version of Alabama/History, the History of Alabama page will now be showing whatever the most recent version of Alabama/History was). The last step is to revert to the last version of History of Alabama from before the move, by
    1. Clicking "Page history" on History of Alabama.
    2. Make a hard reload (Shift+Control+R in Mozilla or Opera, Ctrl+F5 in Internet Explorer, and Ctrl+R in Firefox) to see an up-to-date history reflecting the undeletion.1
    3. Reverting to the last pre-move version.

Merging page histories of pages with many revisions

Suppose that the page History of Alabama had too many revisions to be deleted or deleting it may cause other disruption. The following procedure can be used to merge page histories in this situation:

  1. Move History of Alabama to Alabama/History with a move summary like "history merge, will be back at correct title soon". Answer yes when asked to delete the Alabama/History page.
  2. Undelete the revisions of Alabama/History containing the page history.
  3. Move Alabama/History back to History of Alabama.
  4. If needed, undelete the remaining revisions at Alabama/History.

A more complex case

Sometimes, after a cut-and-paste move is performed, the article at the old title is then edited for some other purpose (e.g., turning it into a disambiguation page). That causes the article now at NewTitle to have part of its history there, and part at OldTitle, but the history at OldTitle also contains the history of NewMeaning. Use of the selective deletion function allows these to be repaired as well.

Steps for a complex case
Steps for a complex case

To select more than one revision for undeletion, click on the tick box of the first revision to be undeleted, then shift-click on the last revision to be undeleted. Every intermediate revision will then be selected.

An example of this was Military of Japan; the original was moved to Japan Self-Defense Forces with a cut-and-paste move, and the article Military of Japan was then turned into a disambiguation page. This was repaired with the following procedure:

  1. Military of Japan is deleted.
  2. Selective undelete is used to undelete only those versions of Military of Japan which belonged to "Japan Self-Defense Forces".
  3. The versions of "Japan Self-Defense Forces" at Military of Japan are moved to Japan Self-Defense Forces, using the normal page-move function. For this to happen, Japan Self-Defense Forces must be deleted, although this can be done as part of the move.
  4. Undeletion of Japan Self-Defense Forces restores the rest of the versions of that article to its history.1
  5. However, the most recent version in the history of Japan Self-Defense Forces is now the most recent version of the old history from Military of Japan (it's a copy of that version, created by the page-move function). So, go into the history of Japan Self-Defense Forces, select the next-most-recent version, click on it, and when it appears, click on "Edit this page", ignore the "WARNING: You are editing an out-of-date revision" message, type something suitable (e.g., "restoring most recent version after merging histories") in the edit summary, and hit "Save page". That article is now restored to its condition prior to this procedure, and now also has its complete history.
  6. Step 3 above (the move) will have left a history containing just a redirect at Military of Japan. Delete the redirect.
  7. Undeletion of all the other versions of Military of Japan restores the more recent history of that article; no additional steps are needed, as the most recent version should now be the current version.1

A troublesome case

However, the examples just described only work well if the two pieces of the history of one 'article' are disjoint; i.e. one ends before the other begins. These procedures are inadequate if this condition does not apply, e.g., if the copy of the article at the old title has been edited after the pasting of its contents into the new title. For example, it is not uncommon for:

  1. an article at (old) page A to be cut and pasted into (new) page B, and
  2. page A later to be reverted to an article on the same topic, with a sequence of edits there as well.

In this case, the time periods of the two series of edits will overlap.

If someone then page-history merges pages A and B using the method described above, the result will sequence the versions of A and B strictly by time, with the result that various versions of A will be interleaved between versions in the page history of page B (and/or vice-versa). Inspecting this merged history without means of distinguishing between the two overlapping progressions (since nothing in this history indicates which version belongs to which sequence) invites severe confusion.

An appropriate procedure for such a case is to forego the history merge, and instead handle the situation much like a normal merge; put a note pointing to the other version of the page on the article's talk page. If it is inappropriate to leave the second copy in the main article space, you can archive the duplicate page to Talk: space (i.e. by moving it to some suitable title, such as Talk:RandomArticle/OldVersion).

The MediaWiki software does not allow page history to be publicly archived at a page title that does not host a live page or redirect. Therefore, if two pages with parallel histories are merged but it is undesirable to keep a redirect from the deprecated page title to the destination page title, the old page history needs to move. This is sometimes done by moving the page history to a subpage of the talk page of the destination page. An example can be found at Talk:Compilation of Final Fantasy VII#Old page history. Use the {{Parallel version}} template for tagging parallel versions found on talk pages.

Also, if page A is to be history-merged into page B, before the process, make sure that there are no deleted edits in page B, as then deleting B will shuffle the deleted and non-deleted edits attached to the page together. The deleted history should first be rescued from under B by some process such as this: Move B to some other name, say B_zxcvbnm (without making a redirect). Undelete B. Move B to some other name, say B/old_version . If necessary, re-delete B/old_version . Move B_zxcvbnm back to B (without making a redirect).

Likewise, if a page must be deleted and then partly undeleted for a history-split, first check in case it is sitting over a deleted parallel history.

History splitting

Over time, articles may change from one underlying topic to a completely separate topic. Normally this should be accomplished through moves and disambiguation pages. However, if a user is unfamiliar with those processes they may simply change the topic of an article (overwriting the old) and continue editing. If this is not caught immediately it is very easy for the new topic to build up a substantial edit history of its own. Admins can use the following steps to fix this problem and maintain separate histories for the separate topics:

  1. Delete the article (original article name)
  2. Restore previous revisions up to (but not including) the point where the topic was changed.
  3. Move [without redirect] the restored versions (old topic) to a new name (see also disambiguation)
    • If there is already an article under the new name and you wish to histmerge into it:
    a) select the "delete the existing article" option, while moving;
    b) restore deleted revisions of new name.
  4. Restore new revisions of new topic (still at original article name)
  5. Revert to latest good versions as needed
  6. Establish a disambiguation page for the different topics

How to handle the left-over redirect

In most cases, you will be moving all non-redirect versions of one page into the history of another and leaving a redirect. Please keep the following situations in mind when deciding what to do with the redirect:

  • Is the resulting redirect eligible for speedy deletion (see WP:SPEEDY#General and WP:SPEEDY#Redirects)? As with regular page moves, consider waiting a few days before deleting the leftover redirect even if it is eligible for speedy deletion.
  • Are all incoming links to the leftover redirect fixed? If not, don't delete the redirect until they are.
  • Is it likely the most recent editors of the moved revisions are watchlisting the page? Consider notifying them of the change.
  • Is the leftover redirect in User: or User_talk: space? If you do delete it, notify the affected user unless there is a good reason not to. Consider leaving the redirect unless doing so would cause problems, such as in the case of:
    • A redirect from the "main" user page or "main" user talk page to somewhere other than another page in that user's userspace.
    • A redirect to another user's pages or non-user space in a way that may cause confusion or is otherwise inappropriate.

History-merging a transcluded page

If page X is transcluded in page Y, and page X is marked to be the recipient in a history-merge, then page X and page Y will both appear in Category:Candidates for history merging, and both pages will display the request to perform a history-merge. An admin should not try to perform a history-merge on page Y, but only on page X. This is most likely to happen if page X is a template, but it may happen to any page that is transcluded. To avoid this, {{history merge}} should be placed in <noinclude> tags on page X.

How to undo a history merge

If a history merge should not have been performed, then it may be undone. Note, however, that it can be quite tedious, especially if the article has a very long history. The following procedure is listed:

  1. Suppose A has been history merged into B.
  2. We want to get A's former history back into A.
  3. Delete B.
  4. Selectively undelete the revisions of B that made up the history of A before the history merge.
  5. Move B to A.
  6. Undelete the rest of the revisions of B.
  7. If A and/or B is now a redirect to itself or the other article, then revert or change the redirect target, as deemed appropriate.

An example of a successful history merge and undo is available at User:King of Hearts/Sandbox/6 (the A article) and User:King of Hearts/Sandbox/7 (the B article).

Bugs and problems

Revisions with same timestamp

When a page is moved, two edits are made, with consecutively numbered revision IDs and identical timestamps & edit summaries. In edit histories, the timestamps are usually shown to the minute (17:47, 21 January 2008‎), unless the ISO 8601 date format preference is set; however, in the database they are recorded to the second, e.g.:

revid timestamp edit summary title bytes difference in bytes and page content
185912120 2008-01-21
17:47:32
moved Élie, duc Decazes to Élie Decazes, Duc Decazes Élie Decazes, Duc Decazes 8,304 0‎ — an edit is made on the target documenting the move in the edit summary, with no difference in page content
185912121 2008-01-21
17:47:32
moved Élie, duc Decazes to Élie Decazes, Duc Decazes Élie, duc Decazes 40 ‎ ‎-8,264‎ — the source page's text was replaced with #REDIRECT Élie Decazes, Duc Decazes

Live edits are uniquely identified by their revision ID numbers, but deleted edits are referenced by their timestamps. As long as these two revisions are located on different titles, this isn't a problem. However, if the two edits are inadvertently histmerged to the same page, and then temporarily deleted, it is impossible to restore one of these edits without restoring both of them, because they share the identical timestamp which identifies which edit to restore. Thus care should be taken not to move or histmerge a page-move generated #REDIRECT edit off of the page it was made on. #REDIRECTs should stay on the page on which they were created, either as live or deleted edits.

These can, however, theoretically be separated using Special:MergeHistory, but in practice this is a particularly inelegant and tedious method for anything other than what it was designed for (i.e. history-merging). Here's a sketch of how it would need to be done:

  1. Suppose we want to move revisions of A into B. We first delete A.
  2. After undeleting the revisions of A, we notice that extra edits with the same timestamp that need to remain on A have also been undeleted.
  3. Create a temporary empty userpage to hold the revisions while performing the split.
  4. Using Special:MergeHistory, if the first undeleted edits on A need to remain on A, merge those edits to the temporary userpage. If the first undeleted edits should be merged to B, then merge those edits to B.
  5. Repeat step 4 until Special:MergeHistory cannot be used anymore.
  6. If the remaining revisions on A still need to be merged to B, delete B, move those revisions to B (the revisions should not be mixed anymore after step 5), then undelete B.
  7. Delete the temporary empty userpage, and undelete all revisions to be moved back to A (so to exclude the revision of you creating the userpage)
  8. Delete A if there are remaining undeleted revisions on A. Move the temporary empty userpage to A. Undelete the other revisions on A and revert the latest move edit back to the previous latest revision on A.
  9. You should now end up with the correct edits merged into B.

Wikidata

Page moves and deletions are generally reflected on Wikidata as soon as they happen. After performing a history merge, it is a good idea to check your Wikidata contribs (convenience link) and restore pages to their previous state if necessary.

Page curation

Similarly when a page is deleted and then undeleted, it is added to the new pages feed and is marked as unreviewed; it is also a good idea to check for this and manually review the page if necessary (example log).

See also