Jump to content

User:SMcCandlish/Replacement of Template:Rp

From Wikipedia, the free encyclopedia

{{Rp}}, a klugey form of partial parenthetical citation that was created in 2007 to handle cases in which an article needed to cite the same source many times but at different page numbers, is now obsolete, and has been for several years.

What this template does is use superscripted notes, inline in the content, to indicate page numbers or other in-source locations in sources after the footnote indicators. E.g.: "A claim with sources.[1]:12–14 [2]:321, footnote 27 [3]:ix, 9[4]:26 This is more article text." While instances like [2]:321, footnote 27 are the extreme end, simple cases like[4]:26 are also problematic. Use of this crusty format at all poses several usability concerns, from reader inclarity about what these text strings even mean, to unnecessary splitting up of citation details, to cluttering the prose with metadata; and may also pose an accessibility issue for readers with poor eyesight.

We have much "cleaner" ways of handling such cases now, that keep all of the citation information inside the citation footnotes at the bottom of the page where they belong, leaving only the simple superscript indicators – A claim with sources.[1][2][3] – that link to the citations. The community consensus established in 2020 is to deprecate and replace inline parenthetical citations – those that inject citation details into the prose of the article – and this necessarily includes parenthetical partial citations like those created by {{rp}}.

Replacing them is not instantaneous and requires some judgment and some work. Below are tips for converting {{rp}} citations to the most common method of shortened footnotes, the {{sfnp}}/{{harvp}} template set, which replace or work directly with <ref>...</ref> citation markup. There are some alternative citation methods, and this documentation may later be updated to include conversion instructions for them, if there is sufficient demand, but most of these alternatives are themselves obsolescent and disused.

Converting templated (CS1/2) citations that use {{rp}}

[edit]

The instructions in this section are for citations that have been using <ref>...</ref> along with CS1 citation templates like {{cite web}}, {{cite journal}}, and {{cite book}}. This will also generally apply to the disused CS2 template, {{citation}}, which now uses the same set of parameters as CS1 (and has matching output).

The instructions below specifically recommend and illustrate the use of {{harvp}} and {{sfnp}}(rather than different-formatting variants of them), because they are intentionally consistent with the output of the CS1/2 citation templates. WP:Citing sources#Citation style (WP:CITESTYLE) instructs us to use a single, consistent citation style within an article.

Summary of the process

[edit]
  1. Identify sources using {{rp}} but which are being only cited at a single page number (or range of page numbers). These are the lowest-hanging fruit, as the {{rp}} page number(s) can simply be moved into |page= or |pages= in the citation template, and {{rp}} removed.
  2. Identify the sources that are being cited multiple times at different page numbers, and copy the full citation ({{cite book|...}}, etc.), without page numbers, to the ===Sources=== section below the ==References== section so they are in a central location; create such a subsection if needed.
    • The exact names of these sections vary by article; you might find ==Notes== and ===Bibliography===, or ==Citations== and ===Works cited===, or whatever, and maybe both will be level-2 headings, or the second heading might simply be absent. The point is that these re-used citations are at the bottom of the article, below <references /> or {{reflist}}, usually in a bullet list, and sometimes between a {{refbegin}} and a {{refend}} template. Rarely, they may be embedded with <ref> tags inside {{reflist}} or <references /> (known as list-defined references). Any such format is fine; just match your new entries to whatever is used in the article already (if anything). Note: articles on writers often have a content section named "Bibliography" that has nothing to do with a section for reference citations; don't get it confused with one.
  3. For the first source cited at multiple page numbers/ranges, replace things like <ref name="Smith 2023">{{cite book |last=Smith ... |date=2023 ...}}</ref>{{rp|9}} and <ref name="Smith 2023" />{{rp|9}} with {{sfnp|Smith|2023|p=9}} (without a <ref>...</ref> tag around it) in both cases.
    • Repeat this sub-job for each of the different pages (or ranges) cited in that source.
    • Due to the "smarts" of {{sfnp}}, it just does not matter if the page is being cited more than once and which is the first instance; the template will create merged short citations, not duplicates.
  4. Repeat steps 2–3 for the next source being used multiple times at different page numbers/ranges. Repeat until there are no {{rp}} instances left.

Concurrent cleanup

[edit]

In the course of doing this, you may run into various citation inconsistencies, and this is a good time to fix them. Probably the easiest and most common is a source that is only being cited one time at all yet has a ref name like <ref name="Something">; this can be simplified to just <ref>.

Also look out for old CS2 {{citation}} templates mixed in with CS1's more specific types ({{cite book}}, etc.); completely untemplated citations; half-templated citations; bare URLs wrapped around content instead of being inside citations at all; duplicate citations (that are not going to be replaced with the {{sfnp}} operation described above); citations with confusing names in the ref tag; citations misusing various parameters (especially multiple authors put into the same parameter, unless it is |vauthors= with correct Vancouver formatting, and even then that should be replaced unless the entire article is done in that citation style); citations missing key information; redundant entries in "Further reading" or "External links" that have already been fully cited as citations above; the wrong citation template for a source type (e.g. {{cite book}} used for a journal article); multiple pieces of information in one parameter like |page=74, 3rd edition instead of |page=74|edition=3rd, non-citation bibliographic information like |edition=hardback 237pp, vertical citations inline in the text (that format is good for list-defined references at page bottom, but visually disruptive in mid-article code); and other messiness.

An especially helpful cleanup bit to do is changing instances of <ref name=Something> and <ref name=Something/> to <ref name="Something"> and <ref name="Something" />, respectively. See below for a regex search–replace operation to do this easily. [Update: That code is in process of an overhaul, should be finished in Jan. 2024.] This cleanup is beneficial for at least three reasons: 1) It normalizes the formatting to consistent patterns, which aids with the search–replace operations covered below. 2) The quoting "future-proofs" the citation names, since the quotes are required around any value containing a space or punctuation; if a citation presently is named something like DSmith2023 there's a reasonable chance someone will later change it to D. Smith 2023 or whatever, but possibly forget the now-necessary quotation marks. (And if it's already D_Smith_2023, D-Smith-2023, etc., then it already actually requires the quotation marks even if an earlier editor did not understand that.) The quotes are also required if it includes any non-ASCII Unicode characters at all (which includes a lot of accented Latin-alphabet characters, and anything in another character set like Cyrillic, Greek, or Japanese). There are just too many ways to break this to not quote it, so quote it. 3) The space before the /> is understood by more parsers than the unspaced version, so is better for reuse of WP content. See also the mw:Help:Cite documentation for <ref>: Note that identifiers used in the name attribute require alphabetic characters; solely relying on numerals will generate an error message. Quotation marks [specifcally, straight double ones] are always preferred for names, and are mandatory when the name includes a space, punctuation or other mark. It is recommended that names be kept simple and restricted to the ASCII character set. [Emphasis added.]

The process in more detail, for specific kinds of cases

[edit]

This section is in process of major revision to stop recommending <ref ...>{{harvp|...}}</ref> when {{sfnp}} will suffice.

  • Copy the citation's details as a list item down into the Sources/Bibliography/Cited works/whatever subsection (hereafter "the bibliogaphy") under the auto-generated citations, but without any page numbers: * {{cite something |last=Smith ... |date=2023 ...}}
    • If the work is only being cited one time in the article (and is likely to remain that way), it is preferred in the citation style of most articles to leave that specific source citation entirely inline instead of moved down to the bibliography; this can vary by article.)
  • For a source used only once, with {{rp}}:
    • Replace <ref>{{cite something |last=Smith ... |date=2023 ...}}</ref>{{rp|37}} with <ref>{{cite something |last=Smith ... |date=2023 ...|page=37}}</ref>
    • Or, if this citation's details have been moved to page-bottom because the source is likely to be reused later at another page number: Replace <ref>{{cite something |last=Smith ... |date=2023 ...}}</ref>{{rp|37}} with {{sfnp|Smith|2023|p=37}}, or if you prefer the longer syntax, <ref>{{harvp|Smith|2023|p=37}}.</ref>
    • If it has <ref name="Something"> reduce this to <ref>
  • For a source used only once, without {{rp}}, with or without a page number (shown here with one):
    • Leave <ref>{{cite something |last=Smith ... |date=2023 ... |page=37 ...}}</ref> as-is.
    • Or, if this citation's details have been moved to page-bottom because the source is likely to be reused later at another page number: Replace <ref>{{cite something |last=Smith ... |date=2023 ... |page=37 ...}}</ref> with {{sfnp|Smith|2023|p=37}}, or if you prefer the longer syntax, <ref>{{harvp|Smith|2023|p=37}}.</ref>
    • If it has <ref name="Something"> reduce this to <ref>
  • For a source reused, without page numbering at all, named like <ref name="Smith 2023">...</ref>:
    • Leave this as-is, if it is unlikely that additional citations to it will be created with specific page numbers, and leave it inline instead of moving citation details to the bibliography section.
    • If later citations to specific page numbers are likely, replace complete-citation instance thus: <ref name="Smith 2023">{{cite something |last=Smith ... |date=2023 ...}}</ref> becomes <ref name="Smith 2023">{{harvp|Smith|2023}}.</ref>
    • In either case, referential shortened instances of <ref name="Smith 2023" /> remain the same.
  • For a source reused, at only one specific page number or range, named and with {{rp}}: <ref name="Smith 2023">...</ref>{{rp|...}}:
    • If additional citations to it at other page numbers are unlikely, leave it inline instead of in the bibliography, and replace complete-citation instance thus: <ref name="Smith 2023">{{cite something |last=Smith ... |date=2023 ...}}</ref>{{rp|37}} becomes <ref name="Smith 2023">{{cite something |last=Smith ... |date=2023 ... |p=37}}</ref>
    • If later citations to other page numbers are likely, replace complete-citation instance thus: <ref name="Smith 2023">{{cite something |last=Smith ... |date=2023 ...}}</ref>{{rp|37}} becomes <ref name="Smith 2023 p37">{{harvp|Smith|2023|p=37}}.</ref>
    • Replace referential shortened instances of <ref name="Smith 2023" />{{rp|37}} with <ref name="Smith 2023 p37" /> (note: not all cases of <ref name="Smith 2023" />!)
    • You can optionally just leave name="Smith 2023" as-is in these steps if this source is not likely to be cited for anything else later at another page number.
  • For a source reused, at only one specific page number or range, named but without {{rp}}: <ref name="Smith 2023">...</ref>:
    • If additional citations to it at other page numbers are unlikely, leave it inline instead of in bibliography, and replace complete-citation instance thus: <ref name="Smith 2023">{{cite something |last=Smith ... |date=2023 ... |page=37 ...}}</ref> becomes <ref name="Smith 2023 p37">{{cite something |last=Smith ... |date=2023 ... |page=37 ...}}</ref> (i.e., just add page number to ref name).
    • If later citations to other page numbers are likely, replace complete-citation instance thus: <ref name="Smith 2023">{{cite something |last=Smith ... |date=2023 ... |page=37 ...}}</ref> becomes <ref name="Smith 2023 p37">{{harvp|Smith|2023|p=37}}.</ref>
    • Replace referential shortened instances of <ref name="Smith 2023" /> with <ref name="Smith 2023 p37" />
    • You can optionally just leave name="Smith 2023" as-is in these steps if this source is not likely to be cited for anything else later at another page number.
  • For a source reused, at different page numbers, with {{rp}}: <ref name="Smith 2023">...</ref>{{rp|...}}
    • Replace complete-citation instance of particular page reference thus: <ref name="Smith 2023">{{cite something |last=Smith ... |date=2023 ...}}</ref>{{rp|37}} becomes <ref name="Smith 2023 p37">{{harvp|Smith|2023|p=37}}.</ref>
    • Replace referential shortened instances of <ref name="Smith 2023" />{{rp|37}} (not all cases of <ref name="Smith 2023" />!) with <ref name="Smith 2023 p37" />
    • Find other instances of <ref name="Smith 2023" /> with or without {{rp}} and decide how each has to be handled.
  • For a source reused, at different page numbers, without {{rp}}, something like: <ref name="Smith 2023 p37">...</ref>
    • Replace complete-citation instance of particular page reference thus: <ref name="Smith 2023">{{cite something |last=Smith ... |date=2023 ... |page=37 ...}}</ref>{{rp|37}} becomes <ref name="Smith 2023 p37">{{harvp|Smith|2023|p=37}}.</ref>
    • Referential shortened instances of <ref name="Smith 2023 p37" /> can be left as-is (though if it is not formatted consistently with other citation ref-names, now is a good time to normalize them along with the "master" instance that has the {{harvp}} in it).
    • Find other instances of <ref name="Smith 2023 ..."> and decide how each has to be handled.
  • Especially with {{rp}}, be on the lookout for redundant instances of specific-page citations that can be merged.
  • The above examples are examples, not rules. If you prefer ref names in a form like <ref name="Smith2023p27"> or <ref name="Smith-2023-p27"> or even <ref name="Smith (2023), p. 27"> to mimic the output, that's fine. If the "Smith (2023), p. 37" output of {{harvp}} and {{sfnp}} do not match the rest of the page's citation style, you can get "Smith 2023, p. 27" output (which is permissible but less clear) with {{harv}} and {{sfn}}, and there are several other formatting alternatives; just be consistent within the article. However, because WP:CITESTYLE does ask that we impose a consistent style across the citations within a single article, these alternatives should not be used in most cases, since the output of CS1/2 citation templates uses "(2023)" and "p. 27" formatting. The {{harvp}} and {{sfnp}} templates are specifically recommended throughout this tutorial because they are designed to be consistent with CS1/2.

Common issues

[edit]

Read the basics of the template documentation at {{harvp}} (also covers {{sfnp}}, etc.). Crash course in common issues:

  • To cite multiple authors: {{harvp|Smith|Chen|Ocampo|2023|p=27}} or {{sfnp|Smith|Chen|Ocampo|2023|p=27}}. This may require cleanup of bad citations, such as those with no citation templating, or misuse of |author= or |authors= to dump multiple author names into one parameter. Switch to |last1=|first1=|last2=|first2=, or iff the article is consistently using Vancouver-style citations, switch to |vauthors= (which has to be formatted a specific way, e.g. |vauthors=Smith JB, Chen BC Jr, Ocampo P).
  • To cite mutiple pages: {{harvp|Smith|pp=27, 32–33, 170}} or {{sfnp|Smith|pp=27, 32–33, 170}} (While technically it's preferred to use |pp=, in the {{harvp}} or {{sfnp}}, for multiple pages, it actually works fine to just use |p= – it's more important to get rid of {{rp}} than to be ultra-precise with sfnp/harvp niceties. The template documentation claims that such a mismatch can cause breakage, but in pretty extensive testing so far, this has proven false. And there will be a regex search–replace detailed below to fix this anyway. [forthcoming].)
  • If there's no author name to use, you need to use some other meaningful string instead, e.g. publisher's name or acronym, or a key word from the title. This is done with |ref={{harvid}} inside the full-citation template, e.g.: {{cite web |title=Championship Rules |date=2015 |url= .... |publisher=United Underwater Basketweaving Federation |ref={{harvid|UUBF|2015}} }} and then cite this with {{harvp|UUBF|2015}} or {{sfnp|UUBF|2015}}. Do not abuse |author= to repeat the publisher name; use |ref= for its actual purpose.
    • The same technique can be used when a date is unknown but the author is named: |ref={{harvid|Smith|n.d.}}
    • It's also useful for shortening the name of a long organizational author; if you have something like {{cite web |last1=Lujan |first1=J. B. |author2=Ad-Hoc International Committee of Weasel-Shaving Standards |title=Mustelid Grooming Procedures |date=1998 |publisher=Animal Grooming Association ... }} you can add |ref={{harvid|Lujan|AHICWSS|1998}} and cite it with that short name.
  • If the same author has more than one publication in the same year, the conventional thing is to refer to them as, e.g., 2023a and 2023b. The simplest and clearest way to do this is again with |ref=, like so: |ref={{harvid|Smith|2023a}} (there is a different "legacy" way to do this, that operator-overloads the |year= parameter while simulaneously using |date= in the same citation, but this is confusing, obsolescent, and very likely to be broken by later editors). The |ref= parameter exists for good reasons, so please use it when it is called for.

Dealing with list-defined references embedded inside a {{reflist}} or <references />

[edit]

[forthcoming]

Converting annotated or un-templated citations that use {{rp}}

[edit]

[forthcoming]

Using scripts and regular expressions to speed the conversion

[edit]

There is a powerful tool at your disposal: If you have Wikipedia's built-in editor enabled, click the > Advanced item in the top menu, and on the far right will appear an hour-glass icon for advanced in-page search and replace. This supports regular expressions (regex). They are complicated to learn, but you don't have to learn them in detail, just adapt the ones provided here. Select the "Treat search string as a regular expression" checkbox and turn off the other two options when using these. If you have replaced the built-in editor with wikEd (an advanced editor you can install via "Gadgets" in the Wikipedia "Preferences" menu), it also has a regex search feature. So does any good external text editor.

Before doing any regex search–replaces, make the entire process much smoother by normalizing the citation spacing so that your search–replace operations work reliably and don't miss instances. There's a one-click, regex-based tool to do this all for you: TidyCitations![a]

  • Put the line:
    {{subst:Load user script|User:SMcCandlish/TidyCitations.js|User:SMcCandlish/TidyCitations}}
    in either your common.js or the skin.js of your current skin, save the page, and bypass your browser cache.
  • This gives you a script named "{{Tidy}}" in the "Tools" menu on the left while editing a page (might be somewhere else, depending on your skin). Edit the article you're cleaning up, and click that script. This will fix inconsistent spacing in citations.
    • See the short documentation at the top of User:SMcCandlish/TidyCitations for what to do if the article uses vertically formatted list-defined references (LDR) at the bottom of the article.
    • If you are using wikEd, you'll need to temporarily turn off wikEd (it's incompatible with many scripts like this) by pressing the button, making the changes with TidyCitations, then re-enable wikEd.

The first regex action to apply (after the above script) is this fancy oneCite error: There are <ref> tags on this page without content in them (see the help page). to normalize all the <ref name=...> and <ref name=.../> instances in the page (with or without quotes, with or without space before />, with or without unnecessary spacing like <ref name = " foo ">, with or without other attributes in the <ref> tag, etc.) all to a consistent and robust format of <ref name="..."> and <ref name="..." />, so that later searches are guaranteed to find all cases and not miss ones because they don't have quotation marks or exactly the same spacing.

  • Ignore this for now. The regex work below has been surpassed by an in-development version that handles more cases, by using a series of regexes like this one to handle <ref> tags with multiple attributes like group= in any order.
  • Use this regex in the "Search for" field: <ref\s+((?:group|follow|extends)\s*=(?:(?!name\s*=)[\s\S])*)?name\s*=\s*(?:"\s*([^"](?:(?!\s*\/>|\s*"\s*>|\s+(?:group|follow|extends)).)*?)\s*"|'\s*([^'"](?:(?!\s*\/>|\s*'>|\s+(?:group|follow|extends)).)*?)\s*'|([^"](?:(?!\s*\/>|\s*>|\s+(?:group|follow|extends)).)*))(\s+(?:group|follow|extends)\s*=(?:(?!\s*\/>|"\s*>|'\s*>)[\s\S])*)*\s*(?:(\/)|)>
  • Use this string in the "Replace with" field: <ref $1name="$2$3$4"$5$6>
  • Immediately after doing that, do a non-regex search–replace changing "/> to " />
    • This will all work on virtually any ref name, even something as ridiculous as <ref name="Te>st/ ing"/>. The few known limitations (all pertaining to invalid nested quotation marks in ref names), are detailed in this footnote:[b]
    • This regex will also clean up extraneous whitespace inside <ref>, e.g. <ref name = "foo">Citation here.</ref><ref name = bar /> will be normalized to <ref name="foo">Citation here.</ref><ref name="bar"/>, including when any of those unnecessary spaces are line breaks. It even works for extraneous leading/trailing whitespace inside the quotation marks, as in <ref name = " bar " />.
    • Caution: This does not presently handle <ref group="..." name="..."> and similar constructions (where more than one attribute is present and name is not the first one). One step at a time here .... This should not be particularly problematic, because such citations are rare, they are usually sparse in a page when found (easily manually addressed), almost never have {{rp}}, and will not be broken by our regex operations, because a ref tag like <ref group="fn" name="fn1">Footnote 1.</ref> or <ref name="fn1" group="fn">Footnote 1.</ref> cannot be later referred to as <ref name="fn1" />, only as <ref group="fn" name="fn1" /> or <ref name="fn1" group="fn" />, and none of those will match our search specifics. Then eventual multi-regex script will handle them all carefully anyway.

Rp regex example to find nearly all instances of a particular source's secondary (<ref name="..." />) citations with {{rp}} numbers, and replace them with something clearer and non-rp:

  • Ignore this for now: It needs to be updated to stop doing <ref ...>{{harvp|...}}</ref> citations when {{sfnp|...}} ones will suffice.
  • Do a regex search in the "Search for" field on <ref name="TIEOB" \/>{{rp\|([0-9A-Za-z,\.–\- ;§¶\(\)]*)}}, replacing "TIEOB" with whatever (often very unclear) name you are searching for, such as "about the buyout" or ":2" or "May'19".
    • If you are wondering about the technical details: the first \ "escapes" the / symbol, which otherwise has a special regex meaning; same with the second \, which escapes the | symbol; the (round brackets) create a substitution group that we can call (as $1) in the replacement text; the [square brackets] create a character-clustering group that * can operate on; 0-9A-Za-z means any numeral or basic alphabetic letter, and this is followed by characters that also might appear in page number citations, including a comma, a dot (also escaped with \ because it has a special regex meaning of "any single character"), an en dash, mistaken use of a hyphen for a dash (also escaped with \ because of its special regex meaning as a range indicator when inside square brackets), a space, a semicolon in case someone silly did that instead of a comma, the section symbol, the paragraph symbol, and round-brackets (escaped because in regex they are grouping markup); and the * means "any of the characters, however many times they may appear, that are specified in the square-bracket group".
    • This is not the most foolproof possible regex, as it will not include any "page number" matches that contain things like accented letters, CJK/Cyrillic/Greek/Indic characters, or extraneous punctuation, but such messes are not very common in {{rp}} and are easily manually fixed. It will find things like vii–ix, fig. 2, etc.
    • A potential issue is the name="..." value containing characters that need to be backslash (\) escaped – any of these characters: .^$*+?()[]\| (other chars that sometimes need to be escaped in certain regex constructions do not need to be here). It is probably best to replace the name="..." value (using a non-regex search–replace) with something that doesn't have one of these characters, so you (and others later) don't have to remember to escape it with a backslash. E.g., replace name="D. M. Smith | McNabb (2022)" with name="DM Smith & McNabb 2022"
  • In the "Replace with" field, put something like <ref name="Shamos 1993 p$1" />, where "Shamos" is the author surname and "1993" the publication year.
    • The $1 means "swap in the text that was captured in the round-bracketed substitution group by the regex search". You can do multiple such groups, and they are processed and numbered left-to-right. E.g., if you wanted to keep the original ref name and just move the {{rp}} page numbers into the ref name after the original name text, you could search for <ref name="(TIEOB)" />{{rp\|([0-9A-Za-z,\.–\- ;§¶]*)}} – notice the new (TIEOB) round brackets – and replace with <ref name="$1 p$2" />.
  • Clicking "Replace all" with the above search and replace strings will convert something like <ref name="TIEOB" />{{Rp|15}} to <ref name="Shamos 1993 p15" />, and <ref name="TIEOB" />{{Rp|15, 21–22, 37}} to <ref name="Shamos 1993 p15, 21–22, 37" />
  • Search for the original ref name again, e.g. <ref name="TIEOB" to find any that need manual treatment, e.g. something like <ref name="TIEOB" />{{rp|鸡屁股)}}, or other citations to the same source with different names like <ref name="TIEOB on snooker">{{cite book ...}}</ref>
  • Lastly, because that search–replace was a "blunt instrument" that just created secondary (<ref name="..." />) citations, a proper citation will be needed for each distinct page number. Search now for <ref name="Shamos 1993, and just make a list of all the page numbers being cited. For each of them, convert the first instance on the page from something like <ref name="Shamos 1993 p15" /> to <ref name="Shamos 1993 p15">{{harvp|Shamos|1993|p=15}}.</ref> if there is more than one instance of this page citation, or to just the short format {{sfnp|Shamos|1993|p=15}} with no ref tags, if it is used only once. After one primary, complete citation for each separate page-reference to it is done, all the other secondary ones like <ref name="Shamos 1993 p15" /> will just work automatically.

Let me know what other kinds of regex examples would be helpful.

Important notes

[edit]
  • Warning 1: Search-and-replace operations performed with the built-in editor's search–replace function cannot be undone with Ctrl-Z (Cmd-Z on a Mac). Thus it is very important to copy the full text of the article code after performing a successful regex operation, and paste it into a text editor, so that if you do a second operation and it doesn't work right, you can just paste the last-good results back over the results that failed, instead of having to start over.
  • Warning 2: Remember to turn on/off the "Treat search string as a regular expression" checkbox depending on what kind of search–replace you are doing. If you have this set wrong, the results (if the search matched something) will get boogered pretty badly.
  • External tools note: If you are using some off-site editor to do these regexes, you may need to wrap the entire regex with forward slashes: /paste the regex here/; see its documentation for how it wants regexes formatted. You might need /paste the regex here/g for global (don't stop at first match), or /paste the regex here/gm for global and multi-line (don't stop at a line break). It varies by application. You might also end up needing to backslash (\) escape more characters (typically { and }, depending on the editor). There are many "flavors" of regex, and any given editor might be using a rather particular one.

Major cleanup example

[edit]

For a case study of an {{rp}} cleanup (among some other tweaks) of one of Wikipedia's longest and most complex glossary articles, see this combined diff. Ironically, this removed {{rp}} from the very article that this now-obsolete template was originally created for. The entire process, a combination of manual setting-up of needed initial citations for particular works and pages, and multiple regex search–replace operations, took about two hours (much of it spent working out the regex syntax), plus a cleanup tweak or two a bit later.

Footnotes

[edit]
  1. ^ Credits for this script's development, from most recent to earliest: SMcCandlish, Sam Sailor, Zyxw, Meteor sandwich yum and Waldir.
  2. ^ The only known issues with that regex are all to do with ref names that have multiple sets of quotation marks, though even most of those are workably handled:
    • If someone has wrongly done <ref name='..."..."...'> with single quotes, the regex search–replace output will be <code><nowiki><ref name="..."..."..."></nowiki>, which is even worse. This is something that going to have to be addressed in the eventual cleanup script with a pre-pass that detects nested quotation marks.

      NB: The single-quotes are replaced as wrong because <ref> is not XML but "an XML-like syntax" that is documented as requiring double-quotes not single-quotes for this purpose; single-quotes in this construction are, canonically speaking, just content characters that, like all punctuation and other name content that is not ASCII letters and numerals, are required to be inside double quotes. Right this moment, the MW parser appears to attempt to treat single quotes as if delimiters anyway, but this functionality is not the design intention of mw:Extension:Cite, and it does not actually work properly. It will (at present) produce a valid ciation if the format is something like <ref name='foo "bar" baz'> but the citation breaks if there is a space inside the interior quotation marks, as in <ref name='foo "bar baz" quux'>.

    • The regex will handle any other case of misuse of single quotes, even something ridiculous and "extra-invalid" like <ref name='foo 'bar' baz'>.
    • It cannot handle especially bad markup of <ref name="foo "bar" baz">. As with the first case mentioned above, MW tries pretty hard to parse this and in a very simple case like that one it will actually work (at least for now), but it fails on <ref name="foo "bar baz" quux"> with a space in the interior quoted part. This, too, is something that will have to be repaired with a pre-pass filter before the main regexes go to work.