Module talk:Citation/CS1/Archive 10

This is an archive of past discussions about Module:Citation. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 5

←

Archive 8

Checking for invalid `|lccn=`

This one won't come up very often, but an invalid |lccn= can make it so that the automatic link to lccn.loc.gov does not work. I came across one in a citation today. I'm guessing that there are approximately a few dozen invalid LCCN parameters in all of WP, but they should be easy to detect.

Here is a straightforward explanation (scroll to "identifier-syntax") of valid LCCN syntax that will work with the LCCN web site. – Jonesey95 (talk) 05:16, 17 March 2014 (UTC)

In Module:Citation/CS1/sandbox:

length=7

fail. LCCN 1234567. {{cite book}}: Check |lccn= value (help)

length=8

pass. LCCN 12345678.
fail. LCCN 1234567A. {{cite book}}: Check |lccn= value (help)

length=9

pass. LCCN a12345678.
fail. LCCN 012345678. {{cite book}}: Check |lccn= value (help)

length=10

pass. LCCN aa12345678.
pass. LCCN 9912345678.
fail. LCCN a912345678. {{cite book}}: Check |lccn= value (help)
fail. LCCN 9a12345678. {{cite book}}: Check |lccn= value (help)

length=11

pass. LCCN aaa12345678.
pass. LCCN a9912345678.
fail. LCCN 99912345678. {{cite book}}: Check |lccn= value (help)
fail. LCCN 9aa12345678. {{cite book}}: Check |lccn= value (help)
fail. LCCN aa912345678. {{cite book}}: Check |lccn= value (help)
fail. LCCN a9a12345678. {{cite book}}: Check |lccn= value (help)

length=12

pass. LCCN aa9912345678.
fail. LCCN 0a9912345678. {{cite book}}: Check |lccn= value (help)
fail. LCCN a09912345678. {{cite book}}: Check |lccn= value (help)

length=13

fail. LCCN aa99123456789. {{cite book}}: Check |lccn= value (help)

If retained, error category will be Category:CS1 errors: LCCN.

—Trappist the monk (talk) 11:31, 20 March 2014 (UTC)

length=12

fail. LCCN 779912345678. {{cite book}}: Check |lccn= value (help)

Looks good to me. – Jonesey95 (talk) 03:55, 21 March 2014 (UTC)

Well, not quite right. The check needs to be improved so that lccns with hyphens are normalized before they are checked.

—Trappist the monk (talk) 14:20, 30 March 2014 (UTC)

Ok, I think that I've fixed the issue. New function normalize_lccn() normalizes the lccn according to the Normalization of LCCNs procedure. These test citations all work correctly. normalize_lccn() is able to normalize them all; the two fails are because there are spaces in the lccn that cause improper display of the lccn link

pass. LCCN n78-890351.
pass. LCCN n78-89035.
fail (white space). LCCN 78890351 n 78890351. {{cite book}}: Check |lccn= value (help) [http://lccn.loc.gov/n 78890351 n 78890351]
pass. LCCN 85000002.
pass. LCCN 85-2.
pass. LCCN 2001-000002.
pass. LCCN 75-425165//r75.
fail (white space). LCCN /AC/r932 79139101 /AC/r932. {{cite book}}: Check |lccn= value (help) [http://lccn.loc.gov/79139101 /AC/r932 79139101 /AC/r932]

—Trappist the monk (talk) 17:41, 30 March 2014 (UTC)

Should the red error message be set to "hidden=true" in the live module until this bug fix is rolled out? I recommend doing so in order to avoid false positives. – Jonesey95 (talk) 00:03, 31 March 2014 (UTC)

Possible small bug in new year range code

I think I have found a small bug in the new year range code (which is great, by the way!). Here's what I have so far:

Author (1901–02). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help); Check date values in: |year= (help)

Author (1901–04). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help); Check date values in: |year= (help)

Author (1909–10). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help); Check date values in: |year= (help)

Author (1911–12). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help); Check date values in: |year= (help)

Author (1918–20). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help)

Author (1921–22). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help)

Author (1931–36). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help)

Author (1984–86). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help)

Author (2001–02). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help); Check date values in: |year= (help)

Author (2001–04). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help); Check date values in: |year= (help)

Author (2009–10). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help); Check date values in: |year= (help)

Author (2011–12). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help); Check date values in: |year= (help)

Author - future year (2018–20). "Foo Title". Journal Name. 23: 4. {{cite journal}}: |author= has generic name (help)

The year ranges are changed and are in |year=. The citations are otherwise identical except for the future year. – Jonesey95 (talk) 00:17, 31 March 2014 (UTC)

Not a bug. All of those errors occur where the two-digit year is less than 13 which makes for a possibly ambiguous date: YYYY-MM? or YYYY-YY?

—Trappist the monk (talk) 00:31, 31 March 2014 (UTC)

The warning violates WP:MONTH which reserves the ~~YYYY-YY~~ YYYY–YY format for years with this statement: "Do not use YYYY-MM format (e.g. 2001-03 for March 2001, which may be confused with the year range 2001–2003)." Jc3s5h (talk) 01:27, 31 March 2014 (UTC) Corrected format; year ranges should use an n-dash, not a hyphen. 14:55, 31 March 2014 UT.

What is the proposed way to remove the date error for "date=1901–02"?

The RFC on the YYYY-MM date format was just closed in favor of YYYY-MM remaining as a proscribed format, so 1901–02 can only be a year range.

If I recall correctly, the date-checking code is picky about hyphens (required for YYYY-MM-DD, marked as an error in other formats) versus endashes (required for all ranges, marked as an error in other formats). The examples above all use endashes, so they should be valid, whereas the YYYY-MM format, if it were acceptable, would use a hyphen. Since the above examples (a) are all in |year= and (b) all use endashes, I propose that they should be acceptable to the date-checking code. – Jonesey95 (talk) 02:44, 31 March 2014 (UTC)

Yes, Module:Citation/CS1 is picky about hyphens and endashes. Hyphens are only allowed in YYYY-MM-DD dates; ranges require endashes.

I'm not sure that we can rely on endashes, or on particular date-holding parameter names, or on editors adhering to the strictures of WP:MONTH, to divine intended meaning when we find AAAA-BB format in a date-holding parameter. You've been sorting through the gibberish that editors dump into CS1 template parameters long enough to know that editors aren't all that careful. I, for one, would like to see |year= go the way of |day= and |month=.

I don't think that readers (who aren't versed in the intricacies of WP:MOS) can easily determine by inspection if Journal Name 23, 1901–02, is the February issue or covers the period 1901–1902. Interpretation of such date ranges occurring in article text at least has some possibility of context to aid the reader; context in an isolated citation is much more limited and may not exist.

So, the fix for |date=1901–02 is |date=1901–1902.

I'll add this to the CS1/WP:DATESNO compliance table.

—Trappist the monk (talk) 12:11, 31 March 2014 (UTC)

"So, the fix for |date=1901–02 is |date=1901–1902." This is wrong on a few levels. First, this thread is about a bug in checking code. The checking code doesn't fix anything, an editor does. Next, since MOSNUM allows 1901–02 (that's an n-dash) this style of range should be acceptable by the checking code. I think the code already would disallow a hyphen in any date expression except YYYY-MM-DD, so even after the code is fixed to accept 1901–02 the expressions 1901-02 and 1901-99 (with hyphens) would still be flagged as errors. Jc3s5h (talk) 15:11, 31 March 2014 (UTC)

As I wrote before, not a bug, the code was intentionally written to exclude AAAA–BB dates where BB is less than 13. AAAA-BB dates are invalid because of the hyphen, regardless of the value in BB.
Yep, the code doesn't fix anything; never has, never will, and I never said that it would. Please stop putting words in my mouth that I have never written nor spoken.
I have noted before that CS1 is compliant with a subset of WP:DATESNO, itself a subset of WP:MOSNUM, CS1 will never be fully compliant with either.
YYYY-MM-DD is the only date format where hyphens are allowed and in fact required by CS1.

—Trappist the monk (talk) 16:11, 31 March 2014 (UTC)

We seem to have a breakdown in terminology. There is a Citation Style 1, as described in Help:Citation Style 1, which is the way citation should be entered into templates and be displayed. Then there is the implementation of Citation Style 1 by the various bits of template code. Sometimes the template implementation is deficient. For example, if a source were written in the year 46, it would have to be so described in the citation, but the implementation doesn't support it. So the editor most likely would write a hand-coded citation that resembles Citaton Style 1 as much as possible.

Help:Citation Style 1#CS1 compliance with Wikipedia's Manual of Style describes certain aspects of dates that, at present, are not feasible to implement, or are still on the to-do list for implementation. It should not be a description of free choices to create differences between CS1 and MOSNUM, because no such free choices were agreed to by the community. It would be inappropriate for a template coder to chose to implement templates in defiance of WP:MOSNUM if it is reasonably feasible to follow WP:MOSNUM. Jc3s5h (talk) 17:47, 31 March 2014 (UTC)

Perhaps it should be a lower class of warning, but we are used to ignoring certain date errors, already. I see how this range can be ambiguous, even though it is compliant. Warning that it could be a problem is not actively trying to fix something that isn't broken. I'd support adding the word possibly to this detection, or a different differentiation, if the code became that sophisticated. (It is already quite complicated.) —PC-XT+ 00:47, 1 April 2014 (UTC)

PC-XT, I think this is the wrong place to argue that YYYY–YY is too ambiguous to use in a citation, because a citation has less context than other parts of an article. If that's what you believe, you should bring it up at Help talk:Citation Style 1, and argue that page should declare that Citation Style 1 permanently rejects YYYY–YY, as an exception to the general acceptance of WP:MOSNUM date guidance, and it should be spelled out the rejection is on the basis of ambiguity (which is permanent) rather than it being unfeasible to check (which may be temporary). Jc3s5h (talk) 01:13, 1 April 2014 (UTC)

I don't mean to argue. I simply think there is room for compromise, at least for the moment. Some people find it ambiguous. I'm not sure I understand the arguments of whether it should be ambiguous, as I haven't read most of the other discussions about it. I may propose that the help page mention that some find it ambiguous, but I don't plan to try to outlaw it. I also have no opinion on whether this check remains the way it is due to ambiguity or turns into something else, other than that I prefer that templates, modules, userscripts, etc. follow consensus. I don't think this change is as bad as it could be. We have already had date errors against the MOS for a while, so I expect editors are used to them by now, and will generally use proper judgement. It's not perfect, but it's generally improving, and there is time. —PC-XT+ 01:56, 1 April 2014 (UTC)

CS1 does not do repairs. It can't. Besides, we have editors and robots to do the menial labor.

I'm sure that it's possible to have different levels of errors but what I think that amounts to is hidden error messages and perhaps different categorization. These particular error message are currently hidden so there isn't much to be gained there. Where would the word 'possibly' go?

—Trappist the monk (talk) 01:09, 1 April 2014 (UTC)

I meant that the current implementation doesn't do anything wrong. It simply alerts editors to possible ambiguity, without changing anything in the display. I don't think it really needs different levels, but since it may be controversial, it could possibly be separated into a different category or use some kind of lesser error orange color, though I don't really see the point. As far as the word possible, it could go into the documentation at Help:CS1 errors#bad date. I prefer templates to follow documentation, but this case is not going to be so straightforward. —PC-XT+ 01:56, 1 April 2014 (UTC)

At present the code not only issues a warning message, but also throws a page into Category:CS1 errors: dates. One goal of more precise date checking is to reduce the number of false alarms in this category. So the code would have to have a level of warning which would issue a warning, but not put the page in the category. Also, eventually this is supposed to be fully turned on so all readers see it. I don't think we want to be showing warnings for things that might be wrong, rather than a clear violation of Citation Style 1. Jc3s5h (talk) 02:07, 1 April 2014 (UTC)

I've added a section at Help talk:CS1 errors with more details. I agree that reducing false alarms is a goal worth pursuing. I also agree that when the messages are turned on for everyone, each message should ideally link to help text with MOS links supporting its statements. It seems to be too soon to try for that, at the moment. I might support removing the category from this error, but leaving the hidden red text for now. That way, pages with less contentious errors can be given priority over those with this error only. —PC-XT+ 02:26, 1 April 2014 (UTC)

Regretably, the development of CS1 was not a neatly planned, well organized engineering project. It was/is a mess. In the best of all possible worlds, the developers of CS1 would have started with a specifications document. That document would have directed the implementation; it would have served as the basis for the user documentation. Alas, twas not to be. This is Wikipedia. Multiple authors created multiple templates that evolved into multiple templates using the common {{citation/core}}. Continuing evolution is bringing them all into Module:Citation/CS1; new features are added, old features are pared away, and somehow, somehow, it is beginning to coalesce into a single entity. There is no plan for this, it just happens because Wikipedia happens. Documentation in this kind of environment lags behind the implementation; it will always lag behind.

Help:Citation Style 1 is not a specifications document, nor a design guide, nor is it even a style guide; it is not a description of free choices to create differences between CS1 and MOSNUM, though it does reflect those choices; it isn't even good user documentation. It is a mess. Help:Citation Style 1 is merely a collection of writings that attempts to describe how CS1 works and how to use it. Expecting more from it than that will lead to despair.

I choose think that this particular coding choice was made not in defiance of WP:MOSNUM, but rather, to the benefit of CS1.

—Trappist the monk (talk) 01:09, 1 April 2014 (UTC)

It is a mess, and documentation isn't going to be perfect. Nevertheless, when it is clear what the documentation says, and it is reasonably feasible to write checking code that follows the documentation, that should be done. It is inappropriate to use a position as a code writer to ignore the consensus process and just implement whatever the coder prefers. If you think 1901–02 is ambiguous in the context of a citation, get consensus to modify Help:Citation Style 1 accordingly, instead of throwing it on a list of stuff that is infeasible or in the queue. Jc3s5h (talk) 01:21, 1 April 2014 (UTC)

origyear --> origdate?

Should we deprecate origyear and make it a synonym of a new parameter called origdate so that the naming format is consistent. Jason Quinn (talk) 03:43, 3 April 2014 (UTC)

No. |origyear= is mainly used for books, where a second or subsequent edition is common, but the exact publication date is unimportant; for a non-first edition of a book, the actual and original year of publication are both useful. The exact publication date is mainly of use for periodicals, where although there is almost always more than one issue, there is rarely more than one edition. Some newspapers do have one or more pages re-set during a print run, in order to cover breaking news; but the cover date doesn't change, and so the "original date" is pretty much a non-existent concept. --Redrose64 (talk) 10:02, 3 April 2014 (UTC)

No consensus on whether YYYY-MM is acceptable or unacceptable

An RFC that sought to determine whether YYYY-MM was an acceptable date format was recently closed.

The YYYY-MM format is currently in the Unacceptable column of the table at WP:BADDATEFORMAT, but I expect that to change soon. It was added there because the initial RFC closure said that there was "no consensus to change anything", implying that the state of the table at the opening of the RFC (YYYY-MM was in the Unacceptable column at that point) was how it should remain. The closure was subsequently revised to read: "There is no consensus that YYYY-MM is an acceptable format, nor any consensus that it is an unacceptable format. I would recommend against any mass changes being made purely on the basis of this RfC."

Based on this reasonably-attended RFC, despite the lack of consensus, it appears that the CS1 module's date-checking code should stop flagging YYYY-MM as an invalid date format. Thoughts? – Jonesey95 (talk) 00:48, 3 April 2014 (UTC)

The RfC pretty much starts off with: The recent (29 Nov 2013) banning of the yyyy-mm format ... which apparently arises from this conversation and this change to the table at WP:BADDATEFORMAT.

The table at WP:BADDATEFORMAT was then subjected to quite a few edits to change its format but YYYY-MM remained in the table until this edit (3 Feb 2014) when it was hidden pending the outcome of the RfC.

Another version of the ban was added 31 Mar 2014. That same day, the ban was modified a bit and then hidden only to be almost immediately unhidden following the closure of the RfC.

It doesn't appear to me that the ban on YYYY-MM was added to WP:BADDATEFORMAT because of the closure of the RfC but rather, was restored following the closure.

—Trappist the monk (talk) 11:27, 3 April 2014 (UTC)

The ban was always there in the "Month" section, it was just repeated in the unacceptable date format table for convenience. I think it was a mistake to comment it out from the unacceptable date format table while leaving the "Month" section alone. I think we have always taken the position that when we mention the YYYY-MM-DD format, we mean exactly that, and do not allow any of the related forms mentioned in ISO 8601 such as 2014-04, 20140403, 2014-04-03T13:12, etc.

I think this puts the English Wikipedia in an analogous position to the UK House of Lords[1]; a bill was introduced to clarify whether the legal time in the UK, which is called in the law "Greenwich Mean Time", was UTC or UT1. The lords had a debate but left the question unanswered. Wikipedia editors had a debate but couldn't come to a conclusion about whether YYYY-MM is acceptable. Jc3s5h (talk) 13:18, 3 April 2014 (UTC)

Sorry if I wasn't clear above. Here's the timeline:

The ban was added to the table (by me, after a discussion with clear consensus against YYYY-MM on the MOS/Dates Talk page) in November.
After the RFC started, someone commented out the ban from the table.
When the RFC was initially closed with the statement that there was "no consensus to change anything", the ban was restored (since "no consensus to change" implied that the ban should stay, i.e. no change from the state of the table when the RFC started).
After the ban was reinstated, the resolution of the RFC was edited to its current state. After reading the comments above and rereading the Talk page discussion, it appears that the proper path may be to leave the YYYY-MM prohibition in the CS1 module code, remove YYYY-MM from the Unacceptable list, and leave the recommendation against YYYY-MM in the "Month" section of MOS/Dates. That would restore everything to its pre-RFC state, I believe. – Jonesey95 (talk) 17:20, 3 April 2014 (UTC)

I think that the current state of things is just as it was prior to the RfC. So, I agree with all of item 4 except: remove YYYY-MM from the Unacceptable list, which would would leave us in a state different from that which existed at the initiation of the RfC. If you are looking to undo this change to the table at WP:BADDATEFORMAT, then I think that Module talk:Citation/CS1 is the wrong forum.

—Trappist the monk (talk) 17:48, 3 April 2014 (UTC)

I think leaving the "Month" section alone, removing YYYY-MM from the unacceptable format table, and leaving it in the CS1 module would be sweeping the problem under the rug. Advice in MOSNUM should be easy to find; it is fundamentally sneaky to keep controversial advice in there and hope nobody notices it. As for CS1 warnings, if the community can't decide what they want, they don't deserve help from automated tools, so remove the warning. Jc3s5h (talk) 18:04, 3 April 2014 (UTC)

Re item 4 above: I'm not saying that I am going to modify the date format table. There's too much kerfuffle on that page and its Talk page for my taste. I tried to be helpful once, after a clear consensus, and it got me nowhere. I'll stick to what I'm good at. Personally, I think the error message should stay, because the YYYY-MM guidance has been in the "month" section and because YYYY-MM will either be "unacceptable" or not displayed in the date format table.

And because YYYY-MM is fundamentally ambiguous and less clear than it should be, but I recognize that this amounts to the same thing as me just not liking it. – Jonesey95 (talk) 18:10, 3 April 2014 (UTC)

Validating `|mr=`

This is pretty esoteric, so it's OK if it goes onto the Feature Requests list, but I think we can validate |mr=. I haven't found a spec for the number, but it appears to be seven numeric digits, optionally preceded by "MR".

We might want to consult with Wikipedia_talk:WikiProject_Mathematics about the preferred formatting for this link in the citation templates. We could show it as "MR1234567" or "MR MR1234567" or "MR 1234567". We show one of the latter two now, depending on whether someone puts "MR" in the value of the parameter. The first "MR" is linked to Mathematical Reviews. The number is linked to the cited source at mathscinet.org.

The article Uniform module contains links to a number of MR citations (some of them recently fixed so that they link to the right cited source). I haven't played around with case sensitivity, spaces, or other formatting to see how good the mathscinet.org processor is at handling what you throw at it. – Jonesey95 (talk) 23:49, 8 April 2014 (UTC)

Seems immune to leading zeros and the MR if included. Seems to start at 1 and the current end seems to be 3117748; so, monotonically increasing list of numbers. How pragmatic. Error checking would seem to be pretty simple: nothing but digits and the value of the number must be greater than zero and less than say, 4000000.

—Trappist the monk (talk) 00:44, 9 April 2014 (UTC)

Some test cases:

Goldie, A. W. (1958), "The structure of prime rings under ascending chain conditions", Proc. London Math. Soc. (3), 8: 589–608, ISSN 0024-6115, MR 010 3206, (space in MR number works fine) {{citation}}: Check |mr= value (help)

Goldie, A. W. (1960), "Semi-prime rings with maximum condition", Proc. London Math. Soc. (3), 10: 201–220, ISSN 0024-6115, MR MR011-1766, (leading MR and hyphen in MR number, both work fine) {{citation}}: Check |mr= value (help)

Miyashita, Y. (1966), "Quasi-projective modules, perfect modules, and a theorem for modular lattices", J. Fac. Sci. Hokkaido Ser. I (contd. as Hokkaido Journal of Mathematics), 19: 86–110, MR 0213390/, (slash at end of MR number leads to wrong article) {{citation}}: Check |mr= value (help)

Takeuchi, T. (1976), "On cofinite-dimensional modules.", Hokkaido Journal of Mathematics, 5 (1): 1–43, ISSN 0385-4035, MR 0213.390, (period in MR number works fine) {{citation}}: Check |mr= value (help)

Varadarajan, K. (1979), "Dual Goldie dimension", Comm. Algebra, 7 (6): 565–610, doi:10.1080/00927877908822364, ISSN 0092-7872, MR %MR524269, (percent symbol before MR fails) {{citation}}: Check |mr= value (help)

Goldie, A. W. (1960), "Semi-prime rings with maximum condition", Proc. London Math. Soc. (3), 10: 201–220, doi:10.1080/00927877908822364, ISSN 0024-6115, MR MR111766 (22 #2627), (full bolded text copied from citation page fails) {{citation}}: Check |mr= value (help)

The web site looks pretty tolerant, but not infinitely so. – Jonesey95 (talk) 03:24, 9 April 2014 (UTC)

Cite book with period for title makes everything bold

OK, it's an edge case, but it looks like a tiny little bug that may be manifesting itself in other situations.

Cite book with only period in |title= makes everything after it bold and adds a single quote mark before the title. Something about the wikimarkup difference between using single quotes for bold and using single quotes for italics, perhaps.

Cite book comparison
Wikitext	`{{cite book\|author=Author\|date=2001\|page=3\|title=.}}`
Live	Author (2001). p. 3. `{{cite book}}`: `\|author=` has generic name (help)
Sandbox	Author (2001). p. 3. `{{cite book}}`: `\|author=` has generic name (help)

Using |url= makes the problem go away.

Cite book comparison
Wikitext	`{{cite book\|author=Author\|date=2001\|page=3\|title=.\|url=http://www.example.com}}`
Live	Author (2001). p. 3 http://www.example.com. `{{cite book}}`: `\|author=` has generic name (help); `\|url=` missing title (help)
Sandbox	Author (2001). p. 3 http://www.example.com. `{{cite book}}`: `\|author=` has generic name (help); `\|url=` missing title (help)

When would you need to set |title=.? --Redrose64 (talk) 16:13, 11 April 2014 (UTC)

I think that the real flaw is that a raw title containing only a terminating character that matches the separator character is not flagged as a missing title error. If a terminating character matches the separator character, it is removed in safejoin(). Change to |title=; and |separator=; and the same thing occurs:

Author (2001). ;. p. 3. {{cite book}}: |author= has generic name (help); Unknown parameter |separator= ignored (help)CS1 maint: extra punctuation (link)

The |url= fix 'works' because the last character in the title string is not the separator but is the closing ] of the assembled external link: [http://www.example.com ''.'']. You can see in your second example that there is a linked period followed by an unlinked period.

Is it worth the effort needed to fix it? Unless there are untold thousands of these peculiar citations out there, probably not.

—Trappist the monk (talk) 16:20, 11 April 2014 (UTC)

I have never actually seen this format in the wild, and I've seen a lot of crazy, crazy stuff in my travels through Category:Articles with incorrect citation syntax. I created a citation with this format by accident and noticed the weird formatting when I previewed it.

I just figured I'd report it here to see if anyone else had noticed it or could think of why it might happen. I like the explanation. Probably not worth fixing, but if it comes up again, we'll have this discussion in the archives. – Jonesey95 (talk) 18:49, 11 April 2014 (UTC)

New alias, "lang"?

Would it be possible to make |lang= an alias for |language=? It Is Me Here ^{t / c} 11:33, 13 April 2014 (UTC)

Reason? -- Gadget850^talk 13:08, 13 April 2014 (UTC)

Because IMO it's easy to think that this already exists and so to put it in (as I did earlier), and I can't think of any other uses someone might have for typing "lang=", so it won't be misleading. It Is Me Here ^{t / c} 13:25, 13 April 2014 (UTC)

|language= is clear, straightforward, and unambiguous. |lang= is an abbreviation that is not as clear. I believe that we generally avoid abbreviations for parameter names or aliases, except where the full name of the parameter would be absurdly long, e.g. |internationalstandardbooknumber=. – Jonesey95 (talk) 04:25, 14 April 2014 (UTC)

Well, re. "lang" specifically, it's already used as a parameter on e.g. {{Link-interwiki}}, {{Sec link}}, {{Braille cell}}, and {{Broken ref}}. Plus, there are over 500 templates that have "lang" in their name. This is why I had thought this would be fairly uncontroversial, to be honest. It Is Me Here ^{t / c} 22:50, 15 April 2014 (UTC)

PDFlink

I just noticed that {{PDFlink}} is being merged into CS1 as of May 2013. I don't recall the discussion. -- Gadget850^talk 19:57, 14 April 2014 (UTC)

There was a discussion here that resulted in a consensus decision to eliminate the template and add parameters to CS1 citations. A CS1 feature request was submitted here, but it didn't go anywhere. It's not clear to me that |formatsize= is required in order to eliminate the template, nor is it clear to me that there is a CITEVAR-friendly path from instances of PDFlink to CS1 citations in all or even most cases. The mechanics of how to make the transition, showing how existing instances of the template would be converted, were not explored thoroughly. – Jonesey95 (talk) 20:33, 14 April 2014 (UTC)

remove /sandbox

hi what about removing /sandbox from

--local cfg = mw.loadData( 'Module:Citation/CS1/Configuration/sandbox' );

and

--local whitelist = mw.loadData( 'Module:Citation/CS1/Whitelist/sandbox' );

please

86.173.55.186 (talk) 14:53, 21 April 2014 (UTC)

Greetings, User:Google6666. --MF-W 15:12, 21 April 2014 (UTC)

Update to the live CS1 module week of 2014-03-23

In about a week's time I intend to update these files from their respective sandboxes:

Module:Citation/CS1 (diff);

Module:Citation/CS1/Configuration (diff);

Module:Citation/CS1/Whitelist (diff)

The update makes these changes to Module:Citation/CS1:

Add PMC error checking; (discussion)
Fixed a circa year date validation bug; (discussion)
Add url in |authorlink parameter error checking; (discuassion and discussion)
Expand DOI error checking; (discussion)
Fix longstanding bug that broke citation terminal punctuation if the value assigned to |postscript= is multicharacter (like html entities); Moved citation template's default assignments for |separator=, |postscript, and ref=harv from the invoking template into the module; Added support for |postscript=none; (discussion)
Limit acceptable years in dates to current year+1; (discussion)
Expand date validation; all allowable date formats should now be supported; (discussion)
Migrate cite interview; (discussion)
Move date validation code into a separate page Module:Citation/CS1/Date validation;
Extract page numbers from external wikilinks in any of the |page=, |pages=, or |at= parameters for use in COinS; discussion)
Add lccn error detection; (discussion)
Migrate cite AV media notes; (discussion)
Migrate cite DVD notes; (discussion)

to Module:Citation/CS1/Configuration:

PMC error checking;
url in |authorlink parameter error checking;
Move |postscript= and |separator= default initialization into Module:Citation/CS1/sandbox;
Add subject and subject link for cite interview migration;
Add artist, albumlink, albumtype, notestitle, publisherid for cite AV media notes migration;
Add lccn error detection;
Delete albumtype; merge deprecated parameters albumlink, artist, director, notestitle, publisherid, titleyear as aliases of other parameters; remove these parameters after 1 October 2014;

to Module:Citation/CS1/Whitelist:

Add subject and subjectlink for cite interview migration;
Add artist, albumlink, albumtype, notestitle, publisherid for cite AV media notes;
Invalidate albumtype; deprecate artist, albumlink, director, notestitle, publisherid, titleyear; these last to be invalidated after 1 October 2014;

—Trappist the monk (talk) 11:54, 25 March 2014 (UTC)

Corrected item 5 for Module:Citation/CS1 to read: Added support for |postscript=none;

—Trappist the monk (talk) 12:53, 25 March 2014 (UTC)

Done.

—Trappist the monk (talk) 12:29, 30 March 2014 (UTC)

Thanks for fixing the year-range issue. Kanguole 12:54, 30 March 2014 (UTC)

Discussion

I object to the allowable date checking as it exist in sandbox. There is no clear consensus [fixed link] for prohibiting "Feb." or "Sept", and ~~"Feb."~~ "Feb." is given as an acceptable example in WP:MOS. Documentation and function should proceed in lockstep; if the community won't let you change the documentation, you shouldn't change the code. Jc3s5h (talk) 14:31, 25 March 2014 (UTC) Fix wikilink for the abbreviation "Feb." 14:52 UT. Another link fix 15:59 UT.

Per MOS:MONTH:

Months are expressed as capitalized whole words (e.g. March).
Abbreviations such as Mar. or Mar are used only where space is extremely limited, such as in tables and infoboxes.

-- Gadget850^talk 14:43, 25 March 2014 (UTC)

Sorry, I had the wrong wikilink for "Feb." Jc3s5h (talk) 14:52, 25 March 2014 (UTC)

Reviewing Module_talk:Citation/CS1/sandbox#Invalid_year_doesn.27t_generate_error, I see no discussion about months. -- Gadget850^talk 15:26, 25 March 2014 (UTC)

(edit conflict)

Now, wait a minute. You yourself have written: I do not agree that WP:MOS or WP:MOSNUM control date formats in citations (although Wikipedia talk:Manual of Style/Archive 128#Which guideline for citation style? shows there is no consensus about this). But here you are invoking WP:MOS#Months to support your argument that Sept. and Feb. should be allowed in CS1 citations.

It should be noted that short month names longer than three characters have not been acceptable to CS1 since the first iteration of the date validation code was released 9 November 2013. Except for implementation details, the functionality of that code hasn't changed and isn't changed with this update.

Your There is no clear consensus link above, points to Module_talk:Citation/CS1/sandbox#Invalid_year_doesn.27t_generate_error. Was that what you intended?

Please don't put words in my mouth that I have not spoken. I have not asked for nor attempted change to the MOS with regard to date formatting.

—Trappist the monk (talk) 15:42, 25 March 2014 (UTC)

Sorry, my link to the consensus discussion should be Module talk:Citation/CS1/Archive 9#Legitimate date range examples to add to the date checking part of the CS1 module

As for the timeliness of this objection, it isn't clear to me if this change will make the error messages visible to everyone; if not, resolution of this could wait until the changes will be visible to everyone. Jc3s5h (talk) 16:00, 25 March 2014 (UTC)

Date errors are hidden by default and will likely remain hidden until the number of pages with these errors has been significantly reduced.

—Trappist the monk (talk) 19:01, 25 March 2014 (UTC)

As for which guideline controls citations, in the general case, WP:CITE says any consistent style is allowed. The CS1 style (but not other styles), has chosen to adopt the date formats in WP:MOSNUM (which contains ~~"Feb."~~ "Mar."). Also, the RFC mentioned above shows consensus that WP:MOS and WP:MOSNUM should agree with each other, and WP:MOS contains "Feb." Jc3s5h (talk) 16:06, 25 March 2014 (UTC) Fixed abbreviation 22:40 UT.

CS1 does not comply with a lot of WP:DATESNO. Here is a table that indicates CS1 compliance with WP:DATESNO that I will probably copy over to Help:Citation Style 1#Dates so that CS1's compliance is documented for all to see.

CS1 compliance with Wikipedia:Manual of Style/Dates and numbers
section	compliant	comment
Acceptable date formats table	yes	Exceptions: linked dates not supported; sortable dates not supported (`{{dts}}` etc); proper name dates not supported;
Unacceptable date formats table	yes
Consistency	no	article level restriction beyond the scope of CS1
Strong national ties to a topic	no
Retaining existing format	no
Era style	no	dates eariler than 100 not supported;
Julian and Gregorian calendars	limited	Module:Citation/CS1 cannot know if a date is Julian or Gregorian; assumes Gregorian
Ranges	yes	Exceptions: does not support the use of `–` or ` ` does not support dates prior to 100; does not support solidus separator (/) does not support " to " as a date separator;
Uncertain, incomplete, or approximate dates	yes	Exceptions: does not support `{{circa}}` or `{{floruit}}`; does not support dates prior to 100;
Days of the week	no
Months	yes	Exceptions: shortened month names longer than three characters or with terminating periods are not supported in keeping with the Acceptable date formats table;
Seasons	no	seasons are treated as if they were months so must be capitalized;
Decades	no
Centuries and millennia	no
Abbreviations for long periods of time	no

—Trappist the monk (talk) 19:01, 25 March 2014 (UTC)

As for Trappist the monk changing date formats in MOS, I did not mean to imply Trappist had done so, or tried to. I am saying that MOS and MOSNUM apply to CS1 because Help:Citation Style 1 says MOSNUM applies, and there is consensus MOS and MOSNUM should agree with each other. Therefore, the code should allow what MOS and MOSNUM allow, and if the coders don't like it, they should change MOS and MOSNUM before making error messages visible to everyone. Jc3s5h (talk) 16:11, 25 March 2014 (UTC)

This editor has no interest in doing battle over the discrepancies among Wikipedia:MOS#Months and Wikipedia:DATESNO#Months and the Acceptable date formats table. When those discrepancies have been resolved, I am quite content to adapt Module:Citation/CS1 so that it complies where it is possible to comply.

—Trappist the monk (talk) 19:01, 25 March 2014 (UTC)

It seems to me that since both MOS and MOSNUM contain "Feb." and "Mar." respectively, the status quo is that periods after dates are currently allowed. A prior version of Acceptable date formats table spelled out the dates in such detail that it implied that abbreviated dates with periods and "Sept" were not acceptable, but the current version carries no implied prohibition of these formats. Jc3s5h (talk) 19:23, 25 March 2014 (UTC) Fixed abbreviation 22:40 UT.

@Jc3s5h: Which statement in the MOS is at odds here? -- Gadget850^talk 17:52, 25 March 2014 (UTC)

The MOS states "Abbreviations for months, such as Feb. in the United States or Feb in most other countries, are used only where space is extremely limited." But the date syntax check in the sandbox version of the CS1 Lua-based templates flags month abbreviations followed by a period as errors. Jc3s5h (talk) 18:31, 25 March 2014 (UTC)

This was the subject of a RFC which is still pending closure by an uninvolved admin. Suggest that we leave as is until this is closed. Keith D (talk) 19:15, 25 March 2014 (UTC)

WP:MOS#Months (which goes to Wikipedia:Manual of Style#Months on the general MOS page) does say "Abbreviations for months, such as Feb. in the United States or Feb in most other countries, are used only where space is extremely limited." But before that it has "Further information: MOS:MONTH". This goes to Wikipedia:Manual of Style/Dates and numbers#Months; and as I understand it, the general MOS page summarises the more specific MOS subpages - it can't include all of the details, otherwise there would be no point to having subpages. MOS:MONTH does give more information: "Months are expressed as capitalized whole words (e.g. March). Abbreviations such as Mar. or Mar are used only where space is extremely limited, such as in tables and infoboxes." The bolding is mine: it shows which words are only in the specific page, not in the general page. The last phrase, "such as in tables and infoboxes", does not necessarily include references. It could include references, if the article has a very large number of refs, and those refs are high in information. But if space for refs is at a premium, abbreviating months will save a maximum of eighteen letters per ref (by using Sep for September in the |date= |accessdate= and |archivedate=), whereas a lot more space can be saved by other means: using initials instead of author's first names; by the use of |displayauthors=; by judicious use of |location= and |publisher=; by the non-use of |quote= - there are several other ways of reducing the length of a ref, which can easily achieve a saving of more than 18 characters. --Redrose64 (talk) 19:49, 25 March 2014 (UTC)

The YYYY-MM-DD format is also for places where space is limited, and that format is widespread in CS1 citations. I think you'd have a hard time arguing that we should forbid Jan 1, 2014 but allow 2014-01-01. Personally, I'd be happy to get rid of both Jan 1, 2014 2014-01-01, but I don't think you'll convince the community of that. Jc3s5h (talk) 20:21, 25 March 2014 (UTC)

That is a holdover from date linking. At one point dates were linked by the templates so they would show per the user's preferences. It was eventually realized that the majority of readers had no preference set, thus they saw a variety of date styles in an article. After two years of discussion, date linking was removed from the templates in 2008, but the dates were never systematically cleaned up. There have been a number of bike shed discussions since. The existence of YYYY-MM-DD dates in citations doesn't mean they are correct. -- Gadget850^talk 20:37, 25 March 2014 (UTC)

I am very much in favour of |date=31 December 1999 (or |date=December 31, 1999 if you really have to) for publication dates, and |archivedate=1999-12-31 and |accessdate=1999-12-31 for archive and access dates. This visually separates the publication date from other dates which are relevant only within Wikipedia. -- 79.67.241.76 (talk) 14:55, 28 March 2014 (UTC)

HTML entity: – does not seem to be supported in date ranges. Example:

Jc3s5h (talk) 18:43, 25 March 2014 (UTC)

It is not supported because for the time being html entities in certain date-holding parameters corrupt COinS metadata. Use {{ndash}}. Also, {{cite journal/sandbox}} invokes the live module, not the sandbox version as you might expect. I don't know what the IP editor who made that change had in mind. Use {{cite journal/new}}:

Smith, Joseph III (1879–1910). "Last Testimony of Sister Emma". The Saints' Herald: 289.

—Trappist the monk (talk) 19:01, 25 March 2014 (UTC)

So far, I haven't seen any objections to the changes listed in the original list above, only objections to the current operation of the module code. It might be better to split the above discussion into sections with appropriate titles. I can try to do that in an NPOV manner unless there are objections. If there are objections, I will leave the discussion as is and will not be offended.

The only note I see above that may be read as an objection to the list is in reference to the date checking. Trappist the monk may have been overly concise in item 7 on the first list, which might be clearer if it read something like "Expand date validation; all acceptable date formats in the table at WP:DATESNO should now be supported, along with most ranges listed at WP:DATERANGE (see exceptions)" – Jonesey95 (talk) 20:55, 25 March 2014 (UTC)

I don't mind if certain items are placed in different sections. As for "Expand date validation; all acceptable date formats in the table at WP:DATESNO should now be supported", I don't think that is a correct reading of the table (although it would have been a reasonable reading of an earlier version of the table. The current table is silent about whether a period may follow a month abbreviation, or whether "Sept." is allowed. Both MOS and MOSNUM contain abbreviations followed by a period ("Feb." and "Mar." respectively.) Jc3s5h (talk) 22:41, 25 March 2014 (UTC)

@Trappist the Monk: THANK YOU THANK YOU THANK YOU for expanding the date validation! When all allowable date formats will shortly be supported, I hope the number of articles in Category:CS1 errors: dates will drop off dramatically over the next few weeks. GoingBatty (talk) 01:16, 26 March 2014 (UTC)

Invalid parameter not detected

I found this attempt at |issn= in the wild, and it did not cause a citation error:

{{cite journal | author=Moraes KCM, Quaresma AJ, Kobarg, J |title= Identification and characterization of proteins that selectively interact with isoforms of the mRNA binding protein AUF1 (hnRNP D) |journal=BIOLOGICAL CHEMISTRY |volume=384 |issue= 1 |pages= 25–37 |year= 2003 |pmid= 12674497 | ISSN: 1431-6730 pmc= |doi=10.1515/BC.2003.004}}

Moraes KCM, Quaresma AJ, Kobarg, J (2003). "Identification and characterization of proteins that selectively interact with isoforms of the mRNA binding protein AUF1 (hnRNP D)". BIOLOGICAL CHEMISTRY. 384 (1): 25–37. doi:10.1515/BC.2003.004. PMID 12674497. {{cite journal}}: Cite has empty unknown parameter: |ISSN: 1431-6730 pmc= (help)CS1 maint: multiple names: authors list (link)

Look specifically at the attempted ISSN parameter. Is there no error because there is nothing following the "=", and parameters with blank values are ignored?

I fixed this one, but I thought I'd drop this example here to offer some food for thought. There's a lot of craziness out there. – Jonesey95 (talk) 23:40, 21 April 2014 (UTC)

Yes, that would be why it is not detected. There are probably a reasonable number of these out in the wild. However, the last non-whitespace character has to be "=" and it must be the only "=". Such text can also be part of a incorrectly encoded URL which contains a "|" followed by some text ending in "=".

Another error which can not be detected by the module is duplicate parameter names. I have encountered several of those while running through pages fixing the identified "unknown parameter" errors. I was not looking for such and only detected some in specific situations. There are probably a reasonable quantity of both types of issues out there. Finding them would require a database scan. — Makyen (talk) 00:16, 22 April 2014 (UTC)

A date range error I have been unable to fix

This date range is marked as invalid. It is from 1999 in archaeology.

Baker, Dorie (December 13, 1999 – January 17, 2000). "Finding sheds new light on the alphabet's origins". Yale Bulletin and Calendar. 28 (16). Retrieved 2012-03-16.

Can anyone help turn it into a valid date range? It looks to me like it matches MOSDATE. I checked the source, and the date range matches that of the source. – Jonesey95 (talk) 04:17, 24 April 2014 (UTC)

That format isn't currently supported. I'll fix that shortly.

—Trappist the monk (talk) 13:13, 24 April 2014 (UTC)

You're the best. – Jonesey95 (talk) 14:31, 24 April 2014 (UTC)

In the sandbox:

'"`UNIQ--templatestyles-00000052-QINU`"'<cite class="citation book cs1">''Title''. 13 December 1999 – 17 January 2000a.</cite><span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Title&rft.date=1999-12-13%2F2000-01-17&rfr_id=info%3Asid%2Fwiki.riteme.site%3AModule+talk%3ACitation%2FCS1%2FArchive+10" class="Z3988"> <code class="cs1-code">{{[[Template:cite book|cite book]]}}</code>: Invalid <code class="cs1-code">|ref=harv</code> ([[Help:CS1 errors#invalid_param_val|help]])
Pass. 31 December 2014 – 1 January 2015.
Fail – same year. 30 December 2014 – 31 December 2014. {{cite book}}: Check date values in: |date= (help)

'"`UNIQ--templatestyles-00000056-QINU`"'<cite class="citation book cs1">''Title''. December 13, 1999 – January 17, 2000a.</cite><span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Title&rft.date=1999-12-13%2F2000-01-17&rfr_id=info%3Asid%2Fwiki.riteme.site%3AModule+talk%3ACitation%2FCS1%2FArchive+10" class="Z3988"> <code class="cs1-code">{{[[Template:cite book|cite book]]}}</code>: Invalid <code class="cs1-code">|ref=harv</code> ([[Help:CS1 errors#invalid_param_val|help]])
Pass. December 31, 2014 – January 1, 2015.
Fail – same year. December 30, 2014 – December 31, 2014. {{cite book}}: Check date values in: |date= (help)

Fail – mixed format. December 31, 2014 – 1 January 2015. {{cite book}}: Check date values in: |date= (help)
Fail – sequence order. January 1, 2015 – December 31, 2014. {{cite book}}: Check date values in: |date= (help)

—Trappist the monk (talk) 15:22, 24 April 2014 (UTC)

Open library parameter syntax, checking, and linking

We have a parameter, |ol= which displays in the citation and links to the Open Library. Currently the link prefixes a "OL" to the value of the parameter and an "OL" is displayed to identify that this is an OLID:
{{cite book |last=Last |first=First |title=Title |ol = 1135607M }}
Last, First. Title. OL 1135607M.

However, in their listing pages the Open Library lists their identifiers already including the "OL" prefix (e.g. "OL1135607M"). It would be normal for an editor to expect to be able to copy and paste the identifier from the Open Library page into the citation template:
{{cite book |last=Last |first=First |title=Title |ol = OL1135607M }}
Last, First. Title. OL 1135607M.

Unfortunately, this does not currently work. The link is non-functional. In addition, no indication is given to the editor that there is a problem. In order to determine that there is an issue, the editor has to examine or test the link.

The "OL" in the ID appears to actually be a part of the ID. While I have not found an actual spec for the ID, looking at their API implies that the OL is part of the ID. We have been enforcing, by not linking properly with the "OL" present, not entering the "OL" in the |ol=. That means that we can not suddenly switch to requiring it. We need to accept |ol= both with and without "OL" as the first two characters.

At a minimum, we should change the module such that it does not add an additional "OL" to the link if one already exists in the provided OLID.

As to the visual aspect: While I do not find it visually appealing, having it appear as "OL OL1135607M" is consistent with the format which we have adopted of having a descriptor prior to each ID and displays the complete OLID.

The result of this is that that |ol= parameters should be processed prior to both linking and display to add an OL to the |ol= value if an "OL" does not already exist as the first two characters of the parameter value. — Makyen (talk) 06:11, 16 May 2014 (UTC)

Subscription required message

I was wondering if it could be possible to make the subscription required message a little bit prettier. Right now it has a double parentheses: "(subscription required (help))", and the "help" shows a tooltip. Couldn't the whole "(help)" just be removed and the tooltip applied to the whole message instead? --Atethnekos (Discussion, Contributions) 17:04, 20 May 2014 (UTC)

ISBN =

If I'm understanding this correctly.... If the parameter is ISBN=, the module will not check the ISBN number for errors. Should the 4,964 articles that contain | isbn = be converted via a bot to | isbn = ? Amount of articles obtained from April's dump. Bgwhite (talk) 07:16, 27 April 2014 (UTC)

If there is an ISBN in |id= it is not completely checked for format, but is linked if it does not fail some course format checking. If it is 13 digits and starts with 978 or 979 it is linked (e.g. ISBN 9781234567890 Parameter error in {{ISBN}}: checksum), but is not linked if it does not start with those digits (e.g. ISBN 9801234567890). If it is 10 digits (with X as a possible 10th character) it is linked (e.g. ISBN 123456789X). If it is not 10 or 13 digits it is not linked (e.g. ISBN 12345678901). [NOTE: I have not looked at the code for this which is part of MediaWiki, not the citation templates.]

At a minimum, there should be some additional logic to moving ISBNs out of |id= into |isbn=. In many cases |id= was used because |isdn= is already occupied and would generate an error if it contained more than one ISBN. In addition, if the editor desired to have additional text prior to, or after, the ISBN then it may have been placed in |id= for that reason. The |isbn= parameter accepts nothing other than a strictly formatted ISBN with no other text permitted. If the |isbn= is already occupied, then obviously an additional ISBN should not be moved out of |id= into |isbn=. If there is additional text in |id= then it is a contextual edit where human editorial judgement should be applied and should not be performed by bot.

If the edit is strictly that |isbn= does not exist and an ISBN is in |id= without additional text – other than "ISBN" – then yes it should be moved into |isbn=. The contents of |id= are not included in the COinS data, but |isbn= is – NOTE: This is contrary to the documentation stating that "any of the identifiers" are included in the COinS data. However, |isbn= is included in the COinS without any format corrections, which, I assume, is why it has been programmed to generate an error if the value is not strictly compliant as an ISBN (i.e. no other characters are tolerated).

In my opinion, it would be better for us to somewhat relax the formatting required in the |isbn= parameter. We could easily strip out all non-numeric characters prior to performing the ISBN format/check-digit verifications and passing that stripped version in the COinS. This would result in fewer errors, both for our editors and in the COinS data at the cost of a single regular expression substitution. In effect we would be permitting additional non-numeric text in the |isbn= value. If desired, the regular expression could also strip a preceding "1[03]:" as that sequence is somewhat commonly used by editors, for some reason, to indicate that it is a 10, or 13 digit ISBN. — Makyen (talk) 08:42, 27 April 2014 (UTC)

Why do we need additional text? Do you have an example where this is needed? And multiple ISBNs or other identifiers are always suspect. I have only seen multiple ISBNs where someone is trying to identify multiple versions of a source, not the particular source they are using.

It is not a question about when I think additional text is needed. My personal opinion is that it is a very rare occasion when it is actually needed. The one occurrence which I recall was on an author's Wikipedia page. The {{Cite book}} templates were used to format a list of the author's works. As part of the list, the ISBNs were supplied for all of the different versions of each book. A brief piece of text was supplied inline to describe the version of the book for each ISBN. I'm not sure I would make the same editorial choice, but I respect the fact that they had made that choice on that page.

The additional text issue is a question of when a significant number of editors consider it appropriate to include such text and how we should handle the fact that it happens a significant amount of the time. Our checking for strict formatting on the ISBN appears to be due to using it in COinS, not just based on verifying that the provided ISBN text would enable a human to find the book, or that linking the ISBN to Special:BookSources will function. Special:BookSources appears to strip all non-numeric characters from what is passed to it. Humans can handle a much wider variety than the strict requirements we are currently applying to this field. We are imposing much stricter requirements that do not need to exist in order to accomplish the primary task of enabling someone to find the reference. The strict format requirement makes the template less user friendly when being a bit more user friendly (tolerant of a somewhat larger range of formats) costs very little and actually improves the quality of the data we are passing via COinS (i.e. we strip any extraneous text instead of only flagging an error).

In going through Category:Pages with ISBN errors the most common additional text that actually has some meaning is to append a short descriptor about which version of the book the ISBN is for. For example: "{paperback}", "(pbk)", "(hardback)", "(hdb)", etc. Are these strictly necessary for identifying the book – assuming the ISBN is actually correct: no. As a human looking to acquire the exact book is it helpful information to know: yes.

There are also a significant number of citations where effectively useless information is provided. For example prefixing the value with "10:", "13:" "ISBN", etc.

I question why we consider the additional text as "errors" when they are in fact not an actual error, merely a deviation from strict formatting of this specific parameter. This is when the strict formatting is not needed for it to be functional in the way that it the information is primarily used (link to Special:BookSources) and most deviations from the strict formatting are trivially handled in the module to provide good data, in most cases, via COinS. The processing necessary to provide good data via COinS is a regular expression replacement. This is something we at least come close to doing already. Even for a properly formatted ISBN we have to strip out the "-" or " " characters in order to calculate the checksum.

To cover a specific issue: I am not suggesting that we change what we display in the citation (except no error when it is now not an error). We currently display all text supplied in the |isbn= value. We should continue to do so.

As to multiple ISBNs in the same citation. Yes, of course, it is suspect. However, please note that what I said about multiple ISBNs was that the proposed move-the-ISBN-from-id-to-isbn bot should not create an error where none currently exists by either creating a duplicate |isbn= or by moving a second ISBN into the |isbn= where it will be an error when an editor has already placed the second ISBN in |id= where it does not create an error. I made no comment about the editorial choice to have multiple ISBNs in the citation, only that the bot should be programed to not create errors in the citation when it comes across some situations that are known to exist. — Makyen (talk) 13:57, 27 April 2014 (UTC)

|type= is the proper parameter for your examples "{paperback}", "(pbk)", "(hardback)", "(hdb)" – without the brackets.

—Trappist the monk (talk) 16:07, 27 April 2014 (UTC)

@Trappist the monk: I both agree and disagree with |type= being most appropriate. When making these changes I have no history on the page, and no knowledge of any possible agreement about format. In my opinion, changes to correct citation errors should remain as close to the original editors intent as possible. Thus, for many cases I feel that it is more important to retain the intent of the original editor rather than use the "correct" parameter |type=.

Here is an example which I encountered today:

As originally in the page:

Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9 (Print) 978-0-306-47916-8 (Online). {{cite book}}: Check |isbn= value: invalid character (help); Unknown parameter |editors= ignored (|editor= suggested) (help)

Using |type= and |id= (location of "Print" disassociates it from the ISBN):

Fifty Years of the Shell Model — The Quest for the Effective Interaction (Print). Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9. {{cite book}}: More than one of |ISBN= and |isbn= specified (help); Unknown parameter |editors= ignored (|editor= suggested) (help)

Using |id=:

Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. 2003. doi:10.1007/b100519. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online). {{cite book}}: Unknown parameter |editors= ignored (|editor= suggested) (help)

In my opinion, the version which does not use |type= is closer to what the original editor intended.

Note that this citation has other problems and would likely be better as (retaining the 2 ISBN numbers):

Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. (eds.). Advances in Nuclear Physics, Volume 27. Advances in the Physics of Particles and Nuclei (APPN). Vol. 27. Springer-Verlag. pp. 1–275. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online) ISSN 0065-2970. {{cite book}}: External link in |series= (help)

— Makyen (talk) 23:58, 27 April 2014 (UTC)

Yeah, as you show it, |type= doesn't work so well in your example, not because |type= is wrong but because the original editor is wrong. The CS1 templates are designed to provide information about a single source. Here, the editor is trying to cite two versions of the same source in a single template. We should be glad that he didn't want to include the softcover version as well (ISBN 978-1-4757-8801-3). Perhaps the better solution to the multiple isbn problem is to choose one to use in the template and include the other(s) parenthetically outside the template. This at least avoids the error, includes an isbn in the COinS metadata, and still keeps the rest available:

Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. (eds.). Advances in Nuclear Physics (hardback). Advances in the Physics of Particles and Nuclei (APPN). Vol. 27. Springer-Verlag. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. ISSN 0065-2970. (alternate: ISBN 978-0-306-47916-8 (Online); 978-1-4757-8801-3 (softcover))

I took out |url=, |chapter-url=, |pages=, and removed the external link from |series=. |doi= gets the reader to the same place as |chapter-url= where all you get is a sample of the table of contents and part of the introduction teaser as part of the publisher's effort to sell you a copy of the book; |url= and the external link in |series= is more selling. There is no point in listing a chapter and all of the pages that make up the chapter; that does nothing to help a reader find the cited information.

—Trappist the monk (talk) 10:50, 28 April 2014 (UTC)

id = ISBN should not be changed wholesale to ISBN =, for the reasons noted above. I think it would be reasonable for an editor using AWB to convert instances of id = ISBN that contain plain ISBNs with no extraneous text, in citations where an ISBN is not present.

Some data: I have fixed about 3,000 of the 8,000 articles in Category:Pages with ISBN errors using an AutoEd script in the past couple of months. I have about 2,500 more articles to examine. The script has been able to fix about 60% of the articles I have examined. The most common fixable error, by far, is two ISBNs separated by a comma. These two ISBNs are usually the 10-digit ISBN followed by the 13-digit ISBN.

As for extra text, the examples given above are often present. Sometimes a "printing" or "edition" is present, though it is almost always redundant with |year=. Sometimes multiple volumes, each with its own ISBN, are specified; I don't touch those.

When I am done going through the category, I expect there to be about 2,500 articles left. The large majority of those errors will be legitimate errors: ISBNs with too few or too many numbers. There will be somewhere under 1,000 "low-hanging fruit" still left, primarily ASINs, multiple ISBNs that were too strange or ambiguous for my scripting skills to handle, ISSNs, publisher names, and other easy fixes. After those are fixed, I expect we'll have under 2,000 actual ISBN problems to track down.

Anyone who would like to contribute to clearing out this category is welcome to do so. I recommend starting at the end of the alphabet, since the remaining articles that my script hasn't touched are in the A–N portion of the alphabet (I've been working my way from Z to A). – Jonesey95 (talk) 17:39, 27 April 2014 (UTC)

@Jonesey95: I have been working on them from "A" forward. I was splitting multiple ISBNs in |isbn= into |isbn= and |id= until Redrose64 commented that a large number of them were just both the 10 and 13 digit ISBN for the same book and expressed a belief that the 10 digit one should be removed. I don't agree that there is consensus for us to wholesale override the choice of editors to put both a 10 and 13 digit ISBN into the citation. I fully agree that it is not needed, and would not do so myself. I just don't think that there is a wide enough consensus for us to remove them from thousands of articles. I have not been splitting them wholesale since that point. My intent was to go back through once it was clearer as to how to handle them. I also have not translated the code I wrote for a different purpose which decodes/formats/checks ISBNs from JavaScript to what is needed for AWB (which is the tool I use). Something which actually compares the two and verifies that they are 10/13 duplicates would be needed.

Looking at your script: Your script appears to delete the first ISBN unless it starts with 97[89] without any checks to see that this occurrence is actually a 10/13 duplicate. I consider this to be inappropriate. You may be deleting a non-duplicate. In addition, even in the case where it is a 10/13 duplicate, the editor has made the choice to include both. While I don't agree with that choice, I have not seen something that indicates a wide consensus for removing 10/13 duplicates from thousands of articles.

I disagree with your choice to comment out any ISBN starting with 977. I have seen a good number of ISBNs which have had "97[89]" mistyped as "977". In these cases, changing the 977 to 97[89] was sufficient for the ISBN to be valid and find the correct book.

I am not familiar with scripts for AutoEd. However, the replacements you are performing appear to be performed on the complete text of the article, not limited to citations. For the |isbn= parameter this might be sufficiently specific. On the other hand it might not. You might want to consider adding/changing your regular expressions to more specifically limit them to only being within citation templates. I use the following (or a variation upon):

({{\s*[cC]it[ea](?:[^}{]*(?:\{\{[^}{]*}}[^}{]*)*)\|\s*)isbn(\s*=\s*)

It also prevents matches with any parameters within one level of sub-template within the citation template. It could be more specific and prevent low probability matches within wiki-links (within citation templates), but a wiki-link with the displayed portion being the format of a parameter, "|\s*isbn\s*=\s*", is a low probability and these are not intended for unattended operation. Note that if there is more than one |isbn= in the citation this will match the one furthest from the {{\s*[Cc]it[ae].

As to ASINs: you change any that are explicitly called out as ASINs. I would suggest adding additional cases to that. My experience so far is that a sequence matching B0[0-9A-Za-z]{8} can safely be considered an ASIN even when not explicitly stated as an "ASIN". However, I have been actually clicking on the links created to verify the fact that is is an ASIN and is valid. I have not found a formal specification for ASIN numbers, but aside from those which are also ISBNs, that format has fit the ones I have seen. — Makyen (talk) 23:58, 27 April 2014 (UTC)

Looks like I spoke a bit too soon about using B0[0-9A-Za-z]{8} as indicating an ASIN. I just encountered 4 on a page. Three of them were invalid as ASINs. Although, I have not previously encountered ones which turned up invalid when changed to |asin= based on that criteria.— Makyen (talk) 00:07, 28 April 2014 (UTC)

Thanks for the tips. I will see if I can incorporate some of them into my editing.

My answer to most of your concerns is that I visually inspect each article's ISBN errors before running my script, and then I visually inspect each of the script's proposed edits before saving. There are plenty of articles that I skip because I can see in advance or after running the script (but before saving) that the script will produce undesirable results.

I believe that I am commenting out only 13-digit "977" numbers, which are typically UPC bar codes; I don't see many of these. I look at the citation to confirm that it does not appear to be a book before doing so, but I comment it out instead of deleting it because I can't be sure. There is a particular editor who has inserted many "977" numbers, allegedly for Billboard Brasil, as ISSNs and ISBNs. I did a ton of research to try to find a valid ISSN for these, and failed, so I resorted to commenting them out.

ASINs: There are a couple hundred apparent ASINs in the category. I didn't feel comfortable changing them without checking each one manually, so I have saved them for a second pass.

As for removing a 10-digit ISBN when a 13-digit ISBN is also present, my understanding is that they contain identical information and lead the reader to the same book (at worldcat.org, for example) when clicked. The CS1 error help text explicitly says to "Use the 13-digit ISBN when it is available" and that "Only one ISBN is allowed in this field" because it breaks the metadata and breaks the link to Special:BookSources. – Jonesey95 (talk) 01:08, 28 April 2014 (UTC)

Including multiple ISBNs, such as for print and online is an issue, since we can not definitively determine which version was consulted. Fixing these has the same problem, where we cannot determine the definitive source. -- Gadget850^talk 01:11, 28 April 2014 (UTC)

Multiple ISBNs may be useful outside of references, in a list of works. For example, the subject of an article might be the editor of a multi-volume encyclopedia, for example, where each volume has its own ISBN. In that case, putting all of the ISBNs into |isbn= is not appropriate, but neither is removing all but one ISBN. Using |id= or putting the ISBNs outside of the citation template might work; I haven't given it enough thought yet, since I've been working on the easy fixes. – Jonesey95 (talk) 01:17, 28 April 2014 (UTC)

wp:SAYWHEREYOUGOTIT pertains. If we can't tell which was seen due to multiple ISBNs, we imply they are equivalent (down to pagination). In that case it might be cleaner to cite OCLC 70752232 or OL 9534802M.LeadSongDog come howl! 13:49, 28 April 2014 (UTC)

13-digit numbers beginning 977 are the EAN-13 representation of an ISSN, but they are not ISSNs: a true ISSN has eight digits. It is not always easy to convert an EAN-13 to an ISSN: for example, The Railway Magazine is ISSN 0033-8923 and the barcode is 977-0033-89229-3 - clearly seven digits correspond, but I don't know about the rest. --Redrose64 (talk) 17:48, 28 April 2014 (UTC)

If a multi-volume work has an ISBN for each volume, then I recommend listing each volume individually with the appropriate ISBN. Otherwise, there is no connection between the volume and the ISBN. -- Gadget850^talk 13:03, 29 April 2014 (UTC)

Multiple ISBNs

Would it be feasible to have multiple instances of {{{isbn}}}, each associated with a {{{type}}}? For example, the above example could be converted to {{cite book |chapter=Fifty Years of the Shell Model — The Quest for the Effective Interaction |date=2003 |publisher=[[Springer-Verlag]] |doi=10.1007/0-306-47916-8_1 |title=Advances in Nuclear Physics |volume=27 |first=Igal |last=Talmi |editor1-first=J. W. |editor1-last=Negele |editor2-first=E. W. |editor2-last=Vogt |isbn1 = 978-0-306-47708-9 |type1=hardback |issn=0065-2970 |series = Advances in the Physics of Particles and Nuclei (APPN)|isbn2 = 978-0-306-47916-8 | type2 = Online | isbn3 = 978-1-4757-8801-3 | type3 = softcover}} We would default to {{{isbn1}}} or simply {{{isbn}}} for generating COinS metadata, just like at present. HTH HAND —Phil | Talk 17:40, 15 May 2014 (UTC)

No. Where would it stop? Some books have many more than one ISBN - paperback/hardback; audio; USA/UK/Australia/etc. publisher; separate volumes or all-in-one; special coffee-table binding. How many do you need? The answer to that is: give the ISBN of the edition that you actually consulted, and no other. --Redrose64 (talk) 17:52, 15 May 2014 (UTC)

There should only be one - the one the page numbers were taken from. Keith D (talk) 18:40, 15 May 2014 (UTC)

We should not be encouraging storing a significant list of different ISBN numbers. The one which should be selected is the one, without modification, which is printed in the book actually being referenced. If there is more than one printed, use the one that matches the version of the book in-hand. If there is both a 10-digit and a 13-digit version printed in the book, the 13 digit version is preferred. Do not convert from a 10-digit version to a 13-digit version by just adding the 978-; it will be wrong. Do not convert a 13-digit version to a 10-digit version by removing the 978-; it will also be wrong. Use the version as printed in the book.

There are ways to have more than one ISBN if the |id= is used, but that should be an exception, not a rule. If we were going to start listing all of the different identifiers for every edition/version of a book, as Redrose64 said "where would it stop?" As an example: a reference on which I was attempting to fix the ISBN earlier today was citing Magic and Mystery in Tibet. Should we be listing identifiers for all of the 60 versions listed in WorldCat?

If the citing editor has actually checked multiple versions to find that the page numbers and text are exactly the same, then it is reasonable for them to list more than one identifier. The |id= parameter can be used for this purpose and as long as the text "ISBN" precedes a valid format ISBN it will be linked to Special:BookSources by the MediaWiki software. (see Help:Magic links)

On the other hand, We should not generate badly formed COinS data if there are extraneous non-numeric characters in the |isbn= parameter. Removing everything other than digits is trivial.

I also believe that we should not generate an error if there is extraneous non-numeric text in the ISBN parameter. All non-numeric text can be removed prior to processing with a single regular expression substitution. We are already performing one regular expression substitution to remove the "-" marks. Given the ease with which all extraneous non-numeric text can be removed – particularly given we are already removing some such text (hyphens) – it feels like we are going out of our way to make the requirements for this parameter more stringent than is needed in order to meet the goals of an accurate link to Special:BookSources and valid COinS data. In fact, we appear to choose to provide bad COinS data when providing good COinS data in a larger percentage of cases is trivial. Just removing such extraneous text prior to checksum verification and forwarding to COinS is slightly easier, from a processing point of view, than what is currently done and results in both that parameter being much more user friendly and our providing good COinS data in a higher percentage of citations. — Makyen (talk) 02:35, 16 May 2014 (UTC)