Help talk:Citation Style 1/Archive 80

This is an archive of past discussions about Help:Citation Style 1. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 75

←

Archive 78

→

Module:TwitterSnowflake problem

A recent edit at Dave Grohl has produced "Lua error in Module:TwitterSnowflake at line 16: attempt to perform arithmetic on local 'c' (a string value)." That is seen by previewing the following.

{{cite tweet
|author=Foo Fighters
|title=Example
|user=foofighters
|number=1026546600946982912/video/1
|access-date=August 10, 2018
}}

The problem is due to "/video/1" in the number parameter and is easily fixed, but perhaps the module could show that number is invalid. Johnuniq (talk) 02:43, 25 October 2021 (UTC)

Not a cs1|2 module. Editor Elli is the author of Module:TwitterSnowflake so perhaps Editor Elli is the correct person to fix this bug.

—Trappist the monk (talk) 02:58, 25 October 2021 (UTC)

Been meaning to get to that. Thanks for the reminder. Elli (talk | contribs) 03:06, 25 October 2021 (UTC)

Done I will note that while I was the original creator and maintainer of Module:TwitterSnowflake, the error was actually caused due to code in Module:Cite tweet which I did not write. Regardless, I have fixed it. Elli (talk | contribs) 03:30, 25 October 2021 (UTC)

Usurped titles

There are usurped URLs, we also have usurped titles. See here 38 domains have been usurped by a gambling site, then ReFill or Citation bot add a missing |title= pulled from the gambling site - d'oh. My bot can deal with the usurped URLs but what about the titles: delete the title, or replacing with a place holder? If it is deleted, it won't stop other tools from re-adding the usurped title again. We could notify tool makers, but there is no guarantee they will implement, or future tools will be created, there is also global wikis with the same problem. IMO a placeholder title (eg. |title=Usurped title) that enters a tracking category with a help message would lock up the title field from usurpation until someone can manually add a working title (if one can be found). -- GreenC 19:18, 20 October 2021 (UTC)

As you may remember there have been several discussions over usurped URLs. In the case of online-only citations such as in {{cite web}}, the usurped URL disqualifies the citation, which should be removed as unverifiable. Whether the title has been also usurped or not, it makes no difference. Isn't this a simpler solution? 98.0.246.242 (talk) 20:57, 20 October 2021 (UTC)

I should add, unless there are archived urls with the correct info, assuming the content is static. If the archive becomes the live version, and the title was usurped but then corrected, maybe the status of archive-url (as live-url) could act as a flag to lock the title. 98.0.246.242 (talk) 21:05, 20 October 2021 (UTC)

Most cite webs have archives available, and those that don't will be up to someone else to delete, my bot does not delete citations. My bot adds archive URLs and toggles |url-status=usurped - but it does not determine a working title. It can't just leave a spam usurped title, it needs to do something. The question is: What? -- GreenC 21:41, 20 October 2021 (UTC)

I'm confused. I thought I understood your first post as saying that your bot could not deal with |title= (My bot can deal with the usurped URLs but what about the titles). But here you are saying that your bot can't just leave a spam usurped title, it needs to do something. But if your bot does not determine a working title how can it know that the title in |title= is a spam usurped title? Too many 'buts'.

I'm all in favor of identifying bogus titles, emitting an error message, and putting the article in an error category. We should have done that with |title=Archived copy; only a handful of gnomes who have enabled maintenance messaging see the maintenance messages associated with Category:CS1 maint: archived copy as title‎ (52,089).

—Trappist the monk (talk) 22:06, 20 October 2021 (UTC)

A distinction should be made between generic titles such as "Archived copy" and titles edited to reflect a usurped URL. In the latter case, the {{cite web}} citation itself is bogus and should be deleted, unless an archive exists that verifies the wikitext just like the now non-existent (for verification purposes) original. Non-visible editor flags+cats to mark usurped citations as "delete" or "replace" may be more useful, and may cause fewer complaints. 68.174.121.16 (talk) 23:59, 20 October 2021 (UTC)

Wonderful. In this particular case my bot can identity the bogus titles through keywords known ahead of time (see the "See here" link above) so it can replace with |title=Usurped title. The keywords in the title is actually how the 38+ domains where first identified as being usurped. -- GreenC 01:49, 21 October 2021 (UTC)

Explicitly showing a citation with usurped URL and title as unreliable (per the proposed solutions above) is counterproductive and unnecessary. Keeping it in any form shows that the project is unreliable. The objective should be to remedy the situation. If there is an archive, the bot should make the archive the live link. If there is no archive, the bot should flag the citation for deletion/replacement and at a minimum, it should be immediately hidden/nonprinted. One would think this area of Wikipedia should be concerned with lessening its unreliability, not merely trumpeting it. The proposals fall short of the mark. 71.247.146.98 (talk) 12:51, 21 October 2021 (UTC)

I have added Usurped title (case insensitive) to the list of generic/bogus titles in the sandbox:

{{cite book/new |title=usurped title}}

usurped title. {{cite book}}: Cite uses generic title (help)

—Trappist the monk (talk) 13:21, 21 October 2021 (UTC)

And? This is not a case of a badly formatted date or wrong format something-or-other. A usurped title/usurped url combination disqualifies a {{cite web}} citation. It cannot be "helped". 71.247.146.98 (talk) 13:42, 21 October 2021 (UTC)

You clearly have 0 experience repairing cite web templates, or any other such templates as discussed above, and should honestly stop talking about their legitimacy. Nothing you have said on the point is true or valid. Izno (talk) 14:08, 21 October 2021 (UTC)

And why would you know what kind of experience someone has? It has nothing to do with your opinion. Also this is not about "repairing" a template, but about adding a misguided "error" message. There is no error. A citation with the usurped combination described above cites nothing and should be removed. There is no fixing it. Unless, as pointed out several times above, there is an archive, in which case a different action was proposed. Your comment above seems to fit you better, no? 74.66.14.68 (talk) 16:06, 21 October 2021 (UTC)

Yeah, I'm done responding to you. Izno (talk) 16:11, 21 October 2021 (UTC)

(edit conflict)

Clearly you have ignored what Editor GreenC wrote: My bot adds archive URLs and toggles |url-status=usurped - but it does not determine a working title. Setting |url-status=usurped masks the original (usurped) |url= in the citation's rendering when |archive-url= has a value:

{{cite web |title=Title |url=//usurped-examples.com |archive-url=//archive.org |archive-date=2021-10-21 |url-status=usurped}}

"Title". Archived from the original on 2021-10-21.

In the rendered citation, readers do not have access to the usurped url but do have access to the archive url. cs1|2 maintains a very short list of generic/bogus titles. If Editor GreenC's bot cannot supply a |title= value but can recognize a usurped title and replace that title with a value that will cause cs1|2 to emit an error message and category, a human or a 'title-finding' bot can make a repair.

—Trappist the monk (talk) 14:21, 21 October 2021 (UTC)

I very much understand what GreenC wrote. And am in agreement with his bot action. The whole point is that this is not an "error" to be signaled the same way a wrong date etc. is. Leaving in place the proposed wording devalues the citation and therefore the article. Why should a citation with "Usurped title" be announced or shown? It will be seen by some editors but perhaps thousands and thousands of readers. Another approach is needed. 74.66.14.68 (talk) 16:16, 21 October 2021 (UTC)

Thank you. We apparently have a developing problem involving 100s (1000s?) of domains that expired and were re-registered by a bad actor who then sold/leased them to a gambling site (and probably others) to create instant SEO and spam on Wikipedia without making a single edit. Fiendishly clever, and a hard problem to track and deal with since domains expire all the time across 900+ wiki sites. Do you think CS1|2 could have a role, such as detecting the domains from a list and automatically treating as usurped and tracked until someone can hard set |url-status=usurped? I can hard set on enwiki fairly soon, on other wikis it could be a long time if ever. -- GreenC 16:34, 21 October 2021 (UTC):

You are suggesting something like a blacklist of known bad |url= values? Isn't that what edit filters are supposed to be? Of course, if these urls get added to an edit filter, all of a sudden editors won't be able to publish other needed article changes until they do something about the blacklisted url. I don't have a lot of experience with edit filters, but I recall being stymied when I couldn't discover from the message just which one of dozens or more urls was the one that prevented publishing the article. Does anyone know if that has improved and the can't-publish-this-page-because-it-has-a-banned-url error message contains a clue about which url(s) triggered the edit filter? If there has been that improvement, then cs1|2 should, I think, stay out of it and let the edit filters do their jobs.

—Trappist the monk (talk) 17:03, 21 October 2021 (UTC)

It has not improved. That said, if there is a bot taking care of it globally (for citations and otherwise), I don't see a reason to add to CS1. Let the bot do its work and then add the URL to the blacklist after that. Izno (talk) 20:08, 21 October 2021 (UTC)

No global bot exists for this. Trappist the monk when a domain becomes usurped there are two things to be done: 1. stop any new URLs from being added ie. edit filter. 2) convert existing URLs to |url-status=usurped. The problem with 2 globally is no bot exists for the indefinite future. It's actually quite difficult to usurper a domain: add archive URLs, flip the url-status, undo {{webarchive}} and convert to straight archive, remove entirely citations that have no archive, convert bare/square links to archive. To do it globally is a major undertaking due to the language and template differences. It would help, though be incomplete, if CS1|2 could detect from a list of domains and display as-if |url-status=usurped. -- GreenC 05:15, 22 October 2021 (UTC)

I'm not sure that cs1|2 should be in the business of doing the work that edit filters are supposed to be doing. It is technically possible for cs1|2 to maintain a url blacklist and it can internally set |url-status=usurped when such a url is detected. Setting |url-status=usurped does nothing when |archive-url= is empty or missing so special code would need to be written to suppress the url when the internal |url-status=usurped is set. I fear that once added to a blacklist, urls will never be deleted so the list will grow until sometime down the road the system collapses because the size of the blacklist will push some articles over the lua memory-use or execution-time limits.

If this is a 'global' problem, then a 'global' solution should be applied. Perhaps a suggestion for a global edit filter should be put forward at phabricator.

—Trappist the monk (talk) 14:17, 22 October 2021 (UTC)

OK understand the concern about scalability, it's network speed to pull a list from somewhere central is not good with a long list and everything else the template does. Edit filters create new problems: users can't make changes such as setting |url-status=usurped or adding an archive URL - the edit filter blocks the corrective. They are left to delete the cite or URL. Since every domain dies eventually , and some percentage of those will be hijacked, long-term it is a problem so trying to explore other solutions. Bots are the best answer, also the hardest. -- GreenC 05:16, 23 October 2021 (UTC)

Pseudocode proposal (usurped routine):

Is url-status=usurped?
Y
Is archive-url=[non-empty]?
Y
url=archive-url
via=archive service name
Is title usurped?
Y
title=[original title from archive]
exit (usurped routine)
N (archive-url=empty)
Is cite web?
N
Is title usurped?
N
hide span url url-access access-date format via
exit (usurped routine)
Y (cite web)
flag delete/replace
hide cite span
exit

104.247.55.106 (talk) 01:55, 23 October 2021 (UTC)

Edited for non-{{cite web}} cases. 66.108.237.246 (talk) 13:17, 24 October 2021 (UTC)

Edited for non-{{cite web}} cases w. usurped titles. 65.88.88.46 (talk) 15:42, 25 October 2021 (UTC)

Zotero plugin?

Is anyone aware of a Zotero plugin for Firefox?

I'm aware that Zotero allows for exporting of references into CS1, but it does so by producing a text file with the code. I was hoping there might be some plugin that let me insert it in browser, much like you can insert Zotero references in MS Word.

Cheers.Kylesenior (talk) 05:25, 24 October 2021 (UTC)

Pigsonthewing may know. Izno (talk) 16:31, 25 October 2021 (UTC)

I'm not clear what you mean, but see Wikipedia:Zotero. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:44, 25 October 2021 (UTC)

New cite case?

{{cite case}}
{{User:MJL/sandbox3}}

Case citations work pretty much like {{cite book}}. I've linked an example of what I'm proposing above which was done in my sandbox. You can see the result right here. Obviously, things can be expanded later as far as features are concerned, but for now I would like to get thoughts on maybe adding this to the primary CS1-suite of templates. –MJL ‐Talk‐^☖ 01:04, 20 October 2021 (UTC)

Why not update {{cite court}}, which is used in 5,000 articles, instead of reinventing the wheel? See this 2015 discussion for a previous attempt at making that template behave more like CS1 templates. – Jonesey95 (talk) 01:52, 20 October 2021 (UTC)

@Jonesey95: It has parameters that so radically diverge from CS1-templates. I'd rather we start with splitting off from {{cite book}} and adding features than go from the non-Lua-based {{cite court}}. –MJL ‐Talk‐^☖ 02:09, 20 October 2021 (UTC)

If you think that there will be sufficient need for a new {{cite case}} template, create a wrapper template around {{cite book}}. Using Module:template wrapper make all {{cite book}} parameters available and allows you to preset |type=Court case and any other parameters that should hold default or calculated values.

—Trappist the monk (talk) 10:42, 20 October 2021 (UTC)

The existing template is closer to CS2. There is some divergence that is easily remedied. To apply CS1/CS2 style there is no need to use the respective modules. The style consists of field terminators, list separators, capitalization rules, a few punctuation rules (including abbreviations and use of parentheses) and the relative positioning of displayed output.It seems many of the style elements are already in compliance in {{cite court}}, and the editor documentation is more topical to court cases, and better than what the module-related doc offers. I agree with Jonesey's suggestion above, to work with the existing template. 172.254.222.178 (talk) 11:53, 20 October 2021 (UTC)

I really don't agree with that. CS1/2 citations are supposed to use lua templates which allow for more flexibility in citations. –MJL ‐Talk‐^☖ 19:31, 20 October 2021 (UTC)

Not at all. It is actually the other way around. Templates, whether based on Lua or not, apply styles, in this case CS1/CS2. Templates do not make citations flexible. They do the opposite: they standardize them, and therefore limit them to certain classes of cases, because by definition they apply the underlying styles rigidly. There were citations, and citation styles, way before the first citation templates were designed. This is not rocket science. 98.0.246.242 (talk) 20:45, 20 October 2021 (UTC)

(Gonna guess you're the same IP user as before since both are Spectrum Business IPs in New York)

What I mean to say is that they can handle flexible use cases and output the right standard citations which comply with our style. –MJL ‐Talk‐^☖ 21:15, 20 October 2021 (UTC)

I don't think there is disagreement there. It is not correct to limit application of a style to any template, set of templates or any other similar formatting helpers. They do not matter as long as the reader is offered the same consistent citation presentation (style). In this case, it may be easier to edit the {{cite court}} source in order to conform to CS1 or CS2 for readers, rather than adding a meta-template. For editors, the proper parameter aliases can be added to the existing template to bring it more in line with the module whitelist. A few of the generic CS1/CS2 details can be added to the more specialized existing documentation after it is amended to accommodate the source code edits. As a bonus, if this is handled carefully, it will port the existing instances to the chosen style. 68.174.121.16 (talk) 23:37, 20 October 2021 (UTC)

The issue for me is that {{cite court}} has 4712 tranclusions which would all need to be updated unless things like |vol= get held over as an alias. This would make {{cite court}} the only CS1/CS2-style citation template that has that alias. That is not even getting into the fact |lang=, |via=, |url-status=, |archive-url=, |archive-date=, |pinpoint= are all inconsistent with all other citation templates. –MJL ‐Talk‐^☖ 15:13, 23 October 2021 (UTC)

There is a bias to look at everything as an editor (or worse, a developer). The consumers of citations are readers. Presentation (style) is also targeted exclusively to readers. So first, see it the way a reader would. If you follow a citation style you apply style elements: punctuation, capitalization, delimiters, positioning, emphasis etc. etc. The citation data is a different animal and is not related to style, so do not mix the two like apparently everyone else here does. The data is basically two things: 1. the source 2. information on locating the source. The type of that information is not arbitrary. Sources are classified by aggregators (trade databases, library union databases, research databases, government databases etc.) in a certain fashion, out of which indices are built. The great majority of these indices traditionally index one and/or two bits from the following: title, author, identifier. Then, they sub-index from another limited number of bits: pub. date, publisher, location, editor etc. If you want to create efficient citations, for say, court cases, first you find out how such information is classified by the primary providers and their aggregators. Then you build the citation data around it. Which means, you label your parameters accordingly. If court cases are generally classified with labels such as "plaintiff" or "litigant" that is what you use, and that is what editors of such citations should expect. Eventually you arrive at the far less important (for readers+editors) stuff that developers do, which is to standardize. Here there are certain core arguments in the main module: "work" (the source) being the most important. Every source label (book/journal/website/sign/podcast/speech etc. etc.) can be aliased to this. In this case, reporter=work. You need to keep "vol"? make it an alias of the core argument for volumes. a "litigant" is "people" in the current setup. And so on. It is commendable that you want to work on this drudgery, but let's not forget what citations are here for. 65.88.88.71 (talk) 16:27, 23 October 2021 (UTC)

I think MJL may be misunderstanding how wrapper templates work. Take a look at {{Cite scar}}, for example, which is a CS1 wrapper template that takes its own parameters but displays them using {{cite encyclopedia}}. {{Cite court}} could keep |vol= and all of its current parameters, add support for parameters like |archive-url= very easily, and add CS1's error-checking and other features. – Jonesey95 (talk) 19:36, 23 October 2021 (UTC)

@Jonesey95 and 65.88.88.71: I understand that as a concept for how template wrappers work. I just feel that there isn't a justifiable reason for |vol= to be supported as an alias when the underlying templates don't. |vol= isn't like |litigants= which is specific to this use case, but it is just a general way one could refer to volume.
If I may explain my intentions here a bit better, I originally saw a court case being cited using {{citation}}/{{cite web}} which is not currently well-equipped for this use case. I would like to have an immediate and obvious alternative to using those in something {{cite case}}. If someone wants to further work on {{cite court}} to make it more like CS1/2 in output and such, then I would support that. I, however, do not have such capabilities. What I can do is make a somewhat decent wrapper for {{cite book}} in User:MJL/sandbox3. –MJL ‐Talk‐^☖ 20:01, 23 October 2021 (UTC)

The name "cite court" may be better, since what is cited is an in-source location in a specific court's published case reports (which is the "work" in this case). If it is named "cite case" it may be understood by editors to imply that the particular case is published, and can be therefore found, as a standalone item, which I don't believe is correct here. 65.88.88.46 (talk) 15:36, 25 October 2021 (UTC)

I mean in a way the case is published as it could be understood? I mean we're definitely citing the case here at least. –MJL ‐Talk‐^☖ 18:37, 28 October 2021 (UTC)

False positive match on generic title=Wayback Machine

In Wayback_Machine#cite_ref-DigitalJournal_31-0 .. a generic title error on the phrase 'Wayback machine' is actually a legit part of the title. -- GreenC 00:58, 29 October 2021 (UTC)

Accept-this-as-written markup (plus: original url is dead; the article has an author and a publication date; name of the source, not its address, goes in |website=)

{{cite web |first=Alexander |last=Baron |title=((The new Internet Archive Wayback Machine now online)) |url=http://www.digitaljournal.com/article/360776 |website=Digital Journal |date=October 23, 2013 |access-date=November 19, 2020 |archive-date=November 19, 2020 |archive-url=https://web.archive.org/web/20201119071411/http://www.digitaljournal.com/article/360776}}

Baron, Alexander (October 23, 2013). "The new Internet Archive Wayback Machine now online". Digital Journal. Archived from the original on November 19, 2020. Retrieved November 19, 2020.

—Trappist the monk (talk) 01:12, 29 October 2021 (UTC)

Updated. -- GreenC 01:31, 29 October 2021 (UTC)

QID

Are there any plans to link the citation templates with Wikidata? I was thinking of 2-way connections:

The citation templates would accept a |qid= parameter pointing to a Wikidata entry for the source book, magazine, website, whatever. This would pull in values from Wikidata for attributes that were not given in the template. In some cases only the QID and page number would have to be supplied to get a complete citation
A bot would periodically migrate attribute values from Wikipedia to Wikidata. The articles would now get the attribute values from Wikidata, which can be maintained centrally.

To confirm practicality, I made a very crude template at User:Aymatth2/citeQ to pull values from Wikidata. There is no error checking, but it seems to work:

Code	Renders
{{User:Aymatth2/citeQ \|Q25169 \|page=123}}	Douglas Adams, Eoin Colfer (1979), The Hitchhiker's Guide to the Galaxy, p. 123
{{User:Aymatth2/citeQ \|Q4386569 \|page=34}}	Beatrix Potter (October 1903), The Tailor of Gloucester, Frederick Warne & Co., p. 34
{{User:Aymatth2/citeQ \|Q313030 \|page=456}}	Edward Gibbon (1776), The History of the Decline and Fall of the Roman Empire, p. 456

The advantage would be complete and consistent source descriptions rendered from a single vetted Wikidata entry. The citations would be the same across all articles that use the source apart from page number. Error messages or hidden categories could be generated when the Wikipedia values did not match the Wikidata values, so they could be tracked down and corrected. I am sure there are all sorts of complexities: Books have different editions, journals get new publishers, articles are spread over multiple magazine editions, etc.. But is there any reason why we would not work towards implementing something like this? Or is it in the works already? Aymatth2 (talk) 14:03, 3 October 2021 (UTC)

Umm, {{cite Q}}?

There are no plans to link cs1|2 templates to Wikidata.

—Trappist the monk (talk) 14:09, 3 October 2021 (UTC)

More or less, there's a lot of WP:BEANS/Vandalism-related reasons for why using Wikidata for citations is undesirable, as well as several style reasons for why we don't want to do that either. We killed {{cite doi}}/{{cite pmid}} etc... because of it, and {{cite Q}} is likewise not widespread for the same reason, and should be removed from articles whenever found. Headbomb {t · c · p · b} 14:19, 3 October 2021 (UTC)

Aymatth2: See the talk page for {{Cite Q}} for the reasons why that template is not widespread. It causes CITEVAR problems, primarily because of author name formatting, and needs to be used carefully, if at all. – Jonesey95 (talk) 17:32, 3 October 2021 (UTC)

Looks like that was not such an original idea. I had no idea {{cite Q}} existed, but used an almost identical name! I can't see any style problems, since it would just wrap {{citation}}. Vandalism seems no more likely if the information is held in Wikidata – which could be semi-protected to minimize risk. The current Template talk:Cite Q looks like steady progress is being made on resolving issues. One problem that I can see is that I prefer first and last names separated which does not seem to be Wikidata standard. This is mainly to keep {{sfn}} entries short. The benefits generally seem to outweigh the drawbacks. Aymatth2 (talk) 17:45, 3 October 2021 (UTC)

{{Cite Q}} does take mode= parameters for the most basic citation variation issues (CS1 vs CS2) but for any variation more subtle than that, like how to abbreviate author names or journal titles, or how to order given names vs surnames, you're forced to enter things manually, obviating much of the point of Cite Q. And as you say, separating author names into first and last is essential for styles that use sfn or other Harvard-style links, so if cite Q doesn't do that, then it cannot be used in many situations. —David Eppstein (talk) 07:00, 31 October 2021 (UTC)

Changes to Cite news/doc

48Pills: in this edit, alias of 'Lay summary' is not correct. Please change it back to the actual parameter name. – Jonesey95 (talk) 05:24, 3 October 2021 (UTC)

We should just deprecate and remove |lay-date=, |lay-format=, |lay-source=, and |lay-url=. I have marked these parameters as deprecated in the ~Whitelist/sandbox and will change our documentation to reflect that state.

Of course, now that I've done that, I expect that somebody's knickers will get in a twist and I'll all end up at some drama board. Those parameters are not amenable to replacement by bot because some human must decide if they are important to the en.wiki article and then create a separate cs1|2 template for those sources. Creating a maintenance category is possible but very, very few of us even know that maintenance categories exist so it will be years before the last |lay-<param>= is removed (if ever).

—Trappist the monk (talk) 12:16, 3 October 2021 (UTC)

I've reverted them, for the second time now. 48Pills, you must gain consensus for these changes, otherwise they will be reverted again. Changing the standard parameter presentation order without good reason is not acceptable. Headbomb {t · c · p · b} 13:59, 3 October 2021 (UTC)

If you had bothered to read every part of the edit you would have seen it involved a change far more important than a re-ordering of the presentation, but that's how editing works on here isn't it. Destroy hours of work rather than let the least important things go. 48Pills (talk) 16:59, 26 October 2021 (UTC)

Please see WP:BRD. We are at the "discuss" part now. Please discuss. Introducing incorrect information into the documentation of one of Wikipedia's most-used templates is not desirable. – Jonesey95 (talk) 17:11, 26 October 2021 (UTC)

I do agree with Trappist the Monk, but the documentation must be kept current and accurate at all times! Rlink2 (talk) 00:54, 1 November 2021 (UTC)

ISSN in portal.issn.org not in WorldCat

When I use the issn= parameter a link to worldcat is automatically generated. Some ISSN values are valid and in portal.issn.org but not WorldCat. Example: https://portal.issn.org/resource/ISSN/2531-4661 vs. https://www.worldcat.org/issn/2531-4661 . Is there any way to control the automatically-generated link or just disable it? Thanks Jamplevia (talk) 21:52, 30 October 2021 (UTC)

Except that worldcat can identify libraries that hold at least some issues of a periodical identified by an ISSN, as I understand it from what others have written here, |issn= is a low value parameter. There are other opinions expressed at WP:ISSN. Apparently, portal.issn.org does not aid a reader of an en.wiki article in locating a copy of the periodical so, from that perspective, is of little use to our readers.

You can always write something like this after the cs1|2 template's closing }}:

[https://portal.issn.org/resource/ISSN/2531-4661 ISSN 2531-4661 at issn.org] → ISSN 2531-4661 at issn.org

If there are a lot, a lot, of ISSNs like the one in your example, then perhaps we can consider finding some mechanism to link to portal.issn.org.

—Trappist the monk (talk) 22:40, 30 October 2021 (UTC)

Or use |id=. Izno (talk) 23:23, 30 October 2021 (UTC)

Thank you both. I was able to combine the advice to use |id=[https://portal.issn.org/resource/ISSN/2531-4661 ISSN 2531-4661 at issn.org] Jamplevia (talk) 15:39, 31 October 2021 (UTC)

A search for the issn at Ulrichsweb found nothing, which is a bit unusual (they currently index 380000+ serials in 200 languages). The journal may be either brand new or in very restricted circulation? 68.173.76.118 (talk) 01:23, 31 October 2021 (UTC)

It seems this would fall under "limited circulation". It is a local Italian newspaper published in the city of Taranto. It may or may not appear at Worldcat, most likely depending on whether participating Italian libraries carry/classify the item. 104.247.55.106 (talk) 14:25, 31 October 2021 (UTC)

HTML markup

Use of Templates, HTML, and HTML entities within citation templates

For the record, this is the original thread started on User talk:Beland referenced below. I moved it here to clear out my list of unread messages, and because it has pertinent technical details. -- Beland (talk) 20:42, 8 February 2022 (UTC)

You recently edited Langbeinites a couple times to replaced UTF numeral sub/superscript characters with either {{chem2}} or HTML ... or ... in the |title= field in {{cite}} templates. In both cases, this is not recommended because many fields of the various {{cite}} templates generate COinS metadata, which is used for citation cross-compatibility on the Internet, beyond just Wikipedia. See Template:Citation Style documentation/coins for {{cite}} fields that are COinS-producing. — sbb (talk) 12:58, 15 June 2021 (UTC)

@Sbb: Is there a standards document which defines what is and isn't allowed in COinS strings? How would, say, italics normally be represented? Thanks! -- Beland (talk) 01:49, 15 July 2021 (UTC)

@Beland: (I outdented my reply because some of the formatting I used doesn't like to be part of the wikitext : indentation). Well, since COinS strings are emitted entirely as the value of the |title= parameter in empty HTML  tags, the only thing allowed in COinS strings is what can be in HTML attribute values. That's pretty much plain ASCII and URL-escaped entities. As an example, I created 3 references to a fake {{cite book}} reference titled H₂O and r², using 3 different ways to markup the super- and subscripts (note also the that the r is italicized with wiki markup):

HTML tags: |title= H2O and ''r''2^[1]
HTML entities: |title= H₂O and ''r''²^[2]
Unicode characters: |title= H₂O and ''r''²^[3]

References

^ sbb (2021a). H₂O and r².
^ sbb (2021b). H₂O and r².
^ sbb (2021b). H₂O and r².

Generated COinS data

ref 1:

<span title="ctx_ver=Z39.88-2004&
rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&
rft.genre=book&
rft.btitle=H%3Csub%3E2%3C%2Fsub%3EO+and+r%3Csup%3E2%3C%2Fsup%3E&
rft.date=2021&
rft.au=sbb&
rfr_id=info%3Asid%2Fwiki.riteme.site%3AUser%3ASbb%2Fsandbox" class="Z3988">
</span>

ref 2

<span title="ctx_ver=Z39.88-2004&
rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&
rft.genre=book&
rft.btitle=H%26%238322%3BO+and+r%26sup2%3B&
rft.date=2021&
rft.au=sbb&
rfr_id=info%3Asid%2Fwiki.riteme.site%3AUser%3ASbb%2Fsandbox" class="Z3988">
</span>

ref 3

<span title="ctx_ver=Z39.88-2004&
rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&
rft.genre=book&
rft.btitle=H%E2%82%82O+and+r%C2%B2&
rft.date=2021&
rft.au=sbb&
rfr_id=info%3Asid%2Fwiki.riteme.site%3AUser%3ASbb%2Fsandbox" class="Z3988">
</span>

Note that in ref1, the plain HTML 2 and 2 are URL-escaped, telling anybody who consumes/uses that COinS string that the book's title is "H2O ...". It puts the constraint on the resource consumer to correctly parse HTML. Same situation with ref2, only instead of having to parse HTML  and  tags, they have to parse HTML entities. Still requires HTML parsing.

Only the last one, ref3, doesn't require HTML parsing, because the URL-escaped Unicode characters will be correctly interpreted.

Having said all that, note that wikitext is stripped from the data during Wikipedia's COinS generation. So no italicization, bolding, etc., get emitted into the COinS strings. This means that something like a title like, "Book about USS Iowa", will get interpreted as Book about USS Iowa.

— sbb (talk) 19:49, 15 July 2021 (UTC)

@Sbb: Hmm, so it looks like both HTML and Unicode subscripts go through the system intact, it's just a matter of which format the downstream consumers want? How do we know that? I couldn't find any documentation as to the convention there, but perhaps there are databases we could look in? -- Beland (talk) 03:07, 11 September 2021 (UTC)

it looks like both HTML and Unicode subscripts go through the system intact [...] I wouldn't think of it that way. Per the OpenURL spec^[1], "Recognizing the international environments in which ContextObjects will be used, the Committee selected Unicode as the abstract character repertoire for ContextObjects." The data is represented by Unicode, and encoded as UTF-8. An OpenURL parser is required understand Unicode, so a Unicode subscript character's representation is consistent. But parsers aren't required to then interpret the received Unicode string as partial HTML markup. So an HTML substring is just that: some characters in the ASCII-range that may or may not be HTML, and aren't required to be parsed as such.

I think it's safe to assume the downstream consumers want plaintext (where plaintext is Unicode text encoded as UTF-8). Also, I point again to Template:Citation Style documentation/coins, which states,

Use of templates within the citation template is discouraged because many of these templates will add extraneous HTML or CSS that will be included raw in the metadata. Also, HTML entities, for example  , –, etc., should not be used in parameters that contribute to the metadata.

I think that also strongly implies not to manually embed HTML in |title=, etc. fields. — sbb (talk) 21:04, 11 September 2021 (UTC)

I see you edited the cite style doc to allow for Unicode super/scripts, and was quickly reverted by Trappist. I think this needs more consensus before making that change. Note that this is more than a discussion at just the template doc page; it's also potentially a change to MOS:SUPERSCRIPT (Do not use the Unicode subscripts and superscripts ²and ³, or XML/HTML character entity references (² etc.).). I started that discussion several months ago, and it didn't gain much traction: Wikipedia talk:Manual of Style/Superscripts and subscripts § Add exception to allow Unicode super/subscripts in COinS fields in cite xxx templates? — sbb (talk) 22:52, 11 September 2021 (UTC)

Main discussion

This is the original thread started on this page. -- Beland (talk) 20:42, 8 February 2022 (UTC)

@Trappist the monk: Greetings! To answer your question raised in this revert, Sbb started a thread at User talk:Beland#Use of Templates, HTML, and HTML entities within citation templates. I think that happened because I was going around changing articles (including citations) to conform with MOS:FRAC and Wikipedia:Manual of Style/Superscripts and subscripts, and the current guidelines result in HTML markup instead of Unicode precomposed fractions, superscripts, and subscripts. I couldn't find an authoritative COinS specification that explains how to handle superscripts, fractions (including those not available as precomposed characters), italics, and other markup in fields. I thought Sbb was advocating without opposition that Unicode characters be used instead of markup, and I was starting to change the guidelines to reflect that when we got your attention. Sbb also pointed out there has been opposition at Wikipedia talk:Manual of Style/Superscripts and subscripts. So, it would be good to discuss so I can get some clarification on what the consensus is here so I can update my spellcheck code and guideline pages if necessary. There are several possibilities for what to do:

Use Unicode characters whenever possible (but markup is difficult to avoid in 100% of cases)
Use HTML when necessary to follow MOS guidelines, but avoid templates because they tend to spew unwanted HTML markup, and expect downstream consumers to parse ... etc.
As Headbomb suggested, follow Wikipedia guidelines for display purposes, but write some code so that citation templates give downstream COinS consumers output translated into no-markup Unicode, or whatever is needed in any particular case.

Thoughts? -- Beland (talk) 02:59, 12 September 2021 (UTC)

I'm not Trappist, but I strongly disagree with your suggestion to use Unicode substitutes in references, and its implication that these can be adequate substitutes for mathematics formatting in reference titles. They are not adequate substitutes. In particular, they are very limited in their application and frequently incompatible in appearance with the proper mathematics formatting that is required when their limits are reached. —David Eppstein (talk) 04:33, 12 September 2021 (UTC)

The COinS metadata is carried in the title="..." attribute of an empty HTML ... element that also has the attribute class="Z3988". HTML attributes cannot contain markup of any kind, so if it can't be sanitised to remove the markup, it must be omitted in the first place. --Redrose64 🌹 (talk) 07:22, 12 September 2021 (UTC)

@Redrose64: Markup can appear there, but it does need to be encoded. The examples Sbb left on my talk page use percent-encoding, so for example would be encoded as %3Csub%3E. It looks like other fields (like the URL of the page) also use percent-encoding, so downstream consumers would be expected to percent-decode out of course? The result of that decoding could be HTML or no-markup Unicode or MathML or whatever. -- Beland (talk) 16:49, 12 September 2021 (UTC)

My conclusion would be, rather, that if COINS cannot represent accurate references, then we should drop COINS instead of using it as an excuse to force our references to be inaccurate. The tail is wagging the dog. We must be able to cite papers like, say, Pintér, Ákos; de Weger, Benjamin M. M. (1997). "

{\textstyle 210=14\times 15=5\times 6\times 7={\binom {21}{2}}={\binom {10}{4}}}

". Publicationes Mathematicae Debrecen. 51 (1–2): 175–189. MR 1468225.. If our COINS conversion produces garbage like "rft.atitle=MATH+RENDER+ERROR" because COINS is incapable of representing such titles, prevents us from including the nowrap preventing the horrible line break between double quote and start of title, or worse, prevents us from even being allowed to specify such titles, then the problem is COINS. Find some other way of getting your metadata-scraping fix. —David Eppstein (talk) 07:24, 12 September 2021 (UTC)

This "MATH+RENDER+ERROR" thing is a placeholder inserted by us for math objects for which we cannot generate sensible metadata. As was correctly pointed out already COinS basically wants plaintext, but since all data gets encoded we are not limited to what the HTML title= attribute would allow for and could also pass down almost any kind of other stuff. The problem is that "other stuff" doesn't make sense at the receiver. I think, for as long as the title occasionally contains simple markup like  this is easy enough to be parsed correctly even by humans, but most math stuff is more complicated.

What do other COinS producers do in such cases?

Was/is there some standard notation how to transliterate math into ASCII for example in old newsgroup posts? If so, we could try to translate math blocks into this and make it part of the metadata.

In some cases stripping off all markup and leaving only plain text and digits might also create a string which could still be good enough for humans to recognize a title or to work as a search pattern, but it would hardly be ideal.

Yet another solution could be to provide a so called descriptive title |descriptive-title= in addition to the proper title |title= and if the proper title is too complicated to use for metadata, pass down the descriptive title instead. --Matthiaspaul (talk) 09:34, 12 September 2021 (UTC)

Regarding your remark on nowrap preventing the horrible line break between double quote and start of title, I haven't seen this yet. Can you provide an example? --Matthiaspaul (talk) 09:43, 12 September 2021 (UTC)

I now saw that you added an example using nowrap above, so this is no longer necessary. To illustrate your remark here is the example without the nowrap:

Pintér, Ákos; de Weger, Benjamin M. M. (1997). "

{\textstyle 210=14\times 15=5\times 6\times 7={\binom {21}{2}}={\binom {10}{4}}}

". Publicationes Mathematicae Debrecen. 51 (1–2): 175–189. MR 1468225.

--Matthiaspaul (talk) 11:53, 12 September 2021 (UTC)

The main thing to keep in mind is that citations are information-discovery helpers, and the data they carry must be in the format easiest to be found, which is, exactly as the source information presents it, "funny" characters and all. By "source information" I mean the index entry for the work in the various classification databases, which is what a reader will be presented with when discovering the source. Whatever WP:MOS says is secondary. And, the suggestion of dropping COinS if it cannot appropriately represent this data is apt. 65.88.88.57 (talk) 12:12, 12 September 2021 (UTC)

COinS does not really "represent" anything by itself, it is a method to transfer data using OpenURL in a structured way. We would not have any problems to pass the math blob in David's example to the receiver, the problem is that the receiver will most likely not be able to make any sense of it, and rather then just dropping onto them what for them is just a blob of strange binary data, we insert a placeholder hoping that at least the remainder of the title is useful enough for the receiver to make sense of it.

I'm not aware of another metadata standard which would have a reliable solution to this problem. Are you?

Regarding the index entry you mentioned, how would a work such as in David's example be represented in your classification databases? The question is in regard to the visual appearance as well as how it is encoded there. Is this something that can be derived from the proper title, or is it a descriptive title?

--Matthiaspaul (talk) 12:53, 12 September 2021 (UTC)

As was remarked on above by David regarding COinS or any metadata, this is looking at the problem from the wrong end. Metadata representation of any kind (let's forget the underlying problems with OpenURL for the moment) is secondary. This is about presenting data (to the human reader). Reference databases, including specialized databases of mathematics works, build their indices by using data entered as they appear on the work itself, unless the reference/classification database has weird data-entry quirks. Citations should pass the index data "as is", because that is the easiest way to find the underlying source, with very, very few exceptions. It is that simple. If Unicode, COinS or whatever else cannot handle that, then out they should go. 98.0.246.242 (talk) 13:52, 12 September 2021 (UTC)

Matthiaspaul: The list at https://publi.math.unideb.hu/searchb.php shows this paper as "210 = 14 × 15 = 5 × 6 × 7 = {21 2} = {10 4}". That clickable link leads to https://publi.math.unideb.hu/load_jpg.php?p=391 which displays a JPEG. -- Michael Bednarek (talk) 14:00, 12 September 2021 (UTC)

Thanks, Michael, this was helpful. But we need more such examples, also more complicated ones to derive patterns from it. --Matthiaspaul (talk) 20:18, 12 September 2021 (UTC)

Other sites' failure to display reference titles properly should not be an excuse for us to fail at the same thing. The pdf link from that site to the actual paper shows how it should be formatted. —David Eppstein (talk) 21:48, 12 September 2021 (UTC)

It looks like I was misunderstood by the IP and by David. I never said this, quite the opposite - I am not searching for excuses but for solutions.

But turning off COinS, as was suggested, is not a good idea, as it still transmits useful info. In the worst case only the offending data should be muted - and basically that's what we do right now with our "MATH+RENDER+ERROR" placeholder (although we should try to do better).

A PDF, as suggested, won't help either, it's just a binary and not much different from passing over a photo of the printed book title or graphical image of our local rendering. For one, we cannot assume that a picture or a PDF can be viewed on the receiver's end, but it also can't be used for searches. What we need is some machine- and human-readable formula notation encoded as text so that a search pattern can be derived from it. That's why I was asking what other COinS producers are transmitting in such cases and how such work titles are stored in the (text) title entry of external databases.

--Matthiaspaul (talk) 16:37, 15 September 2021 (UTC)

Two things. 1) No unicode characters. Those are a blight, and should be purged on sight. 2) Readers and accurate rendering of information are the priority. If COinS can't handle something, screw COinS. If magic codefu can be done to convert something non-COinS compliant to something COinS compliant behinds the scene (e.g. ''H''x206 → H_{x}20^{6} or whatever the COinS standard is), great, but it should not require editors to sacrifice accurate rendering. Headbomb {t · c · p · b} 14:51, 12 September 2021 (UTC)

A few considerations and a question:

I assume because of general application of MOS:CONFORM to citations (not just quotations), we generally already change punctuation and other special characters to fit Wikipedia style rather than leave it exactly as in the source material. Mostly this is about using straight quote marks. A more exotic case is when someone tweets in all blackboard bold - that gets rendered as all capital letters in the Wikipedia citation.
When I'm cleaning up special characters in citations, I often find a corrupted title in the document we're citing, whether due to mojibake or some other mishandling. I always correct that so that a human reading Wikipedia with their eyes can see the true title, but if they are searching certain databases they might not find that title verbatim. So while I like the idea of "copy exactly so people can find the original" in theory, in practice we are aggregating from lots of different databases, which may have incompatible and in some cases broken representations. The alternative is "use a consistent representation on Wikipedia" and if our representation is sensible, hope that other databases will use the same or at least be able to normalize our representation to theirs when searching (not to mention web sites that search Wikipedia). Worst case, it should be possible to find the full text of a journal article without the title as a search parameter by using journal, author, and date, though this is clearly not ideal.
What Wikipedia outputs for COinS may in fact impact the standard accepted format (and whether or not COinS becomes popular), since there seems to be no formal standard for markup issues and we're a major web site and not many sites use it. We could also look at the other sites listed on COinS and see how they handle special characters.
For science and math articles, MOS:FRAC says we can either use {{sfrac}} or ASCII fractions like "1/2". For general articles we're supposed to use only {{frac}}. So if we're taking the MOS:CONFORM approach, is the desired outcome to change MOS:FRAC to advise using only ASCII fractions in citations, for all types of articles? (Unless part of a more complicated math formula, of course.) Or should we use e.g. 1/2 to approximate {{frac}} at the expense of polluting COinS output with HTML markup?

-- Beland (talk) 17:29, 12 September 2021 (UTC)

We should cite formulas in references the way the reference formatted it, even when our style guidelines would tell us to use a different style for the same-meaning formula in our own text. So, for instance, it would be correct to cite: Bandukwala, J.; Shay, D. (February 1974). "Theory of free, spin-½ tachyons". Physical Review D. 9 (4): 889–895. doi:10.1103/physrevd.9.889. I've used the Unicode ½ here, but I think it would be better to use {{frac|1|2}} 1⁄2 and that our template's failure to allow that is a bug: Bandukwala, J.; Shay, D. (February 1974). "Theory of free, spin-1⁄2 tachyons". Physical Review D. 9 (4): 889–895. doi:10.1103/physrevd.9.889. {{cite journal}}: templatestyles stripmarker in |title= at position 22 (help) —David Eppstein (talk) 18:57, 12 September 2021 (UTC)

The COinS data for this title includes a stripmarker (which doesn't make sense to pass on, hence the warning):

&rft.atitle=Theory+of+free%2C+spin-%7F%27%22%60UNIQ--templatestyles-0000001B-QINU%60%22%27%7F%3Cspan+class%3D%22frac%22+role%3D%22math%22%3E%3Cspan+class%3D%22num%22%3E1%3C%2Fspan%3E%26frasl%3B%3Cspan+class%3D%22den%22%3E2%3C%2Fspan%3E%3C%2Fspan%3E+tachyons

We could strip off the stripmarker and pass on the remaining HTML. Decoded, this would result in the following string:

Theory of free, spin-<span class="frac" role="math"><span class="num">1</span>⁄<span class="den">2</span></span> tachyons

which renders (almost) nicely as:

Theory of free, spin-1⁄2 tachyons

but only if we can assume a HTML rendering engine at the receiver's end (which we cannot, unfortunately).

Automatically stripping out the attributes would give this much cleaner looking HTML:

Theory of free, spin-1⁄2 tachyons

or in this easy case even:

Theory of free, spin-1⁄2 tachyons

for:

Theory of free, spin-1⁄2 tachyons

It is certainly better to pass this through than to mute the metadata completely, but this obviously isn't a good solution working in all cases either.

Presumably, the code already extracts useful text for COinS metadata from some specific math stripmarkers (the alt= attribute with PNGs, plain text with TeX, or the contents of <annotation> elements with MathML), but this obviously doesn't cover all cases. It might be worth trying to further improve this, but we probably also need a |descriptive-title= to allow editors to specify themselves what should be passed on as metadata.

--Matthiaspaul (talk) 20:18, 12 September 2021 (UTC)

If you think "Theory of free, spin-1⁄2 tachyons" renders "almost nicely" you must be using a very different browser setup than mine, where the 2 is huge and overwritten by the /. Also, we should not be rewriting references to make them fit into COINS; the only thing that should get rewritten is what appears in the COINS metadata. As for "plain text with TeX": no. What you get with the current implementation from TeX is "MATH+RENDER+ERROR" in the COINS metadata. —David Eppstein (talk) 21:44, 12 September 2021 (UTC)

Well, that's why I wrote "almost". ;-) Basically, we are in agreement here. What's important here is that it carries over the semantic information, not that it looks pretty. Our CSS definitions are not available at the receiver's end (and shouldn't), that's why my example looks a bit distorted. But there will always be differences in the output of different rendering engines. It would matter if this would be used for our local display of citations (where we should not compromise on the quality of its appearance), but it does not really matter for metadata purposes. What we need is not a particularly nice visual representation of the metadata, but an accurate semantic description of the math.

--Matthiaspaul (talk) 16:37, 15 September 2021 (UTC)

Previous discussion: Help talk:Citation Style 1/Archive 19 § math ml rendering changes and metadata

Tracked in Phabricator
Task T138229

It used to be that we could extract the content of a math stripmarker and from that content extract a more-or-less human-readable copy of an equation that we could put into the metadata. What was in the math stripmarker depended on the math preferences setting of the editor who last saved the article. Here is what we used to get for this equation example:

<math display=inline>210=14\times15=5\times6\times7=\binom{21}{2}=\binom{10}{4}</math>

for the various math preferences settings:

MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools)

<span class="mwe-math-element"><span class="mwe-math-mathml-inline mwe-math-mathml-a11y" style="display: none;"><math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\textstyle 210=14\times 15=5\times 6\times 7={\binom {21}{2}}={\binom {10}{4}}}">
  <semantics>
    <mrow class="MJX-TeXAtom-ORD">
      <mstyle displaystyle="false" scriptlevel="0">
        <mn>210</mn>
        <mo>=</mo>
        <mn>14</mn>
        <mo>&#x00D7;<!-- × --></mo>
        <mn>15</mn>
        <mo>=</mo>
        <mn>5</mn>
        <mo>&#x00D7;<!-- × --></mo>
        <mn>6</mn>
        <mo>&#x00D7;<!-- × --></mo>
        <mn>7</mn>
        <mo>=</mo>
        <mrow class="MJX-TeXAtom-ORD">
          <mrow>
            <mrow class="MJX-TeXAtom-OPEN">
              <mo maxsize="1.2em" minsize="1.2em">(</mo>
            </mrow>
            <mfrac linethickness="0">
              <mn>21</mn>
              <mn>2</mn>
            </mfrac>
            <mrow class="MJX-TeXAtom-CLOSE">
              <mo maxsize="1.2em" minsize="1.2em">)</mo>
            </mrow>
          </mrow>
        </mrow>
        <mo>=</mo>
        <mrow class="MJX-TeXAtom-ORD">
          <mrow>
            <mrow class="MJX-TeXAtom-OPEN">
              <mo maxsize="1.2em" minsize="1.2em">(</mo>
            </mrow>
            <mfrac linethickness="0">
              <mn>10</mn>
              <mn>4</mn>
            </mfrac>
            <mrow class="MJX-TeXAtom-CLOSE">
              <mo maxsize="1.2em" minsize="1.2em">)</mo>
            </mrow>
          </mrow>
        </mrow>
      </mstyle>
    </mrow>
    <annotation encoding="application/x-tex">{\textstyle 210=14\times 15=5\times 6\times 7={\binom {21}{2}}={\binom {10}{4}}}</annotation>
  </semantics>
</math></span><img src="https://wikimedia.riteme.site/api/rest_v1/media/math/render/svg/4012a8a0261dae95c0a7443dbf67dcb58800df0c" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -1.005ex; width:40.087ex; height:3.343ex;" alt="{\textstyle 210=14\times 15=5\times 6\times 7={\binom {21}{2}}={\binom {10}{4}}}"/>

LaTeX source (for text browsers):

<span class="mwe-math-fallback-source-inline tex" dir="ltr">$ {\textstyle 210=14\times 15=5\times 6\times 7={\binom {21}{2}}={\binom {10}{4}}} $</span>

PNG images:

<img src="https://wikimedia.riteme.site/api/rest_v1/media/math/render/png/4012a8a0261dae95c0a7443dbf67dcb58800df0c" class="mwe-math-fallback-image-inline" aria-hidden="true" style="vertical-align: -1.005ex; width:40.087ex; height:3.343ex;" alt="{\textstyle 210=14\times 15=5\times 6\times 7={\binom {21}{2}}={\binom {10}{4}}}" />

For PNG we took the content of the alt= attribute; for LaTeX we took everything between the paired $...$ ; for MathML we took the content of the <annotation>...</annotation> tag.

And then, suddenly, that ability was taken away from us; see the Phabrication link. Because a math stripmarker is wholly and completely meaningless to anyone consuming a cs1|2 citation via the metadata, I replaced the stripmarker with the text: MATH+RENDER+ERROR. Except for that, all of the rest of the metadata are correct:

<span ...>...
&rft.genre=article
&rft.jtitle=Publicationes+Mathematicae+Debrecen
&rft.atitle=MATH+RENDER+ERROR
&rft.volume=51
&rft.issue=1%E2%80%932
&rft.pages=175-189
&rft.date=1997
&rft_id=%2F%2Fwww.ams.org%2Fmathscinet-getitem%3Fmr%3D1468225%23id-name%3DMR
&rft.aulast=Pint%C3%A9r
&rft.aufirst=%C3%81kos
&rft.au=de+Weger%2C+Benjamin+M.+M.
</span>

so readers consuming the citation via the metadata are likely to be able to locate the source (especially if the title has more to it than an equation). Despite this 'fix', what actually ends up in &rft.atitle= is dependent on the preference settings of the editor who last saved the article:

PNG:

&rft.atitle=%3Cspan+class%3D%22nowrap%22%3E%7B%5Cdisplaystyle+210%3D14%5Ctimes+15%3D5%5Ctimes+6%5Ctimes+7%3D%7B%5Cbinom+%7B21%7D%7B2%7D%7D%3D%7B%5Cbinom+%7B10%7D%7B4%7D%7D%7D%3C%2Fspan%3E

LaTeX:

&rft.atitle=%3Cspan+class%3D%22nowrap%22%3E210%3D14%5Ctimes+15%3D5%5Ctimes+6%5Ctimes+7%3D%7B%5Cbinom+%7B21%7D%7B2%7D%7D%3D%7B%5Cbinom+%7B10%7D%7B4%7D%7D%3C%2Fspan%3E

MathML

&rft.atitle=%3Cspan+class%3D%22nowrap%22%3EMATH+RENDER+ERROR%3C%2Fspan%3E

I think that this behavior is new since the last time that I looked at this issue because I seem to recall that all three cases put MATH+RENDER+ERROR in the metadata. Alas, we cannot force editors to use PNG or LaTeX rendering, nor can we force MediaWiki to give us back the ability to extract content from math stripmarkers.

The only way that I can think of to include math markup in |title= is to have an alternate |math-title= or some such that requires some sort of special-secret-markup that is not <math>...</math> tags to wrap whatever would normally be in <math>...</math> tags so, for example:

|math-title=A title with some text and $210=14\times15=5\times6\times7=\binom{21}{2}=\binom{10}{4}$ and yet more text

The module would then make a copy of the value assigned to |math-title= and then remove the special-secret-markup and put the result into the metadata. Then, the module would replace the special-secret-markup with actual opening and closing <math>...</math> tags, and then preprocess a special template that renders the math title. That rendering then goes into |title=. Yeah, pretty ugly, and I have no idea if it would work.

This search finds about 650 articles that contain |title= with a <math> tag (not all |title= parameters are associated with cs1|2).

—Trappist the monk (talk) 23:19, 12 September 2021 (UTC)

Very simple proof of concept. I have hacked a sandbox module. It takes a string of text as single parameter |math-title= that may, or may not, have $ delimited math text. If it finds a matched pair of $ delimiters, it replaces the delimiters with <math display=inline> and </math> and then preprocesses that string to get a math rendering that can be used in the citation's title:

{{#invoke:Sandbox/trappist_the_monk/math|math-title|math-title=$210=14\times15=5\times6\times7=\binom{21}{2}=\binom{10}{4}$}}

math title:

{\textstyle 210=14\times 15=5\times 6\times 7={\binom {21}{2}}={\binom {10}{4}}}

; metadata: $210=14\times15=5\times6\times7=\binom{21}{2}=\binom{10}{4}$

The raw value from |math-title= might be used in the metadata as-is because the $ delimiters are 'native' to LaTex / TeX.

To be done: support for escaped \$ (literal '$' appearing in math text), support for '$' appearing in plain text that is not math text – for |math-title=, requiring editors to escape '$' when it appears in text that is not math text seems a reasonable restriction for this parameter. No doubt there is other stuff to do with this hack before we consider implementing it in the cs1|2 module suite.

—Trappist the monk (talk) 16:24, 14 September 2021 (UTC)

I like your approach to grab the data before MediaWiki gets it by playing man in the middle. This is brilliant. I didn't know that such a pre-processing of the formula would be possible. We should definitely try to further build on this.

But: Even if we are able to pass on a perfectly workable string as metadata, we still cannot be sure that it can be correctly interpreted at the receiver's end, so it is good to have this, but we still need a more general solution to cover other cases as well.

Since publishers of works containing formulas or other "visually complex stuff" in their titles will have to pass some textual representation of such titles to customers it is quite likely that some publisher-provided text-only alternative titles are already stored and established as standard alternative titles in external databases. In some cases, they might even be known by our contributing editors, so it would be useful if they could be entered as alternative text titles into our citations without having to give up on the nice "presentation" titles in |title= for our local purposes.

David's |text-title= and my |descriptive-title= are basically the same idea, except that in his, the contents of |text-title= would completely replace the contents of |title= for metadata purposes (similar to how your |math-title= would replace |title= for both, our local rendering as well as the metadata), whereas my |descriptive-title= could be used instead of a normal |title= (if not given), but could also be combined with |title= (when both are given). The contents of the descriptive title should be displayed without text decoration when rendered (not sure if in front or following the normal title if both exist), and should be put into [square-brackets] in metadata to indicate that this is not the original title (probably prefixing the normal title if both exist). The different representation styles would allow to tell them apart when both are displayed or combined into the single &rft.atitle= or &rft.btitle= COinS key.

I think, our ideas are similar enough to be combined so that |descriptive-title= would effectively become your |math-title= when it contains some $TeX$. (And for the rare case, where the $TeX$ stuff should not be interpreted in your suggested way, we have our ((accept-this-as-written)) syntax to indicate this.) This way, the editor would have the flexibility to provide either the |title= or the |descriptive-title= (including its special handling for math), or both.

This would also cover all cases for which we have discussed a need for something like a |descriptive-title= in the past, non existing titles, dynamic titles, visual or acoustical only titles, functional titles, alias titles, unrepresentable titles, because too long, in unsupported scripts, or misleading in our context...

In past discussions we have also established two more special cases: The case where no actual title exists and we want to indicate this by a standardized descriptive title (keyword "none" to display the localized "no title"), and the case where a title does exists, but should not be displayed for some reason (keyword "off"), for example in an article listing many revisions of a work), but where we would still want to issue the complete metadata for it. Last year, I started to implement this by introducing these keywords to |title=none/off, but realized we would still need something more like a |descriptive-title= parameter to specify the title for metadata.

We now have the chance to combine this and cover all these use cases with one new parameter.

--Matthiaspaul (talk) 16:37, 15 September 2021 (UTC)

Adding support for a |descriptive-title= is scope creep. What we are trying to solve is the display of math in titles. So we should limit it to such with a name that makes it obvious the purpose of that parameter. Izno (talk) 18:02, 15 September 2021 (UTC)

Perhaps it is scope creep in this particular thread, but not in general. Our whole discussion including Trappist's proposal is tangential to the original question raised by the OP, nevertheless the discussion was quite productive so far.

However, thinking about how to possibly combine open requirements when they are related is a good design approach. Many of the incoherences of the existing template design were caused by former ad hoc solutions to fix isolated problems. In the past years we were able to correct some of these bad design decisions to improve the interface but there are still weak spots and we also have a long list of still needed features which haven't been implemented yet because of lack of time or because it was felt that something was still missing to round out the design of a feature. Descriptive titles are among them - I had hoped that it would be possible to implement them without a need for a new parameter at all, but then we would have to introduce some special syntax to the normal |title=. It's still a possibility, but when we are now tinkering with the idea of introducing a dedidated |math-title=, it is important to also think about more general descriptive titles. After all, a title for a textual math representation is some kind of descriptive title. Otherwise, we easily end up with a whole new bunch of special title parameters, something, I think, we both want to avoid. Therefore, it is a valid question how to possibly combine this at least in the design, even if not all parts of the actual solution would be implemented at the same time.

--Matthiaspaul (talk) 22:17, 15 September 2021 (UTC)

Hmm, some sources when ASCIIfying article titles, appear to use TeX-like markup inside special markers like "##" or "$". Examples: [1] [2]. And here's an example of ... where we'd probably want to use '', but I think the em tag gets emitted in the final HTML: [3]. -- Beland (talk) 23:49, 13 September 2021 (UTC)

I'm not sure that the following is a good idea, myself, but one possible way through this would be to add a |text-title= parameter to the template to use as the text version of the title, and simultaneously to allow templates like {{frac}} or {{nowrap}} or whatever in titles when a text-title is present. That wouldn't address the inability to extract meaningful text from <math> formulas, but I'm sure Citation bot could be persuaded to add text-titles for those. One reason it's a bad idea is that the parameter would only produce invisible markup and therefore there wouldn't be much incentive for editors to make it accurate. —David Eppstein (talk) 00:34, 14 September 2021 (UTC)

Templates inside a cs1|2 template are expanded before the cs1|2 template is expanded. So, what cs1|2 sees when an editor writes:

|title=A {{frac|1|2}} Title

is this:

A '"`UNIQ--templatestyles-00000069-QINU`"'<span class="frac"><span class="num">1</span>&frasl;<span class="den">2</span></span> Title

The templatestyles stripmarker refers to Template:Fraction/styles.css which is where class="frac", class="num", and class="den" are defined. None of that styling is available to readers who consume the citation through the metadata. Module:Citation/CS1 might remove the stripmarker, all class= attributes, and any ... tags without attributes:

A 1⁄2 Title

We might also strip other attributes, style= might be one, and remove other html tags.

This same mechanism, where the editors writes this:

|title={{nowrap|don't wrap this text}}

cs1|2 gets as this:

don't wrap this text

where the nowrap class is defined in MediaWiki:Common.css. cs1|2 would include this in the metadata:

don't wrap this text

If this is a workable solution and if the code that implements it doesn't take too much of the limited processor time, we will need a list of commonly used attributes and html tags that can be stripped from every parameter value that is made part of the metadata.

—Trappist the monk (talk) 11:46, 14 September 2021 (UTC)

I suspect that the ... is inappropriate use where ... would have been a better choice. Apparently the $ is a standard part of LaTeX and TeX used to delimit the beginning and end of math text; using a standardized delimiter is always better than making up our own delimiters. I've changed my example above to use the $ delimiters.

—Trappist the monk (talk) 11:46, 14 September 2021 (UTC)

To clarify, $ delimits in-line math text, which TeX renders in a smaller font size. --Shmuel (Seymour J.) Metz Username:Chatul (talk) 14:48, 14 September 2021 (UTC)

Math text in a cs1|2 parameter that contributes to the citation's metadata is always inline math text so the $ delimiters are appropriate, right?

—Trappist the monk (talk) 15:07, 14 September 2021 (UTC)

The math in reference titles should only be inline, yes. $ ... $ for inline math and $$ ... $$ for display math is old-school TeX markup. The modern alternative (better for being less ambiguous wrt actual dollar signs, and also with some technical advantages in actual TeX for making it easier to hang hooks in the code) is $ ... $ for inline math and \[ ... \] for display math. The Wikimedia developers have vetoed allowing these to be shortcuts for math markup in the Wikimedia codebase, but I suppose that doesn't prevent them from being used in templates that intercept them and convert them to <math display=inline> ... </math> and <math display=block> ... </math> respectively. Would this actually work? Can math tags in template output still be expanded, or is math tag expansion only done before the templates are expanded? If this could be done in the existing |title= parameter, I think that would be better than introducing a new multiplicity of confusing variations of title parameters. —David Eppstein (talk) 18:32, 15 September 2021 (UTC)

For the same reasons it was vetoed there, use in the general parameter I think is a bad idea. Izno (talk) 20:47, 15 September 2021 (UTC)

Perhaps you could articulate those reasons, and convince me that it isn't one of (1) we are deliberately sabotaging LaTeX-based math because we still want people to use MathML instead, or (2) we don't care about whether mathematics works on Wikipedia because it isn't an important enough subset of our use and we don't want to pay the ongoing development costs of keeping it working? Because neither of those reasons applies to mathematics formulas in citation titles. —David Eppstein (talk) 21:14, 15 September 2021 (UTC)

Because you don't know what might be citation titles that look like your preferred <math> tag replacements. Izno (talk) 21:24, 15 September 2021 (UTC)

That would be a valid reason to avoid $ ... $ . It's not a valid reason to avoid $ ... $ because very few references use that syntax (very likely, zero references) and because if they do we can fall back to the format-as-typed escape codes already used elsewhere in the citation templates. —David Eppstein (talk) 22:09, 15 September 2021 (UTC)

(edit-conflict) Actually, if it would be possible to include the functionality of the proposed |math-title= (or whatever) into the normal |title=, as David suggests, this would be better from the user's perspective than to introduce a dedicated parameter for this. The question, however, is how conflictive such $TeX$ stuff would be within normal titles. If collisions would be rather rare, we still have our ((accept-this-as-written)) syntax to force the template to take the title verbatim (which is already supported by |title= to override the removal of end interpunctation).

If it can't be combined into the existing |title= parameter then the question is how at least text titles for math (which fall under the category of descriptive titles) can be combined with more general descriptive titles interfacewise, so that we eventually need only one new parameter rather than two for semantically close purposes.

Not thinking about this now at least conceptually is exactly what leads to incoherent interface design, something we should try to avoid.

--Matthiaspaul (talk) 22:22, 15 September 2021 (UTC)

ok, so here's a version of my test hack that uses the $ ... $ delimiters:

{{#invoke:Sandbox/trappist_the_monk/math|math_test2|math-title=Entropy-Based Uncertainty Measures for \(L^2(\mathbb{R}^n),\ell^2(\mathbb{Z})\), and \(\ell^2(\mathbb{Z}/N\mathbb{Z})\) With a Hirschman Optimal Transform for \(\ell^2(\mathbb{Z}/N\mathbb{Z})\)}}

math title: Entropy-Based Uncertainty Measures for

{\textstyle L^{2}(\mathbb {R} ^{n}),\ell ^{2}(\mathbb {Z} )}

, and

{\textstyle \ell ^{2}(\mathbb {Z} /N\mathbb {Z} )}

With a Hirschman Optimal Transform for

{\textstyle \ell ^{2}(\mathbb {Z} /N\mathbb {Z} )}

; metadata:

Entropy-Based Uncertainty Measures for \(L^2(\mathbb{R}^n),\ell^2(\mathbb{Z})\), and \(\ell^2(\mathbb{Z}/N\mathbb{Z})\) With a Hirschman Optimal Transform for \(\ell^2(\mathbb{Z}/N\mathbb{Z})\)

That's a real title I found somewhere (except that in its current guise it uses <math>...</math> tags).

<math>...</math> tags in parameter values are expanded into math stripmarkers before cs1|2 gets parameter values. After cs1|2 has rendered the citation, MediaWiki replaces each math stripmarker with its associated expansion. Using $...$ or $ ... $ instead of <math>...</math> tags allows us to apply <math>...</math> tags and then expand them into math stripmarkers (to be replaced by MediaWiki after cs1|2 final rendering) at the time of our choosing.

The only reasons that I can think of to not support this directly in |title= is that we have to inspect every |title= value for the $ ... $ delimiters and it is possible that some title somewhere legitimately uses the TeX delimiters. Inspecting every |title= value is relatively inexpensive because all we have to look for is the opening \( delimiter so if Title:find ('\\%(') then ... end – attempt to convert delimiters to <math>...</math> tags only when a \( delimiter is present. I found only two instances of the opening \( delimiter; one is vandalism and the other a malformed title. It would not be so simple with the $...$ delimiters so if we proceed with this solution and choose to use $...$ delimiters, implementing |math-title= along-side |title= is the better choice.

—Trappist the monk (talk) 21:54, 15 September 2021 (UTC)

I would be fine with $ ... $ to mark TeX blocks, for as long as our (( ... )) wrapping syntax would disable the feature.

--Matthiaspaul (talk) 22:46, 15 September 2021 (UTC)

$ ... $ TeX delimiters experiment removed

$ ... $ TeX delimiters experiment moved to Module:Citation/CS1/sandbox. I have disabled the <math>...</math> 'allowance' so parameters with <math>...</math> tags will emit the stripmarker error:

{{Cite book/new |title=<math display="inline">3987^{12} + 4365^{12} = 4472^{12}</math>}}

${\textstyle 3987^{12}+4365^{12}=4472^{12}}$ .

math in a book title:

{{Cite book/new |title=$3987^{12} + 4365^{12} = 4472^{12}$}}

$3987^{12} + 4365^{12} = 4472^{12}$.

math in a book chapter:

{{Cite book/new |chapter=$3987^{12} + 4365^{12} = 4472^{12}$ |title=Title}}

"$3987^{12} + 4365^{12} = 4472^{12}$". Title.

math in a journal title:

{{Cite journal/new |title=$3987^{12} + 4365^{12} = 4472^{12}$ |journal=Journal}}

"$3987^{12} + 4365^{12} = 4472^{12}$". Journal.

math in a book title with accept-as-written markup:

{{cite book/new |title=(($3987^{12} + 4365^{12} = 4472^{12}$))}}

$3987^{12} + 4365^{12} = 4472^{12}$.

math in a book title metadata:

'"`UNIQ--templatestyles-0000007F-QINU`"'<cite class="citation book cs1">''\(3987^{12} + 4365^{12} = 4472^{12}\)''.</cite><span title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=%5C%283987%5E%7B12%7D+%2B+4365%5E%7B12%7D+%3D+4472%5E%7B12%7D%5C%29&rfr_id=info%3Asid%2Fwiki.riteme.site%3AHelp+talk%3ACitation+Style+1%2FArchive+80" class="Z3988"></span>

—Trappist the monk (talk) ~~23:22, 16 September 2021 (UTC)~~ experiment removed; 14:56, 1 November 2021 (UTC)

Disabling the default <math> formatting in references is (1) going to cause a huge number of errors with existing citations, and (2) going to cause enormous confusion with editors who don't understand why the mathematics delimiters should be different in this context than everywhere else. I think emitting an error message is the wrong way to go. Better would be to continue to allow the math stripmarkers, but to put them into a tracking category so that a bot or AWB can run around behind the scenes converting the delimiters. I think such conversion is going to continue to be needed on a slow but ongoing basis, rather than being a one-time thing that can be enforced as an error once a change is made. —David Eppstein (talk) 00:54, 17 September 2021 (UTC)

Of course, the error message help would explain the problem, and the article is also put into category Category:CS1_errors:_invisible_characters. The COinS metadata for the citation with the MATH+RENDER+ERROR is still issued just like before (after all, it's still possible that MediaWiki will be fixed somewhen in the future). I think that's exactly how it should be, but an alternative would be to change the error message into a warning.

--Matthiaspaul (talk) 01:17, 17 September 2021 (UTC)

We don't have any warning messages. If this change is accepted, I expect to remove the parts of Module:Citation/CS1/COinS that decoded the math stripmarker content – it won't be needed.

—Trappist the monk (talk) 01:24, 17 September 2021 (UTC)

For some unknown reason I often call our maintenance messages "warnings" - probably have worked with systems which could issue warnings and errors rather than maintenance and error messages... ;-)

Removing the code that decoded the math stripmarker contents, this would not affect the code for SVG and LaTeX math extraction, only for MathML, right?

Brings up the question if we should normalize the slightly different output from the SVG, LaTeX and now MathML via $ ... $ extraction methods, because, from a COinS consumer's perspective, that's our internal business and should always give identical output, shouldn't it?

--Matthiaspaul (talk) 19:10, 17 September 2021 (UTC)

If we keep this proposed solution, all of the code that decoded the math stripmarker contents should go away because we won't need to extract anything from SVG, LaTeX, or MathML, whichever of those the last publishing editor had selected for math rendering – everything we need is right there in the parameter value.

—Trappist the monk (talk) 20:00, 17 September 2021 (UTC)

But this holds true only for new edits, what about the old ones? And what if some editors continue to use the SVG or LaTeX methods, should we prompt them with an error message and force them to rewrite the citation using $ ... $ although their edits did not actually cause us problems?

We should definitely keep the new solution, it's great. I'm just not sure the old handling should be removed (at least not until the new markup is fully established)...

--Matthiaspaul (talk) 22:03, 17 September 2021 (UTC)

I get the impression that you do not understand how the old mechanism works. When I edit an article that just happens to have a cs1|2 template with <math>...</math> markup in a |title= parameter, and then publish that article, the live cs1|2 module will create the metadata string for that citation (coins_replace_math_stripmarker()) using the math settings in my preferences because MediaWiki renders that math image into a stripmarker before cs1|2 gets the content of |title=. Since the stripmarker was created using my settings, the metadata will be derived from my settings. The resulting metadata are then cached for everyone until some other editor saves the article and their math preference setting is different from mine.

The cached metadata will remain as is until something causes MediaWiki to refresh the article. If/when the proposed $ ... $ TeX delimiters are introduced, as noted elsewhere in this discussion, an awb or some such script will be required to replace the <math>...</math> markup which will cause an article refresh and so new metadata using the $ ... $ TeX delimited wikitext straight from the appropriate parameter. Because we feed the metadata directly from the $ ... $ TeX delimited wikitext, there is no need (and no ability to) decode a math stripmarker so the code that decoded the math stripmarker content (even if it still worked) will no longer be need so should be removed. If we ever need it, we can always get it back from a previous version of the module.

—Trappist the monk (talk) 01:13, 18 September 2021 (UTC)

Thanks, Trappist. I think I understood the mechanism well, but its always good to get reconfirmation by reading your explanation. What I did not understood was that you really meant a refresh run to be a mandantory part of the introduction process, instead I thought we'd leave that as optional (if some volunteer cares enough to do it) and otherwise the articles would remain with their old cached data until someone happens to edit them (which could take weeks, months, years).

I guess, there will still be editors who continue to use the <math>...</math> markup at least for a while and it would have been convenient for them if they could continue to use it for entries which either do not contribute to the metadata, or to entries contributing to metadata, if they have selected SVG or LaTeX, not MathML. However, this would put the burden to switch to $ ... $ on the next editor with MathML settings and would also leave the citation source code in a mix of markups, which might not be desirable for other parties which read our wikitext rather than metadata, so yes, I agree, a hard switch is probably the better approach here.

--Matthiaspaul (talk) 09:48, 20 September 2021 (UTC)

I don't know that a huge number of errors with existing citations is an accurate description. If these search results are to be believed, there are:

~650 articles that have <math>...</math> tags in |title=

~25 articles that have <math>...</math> tags in |chapter=

I think that editors will learn quickly enough about the change in markup if they can see an error message that has a link to an explanation; maintenance category messaging is hidden from all editors who have not chosen to show those messages.

If we are to keep this change it isn't difficult to write an awb script that will change the <math>...</math> tags to $ ... $ TeX delimiters. That script can be run on the same day that the change goes live (if it goes live) and be done after a couple of hours.

—Trappist the monk (talk) 01:24, 17 September 2021 (UTC)

This should also work for the title- and chapter-related |script-= parameters which become part of the metadata. Are there other parameters ending up in metadata where math-like constructs could show up occasionally? What about the journal name, work, author etc. names and publisher entries?

And what shall we do with other parameters which definitely might contain math, but are not part of metadata like the corresponding |trans-= parameters, and |quote=, |script-quote= and |trans-quote=. Should we support the $...$ syntax there as well as an alternative for syntax compatibility/consistency, or should we insist on <math> there?

--Matthiaspaul (talk) 19:10, 17 September 2021 (UTC)

If we keep this proposed solution, certainly all of the 'title-holding' parameters and the quotation parameters should support $ ... $ TeX delimiters for math markup. |journal=? |work=? |publisher=? |author=? I don't think so; at least not until a need has been sufficiently demonstrated. Quick searches for <math>...</math> tags in those parameters either timed out with no results or returned no results. We should only support one form of math markup.

—Trappist the monk (talk) 20:00, 17 September 2021 (UTC)

This conversation having died and no consensus for it, I have removed the $...$ TeX delimiters experiment.

—Trappist the monk (talk) 14:56, 1 November 2021 (UTC)

Guidance for science etc.

Given the new proposed solution for <math>...</math> markup and the above comments, I'm wondering where we've come down on how to handle simple markup. I see contradictions between editors like "No unicode characters. Those are a blight, and should be purged on sight." vs. "exactly as the source information presents it, 'funny' characters and all". David Eppstein said titles should be formatted "the way the reference formatted it, even when our style guidelines would tell us to use a different style", but then used {{frac}} instead of ½. Our style guide says to use {{sfrac}} for science articles, so that seems to satisfy neither the goal of looking consistent with the style of body text nor the goal of being exactly the same as the original document for ease of search.

What are we proposing as the solution for simple markup, like a chemical formula? If we're following the sources exactly, we might use no-markup Unicode vs. ... depending on what the original document does, though if it's on paper or PDF it will be impossible to tell. If we're avoiding Unicode compatibility characters, then we still have at least three choices:

Something about H₂O₂. {{cite book}}: templatestyles stripmarker in |title= at position 17 (help) - {{chem2|H2O2}} - Copy-paste: Something about H 2O 2
Something about $H_2 O_2$. - $H_2 O_2$ - Copy-paste: Something about H 2 O 2 {\textstyle H_{2}O_{2}} {\textstyle H_{2}O_{2}}
Something about H₂O₂. - H2O2 - Copy-paste: Something about H2O2
Something about H₂O₂. - Unicode subscripts - Copy-paste: Something about H₂O₂

Though it's unclear to me how well any database or web search engine is going to handle the difference between say, "H2O2" as a search parameter and an internally stored "H2O2". -- Beland (talk) 17:19, 18 September 2021 (UTC)

Certainly not {{chem2}}; your example renders this mishmash:

'"`UNIQ--templatestyles-0000008F-QINU`"'<span class="chemf nowrap">H<sub class="template-chem2-sub">2</sub>O<sub class="template-chem2-sub">2</sub></span>

The copypasta is not as you have shown it but actually like this because the markup includes   tags:

Something about H
2O
2.

Most cs1|2 templates appear in reflists which reduce text size to 90% so a font size of 70% (already smaller than allowed for accessibility; see MOS:SMALL) is just harder to read.

Also, certainly not Unicode subscripts because the already-hard-to-read Unicode subscript characters are harder-to-read when made smaller in reflists.

—Trappist the monk (talk) 21:50, 18 September 2021 (UTC)

OK, that leaves the TeX-like syntax and HTML sup/sub tags. The Tex-like solution isn't ready yet, and presumably the COinS citation code could process simple ... tags and friends cleanly? A rule like "use HTML sup/sub tags instead of Unicode subscripts and superscripts" would be easy to follow and easy to enforce, so I'm thinking maybe do that for now?

What about fractions like movie reviews of Naked Gun 33⅓: The Final Insult? Is it OK to use {{frac}} and {{sfrac}}? -- Beland (talk) 03:29, 20 September 2021 (UTC)

I discussed {{frac}} above at my 11:46, 14 September 2021 post.

Templates inside cs1|2 parameter values are expanded before cs1|2 gets the value so when an editor writes:

|title=Naked Gun 33{{sfrac|1|3}}

cs1|2 gets:

|title=Naked Gun 33'"`UNIQ--templatestyles-00000094-QINU`"'<span class="sfrac">&NoBreak;<span class="tion"><span class="num">1</span><span class="sr-only">/</span><span class="den">3</span></span>&NoBreak;</span>

The templatestyles stripmarker refers to Template:Sfrac/styles.css which defines the classes: sfrac, tion, num, den, and sr-only. As with {{frac}}, none of that styling is available to readers who consume the citation through the metadata so for them, the markup is just meaningless clutter.

—Trappist the monk (talk) 13:33, 20 September 2021 (UTC)

Not sure where the above discussion landed, exactly. It seems we could either put in code to strip that stuff out, or we could make a clean template. How about a {{citefrac}} that just does ¹⁄₂ (1&frasl;2)? If we decide later that COinS conventions have shifted and Unicode fractions are preferred, we can simply change that template rather than all the articles that use it. Actually, if we do universally adopt some sort of Tex-like syntax, having {{citefrac}} would also make it easy to switch over to that, too.-- Beland (talk) 01:00, 23 September 2021 (UTC)

Trappist and I suggested that we could attempt to filter/cleanup/simplify the HTML before we create the title metadata. Removing anything CSS-related, unnecessary attributes, empty elements, etc. This would not be a solution for complicated HTML, but would allow to have at least simple HTML markup in the title.

In addition to "simplified HTML" and the $ ... $ solution for math, I continue to maintain that we need something like an optional |descriptive-title= (or |text-title= per David) as a fallback, so that editors can use fancy stuff in |title= for pretty local display purposes (without compromising for COinS), while still being able to exactly match a title, if known, as it may be used in an external database (regardless of what representation or transliteration may be used there) so that it can be used as search pattern there as well.

--Matthiaspaul (talk) 08:42, 23 September 2021 (UTC)

A filter/cleanup/simplify solution may not be sufficient. In my sandbox I've hacked some code that removes:

stripmarkers
  tags (used in {{chem2}})
class= attributes from  tags
style= attributes from  tags
title= attributes from  tags
extraneous whitespace
 without attributes and its matching

For the simple cases:

Naked Gun 33{{code|{{sfrac|1|3}}}}

Naked Gun 33'"`UNIQ--syntaxhighlight-0000009B-QINU`"'

Naked Gun 33⁠13⁠

we get:

{{#invoke:Sandbox/trappist_the_monk/math|span_test|1=Naked Gun 33{{sfrac|1|3}}}}

Naked Gun 33&NoBreak;1/3&NoBreak;

Naked Gun 33⁠1/3⁠

but for complex cases:

{{chem2|[{(\h{5}C5Me4)SiMe2(\h{1}NCMe3)}(PMe3)Sc(\m{2}H)]2}}

'"`UNIQ--templatestyles-000000A3-QINU`"'<span class="chemf nowrap">&#91;{(η<sup>5</sup>-C<sub class="template-chem2-sub">5</sub>Me<sub class="template-chem2-sub">4</sub>)SiMe<sub class="template-chem2-sub">2</sub>(η<sup>1</sup>-NCMe<sub class="template-chem2-sub">3</sub>)}(PMe<sub class="template-chem2-sub">3</sub>)Sc(μ<sub>2</sub>-H)]<sub class="template-chem2-sub">2</sub></span>

[{(η⁵-C₅Me₄)SiMe₂(η¹-NCMe₃)}(PMe₃)Sc(μ₂-H)]₂

we get:

{{#invoke:Sandbox/trappist_the_monk/math|span_test|1={{chem2|[{(\h{5}C5Me4)SiMe2(\h{1}NCMe3)}(PMe3)Sc(\m{2}H)]2}}}}

&#91;{(η<sup>5</sup>-C<sub class="template-chem2-sub">5</sub>Me<sub class="template-chem2-sub">4</sub>)SiMe<sub class="template-chem2-sub">2</sub>(η<sup>1</sup>-NCMe<sub class="template-chem2-sub">3</sub>)}(PMe<sub class="template-chem2-sub">3</sub>)Sc(μ<sub>2</sub>-H)]<sub class="template-chem2-sub">2</sub>

[{(η⁵-C₅Me₄)SiMe₂(η¹-NCMe₃)}(PMe₃)Sc(μ₂-H)]₂

Maybe that's good enough, I don't know, but is this good enough?

{{chem2|C2H3O2(-)}}

'"`UNIQ--templatestyles-000000AB-QINU`"'<span class="chemf nowrap">C<sub class="template-chem2-sub">2</sub>H<sub class="template-chem2-sub">3</sub>O<span class="template-chem2-su"><span>−</span><span>2</span></span></span>

C₂H₃O−2

and stripped of markup:

{{#invoke:Sandbox/trappist_the_monk/math|span_test|1={{chem2|C2H3O2(-)}}}}

C2H3O−2

C₂H₃O−2

—Trappist the monk (talk) 14:05, 23 September 2021 (UTC)

Not sure if this would be reasonably safe for the general case, but in the examples above the result could be further improved if we would insert a   when a  gets eliminated, and when the text before and after a x y to be eliminated would be framed in y and x (perhaps only if inside a class="chemf"?).

--Matthiaspaul (talk) 20:13, 23 September 2021 (UTC)

I've been thinking about your {{chem2|C2H3O2(-)}} example a bit, and ideally, the stripped markup in that example should look just like the input. The the "2" subscript should bind closer to the "O" than the "-" charge. — sbb (talk) 01:22, 24 September 2021 (UTC)

I'm sure this could be further improved, but I've hacked a "CS1/CS2-compatible" version of {{chem2/sandbox}} [4], which creates hidden metadata reproducing the input, see modified chem2 (this isn't the best possible metadata that could be created, but it was trivial to implement for our illustration purposes here). This is similar to how metadata is embedded into SVG and LaTeX math (which the current implemention of CS1/CS2 can extract already). CS1/CS2 templates could be easily made aware of this extension, so they could extract this data as well and use it for their metadata. This way, templates like {{chem2}}, which produce really difficult output for the HTML simplifier, could actively assist CS1/CS2 in their metadata creation. Over time, more templates could be enhanced this way and thereby be made "CS1/CS2-compatible".

--Matthiaspaul (talk) 10:01, 25 September 2021 (UTC)

If this is to be adopted and used by cs1|2, it would probably be best to not anchor-encode the {{chem2/sandbox}} input. Why anchor encoding? Before cs1|2 parameter values are added to the metadata, they are percent-encoded so, in its present form, what the metadata will get is:

[Cl4Re\qReCl4](2−) ← {{chem2/sandbox|[Cl4Re\qReCl4](2−)}} – anchor encoding

%26%2391%3BCl4ReqReCl4%26%2393%3B%282%E2%88%92%29 – percent encoding of the anchor-encoded input

when the metadata should get:

%5BCl4ReqReCl4%5D%282%E2%88%92%29 ← [Cl4Re\qReCl4](2−) – percent encoding

But, that's not wholly correct because the \q is treated as an escaped q so the result is missing the \ (%5C). This particular {{chem2}} input needs to be tweaked to escape the back slash:

%5BCl4Re%5CqReCl4%5D%282%E2%88%92%29 ← [Cl4Re\\qReCl4](2−) – percent encoding

And, I gotta wonder, are the input symbols that {{chem2}} accepts standardized so that consumers of the metadata will know what they mean when the metadata are decoded? If not then those symbols need to be replaced with the actual thing that they represent, don't they?

—Trappist the monk (talk) 12:09, 25 September 2021 (UTC)

Yeah, this is still a demo for illustration purposes. Anchor-encoding isn't optimal here (but was handy for the quick demo). We might switch to a more suitable encoding. Either case, the extractor would have to decode it back before further processing, otherwise we would get the overencoded results as illustrated by you above. The encoding would have to ensure that no spaces remain in the string. " would be a forbidden character as well. IIRC, the allowed charset for class names is (or was) limited in some HTML versions (would have to look this up), so we would have to make sure to not use other reserved characters as well.

(Also, I think, the scheme could be further improved if we would not only embed the desired metadata output (i.e. MeTaDaTa-OuTpUt:), but optionally also the (parameter) input (i.e. MeTaDaTa-InPuT:). This might allow the extractor not only to replace the complete output of a template by its metadata (as in the current {{chem2}} example) but allow metadata fragments to be inherited from internally called templates instead of having to handle everything monolithically on the level of the outer template (example: {{chem}}, which internally uses {{su}} - still thinking about the details...)

are the input symbols that {{chem2}} accepts standardized so that consumers of the metadata will know what they mean when the metadata are decoded? I guess, this very much depends on the template, so even if this would be a standard notation in this particular case (I don't know if it is), it probably won't be in the general case. However, this is still a demo with the main purpose to illustrate how easy it would be to enhance templates in general. In a proper implementation, {{chem2}} would probably not just forward its own input as metadata, but actually generate the metadata by processing the input (like it does for its normal output, but) in a form which would be text-only or use only very simple markup. What can be considered to be the best metadata very much depends on the purpose/function of the template. The advantage of this approach would be that the developers or users of the template probably know best what is the optimal text-only metadata that can be generated from the input (developers would program the template to generate the optimal metadata for the context the template is used in, and users would always be able to override it using the |metadata= parameter), whereas the generic HTML simplifier in CS1/CS2 has no knowledge on the context and semantics and can only simplify based on universal structural rules.

--Matthiaspaul (talk) 13:53, 25 September 2021 (UTC) (updated 09:33, 26 September 2021 (UTC))

I've meanwhile simplified the magic and changed the encoding from anchor-encoding (which was shorter and more human-readable, but also combined various different space characters into one type and therefore was not fully reversible) to a combination of text-decoding (to level the playing-field also for input containing HTML entities) and percent-encoding (for transparent transportation of the metadata, in particular to encode the invalid space and quote characters). HTML 4 seems to have had more restrictions on the character set used in class names (including the percent-sign which is used by percent-encoding), but they have gone with HTML5 (they are still valid for CSS, but we don't use this "MeTaDaTa:" dummy-class for CSS purposes, so it doesn't affect us).

(BTW. mw.text.decode() has a bug as it properly processes   but ignores  . I have added a workaround at least to the wrapper: Module:DecodeEncode.decode.)

--Matthiaspaul (talk) 12:10, 26 September 2021 (UTC)

Also not a general solution out of the box, but one that could be used to make template output metadata-safe:

We could add code to critical templates like {{chem}} or {{chem2}} so that they issue their input in a HTML title= attribute. We could then use this instead of the actual HTML for metadata purposes (similar to what we do with math SVG and LaTeX extraction). Given that the HTML title= might be used by the template for other purposes already, and that it is also shown to users as tooltip (which might not be desirable if it contains stuff like "[{(\h{5}C5Me4)SiMe2(\h{1}NCMe3)}(PMe3)Sc(\m{2}H)]2"), I am using the title= attribute only for illustration purposes here and we might find another HTML attribute or establish a special "steganographic" notation where/how we could transparently hide those entries for possible extraction by CS1/CS2. Templates might even have a standardized optional parameter like |metadata= to override what the template would otherwise use for this. Templates enhanced this way could get a sticker like "CS1/CS2-compatible" or such. Sure, this would work only for those templates which have been enhanced this way, but all we would have to do now is to specify a standard for this and implement a generic extraction mechanism which would take over whenever CS1/CS2 finds this special HTML attribute/notation in a citation's title. Over the years more and more templates could be adapted accordingly.

--Matthiaspaul (talk) 12:19, 24 September 2021 (UTC)

To further illustrate this, this could be a span framing the normal template output of a template like {{chem2}} (following a similar idea as COinS, but for our internal purposes only):

normal_template_output

If our template metadata extractor would run into something like this (triggering on the "MeTaDaTa" magic), it would replace the whole span including normal_template_output (and, if present, also the corresponding stripmarker) by what follows the :: following the MeTaDaTa (which probably needs to be encoded in an actual implementation). For a template call like {{chem2|[{(\h{5}C5Me4)SiMe2(\h{1}NCMe3)}(PMe3)Sc(\m{2}H)]2}} this would result in [{(\h{5}C5Me4)SiMe2(\h{1}NCMe3)}(PMe3)Sc(\m{2}H)]2. Would the template be called like

{{chem2|[{(\h{5}C5Me4)SiMe2(\h{1}NCMe3)}(PMe3)Sc(\m{2}H)]2|metadata=This is a text-only transcription of the chemical formula}}

instead, it would result in This is a text-only transcription of the chemical formula. |metadata=off/none would disable the metadata (nothing would be following the "::" then). If the extractor does not find the triggering magic, or if the extracted data would be an empty string, it would proceed with the HTML simplification demoed above...

--Matthiaspaul (talk) 12:50, 24 September 2021 (UTC) (updated 09:33, 26 September 2021 (UTC), updated 18:40, 6 October 2021 (UTC))

This is how a CS1/CS2-compatibly modified {{frac}} template [5] could look like:

{{frac/sandbox|1|2|3}}

'"`UNIQ--templatestyles-000000BD-QINU`"'<span class="frac MeTaDaTa::%E2%80%891%C2%A02%2F3" role="math">1<span class="sr-only">+</span><span class="num">2</span>&frasl;<span class="den">3</span></span>

Visual rendering: 12⁄3

Extractable metadata: " 1 2/3"

{{frac/sandbox|1|2|3|metadata=Custom-Metadata}}

'"`UNIQ--templatestyles-000000C2-QINU`"'<span class="frac MeTaDaTa::Custom-Metadata" role="math">1<span class="sr-only">+</span><span class="num">2</span>&frasl;<span class="den">3</span></span>

Visual rendering: 12⁄3

Extractable metadata: "Custom-Metadata"

{{frac/sandbox|1|2|3|metadata=off}}

'"`UNIQ--templatestyles-000000C6-QINU`"'<span class="frac MeTaDaTa::" role="math">1<span class="sr-only">+</span><span class="num">2</span>&frasl;<span class="den">3</span></span>

Visual rendering: 12⁄3

Extractable metadata: "" so it will be ignored

Same for {{sfrac}} [6]:

{{sfrac/sandbox|1|2|3}}

'"`UNIQ--templatestyles-000000CA-QINU`"'<span class="sfrac">&NoBreak;1<span class="sr-only">+</span><span class="tion"><span class="num">2</span><span class="sr-only">/</span><span class="den">3</span></span>&NoBreak;</span>

Visual rendering: ⁠123⁠

Extractable metadata: " 1 2/3"

{{sfrac/sandbox|1|2|3|metadata=Custom-Metadata}}

'"`UNIQ--templatestyles-000000CE-QINU`"'<span class="sfrac">&NoBreak;1<span class="sr-only">+</span><span class="tion"><span class="num">2</span><span class="sr-only">/</span><span class="den">3</span></span>&NoBreak;</span>

Visual rendering: ⁠123⁠

Extractable metadata: "Custom-Metadata"

{{sfrac/sandbox|1|2|3|metadata=off}}

'"`UNIQ--templatestyles-000000D2-QINU`"'<span class="sfrac">&NoBreak;1<span class="sr-only">+</span><span class="tion"><span class="num">2</span><span class="sr-only">/</span><span class="den">3</span></span>&NoBreak;</span>

Visual rendering: ⁠123⁠

Extractable metadata: "" so it will be ignored

Similar for {{chem2}} [7]:

{{chem2/sandbox|[{(\h{5}C5Me4)SiMe2(\h{1}NCMe3)}(PMe3)Sc(\m{2}H)]2}}

'"`UNIQ--templatestyles-000000D6-QINU`"'<span class="chemf nowrap">&#91;{(η<sup>5</sup>-C<sub class="template-chem2-sub">5</sub>Me<sub class="template-chem2-sub">4</sub>)SiMe<sub class="template-chem2-sub">2</sub>(η<sup>1</sup>-NCMe<sub class="template-chem2-sub">3</sub>)}(PMe<sub class="template-chem2-sub">3</sub>)Sc(μ<sub>2</sub>-H)]<sub class="template-chem2-sub">2</sub></span>

Visual rendering: [{(η⁵-C₅Me₄)SiMe₂(η¹-NCMe₃)}(PMe₃)Sc(μ₂-H)]₂

Extractable metadata: "[{(\h{5}C5Me4)SiMe2(\h{1}NCMe3)}(PMe3)Sc(\m{2}H)]2" (could be further improved by improving the metadata generated in the {{chem2}} template)

{{chem2/sandbox|C2H3O2(-)}}

'"`UNIQ--templatestyles-000000DA-QINU`"'<span class="chemf nowrap">C<sub class="template-chem2-sub">2</sub>H<sub class="template-chem2-sub">3</sub>O<span class="template-chem2-su"><span>−</span><span>2</span></span></span>

Visual rendering: C₂H₃O−2

Extractable metadata: "C2H3O2(-)" (could be further improved by improving the metadata generated in the {{chem2}} template)

And for {{nowrap}} [8].

--Matthiaspaul (talk) 22:40, 24 September 2021 (UTC) (updated 11:45, 26 September 2021 (UTC), updated 18:40, 6 October 2021 (UTC))

An example of a variant of this rudimentary "inter-template communication model" using special tokens instead of or in addition to providing alternative metadata can be found illustrated for a modified sic template.

--Matthiaspaul (talk) 19:48, 6 October 2021 (UTC)

I don't know that there is sufficient support to proceed. Editor David Eppstein objects to the $...$ markup so that may go away. Removing certain html markup may or may not be adequate; I don't know, I'm not a chemist so I don't know if the resulting output to the metadata would be at all useful.

—Trappist the monk (talk) 14:05, 23 September 2021 (UTC)

This appears to be the thread of misunderstandings... I didn't understood David as if he would object the $...$ at all, quite the contrary. He was concerned that issueing a visible error message would upset people, but that was before we discussed alternatives.

Regarding HTML cleanup, I'm no chemist either, but even if it might not be good enough to allow for really nasty things such as {{chem2}}, it would certainly be an improvement for many of the simple cases like the example:

Theory of free, spin-<span class="frac" role="math"><span class="num">1</span>⁄<span class="den">2</span></span> tachyons

→ Theory of free, spin-1⁄2 tachyons

--Matthiaspaul (talk) 16:11, 23 September 2021 (UTC)

Perhaps we should just change {{frac}} to use  and  instead of s? The sup/sub can be styled for on-Wiki use, and stripped of class attributes that are meaningless to COinS consumers. — sbb (talk) 01:26, 24 September 2021 (UTC)

Frac deliberately uses spans because sub and sup do not match the intent of their content. Do not abuse HTML please. Izno Public (talk) 14:40, 26 September 2021 (UTC)

Format citations differently than body text

I don't see a contradiction between rendering a title as it appears in the work, and avoiding Unicode. The two are unrelated. Exact rendering in citations does not mean exactitude in presentation items like typefaces. It does include items with semantic significance. A formula will be recognized as such regardless of the typeface used, including the easy-to-read sans-serif typefaces traditionally used in citations. There is an issue with items such as subscript/superscript readability, but I believe it is minor, and there are ways around it.

The main thing is this: I would not use WP:MOS or any other Wikipedia wikitext formatting guideline as yardsticks. Citations are not wikitext, and should be formatted according to their own requirements. Whether the cited material is a science item or not makes no difference. 64.18.9.208 (talk) 00:04, 19 September 2021 (UTC)

Though the above examples do render in different typefaces, that's beside the point. Behind the scenes they use different representations. Though "₂" and "₂" might look like the same number in different fonts, in fact they are different Unicode characters, and correctly interpreting the second one requires parsing HTML.

It's fine if there are special cases like coping with COinS, but suspending all MOS rules when it comes to citations would make the encyclopedia look quite unprofessional, increase skepticism, and make it a little harder to read. It would have far-reaching implications, and I don't think that's within the scope of what's being proposed. -- Beland (talk) 03:29, 20 September 2021 (UTC)

Wikipedia is un-professional, in both meanings of the term. First, as a general-purpose encyclopedia for non-expert readers. Secondly, as a project whose majority of content is unverified and therefore, drivel. As was remarked in another discussion here, the so-called "good articles" are a miniscule minority. So this is a project that unflinchingly publishes information that fails to pass its own mark. Which brings up the question of whether any "good article" criteria such a project decides upon can be trusted.

With such basis, it is imperative imo that citations stand apart, as the only way of turning the garbage heap into reliably usable information. They should not at all resemble wikitext body. They should stand out as its proof. Keeping in mind the target audience, they should be clear, unadorned, and easy to read, so that they can be easily found. It doesn't matter to a reader exactly how a subscript is rendered, only that it is. Use any easy-to-understand rendering, even if it is literal ("subscript 2"). Experts may cringe, but Wikipedia's audience will probably thank you.

Users (editors) I assume have similar expectations of the developers, but that would be a separate comment. 66.108.237.246 (talk) 13:05, 20 September 2021 (UTC)

OK, I guess we just have different goals, then. I'm working toward a professional-looking, credible, well-referenced, and verified encyclopedia. Writing something like "subscript 2" not only looks horrible, it's harder to understand (especially for students) than simply having a ₂. Doing a search in a journal article database on "subscript 2" will almost certainly give bad results. -- Beland (talk) 00:47, 23 September 2021 (UTC)

The use of the literal "subscript 2" was offered only as an example of a non-expert reader's pov, and not as a concrete suggestion. Obviously the proper way would be the title verbatim, semantically, and this is a discussion that should end with the technical minutiae, not start with them. To again point out the obvious, no student should be using Wikipedia for anything related to their work. It is normally (in the statistical sense) unreliable, improperly or not all referenced, badly edited, and of dubious neutrality. And this is just the reader-facing pages. Any work outside of fixing these is just dressing up garbage in a pretty dress. May I suggest a beginning? Remove the "good articles" category and project. Assuming for the moment that the present "good article" criteria are valid, "good articles" are such a dismal minority, it is embarrassing. Instead add a warning to all other articles. Because the normal, obvious expectation of an encyclopedia is that it publishes good articles. Making clear (on a prominent and continuing basis) the fact that this is currently an unprofessional project is the greatest possible service to readers. And it is going to put everyone interested in fixing it in the right frame of mind. 64.18.9.196 (talk) 02:27, 23 September 2021 (UTC)

Moving toward a resolution

Are there any objections or comments on doing the following to get the ball rolling:

Use ... and ... instead of {{chem}} and {{chem2}} for chemistry formulas in citations.
Use {{citefrac}} instead of {{frac}} or {{sfrac}} for vulgar fractions in citations.

?

I don't often see more complicated math in citations, but would we want to make a {{citemath}} that uses <math>...</math> for now and can be switched over to $...$ when that change is ready to be deployed? (And then easily changed again later if handling of TeX-like math formulas needs to change.) -- Beland (talk) 01:19, 23 September 2021 (UTC)

Follow WP:MOS, if COinS chokes, COinS chokes. Headbomb {t · c · p · b} 02:32, 23 September 2021 (UTC)

Well, the question is what the MOS should recommend. It seems like these techniques would allow us to keep a consistent style with body text without changing the MOS recommendation for that, and without causing COinS to choke. -- Beland (talk) 00:36, 24 September 2021 (UTC)

I might be missing the point, but if we want to make as little compromises regarding COinS-compatibility as possible, I don't see how something like {{citefrac}} would actually improve the situation in general (besides that it is nice to know if a template is CS1/CS2-safe or not). As far as I understood you, this is meant to be a "lightweight" version for vulgar fractions to be used in citations. But while being lightweight it may produce more COinS-compatible output (which is good), it will also produce less nice-looking titles in citations (which is not so good)...

IMO it is (more) desirable (because more flexible and universal) to try and clean up the HTML automatically, as demonstrated above. This won't work in all possible cases, but the results shown by Trappist above are already quite good IMO, and with a few more tweaks could become quite useable for our purposes. This and the $ ... $ trick for math are IMO a significant improvement over the current state of affairs. For those cases where the processing would not be desirable and we would want to pass the title to the metadata unchanged, we have our ((accept-this-as-it-is)) markup. For those cases, where we have both, a nice-looking title for local display and a (not so nice looking) alternative title used in external databases we want to match exactly to improve searches, we would have |descriptive-title= (which would also be useful for many other purposes, as mentioned further above). With this in place, we may need only a few general recommendations how to provide titles instead of having to address this explicitly in the MOS.

However, given that some significant developer efforts have been trashed by the "mob" recently, and thereby precious developer resources burnt, it would be great if those who agree with these proposals could actually indicate this instead of just remaining silent (as it happens to be the case here so often). This would help to convince the developers that it is worth to devote their volunteering time on these and other things.

--Matthiaspaul (talk) 11:41, 24 September 2021 (UTC)

@Matthiaspaul: Well, if some day the code is implemented to make regular {{frac}} COinS-friendly, we could always just drop the exception from the MOS and redirect {{citefrac}} to it or bot-substitute. If someone wants to say, "hey I'll have that code done in the next couple weeks", I'm happy to wait. If not, then while we're waiting for "some day" it would be good to make progress cleaning up on all the existing Unicode superscripts and subscripts in citations that no one seems to want. You did mention this solution produces "less nice-looking titles". I'm trying to make a lightweight solution that looks exactly the same as {{frac}}. Could you explain what differences you see, maybe with an example? Maybe there's a lightweight fix. -- Beland (talk) 18:20, 24 September 2021 (UTC)

Let's have a look at what Trappist's demo further above can do already:

Example:

{{frac|1|2|3}}

produces this non-sensible code in the citation's |title=, prompting the current implementation to throw a stripmarker error message:

'"`UNIQ--templatestyles-000000DF-QINU`"'<span class="frac">1<span class="sr-only">+</span><span class="num">2</span>&frasl;<span class="den">3</span></span>

which would be rendered by a browser as part of a citation in Wikipedia as:

12⁄3

The proposed generic "HTML simplifier" would derive the following code for metadata purposes, which could be passed on as COinS-metadata:

1+2&frasl;3

or with a bit more tweaking:

1+2⁄3

or even:

1+2⁄3

This is not perfect for human consumption but much better than the original code already. A user would be able to make sense out of it (although it may not necessarily match the work's title used in external databases, for which we would need |descriptive-title=). Assuming the COinS consuming entity would be able to process HTML, a HTML engine at their end would make this out of the simplified HTML:

1+2⁄3

Let's assume the {{frac}} template would have been made "CS1/CS2-compatible" following my proposed "template internal metadata" demo above, the metadata extractor could, for example, get this even more text-only result:

1 2/3

for which no HTML engine would be needed at the receiver's end.

Regarding existing Unicode superscripts and subscripts, while I agree that the HTML sub- and superscripts look nicer if used in formulas and are generally to be preferred, in non-scientific articles an occasionally interspersed Unicode super- or subscript character in citation titles might not be a bad idea at all. At least they are COinS-safe out of the box and neither require a HTML engine at the receiver's end nor a TeX-savy human to be decoded. I would not use them in technical articles, but also would not want to ban them in non-technical articles. So, it all depends on the context IMO. What does this mean in regard to MOS or more-citation related guidelines? We could offer some generally recommended best practises there, but we should not rule out any of the possible formats in general. And what does that mean in regard to CS1/CS2? We will have to cope with whatever editors throw at us, therefore we probably need all, special $ ... $ markup, HTML simplifier, template internal metadata, and |descriptive-title= to cope with all possible cases optimally.

Regarding "hey I'll have that code done in the next couple weeks", the next CS1/CS2 update isn't scheduled yet but I guess it could be in mid-October.

--Matthiaspaul (talk) 22:04, 24 September 2021 (UTC)

To reiterate what was stated a couple of times above, this is looked at from the wrong end. The only resolution to this is the one that maximizes the utility of the citation to the average reader. Experts, practitioners and students in every academic field have vast resources available to them, resources that are properly vetted. The average person would mostly or only have Wikipedia, but here's the tendency to make this one resource too some sort of experts' preserve (albeit an unvetted one). Why not start from what the reader sees? Make that as clear and faithful to the original as possible. Then decide on the tools/guidance that editors should have in order to implement the reader requirements. Keep the guidance straightforward and tight. This is only a set of special cases of a single parameter in a sprawling module collection. Largesse in editor choices cannot be afforded in everything. Editors will have to learn to throw what the guidance states. Once the tools and guidance for editors is decided, the module could theoretically be developed in a hopefully carefully designed, rational, bugfeee way. But all development is theoretical. Because there are unresolved issues regarding any CS1/CS2 development. 68.173.76.118 (talk) 00:07, 25 September 2021 (UTC)

Everywhere else I can think of, instead of making formatting appear similar to where we're quoting from or citing, we make the formatting consistent on the Wikipedia side. That's what professional publications generally do unless they're showing an actual picture of the original. If we weren't doing that, we wouldn't have MOS guidance on fractions at all, and pages would look somewhat messier. -- Beland (talk) 07:39, 15 October 2021 (UTC)

@Matthiaspaul: It's mid-October. Any update on {{frac}} being made COinS-friendly? I'm not sure what you said above explained why {{citefrac}} looks not as nice as {{frac}}, but perhaps I'm missing something? -- Beland (talk) 07:39, 15 October 2021 (UTC)

Did you see my proposal to let known-to-be-problematic templates like {{frac}}, {{sfrac}}, {{chem}}, {{chem2}} etc. actively assist CS1/CS2 in its metadata creation (because the local developers of these templates know best how to translate whatever these templates are designed for in plaintext or simple HTML)? This would not replace the "general HTML simplifier" for those templates which have not been enhanced with this "template internal metadata" feature, but at least those templates which were enhanced accordingly would then produce perfectly nice output for display purposes and perfectly simplified but semantically correct plaintext (or simple HTML) as metadata. I proposed a general structure for this and also illustrated how this would be future-compatible and flexible enough to be further enhanced in other, semantically more abstract ways in the future.

IMO this, combined with the other proposed bits (the general HTML simplifier, the $...$ markup and the |descriptive-title=), would allow us to address all aspects of the problem in the best-possible way without putting restrictions on users which templates or math markup they can use in citations, so that they can use what is best (based on their editorial capabilities) to produce the desired nice-looking output in rendered citations, but still would produce (or at least allow to produce) perfectly simplified and semantically correct metadata at the same time.

The $...$ markup and general HTML simplifier have been implemented by Trappist already, although both could be further improved (as discussed). I have shown a "mockup" of the hidden "MeTaDaTa" feature. It would be ready for actual implementation, but I don't want to spend time on it if I get reverted by one of those ninja fighters who either don't participate in the discussions seeking for solutions or only complain about inadequacies without proposing better solutions to the problems. My limited time is too precious for this. Waiting for positive feedback...

--Matthiaspaul (talk) 12:56, 15 October 2021 (UTC)

@Matthiaspaul: If everyone else is happy with that solution, I have no objection. I will start using those templates in citations. -- Beland (talk) 00:54, 26 October 2021 (UTC)

extraneous punctuation

I tweaked this citation today and introduced an extraneous = character in |newspaper==Duluth News Tribune. But, I did not see it so the article got published with my error. I expect that I'm not the only one to have done that. So, I've tweaked the extraneous punctuation test:

Cite news comparison
Wikitext	`{{cite news\|archive-date=2013-01-21\|archive-url=http://archive.today/20130121173250/http://www.duluthnewstribune.com/event/obituary/id/163733/\|department=Obituaries\|location=Duluth, MN\|newspaper==Duluth News Tribune\|title=Lynn Diane (Swapinski) Jurek\|url=http://www.duluthnewstribune.com/event/obituary/id/163733/}}`
Live	"Lynn Diane (Swapinski) Jurek". Obituaries. =Duluth News Tribune. Duluth, MN. Archived from the original on 2013-01-21.`{{cite news}}`: CS1 maint: extra punctuation (link)
Sandbox	"Lynn Diane (Swapinski) Jurek". Obituaries. =Duluth News Tribune. Duluth, MN. Archived from the original on 2013-01-21.`{{cite news}}`: CS1 maint: extra punctuation (link)

Extraneous punctuation is not considered an error so the article ends up in Category:CS1 maint: extra punctuation and cs1|2 displays the green maintenance message for those few who have enabled maintenance messaging.

—Trappist the monk (talk) 18:58, 1 November 2021 (UTC)

RFC 9134

I've just formatted a reference to RFC 9134 and have been advised to report the "check |rfc= value" error here. Range checking apparently needs to be updated. ~Kvng (talk) 17:33, 2 November 2021 (UTC)

I don't see any error message when I edit or preview that section. [eta: It looks like a gnome took care of it.] – Jonesey95 (talk) 00:47, 3 November 2021 (UTC)

SEEKING PEDANTS -- writing a script to automatically generate citations and want to get it right

I've made a piece of software called PressPass (a longer explanation of its features and functions is here, along with the code). Essentially, what it does is automatically generate filled-out {{cite news}} invocations from Newspapers.com clippings and search pages. Currently, I am revising some parts of the generation functions, and making a configuration menu (for stuff like, e.g., whether to include access-date). However, I would like to ensure that the templates it generates are properly formatted.

Here is what it looks like, for this clipping:

<ref name="Charle18031112">{{Cite newspaper|url=https://www.newspapers.com/clip/87466966/public-auction/|date=1803-11-12|page=4|title=Public Auction|newspaper=The Charleston Daily Courier|location=Charleston, South Carolina}}</ref>

So far, in the configuration menu, I'm writing features to allow multi-line cite templates, as well as different options for the date output (1969-12-31, 31-12-1969, 1969 Dec 31, 1969 December 31, December 31, 1969"), and the ability to specify whether access-date, via, or location are included.

This is, more or less, all the information exposed to my script from the clipping page. The headline has to be typed in manually by the user since Newspapers.com doesn't have this all scraped. That said, I understand that there's a lot of "best practices" with regard to the ordering of parameters, et cetera (and I can have the date output as whatever). Since I expect this to be used a lot (and have been using it a lot myself, for example in improving Bradford Island to FA), is there anything I should be doing? Is there anything I'm missing? jp×g 20:43, 21 October 2021 (UTC)

Regarding date formats, note as per Wikipedia:Manual of Style/Dates and numbers § Dates, months, and years, dd-mm-yyyy and yyyy month dd shouldn't be used. isaacl (talk) 20:53, 21 October 2021 (UTC)

(edit conflict) Sounds interesting. This citation template looks valid to me. From a technical standpoint, the order of the parameters does not matter. Some editors get worked up over WP:CITEVAR differences within an article; you'll probably hear from at least one of them as the tool is adopted and used more widely. With respect to manual title entry, {{cite newspaper}} requires |title=, so please ensure that your tool requires it. If you are adding |access-date=, remember that it requires |url=. Some of those date formats are invalid on Wikipedia and in CS1 templates. See MOS:DATE for valid formats. – Jonesey95 (talk) 20:58, 21 October 2021 (UTC)

Yeah, it doesn't work without a title (if you don't enter a title for the clipping, or if you generate them from the search page, it defaults to "Page 5" or "Page B4" or whatever). The date formats were something I was concerned about compatibility for. Previously, it only ever formatted them as yyyy-mm-dd (which has worked fine and never thrown an error or anything). To be honest, I'd be fine with the chauvinism of "yyyy-mm-dd is correct, use it or pound sand", but I figured I'd add some other formats since I am not the king of the world (yet 😈). jp×g 21:42, 21 October 2021 (UTC)

In your documentation for the tool, you can explain that it outputs dates in YYYY-MM-DD format, and that editors can add {{use dmy dates}} or {{use mdy dates}}, as appropriate, to the top of an article to have all CS1 templates display dates in that format. – Jonesey95 (talk) 22:35, 21 October 2021 (UTC)

Adding, since I did not notice it until others commented on it below: Whatever the commented "Sat" is in your example, it is not good (yet?). First, it's outside the ref tags, when it should probably be inside. Second, if it represents the day of the week, it should not be abbreviated, and we don't need it. If someone wants to know what day of the week a date happened on, they can look it up. None of our citation formats display the day of the week, since it is extraneous. – Jonesey95 (talk) 01:50, 22 October 2021 (UTC)

Those aren't part of the citation, so they're not in the ref tags -- and I will probably disable them by default. The reason I put them there was to serve as a helpful note, because I was going insane trying to write articles from newspaper refs and put dates to what happened (since every newspaper article will say "Next Monday" or "Last Thursday" or whatever instead of just the date). jp×g 07:36, 22 October 2021 (UTC)

Umm, Bradford Island has never been FA so far as I can tell...

Thoughts:

Spell out 'Charleston' in the <ref> tag name= attribute and hyphenate the date; consider including the page number and provision for a disambiguator for the cases where multiple articles sharing the same page are cited (<ref name="Charleston 1803-11-12 p4a">).
I think that the default state for any citation linking to a newspaper at Newspapers.com should be |via=Newspapers.com because that is the recommended form at WP:Newspapers.com and because the clipping is not delivered by the publisher.
|location= should not be displayed except when it is needed to disamblguate the newspaper named in |newspaper=.
Because the newspaper is dated, |access-date= is generally not required (the newspaper is not an ephemeral source).
Pagination, I think, requires special attention. I have a niggling memory of pagination listed in the scraped information provided for a clipping where the page number did not match the section page number printed on the original page – the pagination provided was more like a sequence number where 1 was the first page of the first section. It could be that I'm confusing newspapers.com with newspaper facsimiles at google or Trove.
support |pages=[https://www.newspapers.com/clip/56035464/the-los-angeles-times/ 1], [https://www.newspapers.com/clip/56035546/the-los-angeles-times/ 10] for as many pages as are necessary; allow for page ranges
I don't think that the day-of-week annotation is necessary but if it is, don't abbreviate it.

—Trappist the monk (talk) 22:31, 21 October 2021 (UTC)

Oh, I mean, I'm in the process (the nomination will come after the peer review, and I still have some things to take care of). At any rate, the expanded ref name is a really good idea (I am in the habit of using shorter ref names to keep the source of an article from getting unwieldly, but this is by no means a universal preference. The access-date thing is smart... the pagination thing sounds like an interminable issue. I think the most I could do is add an option to omit page names from the generated cites, and allow the user to fill them in (since actual printed page numbers aren't reliably OCRed, and this information isn't actually available from anywhere besides user input). I can definitely add a warning about it in the documentation or interface, though. The thing about supporting multiple pages, I'll have to look into, as it sounds neat (I had no idea you could even do that in a citation!) The location thing is kind of hard to wrap my head around a solution for -- I could try a basic comparison for whether the city name was contained in the paper name (i.e. "The Detroit News" contains "Detroit"), but this will be imperfect; "The New York Times" doesn't contain "New York City", for example, and the state of New York isn't mentioned anywhere on the clip page -- so maybe I would need to build a huge database of every city and what state/province they were in -- and then I'd need to deal with all the Athens, Ohios and Paris, Texases and Detroit, Alabamas... zoinks! I'll see what I can do, though, and I appreciate the thoughts. jp×g 08:06, 22 October 2021 (UTC)

One of the first things that will happen with your particular example above is that Citation Bot will come along and change your {{Cite newspaper}} to {{Cite news}} or maybe {{Cite web}}. Some discussions here and here. I also don't see the need to note the day of the week, nor do I understand why it's an HTML comment. And regarding date formats: extremely cool, albeit not to be expected, would be if PressPass tried to automatically use the format specified in the page's {{Use xxx dates}} template, if present. Otherwise, what Jonesey said (22:35). — JohnFromPinckney (talk / edits) 00:21, 22 October 2021 (UTC)

That's a good catch -- my thinking was that, even if {{cite newspaper}} was aliased to {{cite news}}, there might be some use to it (i.e. making it easy to find citations to newspapers as opposed to news sites, blogs, etc). But if there's a bot coming through afterwards to fix them, it's just an unnecessary pain in the ass... regarding using the same date format as the article, this isn't really possible the way the software's set up. It runs in your browser when you're on Newspapers.com, and doesn't actually interface with the articles at all (so it has no way to know what you're using them for). I have, however, finished writing most of the code that allows you to save settings for different formats -- this should at least make it possible to conform to article conventions if someone wants to. Of course, it's a good idea, and it would own if it were possible. jp×g 07:50, 22 October 2021 (UTC)

I would like to ask the OP if s/he finds the documentation of {{cite news}}, and this page's parent unclear, and what if anything, s/he believes could be done with the doc to better facilitate similar template development. Not that there is anything wrong with coming here and asking for guidance, it is a good idea. I'm just curious over whether the template configuration questions were partly or wholly prompted by what s/he perceives as doc issues here. 64.18.9.196 (talk) 04:02, 22 October 2021 (UTC)

The documentation here (as for most templates) is refreshingly complete, especially compared to what you'd expect from a volunteer project consisting mostly of non-technical writers. I've got no complaint with that -- I'm just here to make sure that there's nothing I missed with regards to house style, or general convention, or whatever you want to call it. Since this software is going to be used by a lot of people to generate a lot of citations, I think it's worth putting a lot of thought into avoiding issues or creating work later on (because, boy howdy, will it). For example, if 100 people use this, and each of them writes 25 articles containing 40 citations, and something is stupid about them, we now have 100,000 problems to deal with. Most of the stuff brought up here (adding an option for longer reference names, for example) isn't really part of what documentation covers per se... but I'm glad someone thought of it. jp×g 07:36, 22 October 2021 (UTC)

—Trappist the monk (talk) says: "|location= should not be displayed except when it is needed to disamblguate the newspaper named in |newspaper=".

I would much rather put it the other way round: "|location= is ESSENTIAL unless the name of the city of publication is part of the name of the newspaper". -- Alarics (talk) 09:33, 22 October 2021 (UTC)

We do not permit the use of the YYYY-MM-DD format for dates in the Julian calendar. The newspaper.com site provides coverage for some newspapers that were published before 1752, the year in which the British colonies in North America changed from the Gregorian calendar to the Julian calendar. An example of a newspaper available from newspapers.com for this situation is The Pennsylvania Gazette for the year range 1728 to 1752. The metadata emitted by Citation Style 1 is false for Julian calendar dates. In this situation, I suggest you emit a plain text citation with no template. For example,

Ticket details. The Pennsylvania Gazette. Philadelphia, PA. May 16, 1751. p. 3.

[This example illustrates another failing of Citation Style 1. It lacks the ability to use a description as a title, for cases when the author or publisher have not given a story a title.]

Jc3s5h (talk) 14:51, 22 October 2021 (UTC)

This is not a "failing of Citation Style 1". If a work has no title, and a title is required, a placeholder that is immediately understood as such, can be inserted. But we cannot make up descriptions, and I know of no citation or filing system that allows you to discover a citation's source by arbitrary description. Such sources may be classified with a generally accepted semi-official description in a special field (very rarely in the title field) in which case the relevant metadata would be accessible. But then you would have to know what the description is.64.18.9.201 (talk) 18:17, 22 October 2021 (UTC)

@Jc3s5sh: This is an interesting point, and one I wouldn't have thought of in a million years. How about that! I guess I will account for that (and probably add a warning for users who would otherwise be confused at why it's not working right). One minor question I have is, did you type it backwards in the second sentence? In 1752 the colonies switched from the Julian to the Gregorian calendar. jp×g 09:58, 23 October 2021 (UTC)

As an aside (Bradford Island), several {{cite web}} citations on that page may be more properly entered as {{cite report}} or {{cite news}}. In the latter case, this is recommended even if the news source is online-only. 68.174.121.16 (talk) 13:05, 23 October 2021 (UTC)

FWIW, saving your clipping into Zotero, then dragging it here from there, using Zotero's Wikipedia exporter function, produces:

{{Cite news| pages = 4| title = Public Auction| work = The Charleston Daily Courier| location = Charleston, South Carolina| accessdate = 2021-10-25| date = 1803-11-12| url = https://www.newspapers.com/clip/87466966/public-auction/}}

While any volunteer is, of course free within reason to work on anything they like, perhaps effort would be better deploying in tweaking that exporter, rather than (if I may) reinventing the wheel? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:53, 25 October 2021 (UTC)

Well, if you are already using Newspapers.com with the script installed, you do not do all that stuff; you just create a clipping and it automatically generates a citation. I've been using this script to source articles, and I can come up with somewhere around ten references in five minutes (accounting, of course, for the manual process of determining which articles are relevant and then creating the clippings). I can't imagine there being an external software that does so more quickly (although I've got no objection to someone using Zotero as an alternative). jp×g 02:25, 7 November 2021 (UTC)

New version released

I've made a lot of improvements to the script, and a new version is here; thanks to everyone who helped me out in this discussion! The documentation page covers all of its behavior: let me know if there is anything I missed. jp×g 21:46, 7 November 2021 (UTC)

There are 2 fields to consider: |author= & |agency=. These are not so much discovery parameters, although they may help in finding the correct source faster, but they have a reliability component. An otherwise unreliable source (such as a newspaper that is an official or semi-official organ of a political party, or is state-controlled) may carry a press report from a press agency with proven prior reliability. Or may take through syndication, the column of an author with a history of past objectivity. Such treatment may provide the citation with elevated reliability. In any case, |author= should probably be included even when there is no byline. This is recommended: |author=. 172.254.222.178 (talk) 23:17, 7 November 2021 (UTC)

For newspapers specifically, the main classification indices are usually the title and date fields (also issue number). That is why it was stated above that "author", which would be a subindex field in such classifications, is not a primary discovery parameter. Unlike in other citation types. 172.254.222.178 (talk) 23:24, 7 November 2021 (UTC)

I am considering adding something like that; my main concern is that including the "staff writer, no byline" note could be misleading if it's being included by default on citations that do have a writer listed. I think that maybe it would be best to just have the parameter included blank: either |first=|last= or author=. jp×g 23:51, 7 November 2021 (UTC)

Well, the hidden comment should be offered only when there is no author listed. It is for the benefit of other wiki editors, so they do not waste time looking to add a named author when there is none. Also this may prevent bots from "fixing" things that shouldn't be fixed. 50.74.109.2 (talk) 00:41, 8 November 2021 (UTC)

The current recommendation is |author=; see Help:Citation Style 1 § Authors.

—Trappist the monk (talk) 00:47, 8 November 2021 (UTC)

The problem here is that, like headlines, there's no metadata provided for authors (even when there is one listed) -- so it can't be determined whether that editnote applies without user input. If bots go through and remove blank author fields eventually, I suppose that's not a huge problem (since putting them there would solely be as a convenience to the editor). I guess I will probably give three options (author fields with html note, author fields blank, and no author fields). jp×g 00:54, 8 November 2021 (UTC)

I am constantly finding press citations that don't include the author(s) even though the article does have a byline. Many editors just don't seem to care about getting references right. Much more serious, to my mind, is the widespread failure to disambiguate newspapers by city of publication ("location") even when the city isn't part of the newspaper's title; and worst of all giving the access date but not the publication date, even though the publication date is a far more significant piece of information in the case of newspaper reports. -- Alarics (talk) 09:49, 8 November 2021 (UTC)

@Alarics: Well, this will help on two of three counts; per the recommendations upthread, I made it so that location is always included (and publication date, of course, was from the get-go). Maybe this will clean the place up a little bit... jp×g 10:18, 8 November 2021 (UTC)

@JPxG: Thanks. -- Alarics (talk) 13:28, 9 November 2021 (UTC)

The proposed options are good for no-author. As for listed authors (or agencies), the reliability issue associated with them when citing press items is important regardless of the existence of metadata. Perhaps script users should be urged to include this info if the data exists. 65.88.88.57 (talk) 22:16, 8 November 2021 (UTC)

Example of ref=none

I wanted to add an example of the use of ref=none. This is the usual form, so I felt an example of its use was appropriate. Unfortunately, there is no talk page for discussion of this, as it links to here Hawkeye7 (discuss) 21:11, 9 November 2021 (UTC)

Hawkeye7 To which page are you looking to add that as an example? Izno (talk) 21:48, 9 November 2021 (UTC)

It was Template:Cite encyclopedia/doc. At this edit Editor Hawkeye7 added |ref=none with the edit summary Example text. I reverted that because live documentation is not for testing. Editor Hawkeye7 then added new {{markup2}} template with the edit summary NOT a test edit - ref=none needs to be used all the time - documentation needs to show example of this. I reverted again noting that the initial edit was labeled as a 'test' and suggested discussion here.

|ref=none is noted in the template's documentation which links to a longer discussion at {{citation}}. I'm not sure that yet-another-example at {{cite encyclopedia}} is all that beneficial.

—Trappist the monk (talk) 22:44, 9 November 2021 (UTC)

[Template:Cite book] How do I add a foreword author?

Template:Cite book/doc states:

contribution: A separately-authored part of author's book. May be wikilinked or may use contribution-url, but not both. Values of Afterword, Foreword, Introduction, or Preface will display unquoted; any other value will display in quotation marks. The author of the contribution is given in contributor.

However, I cannot manage to get any of Afterword, Foreword, Introduction, or Preface to work. Could someone help? Veverve (talk) 13:12, 13 November 2021 (UTC)

{{cite book |contributor=Foreword author |contribution=Foreword |author=Book author |title=Title of the book}}

Foreword author. Foreword. Title of the book. By Book author. {{cite book}}: |author= has generic name (help)

—Trappist the monk (talk) 13:30, 13 November 2021 (UTC)

Video citation

Can we create an equivalent of Template:Page numbers for video timestamps? I would like to be able to cite minute #, second # of a video, as a reference for a statement. I believe this feature and a "Video timestamps needed" tag would improve the WP:V of videos used as references, of which there are many. LondonIP (talk) 23:44, 13 November 2021 (UTC)

Template:Cite AV media has a parameter for timestamp. Schazjmd (talk) 23:58, 13 November 2021 (UTC)

Please can you show me an example of it citing a video with a timestamp? LondonIP (talk) 01:39, 14 November 2021 (UTC)

@LondonIP: From Muhammad Ali, reference #173:

{{cite AV media|url=https://www.youtube.com/watch?v=ctoF5Ctc0ZM|title=Day at Night: Muhammad Ali, legendary boxing champion|time=21:50}}

Day at Night: Muhammad Ali, legendary boxing champion. Event occurs at 21:50.

Happy editing! GoingBatty (talk) 02:28, 14 November 2021 (UTC)

Authorlink to Wikidata?

The Egyptologist Dorothea Arnold doesn't have an article here, but does in three other Wikipedias. Unsurprisingly,

{{Cite book|last=Arnold|first=Dorothea|authorlink={{illm|WD=Q1246153|Dorothea Arnold}}|url=https://books.google.com/books?id=sGLFwVkljQMC&lpg=PP1&pg=PA135|title=The Royal Women of Amarna: Images of Beauty from Ancient Egypt|author2=Metropolitan Museum of Art|last3=Green|first3=L.|last4=Allen|first4=James P.|date=1996|publisher=Metropolitan Museum of Art|isbn=978-0-87099-816-4|language=en}}

doesn't work. Is there a workaround (or am I missing something obvious)? -- Hoary (talk) 01:23, 14 November 2021 (UTC)

Perhaps an interwiki link to an existing page is more appropriate, amd would avoid any Wikidata-related issues? 68.173.76.118 (talk) 01:43, 14 November 2021 (UTC)

If you must link to wikidata, you can write:

{{Cite book |last=Arnold |first=Dorothea |author-link=:d:Special:EntityPage/Q1246153#sitelinks-wikipedia |last2=Allen |first2=James P. |last3=Green |first3=L. |url=https://books.google.com/books?id=sGLFwVkljQMC&pg=PA135 |title=The Royal Women of Amarna: Images of Beauty from Ancient Egypt |date=1996 |publisher=Metropolitan Museum of Art |isbn=978-0-87099-816-4 |language=en}}

Arnold, Dorothea [at Wikidata]; Allen, James P.; Green, L. (1996). The Royal Women of Amarna: Images of Beauty from Ancient Egypt. Metropolitan Museum of Art. ISBN 978-0-87099-816-4.

—Trappist the monk (talk) 01:47, 14 November 2021 (UTC)

This is problematic, on several counts. First, the reader must navigate a separate UI when they follow the link. Secondly, readers should be notified prior to clicking that the targets are in several other languages. A consequence is that the target's reliability (to say nothing of the content) cannot be judged by non-speakers of the languages involved. It is perhaps better for the editor to pick the most reliable of the Wikidata links and add a language note. Personally, I would not link this author in this case. 68.173.76.118 (talk) 02:03, 14 November 2021 (UTC)

Thank you, Trappist the monk! Reasonable comments, 68.173.76.118, even if I demur. (1) They are indeed faced with a different and probably unfamiliar interface, and this isn't so desirable. But they're free to hit the back button. (2) If someone can't speak, say, Russian, then they'd be unlikely to click on the link to the Russian-language article. And even within English-language Wikipedia, we (rather famously) don't promise reliability. (And nor should we, given the number of falsehoods that one comes across in its articles.) Now, let's suppose that I am lucky enough to be able to read and evaluate the three languages (arz, de, ru), that I decide that the German one is the best, and that I link directly to it. Its quality may thereafter sink, or a greatly superior French, Italian or other article might emerge. With all that in mind, if I link, I'd prefer to do so to Wikidata. However, I'd agree that providing a link from the (main) author's name doesn't seem so important. -- Hoary (talk) 05:03, 14 November 2021 (UTC)

Yes, Wikipedia is generally unreliable, and services such as Wikidata redistribute and therefore entrench the unreliability. There is no need for an author link. But if the editor decides to add it, it must be because 1. it helps locate the source faster/better 2. enhances or at least does not hinder the citation's reliability. Nothing should be added in a citation unless the editor has verified every aspect of it and is comfortable with the element's reliability. A foreign-language author page may have reliable information that the author has been engaged in fabrications. This would make any such sources suspect, but the English-language reader will have no way of knowing this. The idea that readers are free to hit the back button is disingenuous. This is about verifying wikitext, it is not about adding links to anything or needlessly complicating the process. If readers have to hit the back button regarding anything in a citation, this is not a good citation, and therefore not a good article, and therefore not a good encyclopedia. Which is obvious anyway, but also fixable to an extent. The first thing is to make sure that this is not about you (not "you" personally) or about what you know. You write a citation for a reader who has never seen one. That imo should be the baseline, because it is likely true. 68.174.121.16 (talk) 14:42, 14 November 2021 (UTC)

Template:Cite web/Danish

Template:Cite web/Danish is listed as auto subst but has 165 transclusions. Any idea why it isn't being subst? Gonnym (talk) 20:10, 15 November 2021 (UTC)

Stuff inside <ref> cannot be substed, and AnomieBot probably knows that, so it doesn't (try). Probably could use an upgrade on that point to e.g. hit the REST API for the to-be-substed content of the ref and then doing a replace of that content? Not sure if that would work. Anomie? Izno (talk) 20:49, 15 November 2021 (UTC)

(edit conflict)

Because those articles use {{Kilde www}} a redirect to {{cite web/Danish}}? Does User:AnomieBOT/source/tasks/TemplateSubster.pm understand redirects? I have no experience with Perl so it isn't clear to me from that code. Perhaps Editor Anomie can provide an answer.

—Trappist the monk (talk) 20:53, 15 November 2021 (UTC)

I think AnomieBot has magic that allows it to subst inside refs. It is just cautious about substing templates with more than 100 transclusions. See this notice that the bot posted on its own talk page. – Jonesey95 (talk) 21:01, 15 November 2021 (UTC)

AnomieBOT does manually subst so it can subst inside of <ref>s (but it won't subst inside <nowiki>, which makes {{row numbers}} annoying). You're right that having over 100 transclusions will prevent substing, as a safety measure against some vandal finding a template with thousands of transclusions and having the bot subst them all. 100 seems a decent cutoff for someone to manually fix should a bad substing run need to be reverted. Anomie ⚔ 22:02, 15 November 2021 (UTC)

This substitution has generated some errors in |date= parameters, if anyone is interested in troubleshooting. The code is too nested and strange for me to parse and does not catch invalid dates. – Jonesey95 (talk) 23:30, 15 November 2021 (UTC)

I have been picking away at a lua module that will do a better job of translation than the existing variety of templates. I've got translators more-or-less done for Dutch, French, Finnish, Portuguese, Polish, and Swedish book/journal/web templates. Danish, German, Italian, Russian, Spanish, Turkish book/journal/web templates in the works. While I'm not in any hurry to finish this project, were I you, I wouldn't spend any time fixing the wikitext translation templates because your work will be overwritten by a simple invoke.

—Trappist the monk (talk) 23:55, 15 November 2021 (UTC)

Nothing like nested if statements inside nested if statements to give you a nice spaghetti-code headache. Not to mention the expensive string replacements. But the template author should be applauded, for doing a decent job with very primitive tools at her/his disposal. 71.247.146.98 (talk) 00:05, 16 November 2021 (UTC)

Book volumes

I was following this discussion with interest, but I see now it's been archived with nothing being done? Given the unanimous consensus there to make the proposed changes to the "volume" output, and since no-one objected to Kanguole's sandbox edits, is there any reason this can't be implemented right away? Dan from A.P. (talk) 10:04, 21 November 2021 (UTC)

We sync on a usually-quarterly basis. We missed the last quarter in early October, I know not why, with the next being January. Izno (talk) 05:37, 22 November 2021 (UTC)

The last official update was in early April, which was followed by the unhyphenated parameter name kerfuffle. I recommended another update in late April, since there were a lot of pending changes, but that update never went through. There were a few changes made after the parameter naming RFC was closed and then reclosed. I think the next step is for someone to compile a list of changes to the modules since the most recent update, with links to discussions about those changes, as is typically done before an update. You may be able to work from my late April list as a first draft, but don't trust it without checking each item. – Jonesey95 (talk) 05:58, 22 November 2021 (UTC)

Gunshy. The drama that followed the last update demonstrates, to me at least, that the community have lost faith in us, or maybe they have just lost faith in me. That being the case, I'm inclined to start an all-or-nothing rfc at one of the drama boards, probably WP:VPR. Let the drama board decide and then if the result is all, fine update and get on with life; if nothing, discard all changes in the sandboxen and get on with life.

—Trappist the monk (talk) 14:31, 22 November 2021 (UTC)

I would not exactly phrase it like that. One volunteers to develop a certain piece of software. It is not a matter of "trust", and if some need to trust the developer, they should re-examine their needs. One can either do it (technically) or not, and I would say that on average, the project is no worse than other similar Wikipedia projects. This may be a poor endorsement, but anyway. The issue for me is the micro-management exhibited in the RfC over the tracking categories. That tipped it for me. If one is so intent to micromanage development, it is best that they develop it themselves. Personally, if I was the developer here I would quit at the mere mention of such an RfC. This is not about winning someone's trust, it is about a small minority obstructing development and making the developers' work untenable. And it is obvious that the RfC process is broken. Thank the so-called "administrators" who invented a consensus where none existed and who dismissed any further discussion when that was pointed out. So, good luck!! 65.88.88.57 (talk) 15:54, 22 November 2021 (UTC)

more bogus names

I have added 'author', 'collaborator', 'contributor', 'editor', 'translator' as bogus names. Yeah, this will break the convenient |author=Author |editor=Editor etc demo uses but these exist in article-space templates where they do not belong. We will just have to be a little more creative in our demos.

—Trappist the monk (talk) 16:22, 22 November 2021 (UTC)

In a better world, maybe {{cite\demo xxx}} could exist. 65.88.88.57 (talk) 16:28, 22 November 2021 (UTC)

Why? Some authors are not people, and I do not wish to see |author=XYZ, Inc. replaced by |last=XYZ|first=Inc. Then there are authors who are individual people, but whose names do not follow the Western "forename surname" convention, so I might use |author=Ban Ki-moon for which both |last=Ban|first=Ki-moon and |first=Ban|last=Ki-moon are just plain wrong. This has been discussed before. --Redrose64 🌹 (talk) 22:40, 22 November 2021 (UTC)

Umm, your examples do not use bogus names so will not be detected as errors:

{{cite book/new |author=XYZ, Inc. |title=Title}}

XYZ, Inc. Title.

{{cite book/new |author=Ban Ki-moon |title=Title}}

Ban Ki-moon. Title.

But, these will:

{{cite book/new |author=Author |title=Title}}

Author. Title. {{cite book}}: |author= has generic name (help)

{{cite book/new |editor=collaborator |title=Title}}

collaborator (ed.). Title. {{cite book}}: |editor= has generic name (help)

—Trappist the monk (talk) 23:01, 22 November 2021 (UTC)

Thesis vs Dissertation (displayed in citations)

I am not sure if here is the correct location for my question so please forgive my trespass. I read myself around in circles a while to see if this had been discussed before and the only thing I found that was close is THIS which was about display of titles in italics vs quotes. It is why I felt this best posted here since it is of a similar type subject.

A thesis is produced by a student in the culmination of ones higher education into a chosen field to demonstrate expertise and a thorough understanding, thus earning a degree and graduating. When {cite thesis} is used (thesis) is displayed inline to the reader in the reference section.

A dissertation is the product of a graduate applying that earned degree and expertise, most often producing original research into that chosen field of expertise. When {cite dissertation} is used, it also displays (thesis) to the reader. Is this an accurate and intended function?

These are very different documents in scope and purpose and it feels erroneous that the difference isn't displayed accurately in the citations. Has there already been a discussion/consensus on this issue that I missed, or is it possibly a simple oversight and easy fix?

With best regards.
--- Darryl.P.Pike (talk) 20:40, 23 November 2021 (UTC)

{{Cite dissertation}} is a redirect to {{Cite thesis}}, which is why they render the same. The distinction between a thesis and a dissertation is a muddy one, as you can read about at thesis and other places on the web; there is no way that we could or should try to determine which word to display automatically. You can use |type= to display something other than the default "Thesis". – Jonesey95 (talk) 21:01, 23 November 2021 (UTC)

Will do this from now on for sure. I did not realize the parameter would work like that. Thank you.
---> Darryl.P.Pike (talk) 01:37, 24 November 2021 (UTC)

Further muddying the waters, and despite the supposed distinction described above, I think "habilitation thesis" is much more common than "habilitation dissertation" as a translation of Habilitationsschrift. —David Eppstein (talk) 04:40, 24 November 2021 (UTC)

Possible false positive error at Template:Cite thesis/doc

{{Cite thesis/doc}}, in the Template Data section, shows an error, "|degree= is not a valid parameter". I think that error message, which appears to be caused by Module:Cs1 documentation support, is incorrect. |degree= appears to be a documented and working alias of |type=. It looks like something needs to be tweaked. – Jonesey95 (talk) 21:06, 23 November 2021 (UTC)

Fixed that but ...

at ~/Whitelist line 507, |degree= is defined as a parameter that is unique to {{cite thesis}}

at ~/Configuration line 254, |degree= does not have any aliases

The TemplateData at Template:Cite thesis/doc § TemplateData identifies |type= as an alias of |degree=; it is not, |degree= modifies the template-specific default TitleType metaparameter unless overridden by |type=

at ~/Configuration line 324, |type= does not have |degree= as an alias

Module:cs1 documentation support doesn't catch that error but it should. I'll think about how to fix that.

—Trappist the monk (talk) 23:37, 23 November 2021 (UTC)

Hence my waffle language above. I couldn't quite figure it out. It may be that the actual template documentation at {{Citation Style documentation/type}} needs a tweak if degree is not an alias of type but instead there are some override conditions involved. – Jonesey95 (talk) 00:14, 24 November 2021 (UTC)

Journal and publisher

I find it's distracting and a bit confusing about the place of publisher in journal article citation. For example:

Vo, Alex-Thai D. (2015). "Nguyễn Thị Năm and the Land Reform in North Vietnam, 1953". Journal of Vietnamese Studies. University of California Press. 10 (1): 1–62.

I think journal volume and issue should go immediately right after journal name, instead of publisher inserted between them. Like, the above example should read:

... Journal of Vietnamese Studies. 10 (1): 1–62. University of California Press.

This would make it more logical and smooth to read. Can someone please explain why we put the publisher in the place like we're doing? Any specific citation rule that I'm not aware of? Thanks a lot. Sorry for my bad English. 2604:3D08:4E7F:F7E0:952C:82DC:1A4F:9CD1 (talk) 16:51, 28 November 2021 (UTC)

Omit |publisher= in {{cite journal}}. Yeah, the documentation sucks. But at Help:Citation Style 1 § Work and publisher under the Publisher bullet is this:

WP:Citing sources, and most off-Wikipedia citation guides, suggest that it should be used for books (even famous ones), but not necessarily other works.

—Trappist the monk (talk) 17:10, 28 November 2021 (UTC)

Thank you. So from now on I can simply omit it in journal citation. 2604:3D08:4E7F:F7E0:952C:82DC:1A4F:9CD1 (talk) 20:35, 28 November 2021 (UTC)

If you wikilink Journal of Vietnamese Studies, readers can click on it to find out who publishes the journal. GoingBatty (talk) 17:39, 28 November 2021 (UTC)

rfc

Wikipedia:Village pump (proposals)#rfc: shall we update cs1/2?

—Trappist the monk (talk) 23:13, 28 November 2021 (UTC)

Utter waste of time. 68.173.76.118 (talk) 00:11, 29 November 2021 (UTC)

Suggested values for url-status

Does someone know how to edit the TemplateData for this? It should be something like the following. Thanks.

"suggestedvalues": [
		"live",
		"dead",
		"unfit"
	]

—Michael Z. 01:07, 29 November 2021 (UTC)

Where? {{cite book}} appears to have something like what you are suggesting; see Template:Cite book/TemplateData

—Trappist the monk (talk) 01:15, 29 November 2021 (UTC)

I was using {{cite web}} in the visual editor, and the field has a dropdown, but it’s not populated with values (if I enter a valid one, then only it appears). Clicking on the template's documentation brought me here. —Michael Z. 01:38, 29 November 2021 (UTC)

I don't use ve so cannot comment on the dropdown. {{cite web}} and {{cite book}} appear to have more-or-less the same "suggestedvalues" values (order is different); I doubt that order makes a difference. There are other obvious differences where one template has something that the other template does not: {{cite book}} has "default"; {{cite web}} has "autovalue" and "suggested". I suppose that these might make a difference but I don't know. Have you tried an experiment using the {{cite book}} template instead of {{cite web}}?

—Trappist the monk (talk) 02:09, 29 November 2021 (UTC)

Double quotation marks within title of minor work

Is there a reason why double quotation marks in such cases are not automatically displayed as single quotation marks? E.g.

{{cite web |title=Title with "quotation marks" in it |url=http://www.example.com/}}

"Title with "quotation marks" in it".

Cheers – Finnusertop (talk ⋅ contribs) 08:29, 29 November 2021 (UTC)

I don't know that this topic has ever been raised before. Has it? If so, where was that discussion?

—Trappist the monk (talk) 13:05, 29 November 2021 (UTC)

The reason being that this was probably never brought up. It has been up to editors to follow WP MOS on this. It is a rare, minor case and there are probably other, more material issues that could be addressed. At some point it would make sense to implement this. 71.247.146.98 (talk) 13:32, 29 November 2021 (UTC)

I am not aware of any previous discussion. It seems clear to me from MOS that double quotation marks should not enclose double quotation marks. It also stuck to me like something that could be mended by the CS1 templates. I think it is not that rare actually. I keep seeing these and sometimes fix them. I'm sure somebody could run a check of some kind to see just how many there are. – Finnusertop (talk ⋅ contribs) 15:19, 29 November 2021 (UTC)

I make fixes like that frequently. If it could be taken care of at a software level, that would save a bunch of time and generally make Wikipedia more consistent and credible. SchreiberBike | ⌨ 15:43, 29 November 2021 (UTC)

According to this search, 64kish articles more-or-less. That search only finds cs1|2 templates where |title= includes at least one double quote mark so the search includes templates like {{cite book}} that don't wrap |title= in quote marks and ignores |chapter= (and aliases) parameters that do.

—Trappist the monk (talk) 15:49, 29 November 2021 (UTC)

I was referring to MOS:QWQ, above. Any programmatic solution should be complete, i e. include the rendering of works (which as Trappist said are slanted), not just of in-work locations such as chapters which are quoted. This will lead to unavoidable inconsistent presentation of quote marks. Therefore it is imperative the rationale for the inconsistency is explained clearly in the doc. 104.247.55.106 (talk) 16:06, 29 November 2021 (UTC)

Collaboration / et al behaviour

There's always been a weird behaviour when |collaboration= is set. E.g.

{{Cite book |last1 = Van Dijk |first1 = Peter Paul |last2 = Iverson |first2 = John |last3 = Shaffer |first3 = H. Bradley |last4 = Bour |first4 = Roger |last5 = Rhodin |first5 = Anders |collaboration=Turtle Taxonomy Working Group |year = 2012 |chapter = Turtles of the World, 2012 Update: Annotated Checklist of Taxonomy, Synonymy, Distribution, and Conservation Status |title = Conservation Biology of Freshwater Turtles and Tortoises |doi = 10.3854/crm.5.000.checklist.v5.2012 |isbn = 978-0965354097}}

gives

Van Dijk, Peter Paul; Iverson, John; Shaffer, H. Bradley; Bour, Roger; Rhodin, Anders; et al. (Turtle Taxonomy Working Group) (2012). "Turtles of the World, 2012 Update: Annotated Checklist of Taxonomy, Synonymy, Distribution, and Conservation Status". Conservation Biology of Freshwater Turtles and Tortoises. doi:10.3854/crm.5.000.checklist.v5.2012. ISBN 978-0965354097.

but it should instead give

Van Dijk, Peter Paul; Iverson, John; Shaffer, H. Bradley; Bour, Roger; Rhodin, Anders~~; et al.~~ (Turtle Taxonomy Working Group) (2012). "Turtles of the World, 2012 Update: Annotated Checklist of Taxonomy, Synonymy, Distribution, and Conservation Status". Conservation Biology of Freshwater Turtles and Tortoises. doi:10.3854/crm.5.000.checklist.v5.2012. ISBN 978-0965354097.

Et al should not be applied automatically when collaborations are set. Headbomb {t · c · p · b} 06:09, 4 December 2021 (UTC)

You accepted in the third of 3 originating discussions that most collaborations are likely to be cited as et al (The majority of cases would have the et al. though. Headbomb {talk / contribs / physics / books} 02:16, 26 December 2015 (UTC)), and we're now some 6 years down the road. Has your assessment given then changed for some reason? Do you want to track down and add it to all the citations that rely on the current behavior? Izno (talk) 07:38, 4 December 2021 (UTC)

You'll note there that I also was against it back then. Yes the majority will have et al., that doesn't mean it's something that should be automatically added. Headbomb {t · c · p · b} 07:43, 4 December 2021 (UTC)

So, do you plan to sort out the uses of |collaboration= which rely on an automatic et al? Izno (talk) 08:14, 4 December 2021 (UTC)

Do you plan to sort out the uses of |collaboration= which inappropriately adds et al? The vast majority of uses requiring an et al. to be displayed already have a manually set display-authors. The vast majority of uses which don't have a display authors shouldn't have the et al. There are very, very few citations with a collaboration parameter set that need an automatic et al to be added. Headbomb {t · c · p · b} 15:57, 4 December 2021 (UTC)

If something is wrong, the fact that it will take a lot of work to fix is not a good reason for keeping it. I'm a gnome type; I'd be happy to work on the project. SchreiberBike | ⌨ 16:08, 4 December 2021 (UTC)

As stated below this seems to be an inappropriate (imo, because undocumented) default, applying when there are more than four authors. In Headbomb's example, |display-authors=4 works properly, but |display-authors=5 returns an error. This should be handled at the source. I suppose when/if development on the module collection resumes, this could be tasked, following discussion. 65.88.88.71 (talk) 16:47, 4 December 2021 (UTC)

To clarify, either document that |display-authors= cannot be set manually for authors>4, or remove default-value rendering. 65.88.88.71 (talk) 16:59, 4 December 2021 (UTC)

Where do you get the four authors default? |collaboration= does not count author names. The documentation does say that 'et al.' will be appended to the author-name list when |collaboration= is used. If you believe that the documentation can be improved, please do so.

—Trappist the monk (talk) 17:33, 4 December 2021 (UTC)

I do not document what I do not code. When the documentation is correctly split into reader, editor, and developer doc pages, I will be happy to help with the reader doc. I remember that in the past anything above four authors would be truncated with the et al. suffix. So I assumed this was still the case, because in the present case:

{{Cite book |last1 = Van Dijk |first1 = Peter Paul |last2 = Iverson |first2 = John |last3 = Shaffer |first3 = H. Bradley |last4 = Bour |first4 = Roger |last5 = Rhodin |first5 = Anders |collaboration=Turtle Taxonomy Working Group |year = 2012 |chapter = Turtles of the World, 2012 Update: Annotated Checklist of Taxonomy, Synonymy, Distribution, and Conservation Status |title = Conservation Biology of Freshwater Turtles and Tortoises |doi = 10.3854/crm.5.000.checklist.v5.2012 |isbn = 978-0965354097|display-authors=4}}

renders

Van Dijk, Peter Paul; Iverson, John; Shaffer, H. Bradley; Bour, Roger; et al. (Turtle Taxonomy Working Group) (2012). "Turtles of the World, 2012 Update: Annotated Checklist of Taxonomy, Synonymy, Distribution, and Conservation Status". Conservation Biology of Freshwater Turtles and Tortoises. doi:10.3854/crm.5.000.checklist.v5.2012. ISBN 978-0965354097.

but

{{Cite book |last1 = Van Dijk |first1 = Peter Paul |last2 = Iverson |first2 = John |last3 = Shaffer |first3 = H. Bradley |last4 = Bour |first4 = Roger |last5 = Rhodin |first5 = Anders |collaboration=Turtle Taxonomy Working Group |year = 2012 |chapter = Turtles of the World, 2012 Update: Annotated Checklist of Taxonomy, Synonymy, Distribution, and Conservation Status |title = Conservation Biology of Freshwater Turtles and Tortoises |doi = 10.3854/crm.5.000.checklist.v5.2012 |isbn = 978-0965354097|display-authors=5}}

returns

Van Dijk, Peter Paul; Iverson, John; Shaffer, H. Bradley; Bour, Roger; Rhodin, Anders; et al. (Turtle Taxonomy Working Group) (2012). "Turtles of the World, 2012 Update: Annotated Checklist of Taxonomy, Synonymy, Distribution, and Conservation Status". Conservation Biology of Freshwater Turtles and Tortoises. doi:10.3854/crm.5.000.checklist.v5.2012. ISBN 978-0965354097. {{cite book}}: Invalid |display-authors=5 (help)

So something changes when authors>4. Will have to look at the code regarding |display-authors= and |collaboration=. 65.88.88.69 (talk) 20:28, 4 December 2021 (UTC)

Again notice that when the display option is set to 4, the rendering is correct (4 authors+et al). 65.88.88.69 (talk) 20:31, 4 December 2021 (UTC)

It is invalid to have X names and have display-names >= X, by design. Izno (talk) 20:44, 4 December 2021 (UTC)

I'm sure I skipped over this when I was reading the doc. Bad puns aside, I semi-remember this when originally discussed. I believe it was mentioned in an older iteration of the doc, but I am not certain. In any case, |display-authors=etal is a de facto default when |collaboration=some value. 65.88.88.69 (talk) 21:37, 4 December 2021 (UTC)

Back in the days when the CS1 templates were wrappers for {{citation/core}}, there was provision for no more than 9 authors, but the ninth (if supplied) was never displayed. By default, the first 8 were always displayed, and if you supplied either |last9= or |author9=, the first 8 would be followed by "et al". This cut-off could be adjusted by means of the |display-authors= parameter, which accepted an integer in the range 1-8, so you could show fewer than 8 (but not more) before the "et al". Regardless of the number actually displayed, all 9 would be put into the COinS. All this changed in 2013 when Module:Citation/CS1 was introduced. --Redrose64 🌹 (talk) 23:27, 4 December 2021 (UTC)

I don't know if anything is actually 'wrong' but here are some crude data:

~1180 articles with cs1|2 templates using |collaboration=
~75 articles with cs1|2 templates using |collaboration= followed by |display-authors=
~225 articles with cs1|2 templates using |display-authors= followed by |collaboration=

As I understand it, a named collaboration is shorthand for a large number of individual authors. If an editor chooses to include all of the names associated with a named collaboration in the citation, is there any need for, or is it even proper to include, |collaboration= in that citation? My gut reaction is that the collaboration's name should be included and the list of individual names truncated to one or a few primary names because I suspect (without any evidence to support this) that it will be easier for a reader to locate the source by primary authors and the name of the collaboration than by the names of all n author names (when n is a relatively large number).

One thing that does seem 'wrong' to me is that |collaboration= without any |author=, |last=, or |vauthors= is silently ignored:

{{cite book |collaboration=The Writers Group |title=Title}} → Title.

|collaboration= requires at least one author name so templates without that name should declare a missing-name error.

—Trappist the monk (talk) 17:33, 4 December 2021 (UTC)

Another fix I have previously suggested re missing authors in that context is an |org-author=, which is what that is semantically. Izno (talk) 20:47, 4 December 2021 (UTC)

Headbomb, it's a simple yes or no answer. Do you plan to fix it? If not, say so; if so, say so. A commitment to fixing uses after a change can help ensure the change gets made in the first place. Izno (talk) 18:50, 4 December 2021 (UTC)

Izno, likewise, do you plan on fixing the cases where et al. isn't needed if the status quo remains? If so, how, because no mechanism exists within CS1/2 to do so. Neither of us have any way of detecting problem cases. Like the IP below said, |display-authors=etal is the default. It should not be so, per POLA. Headbomb {t · c · p · b} 19:04, 4 December 2021 (UTC)

@Headbomb: You don't get to deflect instead of answering the question when you were asked a question first. Please answer it. I have no personal issue with either answer, but there may be others who do. Izno (talk) 20:22, 4 December 2021 (UTC)

And you don't get to badger me into being responsible for thousands of articles. Headbomb {t · c · p · b} 20:28, 4 December 2021 (UTC)

A commitment to fixing uses after a change can help ensure the change gets made in the first place.

"I want a change but I'm not willing to work on the problem it causes" isn't a good look. Basically, your answer is "no, I will not work on the issue". Thanks for your answer. Izno (talk) 20:32, 4 December 2021 (UTC)

No, that is not my answer. I'll work on the issue as my time allows it, but I will not be responsible for anything overlooked, undetected, or which remains "unfixed". Headbomb {t · c · p · b} 20:34, 4 December 2021 (UTC)

Thanks for the clarification. That also would have sufficed a half-dozen responses ago. Be honest next time. Izno (talk) 20:35, 4 December 2021 (UTC)

And I assume Principle of least astonishment is the preferred link. Izno (talk) 20:26, 4 December 2021 (UTC)

It seems that |display-authors=etal is the default. It should not be so. 65.88.88.71 (talk) 15:18, 4 December 2021 (UTC)

Neither of us have any way of detecting problem cases. is true, but that also means we have no data to support any suggested implementation. I do agree that the current default, "et al", probably reflects more citations to collaborations than the "we named all named authors so we don't need the et al" case.

There is another possible implementation which exists for the problem: permitting |display-authors= to be set to some reasonable value/key to indicate that no et al should be displayed from the current default.

Still another implementation that Ttm almost mentions above is to always display just a single author in the case of a citation to a collaboration. I support this even if I am unsure about the other two. Izno (talk) 21:06, 4 December 2021 (UTC)

The simplest solution is to have no default for the parameter in all cases. It is the easiest to implement and easiest to document. When the editor uses the parameter (for authors<lastn) et al. should be appended. 63.117.211.42 (talk) 03:12, 5 December 2021 (UTC)

Yes, it is the easiest solution to implement in the modules. It is not the "simplest solution" when you consider that this will add (probably more) wikitext into articles (than all three alternatives, do-nothing plus the other two presented). Izno (talk) 03:26, 5 December 2021 (UTC)

I would not call citations wikitext. They are (published as) endmatter that verifies wikitext, and are not part of the article per se. (Some verification material is very rarely included in footnotes in print media, and there are e-media examples of that as well, but this is considered non-standard). Is there anything stopping an editor from eschewing display-authors and adding |authorn=et al. as the last author in an arbitrarily shortened author list? Or is ""et al." a bogus name? Such citation may be convenient but it is not correct. The correct form is to give all authors as they appear on the source (subject to module limits) and then optionally truncate the list by using |display-authors=. This way both verifiability and attribution ate satisfied. Any other concern (additional text, more cumbersome editing, etc) comes in a distant last. 65.204.10.232 (talk) 15:11, 5 December 2021 (UTC)

Some citations are present as sources of details not included in the article. --Shmuel (Seymour J.) Metz Username:Chatul (talk) 16:01, 5 December 2021 (UTC)

Citations are just pointers, they cite verification material and nothing else. They include the source and ways to find/identify the source, plus factual/technical notes regarding the source (such as dead links, or embargo info etc), and perhaps verbatim quotes (short). Anything outside of that would require its own citation. This can be convoluted. Notes expanding on a citation should ideally be listed separately, and should very closely match the cited material, or nest further citations proving the note text. But that is a different issue (verification), not strictly speaking a citation-format issue. 65.88.88.201 (talk) 17:48, 5 December 2021 (UTC)

I did not call citations wikitext, I said that there will be additional wikitext as a result of a change which requires it to be explicit with every use of |collaboration= and all authors in the collaboration are also provided in the citation of interest.

As for your definition of what qualifies as wikitext, that is not the definition (and I won't spend time arguing why not; I don't think anyone will side with you on that one). You are welcome to make a reasonable (according to WP:LAYOUT) argument that they are part of the end matter or not part of the article-proper, but whether some content is part of the article proper and the serialization that content has are orthogonal and/or completely unrelated.

anything stopping an editor from eschewing display-authors and adding Yes, an error displays: Last1; et al. Title. {{cite book}}: Explicit use of et al. in: |last2= (help)CS1 maint: numeric names: authors list (link), subsequently cleaned up by gnomes.

I am not particularly interested in continuing this discussion; I was simply pointing out that there is a cost to that direction, and it is a cost that users of the templates have been sensitive to before. Izno (talk) 01:44, 6 December 2021 (UTC)

allow c. (circa) in cite book's publication-date

Note, for example, Template:Cite Blomefield, which throws a "Check date values in: |publication-date=" error, but the c. works fine for the |year= parameter. Could the cite book template be modified so "c." is valid for |publication-date=? = paul2520 💬 19:13, 5 December 2021 (UTC)

This has been raised before, for the "native" cs1 templates. You can join the queue, but prepare to wait. 50.74.109.2 (talk) 20:18, 5 December 2021 (UTC)

Yeah, I meant to clarify that cite book is what should be updated! Thanks. = paul2520 💬 20:27, 5 December 2021 (UTC)

In {{Cite Blomefield}} consider removing the 'when written' date (currently in |year=) and replacing |publication-date= with |date=. The 'when written' date doesn't really aid a reader in locating a copy of the source and may, in fact, cause some confusion because the 'when written' date appears first in the rendered citation. There has been some discussion here about changing Module:Citation/CS1 so that |publication-date= becomes a complete alias of |date=.

—Trappist the monk (talk) 20:55, 5 December 2021 (UTC)

^ The OpenURL Framework for Context-Sensitive Services (PDF) (Technical report). p. x

[1] sbb (2021a). H₂O and r².

[2] sbb (2021b). H₂O and r².

[3] sbb (2021b). H₂O and r².

[4] The OpenURL Framework for Context-Sensitive Services (PDF) (Technical report). p. x

[1]

[2]

[3]

[1]