Wikipedia:Bots/Requests for approval/CinemaBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Request Expired.
Operator: Peppage (talk · contribs)
Time filed: 15:40, Thursday August 4, 2011 (UTC)
Automatic or Manual: Automatic
Programming language(s): python
Source code available: https://bitbucket.org/peppage/filmbot
Function overview: WikiProject Films has tens of thousands of articles about film, and it usually involves repetitive and mundane tasks to update them. Cleans up film articles and makes them more compliant to the MOS.
Links to relevant discussions (where appropriate): Discussion started here and then a bot request page was created
Edit period(s): Once run through of all film articles then weekly to sustain.
Estimated number of pages affected: Article using the {{infobox film}}. It will not be touching any templates or user pages, there is no need to worry about the bots template.
Exclusion compliant (Y/N): N
Already has a bot flag (Y/N): N
Function details:
- Clean up infobox by removing fields that are not in the current infobox template
- Change the date to a film date template (infobox release date format, Template_talk:Infobox_film/Archive_17#Film_date)
- Add in comments if the field is blank for future editors to fill in correctly
- Add missing fields for future editors
- Add simple info from IMDB.
- Updates the infobox'es markup formatting as seen in "Scenario A" below
- Also fixes some common spelling/typing mistakes.
Discussion
[edit]- If it's not going to be exclusion-compliant, I take it that it will edit any given page only once? Even if it is reverted? - Jarry1250 [Weasel? Discuss.] 17:24, 4 August 2011 (UTC)[reply]
- That's a good point. It was going to run through once but for maintenance it could hit the page again so now I think it should have it. --Peppagetlk 17:58, 5 August 2011 (UTC)[reply]
- What are "unused fields"? Deprecated fields no longer used in the infobox? Or just empty fields?
- Is there consensus that WYSIWYG dates in film infoboxes should be changed to a template? Also see Wikipedia:Requests_for_comment/Microformats.
- Is there consensus to add missing fields and comments to fields? How does the bot know they were not deleted on purpose?
- What are the common spelling mistakes and can you guarantee no false positives (WP:SPELLBOT)?
I see the discussion linked goes into more detail, but you need to be specific in the BRFA's function details. — HELLKNOWZ ▎TALK 08:23, 7 August 2011 (UTC)[reply]
- I added links in the function detail. I'm basing everything I change about the infobox on the current template. The date template is there, the comments are there, and any fields not on the template should not be there. The bot actually grabs the template data and uses that to fill in the infobox on the page. Since the spelling/typo mistakes are going to be film related they will be very concise. I am using regular expressions to find the film related words. Do these kinds of changes require a manual bot?
- As far as spelling changes go - We would need a broad test, such as in userspace to see what kind of changes it would make, to make sure that it gets it right every time.
- As Hellknowz asked, Is there a way for the bot to know that the fields in the infobox were not intentionally removed? For instance - something goes wrong at IMDB, or with the bot, the template, or something else, and a user reverts it... Will it re-populate that field? What if another user reverts it? SQLQuery me! 23:11, 11 August 2011 (UTC)[reply]
- The spelling is fixing the headers and capitalization. Things like "The Plot" need to be just "Plot" so there are not very many of them. I just match the exact phrase inside a header for that example. What is the best way to handle the second point? On the trip back around if the field is blank it will try and fill it in. A message about the bot's error could be left on its talk page? --Peppagetlk 15:06, 17 August 2011 (UTC)[reply]
I think this is a good bot to have around. By curiousity, will the bot take "current infobox parameters" and completely rebuild the infobox from that, or will it use the existing infobox, and add/remove stuff from it? The former would standardize infobox appearance in the edit window (which I personally feel is desirable, but creates larger diffs in some cases), the later does not standardize the infobox appearance (but would affect less articles and have smaller diffs). What I mean by that is that something like
Scenario A | Scenario B |
---|---|
{{Infobox film|<!--Template:Infobox film--> name = Foobar: The Movie | image = Test.svg | image size = 150px| |director =James Wettfield| caption = This is a caption | producer =Oracle McNutter | writer =Soothsayer Bob and Moe Jones| }} |
{{Infobox film|<!--Template:Infobox film--> name = Foobar: The Movie | image = Test.svg | image size = 150px| |director =James Wettfield| caption = This is a caption | producer =Oracle McNutter | writer =Soothsayer Bob and Moe Jones| }} |
Would become | Would become |
{{Infobox film | name = Foobar: The Movie | image = Test.svg | image size = 150px | border = | alt = | caption = This is a caption | director = James Wettfield | producer = Oracle McNutter | writer = Soothsayer Bob and Moe Jones }} |
{{Infobox film|<!--Template:Infobox film--> name = Foobar: The Movie | image = Test.svg | image size = 150px| |director =James Wettfield| caption = This is a caption | producer =Oracle McNutter | writer =Soothsayer Bob and Moe Jones| | border = | alt = }} |
Headbomb {talk / contribs / physics / books} 15:05, 14 August 2011 (UTC)[reply]
- It takes the fields from the old infobox and puts them into a new infobox that is properly formatted so it is Scenario A. I wanted it to look nice so when editors want to add a specific field it will make it easier to find and everything will be standardized. --Peppagetlk 15:06, 17 August 2011 (UTC)[reply]
- So the bot removes fields it does not recognise? What about typos like
|dierctor=James Wettfield
? - What is the simple info from IMDB that gets placed in what I presume is
|imdb=
field? - I still don't see how adding all "missing" fields is useful, if they have been deliberately removed. I'm OK if the project decides which fields (even if empty) exactly should always be listed in markup.
- The bot should be exclusion compliant, the {{bots}} does not apply just to the userspace, but to any page.
- "I am using regular expressions to find the film related words." Where does the bot check for these? Is it only "the headers and capitalization"? Is there a list you keep with these and what they apply to?
- As this is an automated bot for thousands of articles, I would expect you to list out all the specific details in this BRFA. Don't take this the wrong way, I realize you are here to code a bot and not deal with bureaucracy. But even small tasks must have their BRFAs scrutinized. And this is a "huge" one as it seems, and you have placed all fixes into a single BRFA (which I should note, will make it hard for BAG to review post-trial). All I can suggest is that you list out all the fixes made in bullet-points in the "function details" and add clarifications as feedback is given here (I took the liberty of updating it). Note that each of these tasks requires consensus, as per WP:BOTPOL. — HELLKNOWZ ▎TALK 16:09, 17 August 2011 (UTC)[reply]
- It's ok, I didn't think this was going to be easy. Would it be easier if I just reviewed all the edits? It would still be faster than doing it manually anyway. I understand that wiki needs to be concerned about out of control bots that could potentially make a mess of things. To answer your questions though,
- Yeah the field would get removed, it's not showing up in the infobox anyway. Fixing spelling mistakes of that magnitude would take a lot of coding (there is a whole bot for it).
- I found a python addon to access imdb. If the imdb number is on the page then I can grab any field that is filled in on imdb and place it in the infobox.
- Some editors decided that if they didn't know the director to remove that field. If someone else comes along to enter that data they could misspell director or not know the exact field in the infobox. If they want to fill in the cast but they added "cast" instead of "starring"; if the field was already there it would make the editor's life easier. I'm a programmer so anytime something can be standardized and all the pages look the same it seems like a good idea to me. I can look at every page and the infobox looks the same, makes it easier. The only fields that are removed if unused are image size and narrator and if they are blank at the end of the edits they are removed by the bot.
- ok yeah the bot should be exclusion compliant.
- These are searched through the entire page. I could list them:
- (t|T)he (P|p)lot in a header -> Plot
- External Links in header -> External links
- (A|a)wards in header -> Accolades
- (DVD|dvd) (R|r)elease in header -> Home media
- ((I|i)nfobox Film|(I|i)nfobox (M|m)ovie) -> Infobox film
- {{start date.*?}} -> film date (all dates should be in film date, start date is old template)
- (min(\.)|mins\.|mins|min)(?!utes) -> minutes
- <BR> or <br> to <br />
- {{(Rottentomatoes|Rotten Tomatoes|Rotten tomatoes) to the correct template {{Rotten-tomatoes
- {{(IMDB title|IMDBtitle|IMDb Title|Imdb movie|Imdb title|Imdb-title|Imdbtitle|imdb title) to {{IMDb title
- {{(Amg title|Amg movie|Allmovie)\| to {{Allmovie title|
- I also remove wikilinks on "film", "united states", the country field, and the language field. If there are empty fields add comments so the editor knows how to use the template.
- For the date <!-- {{Film date|Year|Month|Day|Location}} -->
- For based on <!-- {{based on|title of the original work|writer of the original work}} -->
- alt <!-- see WP:ALT -->
- I know some of these changes aren't allowed to be done alone. I can make it so it will only save the page if there is an edit done that is worthy of a save. I would like this to be used but I'm pretty new to all this red tape so I appreciate all the help. --Peppagetlk 14:33, 19 August 2011 (UTC)[reply]
- You will need consensus to add missing fields with default comments. These get removed regularly when they are not needed, and readding them all by bot will be contentious.
- "the field would get removed, it's not showing up in the infobox anyway." -- that's not a valid edit; a human mistake like that should be corrected, not removed. Obviously the bot can only catch a very low rate of these, but a human would see the problem immediately. The missued fields often still contain valid information.
- Where and what imdb info is placed and in what form?
- "(A|a)wards in header -> Accolades"; "(DVD|dvd) (R|r)elease in header -> Home media" -- is there consensus for this? What if these are intentional or are subsections?
- Template name redirect bypass (infobox, rotten tomatos, allmovie, imdbtitle) - not to be done as only edit
- "<BR> or <br> to <br />" - style preference (and not even correct, this is MediaWiki markup, not HTML), not to be done
- "(min(\.)|mins\.|mins|min)(?!utes) -> minutes" -- so "miniature" -> "minutesiature"?
- Does the bot skip comments in find/repalce? — HELLKNOWZ ▎TALK 13:13, 24 August 2011 (UTC)[reply]
- The comments can be added, cannot be added. Either way is fine, I was going off the original requirements.
- If it is required any field that isn't placed in the new infobox could be put at the bottom of the template?
- The header changes are in the MOS:FILM so consensus must be there?
- Template redirects will only be updated if something else on the page is updated. That's fine right?
- in the requirements it was specified for the linebreaks to change, I'll take out the line. What is correct though?
- It would not match miniature. It only matches "min", "min.", "mins." if it is NOT followed by "utes"
- The bot copies over any and all comments/refs and will skip them in find/replace. There is a pywikibot function for that.
- The bot isn't written in stone yet so it's fine if these things need to change. --Peppagetlk 17:04, 26 August 2011 (UTC)[reply]
- I'm sorry to be rude, but it is very hard to deal with this request, since it is some 10 tasks mushed in one. If these were separate BRFAs, half of them would have already been in trail/approved. I can tell from experience it will take pretty long to close this. "(min(\.)|mins\.|mins|min)(?!utes)" matches anything that has "min" not followed by "utes". So "miniature", "minute", "smint" "mind376|, "$@$2min%@#$&", etc; see [1]. By "comments" I meant text inside html-style comments. — HELLKNOWZ ▎TALK 17:56, 26 August 2011 (UTC)[reply]
- Hello, I'm one of the coordinators at WikiProject Film. I am hoping to get the bot underway. To respond to some of your concerns, not all editors include a full infobox in film articles. For example, they may have copied the infobox from an article that uses one of the oldest set of parameters. So one request is to add parameters that should be used and remove unused parameters (ones that are empty). As for IMDb, the infobox used to have a parameter with the ID of the film at the website. It was considered redundant to the {{IMDb title}} template in the "External links" section. However, not all film articles link to IMDb in that section, so the plan is to copy the ID from the nonworking IMDb parameter and put it in the template in the EL section. As for line breaks, it is fine that we don't do that; I thought that the one with the slash was the preferred approach. Let me know if you have any other questions. Erik (talk | contribs) 16:54, 24 August 2011 (UTC)[reply]
- Your explanation to add parameters that should be used and remove empty parameters is different to previously explained add all missing parameters and remove all unrecognised fields. So what is the exact list of parameters that should always be added? To Peppage: is this the altered specification you are going for? — HELLKNOWZ ▎TALK 17:00, 24 August 2011 (UTC)[reply]
- Wikipedia:WikiProject Film/Bot requests is where a lot of the initial requests were listed. (It may need to be updated based on what's acceptable or not.) Typically unused parameters include image_size, followed_by, narrator, and preceded_by (the _by fields are deactivated). Parameters that should ideally be in the infobox are name, alt, image, caption, director, producer, starring, released, and language. Not sure if a field like "alt" (for alternative text) is permissible; it was a proposal to encourage accessibility. Erik (talk | contribs) 17:05, 24 August 2011 (UTC)[reply]
- Your explanation to add parameters that should be used and remove empty parameters is different to previously explained add all missing parameters and remove all unrecognised fields. So what is the exact list of parameters that should always be added? To Peppage: is this the altered specification you are going for? — HELLKNOWZ ▎TALK 17:00, 24 August 2011 (UTC)[reply]
- Hello, I'm one of the coordinators at WikiProject Film. I am hoping to get the bot underway. To respond to some of your concerns, not all editors include a full infobox in film articles. For example, they may have copied the infobox from an article that uses one of the oldest set of parameters. So one request is to add parameters that should be used and remove unused parameters (ones that are empty). As for IMDb, the infobox used to have a parameter with the ID of the film at the website. It was considered redundant to the {{IMDb title}} template in the "External links" section. However, not all film articles link to IMDb in that section, so the plan is to copy the ID from the nonworking IMDb parameter and put it in the template in the EL section. As for line breaks, it is fine that we don't do that; I thought that the one with the slash was the preferred approach. Let me know if you have any other questions. Erik (talk | contribs) 16:54, 24 August 2011 (UTC)[reply]
Trial
[edit]- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Alright, plenty of time to talk about it, let's see the bot in action so we can see what works and what doesn't. Link to the BRFA or (to another specific discussion page, such as WT:FILM or the bot's talk page) in the edit summary, so people seeing the changes can comment on them. Headbomb {talk / contribs / physics / books} 19:34, 1 September 2011 (UTC)[reply]
- Bot bug?
Hi guys. For some reason, the bot removed an image on the page Agnes and His Brothers. [2] Not sure if this is intended or not. Tim1357 talk 18:07, 3 September 2011 (UTC)[reply]
- The bot is converting "<br />" to "<br>". I did point out above this is a stylistic preference and should not be done by bot.
- Bot inserts whitespace when it renames section headers. This is a similar stylistic preference.
- Here the bot unlinks [[2004 in film|2004]] in the lead section, is this correct and how does the bot know the link isn't intentional?
- The edit summary often omits the changes it makes, but includes others. This can be misleading when checking bot's edits or seeing it on watchlist.
- Here link from United States was removed in lead.
- Here film was unlinked, not sure if this is intentional? I am guessing common word link?
- Can we see the new version of "min. -> minutes" regex, please? — HELLKNOWZ ▎TALK 18:32, 3 September 2011 (UTC)[reply]
- Ok, commented out the line break business
- Apparently the headers shouldn't be changed anyway, commented out that part too.
- The removal of the 2004, US, and film unlinked where from the bot request page, they are very common links.
- I will fix the edit summaries, is there a limit on length? --Peppagetlk 14:29, 6 September 2011 (UTC)[reply]
- In addition to H3llkn0wz comments,
- Changes Awards to Accolades, this should probably not be done by a bot. Some articles might have an Awards section, and also a Nominations section and this fix wouldn't work on those cases. Best left to a semi-automated task.
- Changes DVD release to Home media, this also probably should not be done by a bot. Some articles might have VHS release, HDDVD release and/or Blu-Ray release sections in addition to the DVD release section. Best left to a semi-automated task.
- here (and many other places) the bot missed a | to remove (| name = Adavi Ramudu | rather than | name = Adavi Ramudu)
- What should the bot do with "| director of photography = Dante Spinotti" in this case? If it's putting them below the rest, they should probably be preceded by something like
<!-- Unsupported parameters --> | director of photography = [[Dante Spinotti]]" }}
- Headbomb {talk / contribs / physics / books} 19:08, 3 September 2011 (UTC)[reply]
- The last point is very good; the bot shouldn't remove unsupported fields, but somehow mark them. I think moving to the bottom and adding a comment like that is the best course of action. — HELLKNOWZ ▎TALK 19:14, 3 September 2011 (UTC)[reply]
- Then I guess we'll also need to know how the "minutes" variable is handled. Really, to make sure it's only modifying the infobox, it should be matching something like \|\s*runtime\s*=\s*(\d*)(min|mins)(\.|utes)? Headbomb {talk / contribs / physics / books} 22:58, 3 September 2011 (UTC)[reply]
- It does only do this there
elif(field.split("=")[0].strip().lower() == "runtime") :
— HELLKNOWZ ▎TALK 08:07, 4 September 2011 (UTC)[reply]
- It does only do this there
- Then, as I said before twice, this will also match other cases, such as, "miniature", "minute", "smint" "mind376|, "$@$2min%@#$&", etc. [4]. Also note that field names are case sensitive, so the
.lower()
will match incorrectly capitalized fields. — HELLKNOWZ ▎TALK 08:07, 4 September 2011 (UTC)[reply]- If it only changes it in the runtime field then I don't see the problem. If a user incorrectly capitalizes a field why shouldn't the bot match it with the correct lowercase field? --Peppagetlk 14:29, 6 September 2011 (UTC)[reply]
- If you correct the field first, then you can treat it as such. Otherwise, you are editing a different field, even if the difference is only in letter capitalization. Also, was "If it only changes it in the runtime field then I don't see the problem." a reply to regex issue or capitalization issue? — HELLKNOWZ ▎TALK 14:42, 6 September 2011 (UTC)[reply]
- The old infobox fields + data are read in one at a time and I only lowercase the field name (not the data) to check if it exists in a standard infobox. I don't know if I understand what you mean. The runtime field is the only field that is checked for the minutes regex --Peppagetlk 17:46, 6 September 2011 (UTC)[reply]
- If you correct the field first, then you can treat it as such. Otherwise, you are editing a different field, even if the difference is only in letter capitalization. Also, was "If it only changes it in the runtime field then I don't see the problem." a reply to regex issue or capitalization issue? — HELLKNOWZ ▎TALK 14:42, 6 September 2011 (UTC)[reply]
- If it only changes it in the runtime field then I don't see the problem. If a user incorrectly capitalizes a field why shouldn't the bot match it with the correct lowercase field? --Peppagetlk 14:29, 6 September 2011 (UTC)[reply]
- Then I guess we'll also need to know how the "minutes" variable is handled. Really, to make sure it's only modifying the infobox, it should be matching something like \|\s*runtime\s*=\s*(\d*)(min|mins)(\.|utes)? Headbomb {talk / contribs / physics / books} 22:58, 3 September 2011 (UTC)[reply]
Feedback
[edit]Hello, I wanted to provide feedback on CinemaBot's trial edits. We do not need to add the "screenplay" and "story" fields because that distinction is not universal. In a lot of cases, the "writer" field is sufficient. Perhaps add these three fields to an infobox only if none of them are being used? I would also not add the "based on" field as a default field. I also noticed here that we could add a line break between the producers' names. Is that a feasible enhancement, to replace commas with line breaks? Erik (talk | contribs) 14:03, 27 September 2011 (UTC)[reply]
- I am also wondering, is it possible to add the WikiProject Film banner on talk pages of articles that use the film infobox? I think there tends to be a set of low-key film articles where there is nothing (especially the banner) on the talk page. It would be helpful to identify them. Not sure how to find film articles that don't use an infobox. Erik (talk | contribs) 14:53, 27 September 2011 (UTC)[reply]
- All of the field adjustments are no problem, along with replacing commas with line breaks. I can do the talk page thing too but it is adding even more to this project. --Peppagetlk 14:16, 13 October 2011 (UTC)[reply]
Status report? And please respond to the feedback above. --Chris 11:48, 13 October 2011 (UTC)[reply]
- The status is the consensus seems the bot does too many things. I don't know what to do, go back and rewrite so it only does one thing and get that passed and build on that? Not sure where to go next. --Peppagetlk 14:16, 13 October 2011 (UTC)[reply]
- The bot can do as many things as you want, as long as they have consensus, are documented, and are free of false positives, which is the issues brought above. You have to understand that if we approve vague details, it will be BAG getting the stick if the bot does an edit that's controversial. All I can suggest (as this is a single BRFA) is that all details need to be specified; so you could either make several separate BRFAs or hope BAG moves this one along. — HELLKNOWZ ▎TALK 18:36, 15 October 2011 (UTC)[reply]
Request Expired. This is not going anywhere. I would suggest refiling as separate BRFAs (or one BRFA, with clear sections to allow easy discussion of each part of the task), with clearer details on exactly what the bot will be doing. --Chris 09:32, 2 November 2011 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.