Jump to content

User talk:Mike Christie/GACbot

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Looks good[edit]

Looks good to me, can we get this kind of thing? IvoShandor 06:25, 24 April 2007 (UTC)[reply]

scattered thoughts[edit]

Hey this conversation is scattered across a few pages. I was thinking the bot could write to a sortable wikitable on its own subpage of GAC. Columns could include:

  1. date GA nom
  2. article name (linked)
  3. Article size in KB, if you wanna go read each article. If not, then just a field for LONG=YES/NO and let the users manually enter that info on the GAC page for your bot to read; would need to standardize the location of the word "LONG" of course...
  4. nom category & subcategory, such as "Arts" & "Music and musicians, see WP:GAC for details
  5. Active or on Hold (and if on hold, how many days has had that status)
  6. any other shtuff others have suggested....

Thanks Ling.Nut 07:05, 24 April 2007 (UTC)[reply]

In terms of length, it would be very inefficient to parse every single page, and, in any event, the bot would likely not be approved due to being a server hog unnecessarily (think about it: pulling one page and writing one page, or pulling one page + one page per article + writing one page). Anything that is going to be reported on should be contained within the main GAC page. —Daniel Vandersluis(talk) 12:30, 24 April 2007 (UTC)[reply]

Most important goal[edit]

I'd suggest the most important goal is identifying the oldest candidates. If we could have the ten oldest written to a subpage, we could transclude them in a backlog template. The other stats listed would all be valuable, but identifying old candidates is top priority, I think. Mike Christie (talk) 17:27, 24 April 2007 (UTC)[reply]

Something else to think about; other than allowing for sorting, the above is really just a rehash of the already existing GAC page. If that's what's wanted, a redesign is better suited than a bot and subpage. —Daniel Vandersluis(talk) 17:34, 24 April 2007 (UTC)[reply]
yeaaah buuut then you rely on end-users to have the smarts to populate a sortable wikitable..... forgive me if I'm snide.... Ling.Nut 17:54, 24 April 2007 (UTC)[reply]
I'm not an expert on the technical aspects of WP, such as sortable tables and template syntax, but I don't see offhand how we could achieve this with a process that GA nominators would be easily able to follow. The FA process uses transcluded nominations, so the FAC page itself only sees the transclusion template. Those could not be sorted. I think anything sortable is subject to a lot of difficulties in user administration, as Ling says.:::In addition, there are a few things that we would like that are not achievable by this method. For example, one of the original concerns was about the general growth in GA; it would be nice to be able to track historically and see whether the reviewers are starting to get swamped by the nominators. We would also like to see which categories are getting less attention, so that we can request attention go to the right places -- e.g. perhaps Games get reviewed very quickly, but Philosophy articles hang around for ever. A couple of the other things suggested are also not achievable with the table.
Daniel, what do you think? Is this something you're likely to have time for? As I mentioned, if we can just get a quick version that parses noms for date and spits out the oldest ten, that would be terrific for now. More would be good, but that would get us a long way. I will add a section on the User:Mike Christie/GACbot page to specify what that could look like. Mike Christie (talk) 18:12, 24 April 2007 (UTC)[reply]
(undent) Sorry, let me try this again, I don't think I came across as I intended. This bot should be for statistical purposes (oldest noms, etc.) What I was suggesting against was to have a bot that just recreates the entire GAC page in a different format (lets call it a different "view" of the data), and if this is desired, it would probably be better done via a page redesign, templates, etc. Note that I do not believe that the original proposal was calling for a new view, but statistical reporting.
Any sort of statistical tracking should not be a problem. I will start with the quick version as suggested above, but let me stress again that I cannot give you a time as to when it would be done. I don't have a ton of free time, but I'll try to code up the bot as quickly as possible. Then it will have to be approved at WP:B/RFA, and of course I cannot vouch for how long that will take (during that process, trials will begins).
Something I'll need to know for the bot approval group: what sort of editing period are you thinking of? Most statistics bot (including my own) run 1x/day (at a theoretical "off-peak" time). There might be use for maybe 2 edits a day, but anymore than that will not likely be fruitful. —Daniel Vandersluis(talk) 18:39, 24 April 2007 (UTC)[reply]
Once per day would be fine. And of course I understand that you can't give a delivery time; any time you can devote to this is very much appreciated. I think I understand your comments about statistics vs. page recreation in a new format, and I agree with you; the statistics is what we're looking for right now (including oldest noms, of course). So I think we're in synch. Thanks -- Mike Christie (talk) 18:53, 24 April 2007 (UTC)[reply]

Task summary[edit]

If you could, I would appreciate it if you added a numbered list, above the in detail descriptions, outlining each task that is desired to be done by the bot in brief. The purpose of this is so that I can take this page to the B/RFA and point out "the bot currently does tasks x, y and z, with w an v forthcoming." While it should not be a problem to add future tasks after getting the bot up and running, it would be good to have a list to show them (and then get approval for future, non-listed tasks as they arise). The reason I do not believe there will be a problem with adding tasks to this particular bot is that the load on wikipedia servers should be the same no matter what the task is: download the GAC page, [parse the page -- no effect on servers], write the result page. —Daniel Vandersluis(talk) 18:50, 24 April 2007 (UTC)[reply]

OK, I did a quick version -- is that what you're looking for? Mike Christie (talk) 18:57, 24 April 2007 (UTC)[reply]
Yes, that's perfect, thanks. —Daniel Vandersluis(talk) 18:58, 24 April 2007 (UTC)[reply]

B/RFA request made[edit]

Just wanted to let you know that I have filed my WP:B/RFA request. The code is not done yet, of course, but I wanted to file the request first, on the off-chance that something is denied (I seriously do not see this happening). See Wikipedia:Bots/Requests for approval/StatisticianBot. (By the way, I hope you don't care that the bot is not named GACbot; I wanted to create an umbrella bot for all stats-related bot tasks I oversee, this being one of them.) —Daniel Vandersluis(talk) 18:48, 26 April 2007 (UTC)[reply]

Thanks for the heads up. No, I don't care about the name of the bot. If it's all the same to you, though, I won't rename this page -- aside from confusing anyone who's bookmarked it, this page is only going to track functionality of the pieces of StatisticianBot that deal with GAC. On the other hand, if you want to create a page to define StatBot's functionality, and move any discussion of functionality changes over there, please just say so.
On a related note, you might want to glance at Wikipedia:Good article candidates/backlog/items, which gets transcluded into the backlog template at the top of the GAC page. It also gets transcluded into Template:WikiProjectGATasks, which looks like this:
WikiProject Good Articles: Open Tasks
This project identifies, organizes and improves good articles on Wikipedia.
I hadn't realized this existed, but a glance at individual versions in the history of the template makes it clear that the GAProject folks have been updating it by hand. So this is a natural use of the oldest nominations list that is already in place, just waiting for automation. If StatBot could write to the items subpage in the format that's there now, that would be perfect.
I will also post a note to Wikipedia talk:WikiProject Good articles to make sure they know about this bot; I posted a note on the GAC talk page but maybe it would be wise to mention it at the project too. Mike Christie (talk) 20:33, 26 April 2007 (UTC)[reply]
There are actually a number of other pages that in some way track GA statistics that I have found while trying to understand the above:
  1. Template:WikiProjectGATasks could be updated by bot (for the last line stats)
  2. Wikipedia:Good article statistics
  3. Wikipedia:Good article candidates/backlog/items as previously mentioned
The one issue is that as per the bot proposal, and previous discussion, I had thought that all that would be done by the bot would be creating a one-page report, which means that editing other pages is not under the current scope of the bot specification. I can change this, of course, and it shouldn't be a problem. Let's start with having the bot automatically update the items page (if, anything, see below), and deal with the other pages later on, when we've got something working.
My issue with having the bot update the items page is that the format seems to be a bit volatile at the moment. I know that you are involved with this, but it has recently gone from a category list to an article list. If it is going to be unstable, then the bot will not be able to keep up with what is desired by humans.
Daniel Vandersluis(talk) 20:51, 26 April 2007 (UTC)[reply]
Fair comment. I think the volatility is because that items list is new; if I understand the history, it goes like this: (1) GATask box is created, and maintained manually with a list of articles; (2) GA backlog template is created, probably by someone who didn't know about that GATask box, and is maintained manually with a list of categories; (3) I changed it to articles linking to categories as being the most useful compromise; (4) someone else notices that and thinks "Oh, I can just plug that in to the task box". So we have convergence, and I see no reason for further changes.
Having said that, if you'd like to simply create the one page report, and wait a while to see if the requests truly are stable, that's fine too. That report would still be very valuable. Mike Christie (talk) 21:00, 26 April 2007 (UTC)[reply]
I've updated the B/RFA proposal to allow for updates to the items page. Now that I think about it, though, we can make the results page into a pseudo-template with an <includeonly> block containing useful variables that can be transcluded elsewhere, such as {{WP:GAC/R|num_candidates}} or {{WP:GAC/R|oldest5}} (assuming WP:GAC/R is the Good article candidates/report page that the bot creates). That way, no other pages have to be edited by the bot, they would just be updated to transclude the specific variable use. Note that I am not sure if there are any issues with doing this, like way extraneous server load or something (loading a full page rather than writing a static piece of text). —Daniel Vandersluis(talk) 21:10, 26 April 2007 (UTC)[reply]
I have posed this question on the bot request page, we'll see what the bot approval group thinks of it. —Daniel Vandersluis(talk) 21:25, 26 April 2007 (UTC)[reply]
  • I wonder, could you post a list here, just for the sake of easy understanding, of what this bot will do and what the GA peoples can expect, just so its all here in one place and can be ascertained at a glance. That would be most helpful. IvoShandor 06:14, 27 April 2007 (UTC)[reply]
The answer to what it will do is probably covered by User:Mike Christie/GACbot. The request Daniel made is at Wikipedia:Bots/Requests for approval/StatisticianBot, so that's the official version.
What GA people can expect -- I would guess we'll have to work with the output of StatBot for a bit to figure out the most useful thing to do with it. The update to the GATask box and the backlog template will be automatic fairly quickly; beyond that I'd guess we'll probably end up adding a section to Wikipedia:Good article statistics to record StatBot's output. Does that answer your question? Mike Christie (talk) 10:10, 27 April 2007 (UTC)[reply]
For the most part, yeah, it just seemed like others had ideas as well and was concerned that maybe something important got lost in the mix. IvoShandor 10:53, 27 April 2007 (UTC)[reply]
If we're missing something that you think should be done, post here or let me know, and I'll add it to the specification, if I think it's something that can be done (and is within the scope of what a bot should do). In any case, we've got to start somewhere... Daniel Vandersluis(talk) 16:48, 27 April 2007 (UTC)[reply]

Progress update[edit]

Just wanted to let you know that some progress has been made on the bot. While the request for bot approval seems to have stalled (no response from BAG yet), I have been coding the bot. For the most part it's done (it collects the categories and nominations, gets the relevant information, and organizes it in a way that I can use it), though I'm sure there are still a lot of things needing to be done before a "version 1.0". On the other hand, I have the first two tasks completed, and ready to take a look at. Note that the bot is not yet operational, this is just a test to ensure that everything is working. The results of tasks 1 and 2, taken with live (actual) data, can be seen at User:StatisticianBot/Sandbox. —Daniel Vandersluis(talk) 21:49, 30 April 2007 (UTC)[reply]

This looks great, and would be very useful as is. The spec for 1 does say "Anything on hold or under review should not be included", though; is that possible? The other thing that has changed is the creation of Wikipedia:Good_article_candidates/backlog/items which has the format you see there, for use in a couple of transclusions. Could we change to that format? If so I'll change the specification to suit. Then on approval you could simply make that page the target output of the bot. Currently we only have five articles on the page; I'd see no problem with ten, but if it overflows the task box I guess we can cut it back to five easily enough.
The stats count is great. I haven't really thought about the best way to store it; maybe a StatBot subpage which keeps appending a line a day? It could be periodically archived by hand.
I suspect that once we have those numbers available, we'll include those counts on the GAC header -- the total outstanding nominations count, at least -- and so we'll have to make that a transcludable number eventually. But that can wait.
So overall it looks very good. On the BAG lack of response, I was wondering if they think you're going to respond to that note on the approval discussion before they comment? Mike Christie (talk) 22:12, 30 April 2007 (UTC)[reply]
Yeah, it's not done yet, obviously. I did misunderstand the oldest 10 report and forgot to exclude on hold/under review items, and will fix that. In terms of the backlog/items page, when the bot's operational, it will edit that page directly using the same format; the existing list will be part of the report as a whole.
The stats count ("backlog report") can/will certainly be updated each day, and it can be archived too (I can probably do this automatically).
I will also create a page for transcluding the various stats to other templates/pages (such as the GAC header) to make it as simple as possible; I have commented on this in the approval discussion (thereby replying to that note); so hopefully if that was causing a delay it will now progress.
Something I was thinking about, in terms of malformed nominations. While I can detect missing dates and nominators, and malformed uses of GAOnHold and GAReview templates, there are some things that would be nice that I can't do: broken numbering (no way to detect what number (in the list) an entry is, as all list items are the same in HTML); discerning text beneath a nomination as something (unless it uses one of the templates, I have no way of knowing if it's a comment, or a badly-formed on hold notice, or something else); or detecting other stylistic malformations. Just something to keep in mind... —Daniel Vandersluis(talk) 16:11, 1 May 2007 (UTC)[reply]
Yes, I noticed you'd fixed a couple of messed up items. A couple of possibilities occurred to me:
  • Maybe we can create a {{GAComment}} template, for use in any comments. Anything you see in the list other than a hold, comment or review template is then an exception to be reported.
  • I think several problems are caused by people leaving blank lines. These can probably be detected (and one day, perhaps, automatically fixed).
  • If we don't use {{GAComment}}, it still is presumably the case that all text between two consecutive noms, or between a nom and the next section title, is intended as either on hold, review, or other comment. We might be able to heuristically determine which, for some of these. I find this less attractive for a lot of reasons. Maybe a compromise would be to ask people to start their comments with "Under Review" or "On Hold" or "Comment"; anything else would be reported as an exception.
I'm travelling to Europe for two weeks, starting in an hour or so, but will have email access from most places, so will try to stay on top of this. Thanks. Mike Christie (talk) 18:10, 1 May 2007 (UTC)[reply]
GAComment might work, but it would seem to me that there's an entrenched procedure and getting it changed would be a process. Same for starting comments with something bolded. Heuristics wouldn't work well, either, because there'd be no way to associate a word found (say "On hold") with a date or a person reliably. I can easily attach all text between nominations to the correct nomination (already do), but there can be zero, one, two, eight, etc. dates in that block. I can't even use newlines as a guide, because, unfortunately, there is nothing to guarantee that a nomination won't take up multiple lines (already happens).
I don't know what the solution is. The way it stands, an article that is on hold can be marked as unreviewed by the bot if it can't detect the template. This is wrong, but better than ignoring the article altogether, or marking it as malformed because it has a comment but no detected template (which isn't malformed, necessarily).
For what it's worth, these are the conditions of malformedness that I currently detect:
  • Nominator not found.
  • Nomination date not found.
  • GAOnHold parameter does not match article title.
  • GAOnHold found but malformed.
  • GAReview found but malformed.
I noticed from your talk page earlier that you are going on a trip. Have a good one! —Daniel Vandersluis(talk) 20:08, 1 May 2007 (UTC)[reply]

Another update. I'm done with the Exception report, and wanted to give you an idea of what it would look like: see [1]. You will note it is a bit different than the specified one; I chose to not display <list of noms> as you proscribed, because it would take up too much space and the data is all available with one click (each nom is linked to its category). I think this is a good solution, let me know if you disagree. Also, I added a count in brackets to each section of the exceptions report. I think there should be a way to quickly evaluate how many articles are in each category, but I'm not sure yet if I like it this way. Thoughts? The old nominations list has also been corrected to not display anything on hold or under review. —Daniel Vandersluis(talk) 17:54, 7 May 2007 (UTC)[reply]

I've been trying to play around with it to make the format a bit nicer, but haven't come up with anything yet (so nothing was saved). I also tried playing around with m:EasyTimeline to see if we could get something going for the backlog, but unfortunately, it does not appear to be able to cater to the type of graph we'd need for this purpose (a daily look at unreviewed/on hold/under review nominations).
Feel free to edit the sandbox page to a format that you like, and I can probably encompass that into the bot's output. —Daniel Vandersluis(talk) 21:29, 7 May 2007 (UTC)[reply]
Sorry about the delay replying; I'm still in Europe. I've just looked at the most recent output; it's looking pretty good -- you're evidently still working on the "age" part of the summary section, which is showing 13000+ days right now. Whenever you're happy with it (and the bot group OK it) this looks good to fire up. Once it's going, we may get more input from some of the GAC regulars. Mike Christie (talk) 16:06, 9 May 2007 (UTC)[reply]
It's pretty much at the point right now that I'm ready to take it back to the B/RFA for trials. The only thing missing is an archive of the backlog, but that can be added later without having to go back to RFA. The edit you saw earlier was a bug, since fixed -- I did a whole bunch of rapid fire edits while bug fixing so I could see what it looks like.
Trials will starts later today, and will edit Wikipedia:Good article candidates/Report, Template:GACstats and Wikipedia:Good article candidates/backlog/items. —Daniel Vandersluis(talk) 17:34, 9 May 2007 (UTC)[reply]
Update: First day's trial has been posted (see the B/RFA page). —Daniel Vandersluis(talk) 17:51, 9 May 2007 (UTC)[reply]
Outstanding. I'll post a note on Wikipedia talk:Good article candidates. Mike Christie (talk) 17:57, 9 May 2007 (UTC)[reply]

Overdue OnHolds and Reviews[edit]

Mike,

To begin with, this bot and the report it generates are the best thing since sliced bread and I mean it! Kudos to you and all the other people involved in creating it and setting it up to create the GAC/R!

Now, I was wondering whether the bot's functionality could be expanded by one more tasks. Namely, the bot identifies articles, who have been OnHold or Reviewed for over a week. The obvious thing to do in such circumstances is try to get hold of the original reviewers and notify them that they should close their reviews now. This can be made by hand, of course, but I guess it would be fairly easy to automate and would save reviewers some time to focus on reviews. One obvious concern is that the bot would normally leave a message on somebody's talk page every day the review is overdue, which is less than appropriate. A solution would be for the bot to leave an easily identifiable piece of code it could check for before posting another notice (it would have to contain article identification, though, as it is possible for one person to have a few consecutive overdue reviews).

Just wanted to ask whether you think this is a good idea and would be feasible to do! Thanks in advance for your reply, PrinceGloria 09:51, 3 June 2007 (UTC)[reply]

Sounds good to me. You might want to drop a note on Daniel Vandersluis's talk page, since he's the bot coder; I think he watches this page so he will probably see this though.
Another possibility that came up was automatically fixing the GAOnHold notes where the editor has left "Article" in there instead of changing it to the article name. There's also the idea of scanning through the GAC and cross-checking the list of GAs against what's on the GA page, to see if there are discrepancies. I mentioned it on his talk page, and he thought it would be possible, but he hasn't indicated yet if he wants to do it or not. Mike Christie (talk) 10:53, 3 June 2007 (UTC)[reply]