Wikipedia:Bots/Requests for approval/StatsBot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section.
Owner | User:PoliticalJunkie |
---|---|
Function | Updating baseball players' stats. |
Language | English |
Program | Pywikipedia framework |
Mode | Starts manually, edits automatically, checked frequently |
Frequency | Every three days |
Thanks. - PoliticalJunkie 23:57, 29 August 2006 (UTC)[reply]
- where is it getting information from? Betacommand 00:12, 30 August 2006 (UTC)[reply]
ESPN.com - PoliticalJunkie 14:30, 30 August 2006 (UTC)[reply]
- How is it getting this informaton and how is it updating? Betacommand 15:36, 31 August 2006 (UTC)[reply]
It's getting the information by extracting it from ESPN using regex and a preloaded set of player URLs. For example, it takes the player URL as input, goes to that page, and then uses regex to get the necessary statistics. For updating, it's using a search and replace like function. It obtains the table on the page, (which is placed within two tags, see here for a sample edit), and then replaces it with the new, updated statistics. - PoliticalJunkie 17:29, 31 August 2006 (UTC)[reply]
- Thanks for explaining. Looks good, you might want to expand the colum names for the table R=Runs that way someone who doesnt know baseball slang could understand. Betacommand 18:05, 31 August 2006 (UTC)[reply]
Okay. - PoliticalJunkie 14:57, 1 September 2006 (UTC)[reply]
Just looking at Mark Teixeira, I have a couple of questions. Can this information be cited? Have you considered putting the information in the form of a template with the data as parameters? It would make updating it much easier, and you wouldn't need the <!--Please leave this tag here.--> markers. That would also make the bot less likely to make mistakes, since I don't think that tag is generic enough and it's easy to vandalize. If it were a template, you could also add an optional field for a reference. It shouldn't be hard to do. BTW, I think this is a great idea. — Ram-Man (comment) (talk) 23:12, 1 September 2006 (UTC)[reply]
I did consider putting it into a template, but couldn't because I don't know anything about template syntax. I was looking at Wikipedia:Template namespace to see how to accept parameters, but I don't know how to output them in table form. If you could design one using the fields on Mark Teixeira's page, that would be extremely helpful. As for citation, do you mean adding a link to the player's ESPN url in the table? Thanks for the advice. - PoliticalJunkie 01:31, 2 September 2006 (UTC)[reply]
- I'd be happy to help. I'll work on it and post the results when I'm finished. — Ram-Man (comment) (talk) 14:47, 2 September 2006 (UTC)[reply]
- I've made you a template at Template:Baseball stats. Go there and view the examples. We could update the citations to make them more clear, but I'd need to know what format the URL is. I just assumed one of them, but you might be looking it up differently, so if you could post an example URL, that would help us streamline the process. — Ram-Man (comment) (talk) 15:25, 2 September 2006 (UTC)[reply]
Just a note. Be careful when harvesting the data from ESPN. If you do it too rapidly to too many entries, they may flag you as a bot and block you. — Ram-Man (comment) (talk) 15:30, 2 September 2006 (UTC)[reply]
- Indeed. Additionally, you may, if possible, want to compile a dump of the pages now if you haven't already. Otherwise, the site may change format enought to break your parsers either before, or worst, while your bot starts its heavy runs.Voice-of-All 02:19, 3 September 2006 (UTC)[reply]
Here is an example url: http://sports.espn.go.com/mlb/players/profile?playerId=3392. Thanks for designing the template. If I compiled a dump, would I have to recompile it every day to update the statistics? And, how do I compile a dump? Thanks. - PoliticalJunkie 19:13, 4 September 2006 (UTC)[reply]
- I'll make some tweaks to the template to make it easier. In the meantime, I think he meant by "compile" that you should make a local copy of the web pages before you start the bot. That way you are protected if ESPN would change their web page format causing your bot to incorrectly parse the HTML. -- RM 23:43, 4 September 2006 (UTC)[reply]
Oh, thanks for the clarification. - PoliticalJunkie 17:53, 5 September 2006 (UTC)[reply]
- I was having some technical troubles, so if you can use the template as is, that'll be great. I have no further objections. Hopefully someone can approve your bot then. -- RM 23:59, 5 September 2006 (UTC)[reply]
- Small trial approved, if at all possible include citations (possibly using <ref> tags). Keep the trial to 50 pages, up to 10 days. — xaosflux Talk 03:31, 7 September 2006 (UTC)[reply]
The trial, consisting of 19 pages, was successful. See the edits here. - PoliticalJunkie 15:37, 17 September 2006 (UTC)[reply]
- How long do you expect to run this? Will it do a one time update to the pages or will it regulary update them?Voice-of-All 23:22, 22 September 2006 (UTC)[reply]
I expect to update baseball players' stats every three days during the season. The baseball season is almost over, so the bot won't operate until next April, when the next baseball season starts.- PoliticalJunkie 17:51, 23 September 2006 (UTC)[reply]
- Constant updating greatly increases the chanses of parser errors due to site changes there. Perhaps you could make the bot code very specific in checking the page XML and avoid running automatically if it detected changes.Voice-of-All 23:56, 23 September 2006 (UTC)[reply]
I just finished programming that into the bot. - PoliticalJunkie 20:11, 29 September 2006 (UTC)[reply]
- Has the bot been approved? - PoliticalJunkie 20:15, 10 October 2006 (UTC)[reply]
- Could you please cite your references, as described above? With changes made on such a massive scale, how is anyone to verify that your statistics are correct? In any case, not citing your references is not enough reason to block. So:
- Approved. The changes that you've made look great. Run this once or twice a week and never on the high activity days (such as Monday). Please cite your sources using the template, if possible. -- RM 17:53, 12 October 2006 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.