Wikipedia:Bots/Requests for approval/HiTeCBot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Denied.
Operator: Vigsterkr
Automatic or Manually Assisted: Automated
Programming Language(s): Python Wikipediabot Framework, C++
Function Summary: Automated categorization of the articles.
Edit period(s) (e.g. Continuous, daily, one time run): edit is not required
Edit rate requested: -
Already has a bot flag (Y/N): N
Function Details: As part of an on-going research at my university we would like to apply our hierarchical text categorizer (HiTeC, see: http://categorizer.tmit.bme.hu/) for wikipedia. This would require that we could retrive the whole category structure of wikipedia (currently just the english version) and store it in our own format and retrive a given number of articles that we could use as training dataset for HiTeC. As a result we could provide an automated categorization for new and currently uncategorized articles. Probably we could give more relevant results on a simple search query than an index based search engine - this is to be verified after applying HiTeC to wikipedia (see the requirements above).
Discussion
[edit]Do you know about database dumps? This will give you access to all of wikipedia without clogging the server up retrieving all the information you want. :: maelgwn - talk 01:28, 18 October 2007 (UTC)[reply]
- If its not editing, and therefore not needing to get data at runtime... This BRFA isnt needed... And may aswell be denied..? Reedy Boy
- I would say so... unfortunately i didn't know that database dumps exists, before i've made the request... sorry Vigsterkr
- No problem. =) Denied. Reedy Boy 09:21, 19 October 2007 (UTC)[reply]
- I would say so... unfortunately i didn't know that database dumps exists, before i've made the request... sorry Vigsterkr
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.