If the data is available it would be cool to graph the number of editors and/or the number of hits on the same time axis. This might show us any emerging lead lag relationships. user:mirwin
It's a nice graph, but there's an awful lot of white space on it. Can somebody who knows how to do these things trim it? --Camembert
Additionally, the JPEG compression makes it a bit muddy. Would it be possible to resave the original as a PNG file? --Brion
New and improved graphs now in place. The Anome 09:35 Sep 21, 2002 (UTC)
time to update graphs? --Lightning 19:42 Oct 19, 2002 (UTC)
anyone notice something funky about the following:
I made a new graph. uhm.. I'll try to keep it updated.. i'm sorry if it doesnt look great, but im just pumping it out with a spreadsheet program. --Lightning 05:38 Oct 23, 2002 (UTC)
Are you going to change the graph below (rate of increase) as well? -- WillSmith (Malaysia)
I want to wait till Ram Man is done, because the bot massively inflates this number, once the bot is done running, ill take a week's worth of samples and do it. --Lightning 19:49 Oct 24, 2002 (UTC)
Fire away! I've for the most part finished it up (at least the large scale automation anyway!) -- Ram-Man
how about a graph showing the amount of data hosted by wikipedia and the average size per page? Lir 05:56 Oct 23, 2002 (UTC)
No access to the db, so i can't make sql queries to get these numbers.. --Lightning 19:49 Oct 24, 2002 (UTC) It would be interesting though to get the mean number of content bytes per article and wait like a year and take it again for comparison purpouses. --Lightning 19:49 Oct 24, 2002 (UTC)
The new graph has an x-axis which is not evenly spreaded in time. The slope is now dependant on the number of samples in any given period. This is a bit confusing in my opinion. Erik Zachte
The best thing to do is to use an x/y scatter-plot setting for the graph tool. This will allow for the non-uniform sampling, which will otherwise distort the graph.
i'll look into it --Lightning 19:49 Oct 24, 2002 (UTC)
It is possible to give the growth of Wikipedia without including the Ram-Man bot additions? The bot is adding around 1,000 articles a day (and seems to have around 30,000 in total to add) and it would be interesting to see the rate of growth without this distortion.
Note: the article count feature is currently disabled, with the article counter stuck at 90679. -- 15 November 2002
Note: The article counter is incrementing again. -- 18 November 2002
Is the article counter fixed now? If not, there is very little point in continuing to update this data by hand. If much of the past mpacIII data is questionable, perhaps someone would be so kind as to regenerate the mpacIII data from the database dumps? The Anome
The count is still calculated stupidly (comma count?!!) but it's now fixed, yes. I see absolutely zero purpose in regenerating older counts, since A) the number is pure hype with limited value, B) we only have a limited number of dumps kept on hand at ~1 month intervals (keeping the old ones around at a higher rate more would waste A LOT of disk space), and C) the margin of error from the drift is probably smaller than the margin of error of our crappy count system (comma count?!!) except for that one >100000 entry. --Brion 20:33 Dec 17, 2002 (UTC)
Speaking of which, what happened to the idea to redefine the count? I still think we shouldn't count anything below 500 bytes as an article. That along with the dreaded comma count, IMO, would give a more accurate measure of our true progress (~80,000 articles). My only concern for this plan though is what it might do to the moral of the non-English Wikis. Maybe we could have a {{HEADLINEARTICLECOUNT}} that would display the more conservative article count (we could even up the ante by excluding anything below 1 kilobyte). --mav
If we are going to make graphs, and then analyze them, shouldn't we take RamBot's contributions into account? --Uncle Ed
The new article count system is now active on the English Wikipedia. (And the counter is no longer stuck. ;) If desired, I can go back through my backup dumps and run counts of the new algorithm on older databases for comparison purposes. --Brion 06:02 25 May 2003 (UTC)
Is it also up for the Dutch wiki? I find differences between the count on our main page (6901) and the count obtained by
SELECT count(*)
FROM cur
WHERE cur_namespace=0
AND cur_is_redirect=0
AND cur_text LIKE '%[[%'
I have replaced Fonzy's analysis of growth with a new treatment, which produces a new growth model that tries to eliminate the effects of outliers, data dumps, recalibration, and slow-downs. It's a remarkably good (conicidental?) fit for the past, but who knows about the future? -- The Anome 16:58 11 Jun 2003 (UTC)
An HTML idiot writes: is there any way either that this page can be made a sensible width, or that I can view it (IE6) as a screen-width page? jimfbleak 17:30 11 Jun 2003 (UTC)
KAKA
This page hasn't been edited since April except to correct a spelling error, and the page linked to (here) hasn't been updated since May!!!!
I have made a few graphs, and scripts to update them. I am not sure how accurate they are (I didn't do the database query myself), but it looks reasonable. The details are on my user page. Perhaps this can be used here? Amaurea14:48, 23 April 2006 (UTC)[reply]
Could someone get and add information about how much space the text actually takes up? Or perhaps an estimate of how many printed pages all the text would take? There isn't anything here that really gives me a good idea of how BIG wikipedia is when compared to other information compendiums, which is all I wanted when I came to this page.24.128.152.1208:06, 19 December 2006 (UTC)greg[reply]
Not including sizes of pictures in gigabytes doesnt make any sense to me. The images of the encyclopedia are just as important as the text information. Also to compare this to a "book" you would have to look at how much space the average wiki page has in images and include that as well. —Preceding unsigned comment added by 68.154.41.177 (talk) 05:04, August 29, 2007 (UTC)
The original question was how much space the text takes up. Whether you want to include media files or not in this number completely depends on what you want to do the knowledge. —Kri (talk) 14:06, 23 February 2014 (UTC)[reply]
Im sure the entire point of asking this question is to wonder: "If I wanted to download ALL of wikipedia's most latest content, as it appears, for my own personal offline viewing pleasure, how much of my hard drive would it take?" That is the question in my mind when I came to this page. I dont care about page update history or discussion pages, just the meaningful article content, to include all pictures, media, LaTeX/SVG imagery, sounds, etc. -CogitoErgoCogitoSum 75.172.58.58 (talk) 03:03, 10 December 2012 (UTC)[reply]
Why would you want to do that, just for your own personal offline viewing pleasure? It sounds more reasonable to me that someone would make a program of some sort that would use the contents of Wikipedia to iterate on, maybe for a machine learning purposes. And in that case it is not obvious that you want to include media files. —Kri (talk) 14:13, 23 February 2014 (UTC)[reply]
You would do that in order to guarantee access to important articles concerning history, math, physics, chemistry, and so on. A great deal of the information on subjects of academic interest can be explained through the media content in Wikipedia, and in some cases can only be explained through some form of media different from text. Specifically, I want to download Wikipedia to my Raspberry Pi to have a highly portable, nearly universal reference guide to knowledge. Yes, it will have errors, and yes, there are better solutions, but it's still something I want to do, and it would aid me greatly if I knew the total size in GB before I started. — Preceding unsigned comment added by 50.182.238.147 (talk) 12:22, 5 November 2014 (UTC)[reply]
Should use ISO 86918601 dates (eg. 2007-05-30) for all Wikipedia stuff, including graphs and charts. In this age of international commerce and communication, it seems foolish to use ambiguous dates, especially since Wikipedia English is edited and read by a large minority of English speakers outside the US. Anthony71719:10, 30 May 2007 (UTC)[reply]
I agree that the date format is lacking. If this was in the article space I would have just fixed it by wiki-linking the dates and letting the servers format it on the fly; and maybe that should be done here. ISO-8601 would be better than what is here now, but ISO-8691 is for the benefit of computers, is it not? I think most people would find "30 May 2007" more humanistic. --Charles Gaudette09:18, 31 May 2007 (UTC)[reply]
I'll change the date format in the "Wikipedia growth" plots during the next update. The plot in "Comparisons with other Wikipedias" was grabbed from Commons, so I'm not sure where the source data for it is. Maybe I can get the data from http://stats.wikimedia.org/EN/Sitemap.htm and generate a new plot (later, when I have more free time). --Seattle Skier(talk)09:01, 2 June 2007 (UTC)[reply]
This page doesn't answer some of the obvious questions (as noted above) like how many gigabytes is it. Another question that comes to mind, how many gigabyes are the images used in articles (hosted here or in the commons), since they are definitely part of wikipedia as well. Also, how many servers are there currently, how many watts of electricity do they use, how much total RAM - all these are interesting questions. -- fourdee ᛇᚹᛟ 11:09, 7 August 2007 (UTC)[reply]
I am a graduate school student of MBA from Taiwan. I and my
advisor, Professor Chu, are interested in the diffusion phenomenon of
the famous wikipedia website very much. I and my advisor and have
some questions about the diffusion data from this URL below,
we hope we could apply the formal diffusion model from management
science to figure out the successful story of Wikipedia.
At the bottom of this website, there is a data set, describing the
shape of Wikipedia growth in the domain of English. It make me have
two questions from this data set. First of all, faced with this data
set, I can hardly distinguish the numbers of size is from auto-posting
robot, the Rambot, or from the real people. Could you help me to
obtain the data which have already disassembled those two different
processes of data (edited by program and editing by human being)?
Second, what makes me so confused is that the formation of dates is
irregular. I was wondering why the pattern of the data set appears in
that way. Is there anything happening inside those irregular data?
Could you provide me further story or idea which may help me to figure
it out?
Thank you for your response in advance. I hope I can get acquainted
with the statistics of Wikipedia which can help us to explore the
nature about the diffusion condition of Wikipedia.
People just posts the update of the count of articles when they feel like, it is not a robot who is making that. Keep us updated with the models you are going to use... and use also google. I have seen a couple of good articles studying how wikipedia grows Diego Torquemada (talk) 23:47, 10 December 2007 (UTC)[reply]
Hey, I saw your question and tried to come up with a better answer. I think the only way to distinguish human and automated editing is checking all editors entries in the bot category. You can find comments on unusual growth in some of the Category:Wikipedia statistics articles. Some are also slashdot or similar effects. Good luck. --BenT/C14:57, 18 December 2007 (UTC)[reply]
This page says there was exponential growth at some point in time. But the percent increase keeps changing from year to year. "Exponential growth" is a precisely defined mathematical concept that means the percent increase over any time period is THE SAME as the percent increase over any other period of the same length of time.
Why not replace the "Wikipedia growth" graphs and associated "Notes" with the more up-to-date graphs and text at Wikipedia:Modelling Wikipedia's growth which show logistic growth. The stuff being replaced could be preserved by being copied to the other article. (Also, the second paragraph of the lead section is not useful; it could be removed and the new Wikipedia statistics summary link moved to the See also section.) JonH (talk) 09:51, 11 January 2009 (UTC)[reply]
In the first years Wikipedia has grown faster as linear. At that time it was thought to be exponential growth. I noticed the percent increase kept going down from year to year, and proposed the logistic growth on 28 February 2007. Others had hinted on the logistic growth before, but nobody modelled it. Only recently it is more or less accepted that growth is logistic (or linear, certainly not exponentional). Feel free to change the pages accordingly. HenkvD (talk) 20:09, 12 January 2009 (UTC)[reply]
I'm extremely sceptical about the claim of logistic growth. Just because the growth is now sub-exponential does not mean it's logistic. If the project was in an exponential regime, and is now in a more or less linear regime, that does not mean its growth is going to fall off to zero. Perhaps the exponential regime was as Wikipedia was being discovered, and now we're in a linear regime, where everyone knows about it and pretty much anyone who wants to edit already knows about it and is editing? In any case, the logistic growth claims are an overzealous projection from limited data. -Oreo Priesttalk03:07, 7 July 2009 (UTC)[reply]
Feel free to propose a sub-exponential model. The logistic model was proposed 2 years ago, when the growth was still bigger each month. We now see the growth has peaked and is even getting smaller, maybe not exactly as the bell-curve, maybe more like the Extended-growth model. As far as I can see it is not a linear regime either. Will the growth fall to zero: Nobody knows for sure. It is even possible, I hope not, that the growth will be less than zero if more and more stubs are deleted in a final stable version. HenkvD (talk) 11:46, 8 July 2009 (UTC)[reply]
My concern is that a fancy toolbox of curve-fitting tools is being used to extrapolate an entire regime there's no evidence of. Real-world factors are also being ignored; the only way we could have zero growth is if editors lose interest or if we run out of things to write about; I find both of those quite implausible. User:Piotrus estimated the maximum size of WP to be around 400 million articles (he likely has at least the order of magnitude right), so the latter is out, and tell me, do you see the WP community just giving up in a year or so? I certainly don't. Perhaps the decrease in growth rate is due to having picked the low hanging fruit in terms of article creation, but that doesn't mean there's none left.
In any case, I don't think logistic growth should be presented as fact on this page. the extended growth model seems much more believable and conservative, in terms of not extrapolating trends that haven't been seen yet. -Oreo Priesttalk14:56, 8 July 2009 (UTC)[reply]
Two years ago the logistic curve seemed the most simple model that could explain the non-logistical growth. I feel it still is a reasonable model. I like the idea of Extended-growth model. As model it still assumes growth will fall to zero. English Wikipedia could grow much more if wikipedians would translate from other languages as User Piotrus calculates, but the fact is that this is not done. If that would be true than all languages could grow to the size of the english wikipedia. This is not happening because 1) the language skills are missing 2) the interest in foreign cities, people, history etc is not as big as for your local cities etc and 3) translating is not as satisfying as writing a new article. My personal feeling is that a small steady growth will keep the wikipedia up-to-date. HenkvD (talk) 21:28, 11 July 2009 (UTC)[reply]
Back in January 2009, I thought that the old graph with a logarithmic scale (File:EnglishWikipediaArticleCountGraphs.png) no longer showed clearly how the number of articles was growing. So I replaced it with two graphs by HenkvD that I found at WP:GROWTH. These included the logistic curves, so I had to include an explanation of what they mean.
I think this page should mainly show the growth up to the present, rather than future predictions. But it is helpful to describe the logistic model, as it provides a rational explanation for the observed decline in the rate of growth since 2006. The page does say that the growth only "approximately follows a logistic growth model".
I have today changed the captions so that they no longer refer to "extrapolations". In my view the logistic curves are just provided for comparison. (The caption for the first graph also said the thick line is "smoothed to match thin model lines", but that seems unlikely to me, so I removed the comment. Perhaps HenvD can confirm this.)
I suggest keeping the logistic curves for comparison, at least while the growth rate remains between the curves for 3 and 4 million articles.
The smoothing relates to the Rambot action of 2002. The growth at that month was enormous. I smoothed that especially for the related growth charts. Number of articles, without logistic comparison and without smoothing Rambot of 2002. A version without logistic comparison and without smoothing Rambot of 2002 is available as well. HenkvD (talk) 11:41, 12 July 2009 (UTC)[reply]
The assumption that "more content also leads to less potential content" is ridiculous. There's no reason to assume that potential content will ever run out. There will always be more obscure events and biographies to add, no matter what. News, new releases of products, entertainment, etc. will all continue as usual in the future. The number of article-worthy events per year is also going to remain constant. Even the graph does not visually follow the logarithmic fit placed on it. Growth of wikipedia was polynomial at first due to its newness and the number of eager new contributors but the base of contributors is going to remain roughly constant as will their rate of contribution in the future, and from these facts and from the graph itself it becomes apparent that growth will remain linear. Barring some massive policy changes, by 2013 there will be roughly 5 million wikipedia english-language articles and my comment will be here having predicted it. Perhaps some section should detailing linear or other growth predictions should be added. —Preceding unsigned comment added by 69.253.221.174 (talk) 12:55, 23 October 2009 (UTC)[reply]
When I read various talk pages I get the impresson that there was much more activity in 2006 and 2007 than in 2009. Is there any statistics about that? If so I think that there is more and more consensus about the article content. Åkebråke (talk) 14:32, 29 December 2009 (UTC)[reply]
The total size of Wikipedia and most of its sub wiki's, based on the disk space it occupies when using XOWA (A local software application on a local computer) is approximately 150,761,865,216 bytes, or roughly 140GB as reported by the Windows 7 operating system. This numeric value includes all downloaded and decompressed data dumps for the following: Wikipedia Commons, wikiebooks, wikinews, wikipedia, wikiequote, wikiversity, wikiionary, wikispecies, mediawiki and wikidata. This is not a complete listing as there is obviously site overhead that must be considered as well and other wiki data pages not included in the local version of Wikipedia. This is simply a general size analysis of Wikipedia and its various parts and domains.
The original dump files gathered by XOWA and subsequently decompressed were deleted after the various wiki's setups were completed leaving the value of 140GB being reported by the operating system. Again, this is the size of a local copy of most of Wikipedia, but not all of it. The actual size of the entire Wikipedia database is a subject best to be described by the system administrators of the Wikipedia site. Not included in this value are Meta, Incubator, Wikisource, Wikivoyage and Wikimedia Foundation which in total equals approximately 1691.1 GB of additional compressed data.
Current dump sizes as reported by the XOWA software are as follows, the values below are of compress database files, and not to be confused with the final output value derived from the decompress archives made in reference to the local copy stored on a local computer.
AS OF 1-29-2015
- Commons 3.7GB
- Wikidata 3.9GB
- MediaWiki 60.2MB
- Wikispecies 84.1MB
- Meta 164.7MB (Not listed in above size value)
- Incubator 52.1MB (Not listed in above size value)
- Wikimedia Foundation 6.6MB (Not listed in above size value)
- Wikipedia 10.7GB
- Wikionary 432.8MB
- Wikisource 1.4GB (Not listed in above size value)
- Wikibooks 122.6MB
- Wikiversity 54.9MB
- Wikiquote 78.5MB
- Wikinews 36.30MB
- Wikivoyage 67.7MB (Not listed in above size value)
So in conclusion the full decompressed size of Wikipedia would be close to 150-160GB of physical storage space (English only), this is an approximation and the actual value will vary. This compilation of data values is for the English Wikipedia only and does not include any of the dumps for other languages. A 100% complete and decompressed copy of Wikipedia including all languages, images, and framework would have to be somewhere in the 200GB range, probably more. This however is just an educated guess, Wikipedia is far to complicated to derive a final finite value for the physical disk space it occupies. Not to mention that the database is constantly growing due to users adding data to it daily. (Contributed by Britton Burton)
The Size of Wikipedia in Volumes leaves out the images completely. I have tried to calculate how large an area would all the images cover if they where to be printed on a 600dpi printer. ( I guess 600dpi is what you'd print your family snaps at right?)
The database stores in a table the hight and width in pixels of every image. If we multiply these for each image and add them all together we get the total number of pixels in all the images. So I got the latest sql dump of the images table for the English wikipedia (the commons one is over 3 gig compressed) and ran this query: select sum(img_width*img_height) from image WHERE img_media_type = "BITMAP";. The result is 676 071 025 703. (That is 676 Gigapixels just for en.wikipedia) Now to find out what area that would cover if printed.
The square root of 676071025703 is 822235.383392736 . Divide by 600 to get 1370.392305655 inches per side of a printed square. Multiply by 0.0254 to get meters gives 34.807964564.
If all the images on the English wikipedia where printed at 600dpi on a square you would need a square sheet of paper 34.8 meters on each side, or around 1211.59m2.
If someone can double check my calculation to make sure it's correct I'll try and get the same sql query above run on commons (and all other wikis) on the toolserver. Then I'd try to create some fancy graphics to illustrate this (e.g. how much of Belgium would all the commons pictures cover?)
--Inkwina(talk·contribs)14:02, 4 February 2011 (UTC)[reply]
In the process of investigating the recent disparity between the Gompertz curve prediction and the recent data, I've just re-fitted the Gompertz curve using up-to-date data, using only data between 2004 and the present day, on the basis that Rambot activity unduly distorted the activity before that. (Which you can see in quite clearly in the graph at Image:EnwikipediapercgrowthGom.PNG)
Using the same units as in the main article, the new fitted curve has the parameters
a = 4471486
b = -15.344927
c = -0.379785
which gives the graph below:
However, even though this fits better than the previous Gompertz fit, there is still a clearly discernable and growing trend away from the Gompertz curve in favour of continued article creation, starting in roughly mid-2011. -- The Anome (talk) 20:16, 10 June 2012 (UTC)[reply]
I think it would be best to ignore everything before that large jump at ~November 2002 when doing the fit. By the way, what is the maximum in this new fitted curve? Nevermind... Brightgalrs (/braɪtˈɡæl.ərˌɛs/)[1]00:32, 1 July 2012 (UTC)[reply]
Wow! Thanks for that. For all of my Wikipedia:Pools predictions I've been using the old Gompertz model to calculate the approximate date - so with your new model 20 million articles can (theoretically) be reached by 2048 instead of ~2120 aka within my lifetime. Pretty exciting stuff. Brightgalrs (/braɪtˈɡæl.ərˌɛs/)[1]10:25, 1 July 2012 (UTC)[reply]
Does this include article references or just the main body of the article itself? If not, is there a way to determine how many citations are used on Wikipedia? Coinmanj (talk) 05:31, 31 May 2013 (UTC)[reply]
"Comparisons with other Wikipedias" unlabeled graph[edit]
The graph at the top of the section Comparisons with other Wikipedias has no title and the y-axis is unlabeled. Is it the number of articles on each Wikipedia? The total number of pages? The memory in kB?
OK, I assume it's the number of articles, but this fact is never mentioned. Numbers of articles are finally explicitly mentioned in the last two sentences of the section, but even then it's not clear that this is what was on the graph. Eebster the Great (talk) 07:58, 25 November 2015 (UTC)[reply]
On my user page, I use the following wikimarkup...
<div><font size="2">As of {{CURRENTDAYNAME}}, {{CURRENTDAY2}} {{CURRENTMONTHNAME}} {{CURRENTYEAR}}, {{CURRENTTIME}} (UTC), The English Wikipedia has {{NUMBEROF|USERS|en|N}} registered users, {{NUMBEROF|ACTIVEUSERS|en|N}} active editors, and {{NUMBEROF|ADMINS|en|N}} administrators. Together we have made {{NUMBEROF|EDITS|en|N}} edits, created {{NUMBEROF|PAGES|en|N}} pages of all kinds and created {{NUMBEROF|ARTICLES|en|N}} articles.</font></div>
...to generate this:
As of Sunday, 14 July 2024, 08:54 (UTC), The English Wikipedia has 47,672,782 registered users, 113,822 active editors, and 855 administrators. Together we have made 1,229,677,543 edits, created 61,018,682 pages of all kinds and created 6,851,601 articles.
For some reason Wikipedia doesn't use an analytics solution. instead it seems the data is obtained from the server logs. however i see no effort by Wikipedia's tech people to actually consolidate their server report log and run analytics against them. so yes this is cumbersome. (it wouldn't be expensive to run, its just extra work to organise it, and the tech people are clearly busy with other more important stuff). A Guy into Books (talk) 12:51, 18 August 2017 (UTC)[reply]
On my user page, I use the following wikimarkup...
<div><font size="2">As of {{CURRENTDAYNAME}}, {{CURRENTDAY2}} {{CURRENTMONTHNAME}} {{CURRENTYEAR}}, {{CURRENTTIME}} (UTC), The English Wikipedia has {{NUMBEROF|USERS|en|N}} registered users, {{NUMBEROF|ACTIVEUSERS|en|N}} active editors, and {{NUMBEROF|ADMINS|en|N}} administrators. Together we have made {{NUMBEROF|EDITS|en|N}} edits, created {{NUMBEROF|PAGES|en|N}} pages of all kinds and created {{NUMBEROF|ARTICLES|en|N}} articles.</font></div>
...to generate this:
As of Sunday, 14 July 2024, 08:54 (UTC), The English Wikipedia has 47,672,782 registered users, 113,822 active editors, and 855 administrators. Together we have made 1,229,677,543 edits, created 61,018,682 pages of all kinds and created 6,851,601 articles.
For some reason Wikipedia doesn't use an analytics solution. instead it seems the data is obtained from the server logs. however i see no effort by Wikipedia's tech people to actually consolidate their server report log and run analytics against them. so yes this is cumbersome. (it wouldn't be expensive to run, its just extra work to organise it, and the tech people are clearly busy with other more important stuff). A Guy into Books (talk) 12:51, 18 August 2017 (UTC)[reply]
For the "Comparisons with other Wikipedias" section, there should be a sub-section dedicated to the Cebuano Wikipedia and other primarily bot-generated Wikipedias such as the Waray Wikipedia (and the Swedish Wikipedia to some extent). All of the Wikipedia comparison charts need to display the Cebuano Wikipedia.
The Cebuano Wikipedia reached 5 million articles and is well on its way to exceed the English Wikipedia before the end of this year.
Size of Content vs. Meta-Discussion on Wikipedia?[edit]
I'm having trouble finding data regarding the size and growth (preferably in word count) of the actual content pages of English Wikipedia, in comparison to other sorts of pages, especially guideline and talk pages. Is that data available somewhere, and if so, should it be included on this page? Aquaticonions (talk) 18:27, 30 November 2020 (UTC)[reply]
What's the size of Wikipedia in 2018, after all?[edit]
Hi people! How are you? Thanks for this article and its important contribuition for knowledge discovery in datasets. I just couldn't undestand at all whats 2018's size of wikipedia in gigabytes. Could anybody help me, please? Thank you very much, Lu Brito (talk) 22:14, 2 September 2018 (UTC)[reply]
The spreadsheet will no longer be updated as the content is original research and dumps.wikimedia.org/enwiki and its pages archived by archive.org have the database sizes anyways, albeit rounded to the nearest tenth rather than the nearest hundredth. There is no need to be very precise. Johnny Au(talk/contributions)04:24, 5 December 2020 (UTC)[reply]
This article says the English Wikipedia contains over 3.9 billion words, I'd like to know which parts of an article count toward this number (do references, lists or tables count?). Tools like User:Caorongjin/wordcount and Wikipedia:Prosesize don't count the words inside certain tables, so the contents of the episode list section in One Piece (season 1) are not counted (plenty of prose gets skipped because it's inside a table), while I assume they would be counted here. How is the counting done and which words get counted? — Preceding unsigned comment added by Jasper Norbert (talk • contribs) 23:50, 31 May 2021 (UTC)[reply]
This page talks about the size of Wikipedia, over 6 million pages but it doesn't mention Quality.
It might be worthwhile to include a section that gives some space to also discuss 'quality.
It could paraphrase and point to Wikipedia:Featured_articles which says There are 6,092 featured articles out of 6,498,196 articles on the English Wikipedia (and also Wikipedia:Good articles).
I had expected maybe there would be a link in the See also section, and since I didn't see any I decided to make this small suggestion. -- 109.76.199.51 (talk) 04:58, 14 May 2022 (UTC)[reply]
To be honest, this page needs more information about how many articles are created in Wikipedia. This new section should include total comparison since Wikipedia is the largest encyclopedia with over 6.5 million articles. Other edit is adding new pictures as well. More pictures should be added because I wanted to improve this article and I tend to agree with other editors on any contribution. What do you guys and others think? --76.20.110.116 (talk) 19:15, 29 May 2022 (UTC)[reply]
The filesizes in the article are all without the media like images, but really it should also contain filesizes that include the media and images. 80.189.100.37 (talk) 22:31, 10 January 2023 (UTC)[reply]
I been asked to add this data that I collected for a separate purpose, but I haven't yet determined how best to do it. The table of this data is as follows; it excludes redirects, as well as almost all dab and list pages.
The 71% growth in word count from 2010 to 2018 is averaged to 8.9% in the table, but this is misleading. Taking compounding into account, the yearly growth rate is about 6.9% (1.71^(1/8) ≈ 1.069). Any advice how to make the change so the explanation is brief and easy to understand MrFennicus (talk) 18:07, 15 June 2023 (UTC)[reply]