Jump to content

Wikipedia:Dates in Wikipedia

From Wikipedia, the free encyclopedia

By our last count there are 38 million dates in 26 million paragraphs in the current English Wikipedia as of February 2017. This is only dates found in the text (paragraphs) of Wikipedia articles, and does not include info boxes and lists.

Analysis:

[edit]

Taking a bird’s eye view of the 38 million dates we have in our sentences database we make these observations. Of all the dates found, if graphed as dates by year, we can see what we had already expected, that Wikipedia’s collective contemporaneous memory is greatly biased to the present time, showing spikes for the first and second world wars. Then an explosion of dates in articles from the 2000’s to the present time.

AllYears

This can be better understood by looking at the same data in a more condensed form from 1900 to the present:

1900topresent

An example of what might be useful to historians is the effect of the printing press with movable type first used in the Western world beginning roughly around 1440. In the 100 years between 1440 to 1540 we see a doubling of the amount of dates in Wikipedia’s collective memory of dates. If the printing press is responsible for this can be debated.

GutenbergEffect

The tables that include all of the dates found can be downloaded here (data in CSV format):

The titles/articles database (articles.zip). 4,477,089 titles in the English Wikipedia. 75 megabytes:

[edit]

https://drive.google.com/file/d/0BwW3GI4uVWLjSDdQR2p3LUlPLW8/view?usp=sharing

Fields:

article = Article ID

title = the title of the article

countfound = number of times the article was linked to from other articles

datefound = date the article was scanned for dates

dates = number of dates in the article

The paragraphs (paragraphs.zip). 25,778,610 paragraphs of the English Wikipedia. 186 megabytes:

[edit]

https://drive.google.com/file/d/0BwW3GI4uVWLjeXhVakJ3NnBPTlU/view?usp=sharing

Fields:

Article = The article ID

Para = unique paragraph ID

Order = The paragraph number

Added = date added to the table

Dates = The number of dates in the paragraph

The sentences (sentences.zip). 38,428,8710 sentences of the English Wikipedia. 447 megabytes:

[edit]

https://drive.google.com/file/d/0BwW3GI4uVWLjeVA5R2cwQmNzUFk/view?usp=sharing

Fields:

Article = The article ID

Para = The paragraph this sentence was found in

Numdates = The number of dates in this sentence

Start = The place where this sentence begins in its paragraph

End = The length of this sentence

Startd = The date found

Endd = The end date if this was a date range found

Database Method

[edit]

So, for “Leonardo_da_Vinci” in the articles table the ID is [CH27V0XTD].

SELECT * FROM paragraphs HAVING article = [CH27V0XTD]

Will select all paragraphs in the “Leonardo_da_Vinci” article that have dates in them.

SELECT * FROM sentences HAVING article = [CH27V0XTD]

Will select all sentences in the “Leonardo_da_Vinci” article that have sentences which contain dates.

It also includes the starting point of the sentence in the paragraph (start) and the sentences length (end).