Jump to content

User:Periglio/Persondata

From Wikipedia, the free encyclopedia

For my own personal use, I used the Persondata template to create my own database of celebrity birth and death days. I have carried on developing my software in order to validate birth and death data in wikipedias biographical articles. To be honest, this is mainly for my own benefit to maintain my own accurate database. However, as an ex Wiki editor and I feel I should give something back to the community, I am actively updating articles where anomalies are found.

I have not put the software into the public domain, but if anyone shows any interest I could. I have also thought about making the error lists available hoping to get some help in fixing articles. Again, I am waiting for feedback.

Below I am listing the error messages, to give some idea of what I am searching for, and the criteria I am using. To be honest again, this is mainly for my own benefit, but if anyone shows an interest, I am willing to collaborate.

As of 22 February 2014, there are 1,116,575 entries in my database which should account for all articles that contain a Persondata template. This does vary with articles being created, deleted and edited.

Validated messages

[edit]

Complete (living)

[edit]

This indicates an article containing a complete birth date, no validation errors and the subject is still alive.

22 February 2014 - 289,389 records

Complete (non-living)

[edit]

This indicates an article containing complete birth and death dates, plus no validation errors.

22 February 2014 - 131,337 records

Validated

[edit]

This indicates an article where a birth or death date is incomplete, but otherwise validated. A relevant category will be present to confirm the date is missing and is not just a typing error for example.

22 February 2014 - 21,659 records

Error Messages

[edit]

W errors are general Wikipedia errors. The explanations assume the Wikipedia class has been used by the Persondata class. P are specific to the Persondata template.

W001-Cannot contact Wikipedia

[edit]

This errors occurs when there is a loss of Internet, but can also occur if Wikipedia returns an error page such as "server busy". If the error generates multiple times, the software will terminate.

W002-%1 template not found

[edit]

This error will result in the article being removed from the database

The requested page does not contain the Persondata template.

[edit]

There is a broken Wikilink within the template ie an extraneous ]].

W004-Cannot convert year of %1

[edit]

The date supplied is a single numeric, assumed to be a year, but cannot be converted into a 1st January date. The software checks for an error condition, but there is no feasible way this would fail.

W005-Cannot extract year of %1

[edit]

The supplied date field appears to be a small piece of text. This is often someone typing NA into the death date of a living person.

W006-Invalid %1 date - no year

[edit]

The date field appears to be complete (ie it contains 3 fields) and successfully converts into a date. However, the resulting year does not appear in the original date. This happens when the conversion is fooled for whatever reason and uses the wrong year for its conversion. For example, using a 2 digit year.

W007-Invalid %1 date

[edit]

The date field appears to be complete, but fails to convert. The entry is invalid for various reasons. Normally misspelling of months or extra text. nb The software does not yet handle circa, about, between etc. Watch out for out of range dates such as 31st April, or 29th February during non-leap years.

W008-There are not 3 fields in %1 date

[edit]

Before date conversion, the software checks there are 3 fields - day, month and year. This error indicates additional text, or a missing field.

W009-Unmatched category brackets

[edit]

This error indicates a broken category within the article e.g. [[Category:2011 deaths

W010-Unmatched template brackets

[edit]

This indicates a broken template within the article - Note that this applies to all templates within the article.

W011-Cannot handle %1 date modifier

[edit]

Temporary kludge to flag acceptable date modifiers such as circa, about etc. These should not need fixing, it exists solely to prevent false errors in the date conversions.

W012-Unbalanced HTML comment

[edit]

Somewhere in the article there is a rogue HTML comment start or end

W013-Unbalanced template brackets on page

[edit]

The software was unable to extract the template because it could not find closing brackets. i.e. The template was found but is broken - there is a rogue {{ after {{Persondata.

P001-Persondata template contains a template

[edit]

As per WP:PERSON, do not use templates as these can interfere with data extraction. Normally these are date templates, but disambiguation and country flag templates also appear. Occasionally the error may be triggered if there are rogue brackets in the text.

22 February 2014 - 1573 articles found

P002-No year of birth and no explanation category

[edit]

These articles lack any birth information and and not in a category that would explain the lack of information. The normal fix would be to add Category:Year of birth missing (living people).

P003-No name in Persondata

[edit]

This error message occurs when the NAME parameter is left blank. It can also occur if the NAME parameter appears twice, even if one parameter has an entry

23 February - 3 records (cleared)

P004-Unrecognised Persondata parameter

[edit]

This is where someone has added their own parameters to Persondata, such as eye colour, spouse, etc. Can also indicated a rogue | character, left behind when delinking.

P005-Death category does not exist

[edit]

These are entries where a full death date exists, but there is no (year) deaths category. Note that a different error is triggered if a different years category exists. This error means there is no (year) deaths category at all.

P006-Death category does not match DOD

[edit]

This is where a death category exists eg 2013 deaths but the death date in persondata gives a different year. On the assumption that the article dates are visible for review, it is normal to make persondata and/or category match the dates contained within the actual article.

8 March 2014 2639 records

P007-Birth category does not exist

[edit]

This is where at least a birth year is know, but there is no category nnnn births. Sometimes this is due to a more generic version being used, such as a decade birth 1950s births but in the main, it is just simply missing.

9 March 2014 6290 records

P008-Birth category does not match DOB

[edit]

This occurs when there is a complete date of birth, but the nnnn births category indicates another year. Normally we assume that the article contains the correct information as it is visible to everyone.

8 March 2014 - 11205 records

P009-No comma found in name

[edit]

The format for the name field is surname, forename. This indicates where this convention has not been followed. However, there will be many false positives due to many articles where the forename, surname does not apply. This is an ongoing project to remove the false positives.

8 March 2014 - 118091 records

P010-No short description

[edit]

This occurs if the template short description field is left blank,

8 March 2014 - 47602 records

P011-Birth date is in the future

[edit]

This error occurs when the full date of birth is greater than todays date. The error has a reject status.

P012-Accurate Date of Birth - category says no

[edit]

This error indicates that the software was able to extract an accurate date of birth, but there is a category that indicates that an accurate date is not available.

8 March 1258 records

P013-Year only birth and no explanation category

[edit]

P014-Year of birth - category says no

[edit]

P015-Birth year is in the future

[edit]

P016-No year of death and no explanation category

[edit]

P017-Death date is in the future

[edit]

The date of death is greater than todays date. Often caused by vandalism.

22 February 2014 - 1 record (cleared)

P018-Accurate Date of Death - category says not

[edit]

P019-Year only death and no explanation category

[edit]

P020-Year of death - category says not

[edit]

P021-Death year is in the future

[edit]

This error is normally associated with vandalism. A year (not a complete date) has been found in the death date and it greater than the current year. Other errors will normally be generated as a strange figure will cause other validations to fail.

22 February 2014 - 10 records (cleared)

P022-NAME parameter missing

[edit]

The Persondata template has no NAME parameter, often a sign that the template is broken.

22 February 2014 - 1 record (cleared)

P023-SHORT DESCRIPTION parameter missing

[edit]

P024-ALTERNATIVE NAMES parameter missing

[edit]

P025-DATE OF BIRTH parameter missing

[edit]

P026-DATE OF DEATH parameter missing

[edit]

P027-PLACE OF BIRTH parameter missing

[edit]

P028-PLACE OF DEATH parameter missing

[edit]

P029-Date of death is before date of birth

[edit]

This error will invalidate the database record

This error occurs when the subject appears to have died before he was born.

P030-Lived to over 100 and not in centenarian category

[edit]

P031-Lived to over 110 and not in supercentenarian category

[edit]

P032-Longevity too great

[edit]

These are biographies where the subject appears to be over 120 years old. This can be caused by a date of birth before 1800 and no death information. If there is death information, it is likely that either the birth date or death date is incorrect.

22 February 2014 - 8 records (cleared)

P033-Currently less than 10 years old

[edit]

P034-Currently greater than 100 years old not in centenarian category

[edit]

P035-Currently greater than 110 years old not in supercentenarian category

[edit]

P036-Life span too great

[edit]

This error occurs if the person appears to have lived for over 120 years. There are two main causes, an incorrect birth/death date or the article is missing death information.

22 February 2014 176 Records

P037-No death data or Living people category

[edit]

P038-Persondata contains terminators

[edit]

P039-Value in death data and still in Living People category

[edit]

These errors are mainly due to someone typing in "living" or "NA" into the death date parameter. Occasionally can be due to vandalism. The error is also triggered when a death date is correctly added and the Living people category is left behind.

23 February 2014 - 2234 records

P040-Place of birth missing and no cat

[edit]

P041-Place of death missing and no cat

[edit]

P042-Missing template not using (living people) version

[edit]

There are various "missing" information templates such as "Year of birth missing". if the subject is still alive, these category titles also include the additional text (living people) i.e. "Year of birth missing (living people)" This error flags when the (living people) suffix has been incorrectly omitted.

22 February 2014 - 1864 records

P043-Missing template using (living people) for non-living

[edit]

P044-Description more than 100 characters

[edit]

P045-Not used

[edit]

P046-Template contains a %

[edit]

P047-Template contains HTML tag

[edit]

P048-Template not in main namespace

[edit]

Date definitions as per German Wikipedia

[edit]
☒N Wrong  Correct Meaning and notes
[[3 April]] [[1940]] 3 April 1940 Dates should not be linked.
An article should not be edited just to correct such a link.
However, correction is desired
if the article is to be edited for other reasons.
4. 1. 1234
04 January 1234
4 January 1234 unified format,
to simplify automatic data extraction
123 Before the Common Era
123 BCE
123 Before the Christian Era
123 Before the Current Era
123 BC ditto
AD 123
123 AD
123 Common Era
123 CE
123 ditto
Early 43 BC
2nd half of 43 BC
late summer 43 BC
43 BC coarsen,
to simplify automatic data extraction
lived in late 8th and early 9th century 8th century
9th century
for DATE OF BIRTH
for DATE OF DEATH
before 837 before
is always followed by the given date
before the 18th July 837 before 18 July 837
documented 1108
begat 1108
recorded in 1108
mentioned in 1108
provable since 1108
before 1108 for DATE OF BIRTH
after 837 after
is always followed by the given date
after the 18th July 837 after 18 July 837
later than 1245
earliest 1245
presumed dead 1245
missing 1245
1245 (or later)
not before 1245
after 1245 for DATE OF DEATH
837-843 between 837 and 843 between
is always followed by the given dates separated by and
after 3 May 93 and before 5 May 103 between 3 May 93 and 5 May 103
second half of 9th Century
end of the 9th Century
between 850 and 900
6 May before 987
6 May after 987
6 May between 987 and 993
6 May around 987
the day is known, the year is uncertain
6 May 19xx 6 May 20th century
940; other sources 945
940 (or 945)
940 or 945 or should appear only between two stand-alone dates,
not between days or month names;
more than two alternatives are permitted
3 or 4 April 940
3/4 April 940
3 April 940 or 4 April 940
3 April or May 940
3 April/May 940
3 April 940 or 3 May 940
3 April 940 or 941
3 April 940/941
3 April 940 or 3 April 941
3rd/4th century 3rd century or 4th century
approximately 837
around 837
circa 837
ca. 837
c. 837
~837
about 837 about
(small) interval around the given date
born May 1705 and baptised 17 May 1705
17 May 1705 (baptism)
baptised 17 May 1705 baptised resp. buried
refers to the complete date,
this can only appear at the start
(may eventually become uncertain: )
funeral 14 June 1705
14 Juni 1705 (funeral)
14 June 1705 (burial)
buried 14 June 1705
probably 1460
likely 1460
possibly 1460
1460(?)
uncertain: 1460 uncertain
refers to the complete date,
it can only appear at the start
3(?) March 1460
3 March(?) 1460
3 March 1460(?)
possibly 3 March 1460
uncertain: 3 March 1460
uncertain: about 1111
uncertain: 1 May 999 or 1 June 999
uncertain: baptised 17 May 1705
uncertain: buried 14 June 1705
not known
unknown
 ?
Check whether
a rough entry is possible, such as
3rd century
or
3rd century or 4th century
When not: leave the field empty
the following forms should be tolerated until a definitive decision is made
333/32 BC 333/332 BC a known year of another calendar
(Greek/Islamic/Iranian/etc.),
which has been recorded as two consecutive Julian or Gregorian years
– separated by /;
not meant is:
333 BC or 332 BC;
there must be no spaces next to the /
1332/33 1332/1333
about 335/325 BC (small) interval
around the given date span;
please always place after about,
in other cases between is correct;
there must be no spaces next to the /
about 1870/80 about 1870/1880
☒N Wrong  Correct Meaning and notes