User:Monkbot/task 14: repair improper use of publisher params in cs1 templates
The next version of the Module:Citation/CS1 suite will emit an error messages when 'periodical' templates do not have a 'periodical' parameter. The next version will also emit error messages when italic markup is found in |publisher=
:
{{cite news/new|url=http://www.independent.ie/opinion/analysis/a-lonely-life-in-reverse-1793455.html|title=A lonely life in reverse|last=Pavia|first=Will|date=June 27, 2009|publisher=''[[The Independent]]''|accessdate=October 9, 2009}}
- Pavia, Will (June 27, 2009). "A lonely life in reverse". The Independent. Retrieved October 9, 2009.
{{cite news}}
: Italic or bold markup not allowed in:|publisher=
(help)
- Pavia, Will (June 27, 2009). "A lonely life in reverse". The Independent. Retrieved October 9, 2009.
Similarly, error messages will be emitted when italic markup is found in a periodical parameter:
{{cite web/new|last1=Mifflin|first1=Lawrie|title=Jury Rules That PBS Must Pay Video Distributor $47 Million|url=https://www.nytimes.com/1999/02/03/business/jury-rules-that-pbs-must-pay-video-distributor-47-million.html|website=''[[The New York Times]]''|accessdate=May 30, 2016}}
- Mifflin, Lawrie. "Jury Rules That PBS Must Pay Video Distributor $47 Million". The New York Times. Retrieved May 30, 2016.
{{cite web}}
: Italic or bold markup not allowed in:|website=
(help)
- Mifflin, Lawrie. "Jury Rules That PBS Must Pay Video Distributor $47 Million". The New York Times. Retrieved May 30, 2016.
The purpose of task 14 is to preemptively repair the most easily repairable of these types of cs1 templates before the module-suite change goes live.
description
[edit]Wiki markup is not allowed in cs1|2 parameters that are made part of a citation's metadata; {{cite journal}}
documentation is typical (an exception is made for |title=
where italic markup is allowed for proper title rendering – species names in a journal article title, for example). Further, information in |publisher=
is not included in the metadata created for cs1 periodical templates; this is a limitation of the underlying metadata standard and not of Module:Citation/CS1. For readers who consume cs1 citations through the template's metadata, whatever information (corrupted or not) that is held in |publisher=
is not available (even though that information is visibly rendered).
Task 14 has several sub-tasks, defined below, that attempt to correct malformed cs1 (and in some cases cs2) templates that have wiki markup in |publisher=
and |<periodical alias>=
parameter values.
Task 14 skips pages that include {{bots|deny=Monkbot14}}
.
definitions
[edit]For the purposes of this task, these definitions apply:
- periodical template
- any one of the following templates and redirects:
{{cite article}}
{{cite blog}}
{{cite dictionary}}
{{cite document}}
{{cite encyclopaedia}}
{{cite encyclopedia}}
{{cite journal}}
{{cite magazine}}
{{cite newspaper}}
{{cite news}}
{{cite paper}}
{{cite podcast}}
{{cite web}}
- periodical alias
- any one of the following parameters:
|dictionary=
,|encyclopedia=
,|journal=
,|magazine=
,|newspaper=
,|website=
periodical list
[edit]Task 14 maintains a list of periodical names that it consults when making repairs. The list of periodical names is manually assembled according to these loose criteria:
- must be an online and / or print periodical
- must have an en.wiki article or be identified in an en.wiki article as a synonym, redirect, etc
- the en.wiki article must identify the periodical as a newspaper, magazine, etc
- those periodicals that share the name between periodical types (The Courier can be either newspaper or magazine) are excluded
- should not have the form of a domain name (there are some exceptions: 'dictionary.com' for example, and see below for
{{cite web}}
templates with domain names) - television sources are explicitly excluded
- corporate names are excluded: Bloomberg but Bloomberg Businessweek; BBC but BBC News; ESPN but ESPNscrum – the outcome of this rfc may change this criterion
sub-task 1: periodical templates with italicized publisher
[edit]This sub-task operates only on the periodical templates.
Task 14 extracts <periodical name>
from |publisher=''<periodical name>''
and then consults the list of known periodicals for a match. When there is an exact match, and when there is no already-existing periodical alias in the template, task 14 replaces the |publisher=
parameter with an appropriate periodical alias and renames the template to match.
cs2 ({{citation}}
) is excluded here because it isn't always possible to know if the citation refers to a periodical or to a book. The error message emitted by Module:Citation/CS1 will notify editors that these cs2 citations need repair.
When deciding to make a fix, this sub-task will make fixes only when the entire value assigned to |publisher=
is inside proper italic wiki markup (leading ''
must be balanced with trailing ''
).
When fixes are made, task 14 reports the number fixes applied, the number of 'periodical' names that it does not recognize, and / or the number of templates that already have a periodical alias (conflict).
sub-task 2: periodical templates with unbalanced italicized publisher
[edit]This sub-task operates only on the periodical templates.
Essentially the same as sub-task 1 except that this sub-task catches the relatively rare case where |publisher=
has the form:
|publisher=''<periodical name>
– no trailing or closing''
markup
When fixes are made, task 14 reports the number fixes applied, the number of 'periodical' names that it does not recognize, and / or the number of templates that already have a periodical alias (conflict), all of these using the same counters as sub-task 1. During development, this sub-task maintains an unbalanced counter that may or may not remain in the code.
sub-task 3: cite web with italicized domain name in publisher
[edit]This sub-task operates only on {{cite web}}
.
For {{cite web}}
only, task 14 will repair |publisher=''<domain name>''
where <domain name>
is any combination of lowercase letters, digits, 'dot', and hyphens followed by a 'dot' and a two- or three-letter (lowercase) top-level domain (validity of <domain name>
is not assessed). When this form of {{cite web}}
is encountered, and when there is no already-existing |website=
alias, task 14 removes the wiki markup and replaces |publisher=
with |website=
.
When {{cite web}}
/ |website=
fixes are made, task 14 reports the number fixes applied and / or the number of templates that already have a |website=
alias parameter (conflict).
sub-task 4: periodical templates with upright publisher
[edit]This sub-task operates only on the periodical templates.
Task 14 extracts <periodical name>
from |publisher=<periodical name>
and then consults the list of known periodicals for a match. When there is an exact match, and when there are no already-existing periodical aliases in the template, task 14 replaces the |publisher=
parameter with an appropriate periodical alias and renames the template to match.
When deciding to make a fix, task 14 will only make fixes when the entire value assigned to |publisher=
is found in the list of known periodicals.
When fixes are made, task 14 reports the number fixes applied and / or the number of templates that already have a periodical alias (conflict).
sub-task 5: periodical templates with italicized 'work'
[edit]This sub-task operates only on the periodical templates.
Task 14 extracts <periodical name>
from |<periodical alias>=''<periodical name>''
and then consults the list of known periodicals for a match. When there is an exact match, task 14 removes the italic wiki markup. Task 14 may, if appropriate, rename either or both of the template and the |<periodical alias>=
parameter.
When deciding to make a fix, task 14 will only make fixes when the entire value assigned to |<periodical alias>=
is inside proper italic wiki markup (leading ''
must be balanced with trailing ''
). Not currently implemented, |<periodical alias>=
with unbalanced wiki markup (|<periodical alias>=''<periodical name>
) may be repaired in a subsequent sub-task or a revised version of this task.
When fixes are made, task 14 reports the number fixes applied.
sub-task 6: all cs1|2 templates
[edit]This sub-task operates on all cs1|2 templates.
As the last of these subtasks, task 14 will remove italic and bold markup from |<periodical alias>=''<name>''
parameters in all cs1|2 templates; this sub-task does not query the periodical list nor does it replace template or parameter names. In early runs of task 14 as a bot, this sub-task will be disabled so that the operator has the opportunity to refine the periodical list.
When these fixes are made, task 14 reports the number fixes applied.
edit summaries
[edit]Task 14 writes several short edit summary messages depending on the work it accomplished. Here's a brief description of what those messages mean:
- cs1 template fixes: misused |publisher= (n×/n×);
- always part of the edit summary, this message indicates the number of cs1 periodical templates that were modified; first number is the number for
|publisher=''<periodical name>''
, the second number is the number for|publisher=<periodical name>
; numbers may be zero when none were modified [sub-task 1] [sub-task 4] - unbalanced (n×);
- this message reports the number of
|publisher=
parameters with unbalanced italic markup (opening''
without closing''
to match) [sub-task 2] - skipped:
- this text is inserted when one or both of the following messages are added to the edit summary [sub-task 1]
- unrecognized periodical(s) (n×);
- identifies the number of cs1 templates where
<periodical name>
in|publisher=''<periodical name>''
is not recognized (not listed in the task's list of periodicals);<periodical name>
may not be recognized because is hasn't been added to the list (the list is manually curated), is in the list but is not an exact match, is not a periodical, is not unique to a single journal, magazine, newspaper, website, etc. [sub-task 1] Nota bene: the number stated in this summary will include cs1 templates later fixed by sub-task 3 so may be perceived as misleading. - conflicting periodical(s) (n×);
- task 14 does not attempt to repair cs1 templates that have both
|publisher=''<periodical name>''
and a periodical alias with an assigned value; a conflict exists because cs1|2 templates are not allowed to have more than one periodical alias [sub-task 1]
- fixed web sites (n×);
- indicates the number of
{{cite web}}
templates that were modified from|publisher=''<domain name>''
to|website=<domain name>
[sub-task 3] - skipped conflicting website(s) (n×);
- task 14 does not attempt to repair
{{cite web}}
templates that have both|publisher=''<domain name>''
and a periodical alias with an assigned value; a conflict exists because{{cite web}}
is not allowed to have more than one periodical alias [sub-task 3] - fixed work aliases (n×);
- indicates the number of cs1 periodical templates that were modified from
|<periodical alias=''<periodical name>''
to|<periodical alias>=<periodical name>
when<periodical name>
is known to task 14 [sub-task 5] - removed markup from cs1|2 work aliases (n×);
- indicates the number of cs1 periodical templates that were modified from
|<periodical alias=''<periodical name>''
to|<periodical alias>=<periodical name>
; [sub-task 6] - book/cs2 skip (n×);
- this message reports the number of
{{cite book}}
and{{citation}}
templates with italic markup in|publisher=
that task 14 has not repaired - ext text skip (n×);
- this message reports the number of
|publisher=
parameters with 'extraneous text' that task 14 has not repaired, for example:|publisher=''The New York Times'', May 28, 1994
– dates, volume and issue numbers, other descriptive text; this is the most common form|publisher=[[Top Gear (magazine)|''Top Gear'' magazine]]
– italic markup inside the label section of a wikilink
ancillary tasks
[edit]Deletes all empty parameters from templates that are repaired.
This task does not do awb general fixes.
script
[edit]// this is line 18
// replaces |publisher=''<periodical name>'' with |<periodical param>=<periodical name>
// use Wikisearch: insource:/publisher *= *''/
//
// TODO: remove bold/italic markup from |publisher=''<publisher>'' in non-periodical templates like cite book, cite press release, etc
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
{
Skip = true; // in operation, presume that we shall skip; set false when fixes are made
// Skip = false; // for development, never skip; for the bot set this true then when fixes are made, set it false
int periodical_param_conflict_count = 0;
int unrecognized_periodical_count = 0;
int fixed_count_ital = 0; // for publisher with italic markup
int fixed_count = 0; // for publisher without italic markup
int unbalanced_count = 0; // TODO: keep this as a reported number?
int web_param_conflict_count = 0; // cite web |publisher=''domain.name''
int web_fixed_count = 0;
int unrecognized_work1_count = 0; // keep this? just a tally of periodical templates where |<periodical>= value not recognized
int work1_fixed_count = 0; // periodical template rewrite when |<periodical>= has recognized name and italic markup
int work_fixed_count = 0; // for other cs1|2 templates with |work= (and aliases) with italic and bold markup
int publisher_fixed_count = 0; // for non-periodical cs1 templates with |publisher= with italic and bold markup
int cs2_skip_count = 0; // {{citation}} and {{cite book}} skip counter
int ext_text_count = 0; // extraneous text skip counter
string IS_CS1_PERIODICAL = @"[Cc]ite[_ ]*(?=article|blog|dictionary|document|encyclopa?edia|journal|magazine|newspaper|(?:news(?!group))|paper|podcast|web)";
string IS_CS1_NON_PERIODICAL = @"(?:[Cc]ite[_ ]*(?=(?:AV media(?: notes)?)|[Aa][Vv] media|[Aa][Vv] media notes|book|conference|encyclopa?edia|episode|interview|mailing ?list|newsgroup|podcast|press release|report|serial|sign|speech|techreport|thesis))";
string IS_CS1 = @"(?:[Cc]ite\s*(?=(?:AV media(?: notes)?)|[Aa][Vv] media|[Aa][Vv] media notes|article|ar[Xx]iv|biorxiv|book|conference|document|encyclopa?edia|episode|interview|journal|magazine|mailing ?list|manual|newspaper|(?:news(?!group|paper))|paper|podcast|press release|report|serial|sign|speech|techreport|thesis|video|web)|[Cc]itation|[Cc]ite(?=\s*\|))";
string IS_PERIODICAL_PARAM = @"(?:journal|newspaper|magazine|work|website|periodical|encyclopa?edia|dictionary|mailinglist)";
string pattern;
Dictionary<string, string> periodical_map = new Dictionary<string, string>(); // the periodicals listed here are the article name if wikilinked; |publisher=''[[Star Tribune]]'' -> Star Tribune
bool gSkip = Skip;
//----------< D I C T I O N A R I E S >----------
string[] dictionaries = {
"Australian Dictionary of Biography",
"Biographical Directory of the United States Congress",
"Black's Law Dictionary",
"Collins English Dictionary",
"Contemporary Authors",
"Dansk Biografisk Leksikon",
"dictionary.com",
"Dictionary.com",
"Dictionary of Canadian Biography", // canonical
"Dictionary of Canadian Biography Online",
"Dictionary of Welsh Biography",
"Merriam Webster",
"Merriam-Webster", // canonical
"Merriam-Webster Dictionary",
"Merriam-Webster Online",
"Neue Deutsche Biographie",
"Online Etymological Dictionary",
"Online Etymology Dictionary", // canonical
"Oxford English Dictionary",
"Oxford Music Online", // same as The New Grove Dictionary of Music and Musicians
"Random House Webster's Unabridged Dictionary",
"Stedman's Medical Dictionary",
"The American Heritage Dictionary",
"The American Heritage Dictionary of the English Language", // canonical
"The Free Dictionary",
"The New Grove Dictionary of Music and Musicians", // canonical
"Webster's Dictionary",
};
foreach (string dictionary in dictionaries)
periodical_map.Add (dictionary, "dictionary");
//----------< E N C Y C L O P E D I A S >----------
string[] encyclopedias = {
"American National Biography",
"Britannica", // same as Encyclopædia Britannica
"Britannica.com", // same as Encyclopædia Britannica Online
"Columbia Encyclopedia",
"Dansk kvindebiografisk leksikon",
"Den Store Danske",
"Den Store Danske Encyklopaedi",
"Den Store Danske Encyklopædi",
"Encarta", // canonical
"Encarta Encyclopedia",
"Enciclopedia Italiana", // same as Treccani
"Enciclopedia Treccani", // same as Treccani
"Encyclopedia.com",
"Encyclopaedia Britannica",
"Encyclopaedia Metallum", // canonical
"Encyclopedia Britannica",
"Encyclopedic Dictionary of Vietnam", // canonical
"Encyclopedia of Alabama", // canonical
"Encyclopedia Of Alabama",
"Encyclopedia of Arkansas",
"Encyclopedia of Arkansas History & Culture", // canonical
"Encyclopedia of Chicago",
"Encyclopædia Britannica", // canonical
"Encyclopædia Britannica Online", // canonical
"Encyclopædia Iranica",
"Great Norwegian Encyclopedia", // canonical
"Handbook of Texas", // canonical
"Historical Dictionary of Switzerland",
"HistoryWorld",
"International Encyclopedia of the Social Sciences",
"Internet Encyclopedia of Philosophy",
"Jewish Encyclopedia",
"Jewish Virtual Library",
"Metal Archives", // same as Encyclopaedia Metallum
"Nationalencyklopedin",
"New Georgia Encyclopedia",
"Nordisk familjebok",
"Norsk Biografisk Leksikon",
"Norsk biografisk leksikon", // canonical
"Stanford Encyclopedia of Philosophy",
"Store norske leksikon", // same as Great Norwegian Encyclopedia
"Store Norske Leksikon",
"Te Ara – The Encyclopedia of New Zealand",
"Te Ara – the Encyclopedia of New Zealand",
"Te Ara: The Encyclopedia of New Zealand", // canonical
"Tennessee Encyclopedia",
"Tennessee Encyclopedia of History and Culture", // canonical
"The Canadian Encyclopedia", // canonical
"The Canadian Encyclopaedia",
"The Encyclopedia of Science Fiction",
"The Handbook of Texas Online", // same as Handbook of Texas
"The Jewish Encyclopedia",
"The Oregon Encyclopedia",
"Treccani", // canonical
"Từ điển Bách khoa toàn thư Việt Nam", // same as Encyclopedic Dictionary of Vietnam
"Women in World History: A Biographical Encyclopedia",
};
foreach (string encyclopedia in encyclopedias)
periodical_map.Add (encyclopedia, "encyclopedia");
//----------< J O U R N A L S >----------
string[] journals = {
"Acta Classica",
"Acta Orientalia Academiae Scientiarum Hungaricae",
"American Historical Review",
"American Journal of Botany",
"American Journal of International Law",
"American Journal of Sociology",
"American Neptune",
"Annals of the Association of American Geographers",
"Appetite (journal)",
"Brazilian Archives of Biology and Technology",
"British Journal of Sports Medicine",
"Bryn Mawr Classical Review",
"Bulletin of the Atomic Scientists",
"City (journal)", // not to be confused with the magazine City Journal
"Contemporary Women's Writing",
"Cornell HR Review",
"Critical Review (scholarly journal)",
"CrossCurrents",
"Economic and Political Weekly",
"Entomologisk tidskrift",
"Entomologisk Tidskrift", // canonical
"Ethics (journal)",
"Film Quarterly",
"Film & History",
"Futures (journal)",
"Historical Journal of Film, Radio and Television",
"History of Education Quarterly",
"International Journal for the Psychology of Religion",
"International Journal of Pediatric Otorhinolaryngology",
"International Review for the Sociology of Sport",
"Internet Archaeology",
"Jane's Intelligence Review",
"Journal of Analytical Psychology",
"Journal of Church and State",
"Journal of Economic Literature",
"Journal of Financial Economics",
"Journal of Futures Studies",
"Journal of Interdisciplinary History",
"Journal of Physics G", // canonical
"Journal of Physics G: Nuclear and Particle Physics",
"Journal of Social History",
"Journal of the Lepidopterists' Society",
"Journal of the Norwegian Medical Association",
"Journal of the Society of Architectural Historians",
"Journal of Southern History",
"Journal of the Southwest",
"Leonardo (journal)",
"McGill Law Journal",
"Middle East Quarterly",
"Military Affairs", // same as The Journal of Military History
"Modern Language Notes",
"Music Times",
"Nature",
"Nature (journal)",
"New Left Review",
"North American Journal of Medical Sciences",
"Oregon Historical Quarterly",
"PAJ (journal)", // canonical
"Peace & Change",
"Pediatrics (journal)",
"Performing Arts Journal", // same as PAJ (journal)
"Physical Review",
"Quaternary Research",
"Science (journal)", // canonical
"Science (magazine)",
"Science Fiction Studies",
"Scientific Reports",
"Social Forces",
"Southwestern Historical Quarterly",
"Studies in Intelligence",
"Sydney Law Review",
"Teachers College Record",
"The American Journal of Pathology",
"The BMJ",
"The Burlington Magazine",
"The Coleopterist",
"The Good Society",
"The Historian (journal)",
"The Independent Review",
"The Journal of Business",
"The Journal of Military History",
"The Journal of Physical Chemistry B",
"The Review of English Studies",
"The Review of Financial Studies",
"Third World Quarterly",
"Volume!",
"Yale Law Journal",
"Zoological Journal of the Linnean Society",
"Zootaxa",
};
foreach (string journal in journals)
periodical_map.Add (journal, "journal");
//----------< M A G A Z I N E S >----------
string[] magazines = {
"1843", // same as Intelligent Life
"1843 Magazine",
"1843 (magazine)", // canonical
"360 Magazine", // same as 360 (magazine)
"360 (magazine)", // canonical
"5280",
"AARP Magazine",
"AARP The Magazine", // canonical
"ABA Journal", // canonical
"ABC Soaps In Depth",
"Accountancy Age",
"Accounting Today",
"Acoustic Guitar (magazine)",
"Ad Age", // canonical
"Adult Video News", // same as AVN (magazine)
"Adventist Review",
"Adventure Cyclist",
"Adventure Cyclist magazine", // canonical
"Advertising Age", // same as Ad Age
"Adweek", // canonical
"AdWeek",
"Air Force Magazine",
"All About Jazz",
"All Out Cricket",
"Allure (magazine)",
"Alternative Press",
"Alternative Press (magazine)", // canonical
"Alternative Press (music magazine)",
"AltPress", // same as Alternative Press (magazine)
"America Magazine",
"America (magazine)", // canonical
"American Bar Association Journal", // same as ABA Journal
"American Heritage (magazine)",
"American Journalism Review",
"American Rifleman",
"American School and University",
"American School & University", // canonical
"American Songwriter",
"Amusement Today",
"Analog Science Fiction and Fact",
"Animage",
"Animation Magazine",
"Anime Insider",
"anime*magazine",
"Apu (magazine)",
"Arabian Business",
"ARC Magazine",
"Architects' Journal",
"Architectural Digest",
"Architectural Forum",
"Architectural Record",
"Architectural Review",
"Architecture Week",
"ArchitectureWeek", // canonical
"Arch+",
"ARCH+", // canonical
"Art in America",
"Artforum",
"Arts and Antiques",
"Asia Week",
"Asian Scientist",
"Asiaweek", // canonical
"Aspen Peak",
"Astounding Science Fiction", // same as Analog Science Fiction and Fact
"Astronomy (magazine)", // canonical
"Astronomy Magazine",
"Atari ST User",
"Athletics Weekly",
"Atlanta Magazine",
"Atlanta Review",
"Atlanta (magazine)", // canonical
"Atlas Obscura",
"Attitude (magazine)",
"Atwood Magazine",
"Australian GamePro",
"Australian Musician (magazine)",
"Auto Express",
"Autocar (magazine)",
"Automobile Magazine",
"Automobile (magazine)", // canonical
"Autosport",
"Autoweek", // canonical
"AutoWeek",
"Aviation Week",
"Aviation Week & Space Technology", // canonical
"AVN",
"AVN (magazine)", // canonical
"Backstage (magazine)", // canonical
"backstage.com", // same as Backstage (magazine)
"Ballot Access News",
"BAM (magazine)",
"Baseball America",
"Bass Player (magazine)",
"BauNetz",
"BBC History",
"Bicycle Quarterly",
"Bicycling (magazine)",
"Big Brother (magazine)",
"Big Cheese (magazine)",
"Billboard",
"Billboard magazine",
"Billboard Magazine",
"Billboard Philippines",
"Billboard Radio Monitor",
"Billboard (magazine)",
"Billboard.com",
"Bitch (magazine)",
"Bizarre (magazine)",
"Black Issues Book Review",
"Black Belt (magazine)",
"Blackwood's Magazine",
"Blender (magazine)",
"Blistering",
"Blogcritics",
"Bloomberg Business", // same as Bloomberg Businessweek
"Bloomberg Businessweek", // canonical
"Bloomberg BusinessWeek",
"Blues & Soul",
"Bluff Magazine",
"Bluff (magazine)", // canonical
"Blurt (magazine)",
"Bomb (magazine)",
"Bon Appétit",
"Bookforum",
"Boston Magazine",
"Boston (magazine)", // canonical
"Boston Review",
"Bowling This Month",
"Bowlers Journal",
"Boxoffice (magazine)",
"Boys' Life",
"Brandweek",
"BraveWords", // same as Brave Words & Bloody Knuckles
"Brave Words & Bloody Knuckles", // canonical
"Bravo (magazine)",
"Bright Lights Film Journal",
"British Vogue", // canonical
"Broadcast Now", // same as Broadcast (magazine)
"Broadcast (magazine)", // canonical
"Broadcasting", // same as Broadcasting & Cable
"Broadcastnow", // same as Broadcast (magazine)
"Broadcasting and Cable",
"Broadcasting & Cable", // canonical
"Broadcasting (magazine)", // same as Broadcasting & Cable
"Brooklyn Magazine",
"BRW (magazine)", // canonical
"Bucketfull of Brains",
"Buffalo Rising",
"Business Review Weekly", // same as BRW (magazine)
"Business Today (business magazine)", // same as Business Today (India)
"Business Today (India)", // canonical
"Business Week", // same as Bloomberg Businessweek
"Businessweek", // same as Bloomberg Businessweek
"BusinessWeek", // same as Bloomberg Businessweek
"Businessweek.com", // same as Bloomberg Businessweek
"Bustle",
"Bustle (magazine)", // canonical
"Cabinet Magazine", // canonical
"Cabinet (magazine)",
"Café Magazine",
"Café (magazine)", // canonical
"California Lawyer", // canonical
"California Lawyer Magazine",
"Canadian Business",
"Canadian Geographic",
"Canadian Parliamentary Review",
"Capital (German magazine)",
"Car and Driver",
"CAR Magazine",
"Car (magazine)", // canonical
"Card Player", // canonical
"CardPlayer",
"Cashbox (magazine)",
"Cat Fancy", // canonical
"Catster", // same as Cat Fancy
"CBS Soaps In Depth", // same as Soaps In Depth
"CCM Magazine",
"Celtic Family Magazine",
"CERN Courier",
"CFO (magazine)",
"Charisma (magazine)",
"Chart (magazine)",
"Chemical & Engineering News", // canonical
"Chicago Magazine",
"Chicago (magazine)", // canonical
"Chief Executive (magazine)",
"Children & Young People Now",
"China Today",
"Choice: Current Reviews for Academic Libraries",
"Christian Century",
"Christianity Today",
"Chronicles of Chaos (webzine)",
"Cigar Aficionado",
"Cincinnati Magazine", // same as Cincinnati (magazine)
"Cincinnati (magazine)", // canonical
"Cineaste (magazine)",
"CIO magazine", // canonical
"CIO Magazine",
"City Journal", // canonical; not to be confused with City (journal)
"City Journal (New York)",
"City Weekend",
"City & State",
"Clarkesworld Magazine",
"Clash Magazine",
"Clash (magazine)", // canonical
"Classic Rock (magazine)", // canonical
"Climbing (magazine)",
"CMJ New Music Monthly",
"Cobblestone Magazine",
"Cobblestone (magazine)", // canonical
"Coin World",
"College Football News",
"Colorado Music Buzz",
"Columbia Journalism Review",
"Comics Buyer's Guide",
"Comics International",
"Commentary",
"Commentary (magazine)", // canonical
"Complex",
"Complex magazine",
"Complex (magazine)", // canonical
"Compliance Week",
"Computer and Video Games", // canonical
"Computer and Video Games (magazine)",
"Computer Games Magazine",
"Computer World", // same as Computerworld
"ComputerWeekly",
"Computerworld",
"Compute!",
"Condé Nast Traveller", // not the same as ~Traveler (one l)
"Condé Nast Traveler", // not the same as ~Traveller (two ls)
"Consequence of Sound",
"Contactmusic",
"Contactmusic.com", // canonical
"Cook's Illustrated",
"Cornucopia (magazine)",
"Cosmopolitan",
"Cosmopolitan (magazine)",
"Cosmos Magazine", // same as Cosmos (Australian magazine)
"Cosmos (Australian magazine)", // canonical
"Cottage Life",
"CounterPunch",
"Country Standard Time",
"Country Weekly", // same as Nash Country Weekly
"Crack Magazine",
"Crain's Chicago Business",
"Crash (magazine)", // canonical
"CRASH (magazine)",
"Creem", // canonical
"Creem Magazine", // same as Creem
"Creative Computing",
"Creative Computing (magazine)", // canonical
"Crikey",
"Cycle News",
"Cycle World",
"Cycling Weekly",
"C4ISRNET",
"C&E News", // same as
"D Magazine",
"Dance Magazine",
"Datamation",
"Dazed", // canonical
"Dazed & Confused (magazine)",
"Dazed (magazine)",
"DE magazine Deutschland", // canonical
"Deadline",
"Deadline Hollywood", // canonical
"Deadline.com", // same as Deadline Hollywood
"Decanter",
"Decibel (magazine)", // canonical
"Decibel Magazine",
"Dengeki PlayStation",
"Der Spiegel",
"Design Week",
"Deutschland (magazine)", // same as DE magazine Deutschland
"Develop",
"Diez Minutos",
"Digiday",
"Discover Magazine",
"Discover (magazine)", // canonical
"Dissent (American magazine)",
"DIY (magazine)",
"DJ Mag", // canonical
"DJ Magazine",
"Doctor Who Magazine",
"Don Balón",
"Down Beat",
"DownBeat", // canonical
"Dragon (magazine)",
"Dreamwatch",
"Drowned in Sound",
"Drum!",
"Dr. Dobb's Journal",
"Dwell (magazine)",
"EatingWell",
"Ebony",
"EBONY MAGAZINE",
"Ebony (magazine)",
"Ebony.com",
"Eclipse Magazine",
"Edge magazine",
"Edge (magazine)", // canonical
"Editor & Publisher",
"EE Times", // canonical
"El Cultural",
"El Grafico",
"El Gráfico", // canonical
"Electronic Engineering Times", // same as EE Times
"Electronic Gaming Monthly",
"Electronic Musician",
"Elle",
"Elle (magazine)",
"Emel (magazine)",
"Empire",
"Empire Online",
"Empire (film magazine)", // canonical
"Empire magazine",
"Empire (magazine)",
"Ensign (LDS magazine)",
"Entertainment Weekly",
"Entrepreneur",
"Entrepreneur (magazine)",
"ESPN the Magazine",
"ESPN The Magazine", // canonical
"Esquire",
"Esquire Magazine",
"Esquire (magazine)",
"Essence (magazine)", // canonical
"Essence.com",
"Evo Magazine",
"Evo (magazine)", // canonical
"eWeek",
"EW.com", // same as Entertainment Weekly
"Exclaim!",
"Exame",
"Executive Intelligence Review",
"Fact Magazine (United Kingdom)", // same as Fact (UK magazine)
"Fact (UK magazine)", // canonical
"Fader (magazine)", // same as The Fader
"Failure Magazine",
"Fangoria", // canonical
"Fangoria (magazine)",
"Fashion (magazine)",
"Fast Company", // canonical
"Fast Company (magazine)",
"Federal Times",
"Femme Actuelle",
"Fest Magazine",
"FHM", // canonical
"FHM Philippines",
"Field and Stream",
"Film Business Asia",
"Film Comment",
"Film Journal International",
"Film Score Monthly",
"Filmfare",
"Filmmaker Magazine",
"Filmmaker (magazine)", // canonical
"Filmink",
"Filter (magazine)",
"Fine Woodworking",
"Fire Engineering (magazine)",
"Fitness",
"Fitness (magazine)",
"Flak Magazine",
"Flare (magazine)",
"Flavorwire",
"Flux Magazine",
"Flux (magazine)", // canonical
"Flying",
"Flying Magazine",
"Flying (magazine)", // canonical
"FMQB",
"Focus (German magazine)", // canonical
"Food & Wine",
"Footwear News",
"Forbes", // canonical
"Forbes India",
"Forbes Magazine",
"Forbes (magazine)",
"Forbes (website)",
"forbes.com",
"Forbes.com",
"Foreign Affairs",
"Foreign Policy", // canonical
"Foreign Policy Magazine",
"Foreign Policy (magazine)",
"Fortean Times",
"Forth magazine",
"Fortune",
"Fortune Magazine",
"Fortune (magazine)", // canonical
"Fortune.com",
"FourFourTwo", // canonical
"FourFourTwo (Australia)",
"France Magazine",
"Frank Leslie's Illustrated Newspaper",
"Frieze Magazine", // same as frieze (magazine)
"frieze (magazine)", // canonical
"Frontline (magazine)",
"FT Magazine", // supplement to Financial Times newspaper
"Fusion (Kent State University)", // canonical
"Fusion Magazine (Kent State University)",
"Gaffa (magazine)",
"Gala (magazine)",
"Game Developer (magazine)",
"Game Informer",
"Game & Fish",
"GameFan",
"GamePro",
"GamesMaster (magazine)",
"GameStar",
"GamesTM",
"Gay City News",
"Gay Times",
"Gazeta Lwowska",
"Glamour",
"Glamour (magazine)", // canonical
"Global Politician",
"GlobeAsia Magazine",
"Goldmine (magazine)",
"Golf Digest",
"Golfweek",
"Good Housekeeping",
"Governing Magazine",
"Governing (magazine)", // canonical
"GQ", // canonical
"GQ Magazine",
"GQ (magazine)",
"Graham's Magazine",
"Guardian Weekly",
"Guitar Player",
"Guitar World",
"Guitarist (magazine)",
"Harp Magazine",
"Harp (magazine)", // canonical
"HARP (magazine)",
"Harpers Bazaar",
"Harper's Bazaar",
"Harper's Magazine",
"Haute Living",
"Harvard Business Review",
"Harvard Magazine",
"Heat (magazine)",
"Hello Magazine",
"Hello (magazine)",
"Hello! (magazine)", // canonical
"Helsingborgs Dagblad",
"Hemmings Motor News",
"Heti Világgazdaság",
"Heti Válasz",
"High Times",
"Hip Hop DX",
"HipHopDX", // canonical
"HipHopDX.com",
"History Today",
"Hit Parader",
"Hits Daily Double", // same as Hits (magazine)
"Hits (magazine)", // canonical
"HM (magazine)",
"Hogan Stand",
"HollywoodReporter.com", // same as The Hollywood Reporter
"Honolulu Magazine",
"Honolulu (magazine)", // canonical
"Hot Press", // canonical
"Hot Press (magazine)",
"Hotdog Magazine",
"Hotdog (magazine)", // canonical
"Houstonia (magazine)",
"Hudson Valley Magazine",
"Hudson Valley (magazine)", // canonical
"HUMO",
"Iceland Review",
"ICV2",
"IEEE Control Systems Magazine",
"IEEE Spectrum",
"If Magazine", // NOT same as If (magazine)
"If (magazine)", // NOT same as If Magazine
"IHS Jane's Defence Weekly", // same as Jane's Defence Weekly
"Impose Magazine",
"Impose (magazine)", // canonical
"In These Times",
"In Touch Weekly",
"Inc",
"Inc.",
"Inc. Magazine",
"Inc. (magazine)", // canonical
"Inc.com",
"India Today",
"IndustryWeek",
"Information Week", // same as InformationWeek
"InformationWeek", // canonical
"Infoworld", // same as InfoWorld
"InfoWorld", // canonical
"Inked (magazine)",
"Inside GNSS",
"Insight on the News",
"Instinct (magazine)",
"Institutional Investor",
"Institutional Investor (magazine)",
"InStyle",
"Intelligent Life",
"International Gymnast",
"International Gymnast Magazine", // canonical
"International Railway Journal",
"Interview",
"Interview Magazine",
"Interview (magazine)", // canonical
"Investment Week",
"Iron Man (magazine)", // canonical
"Ironman Magazine",
"ISTOÉ",
"Istoé", // canonical
"IZM",
"i-D",
"Jackie (magazine)",
"Jackson Free Press",
"Jacobin (magazine)",
"Jadaliyya",
"Jam!",
"Jane's Defence Weekly", // canonical
"Japanzine",
"Jazz Times",
"JazzTimes", // canonical
"Jeune Afrique",
"Jet",
"Jet (magazine)",
"Jewcy",
"Jewish Currents",
"Jewish World Review",
"Kerrang!",
"Keyboard Magazine",
"Keyboard (magazine)", // canonical
"Kicker (sports magazine)", // canonical
"kicker.de", // same as Kicker (sports magazine)
"Kill Screen",
"Kiplinger's Personal Finance",
"Kirkus Reviews",
"Knack (magazine)",
"Korrespondent",
"Laboratory News",
"Lapham's Quarterly",
"Latina (magazine)",
"Lavender (magazine)",
"Le nouvel observateur", // same as L'Obs
"Le Nouvel Observateur", // same as L'Obs
"Le Point",
"Le Vif/L'Express",
"Legends Magazine",
"Les Inrockuptibles",
"Library Journal",
"Life Magazine",
"Life (magazine)", // canonical
"Linux Journal",
"Linux Magazine",
"LiP magazine",
"Little White Lies (magazine)",
"Live Design",
"Living Blues",
"Local Government Chronicle",
"Locus Online",
"Locus (magazine)",
"London Review of Books",
"Look Japan",
"Los Angeles Review of Books",
"Los Angeles Magazine",
"Los Angeles (magazine)",
"Loud and Quiet",
"Loudwire",
"L'actualité",
"L'espresso", // canonical
"L'Espresso",
"L'Express",
"L'Obs", // canonical
"M Music & Musicians", // canonical
"Macleans",
"Macleans.ca",
"Macleans.CA",
"Maclean's",
"MacLean's",
"MacLife",
"MacWEEK",
"Macworld",
"Macworld Magazine",
"MacWorld Magazine",
"Madame Noire",
"MadameNoire", // canonical
"Make Magazine",
"Make (magazine)", // canonical
"Marie Claire",
"Marketing Week",
"Marketing Magazine", // same as Strategy (magazine)
"Marketing (magazine)", // same as Strategy (magazine)
"Maxim",
"MAXIM",
"Maxim UK",
"Maxim (magazine)", // canonical
"Maximum PC",
"Maximum Rock 'n Roll", // same as Maximumrocknroll
"Maximumrocknroll", // canonical
"MCV (magazine)",
"Mean Machines",
"Media Life", // canonical
"Media Life Magazine",
"Media Week", // same as Mediaweek (Australia)
"Mediaweek (Australia)",
"Melodic (magazine)",
"Melody Maker",
"Mental floss",
"Mental Floss", // canonical
"Men's Health",
"Men's Journal",
"Metal Forces",
"Metal Hammer", // canonical
"Metal Hammer!",
"Metal Storm (webzine)",
"Metro Magazine",
"Metro Weekly",
"Metropolis (free magazine)", // canonical
"Metropolis (Japanese magazine)",
"Military History Matters", // canonical
"MIT Technology Review",
"Mix Online", // same as Mix (magazine)
"Mix (magazine)", // canonical
"Mixmag", // not the same as Mix (magazine)
"Mobile Magazine",
"Modern Drummer",
"Modern Farmer (magazine)",
"Mojo (magazine)",
"Money",
"Money Magazine",
"Money (magazine)",
"MoneyWeek",
"Monthly Review",
"Mosaic (magazine)",
"Mother Jones",
"Mother Jones (magazine)", // canonical
"Motion Picture News",
"Motography",
"Motor Boats Monthly",
"Motor Trend",
"Motorcyclist magazine",
"Motorcyclist (magazine)", // canonical
"Motor Sport (magazine)",
"Moustique",
"Multichannel News",
"Muscle & Fitness",
"Muscular Development",
"Music Connection",
"Music Feeds",
"Music Week",
"Music & Media",
"Music & Musicians", // same as M Music & Musicians
"musicOMH", // canonical
"MusicOMH",
"MusicRow",
"MyM",
"Mystery Scene",
"Nacional (weekly)",
"Nash Country Weekly", // canonical
"National Geographic", // canonical
"National Geographic Magazine",
"National Geographic (magazine)",
"National Journal",
"National Review", //canonical
"National Review Online",
"Nation's Restaurant News",
"Natural History (magazine)",
"Nautilus (science magazine)",
"Neo (magazine)",
"Neo Music Community", // redirect to IZM
"New England Review",
"New English Review",
"New Humanist",
"New Media Rockstars",
"New Musical Express", // same as NME
"New Scientist",
"New Statesman",
"New York",
"New York magazine",
"New York Magazine",
"New York (magazine)", // canonical
"New Yorker Magazine", // same as The New Yorker
"New Yorker (magazine)",
"NewMediaRockstars", // canonical
"Newsfield",
"Newsletter for Birdwatchers",
"Newsweek", // canonical
"NewsWeek",
"Newsweek Pakistan",
"NewsWeek.com",
"newyorker.com", // same as The New Yorker
"Nieuwe Revu",
"Nintendo Gamer",
"NME", // canonical
"NME Magazine",
"NME.COM",
"No Depression (magazine)",
"Nonprofit Quarterly",
"NOW (UK magazine)",
"Now (1996–2019 magazine)", // canonical
"Nursery World",
"Nursing Times",
"Nylon (magazine)",
"n+1",
"Official Nintendo Magazine",
"Official U.S. PlayStation Magazine",
"Official Xbox Magazine",
"Official Xbox Magazine UK",
"Official Xbox Magazine (UK)",
"OK!",
"OK! Magazine",
"OnMilwaukee",
"Opera News",
"Opera (magazine)",
"Orion Magazine",
"Orion (magazine)", // canonical
"Out (magazine)",
"Outdoor Life",
"Outlook India",
"Outlook (Indian magazine)", // canonical
"Outlookindia", // same as Outlook (Indian magazine)
"Outside (magazine)", // canonical
"Outside Magazine",
"OutSmart", // canonical
"OutSmart magazine",
"Oyster Magazine",
"Oyster (magazine)", // canonical
"Ozone Magazine",
"Ozone (magazine)",
"O, The Oprah Magazine",
"Pacific Magazine",
"Pacific Standard",
"Panorama (Italian magazine)", // same as Panorama (magazine)
"Panorama (magazine)", // canonical
"Paper (magazine)",
"Parade (magazine)",
"Parents (magazine)",
"Paste",
"Paste magazine",
"Paste Magazine",
"Paste (magazine)", // canonical
"PC Gamer", // canonical
"PC Gamer UK",
"PCGamer",
"PCGamesN",
"PC Magazine", // canonical
"PC World", // canonical
"PC World (magazine)",
"PC Zone",
"Pcmag",
"PCMag.com", // same as PC Magazine
"PCWorld",
"PCWorld (magazine)",
"Penthouse Magazine",
"Penthouse (magazine)",
"People",
"People en Español",
"People Magazine",
"People (magazine)", // canonical
"People (American magazine)",
"People.com",
"Perfect Sound Forever (magazine)",
"Perlentaucher",
"Philadelphia Magazine",
"Philadelphia (magazine)", // canonical
"Philosophy Now",
"Play (UK magazine)",
"Play (US magazine)",
"Playback (magazine)",
"Playbill", // canonical
"Playbill Arts", // same as Playbill
"Playbill Vault", // same as Playbill
"Playboy",
"Playboy Magazine",
"Playgirl",
"PlayStation Official Magazine - UK",
"PlayStation Official Magazine – UK", // canonical (endash)
"Playthings Magazine",
"Playthings (magazine)", // canonical
"Pointe Magazine",
"Pointe (magazine)", // canonical
"Poker Player",
"Political Affairs Magazine",
"Political Affairs (magazine)", // canonical
"Politico Magazine", // specific publication
"Polityka",
"Pollstar",
"Pop Matters",
"Popmatters",
"PopMatters", // canonical
"Popular Communications",
"Popular Mechanics",
"Popular Science",
"Portland Monthly",
"Power Magazine", // canonical
"POWER Magazine",
"POZ (magazine)",
"Practical Fishkeeping",
"Premier Guitar",
"Premiere Magazine",
"Premiere (magazine)", // canonical
"Press Gazette",
"Princeton Alumni Weekly",
"Printers' Ink",
"Pro Football Weekly",
"Pro Wrestling Illustrated",
"PRWeek",
"PSM3",
"Psychiatric Times",
"Psychology Today",
"Publishers Weekly", // canonical
"Publisher's Weekly",
"Pure Nintendo Magazine",
"Q Magazine",
"Q (magazine)", // canonical
"Quadrant (magazine)",
"Queerty",
"Quill & Quire",
"QST",
"R&R (magazine)",
"Racer (magazine)",
"Radio Ink",
"Radio Times",
"Radio World",
"Radio & Records",
"Rail Magazine",
"Rail (magazine)", // canonical
"Railway Age",
"Railway Gazette",
"Railway Gazette International", // canonical
"Railway track and Structures",
"Railway Track and Structures",
"Railway Track & Structures", // canonical
"Rap-Up",
"Readers Digest",
"Reader's Digest", // canonical
"Reason magazine",
"Reason Magazine",
"Reason (magazine)", // canonical
"Record Collector",
"Record World",
"Reform Judaism (magazine)",
"Regulation (magazine)",
"Relix",
"Remix Magazine",
"Remix (magazine)", // canonical
"Renowned for Sound",
"Reporter Magazine (RIT)",
"Resident Advisor",
"Resumé",
"Resumé (magazine)", // canonical
"Retro Gamer",
"Revolver (magazine)",
"Rewind Magazine",
"Road & Track",
"Robb Report",
"Rock Hard (magazine)",
"Rock Sound", // canonical
"Rockin' On",
"Rockin' On Japan",
"Rockin'on",
"Rockin'on Japan",
"Rockin'On Japan", // canonical
"RockSound", // same as Rock Sound
"Rolling stone",
"Rolling Stone", // canonical
"Rolling Stone Australia",
"Rolling Stone magazine",
"Rolling Stone Magazine",
"Rolling Stone (magazine)",
"RollingStone",
"Rollingstone.com",
"RollingStone.com",
"Romantic Times",
"RPM",
"RPM (magazine)",
"Rugby World",
"Runner's World",
"Ryerson Review of Journalism",
"R.M. Williams Outback",
"Sabotage Times",
"Sacramento Magazine",
"Sacramento (magazine)", // canonical
"Sai Kung & Clearwater Bay Magazine",
"San Diego Magazine",
"Saudi Aramco World",
"Sault Star",
"Sault This Week",
"Saveur",
"Scarlet Street (magazine)",
"School Library Journal",
"Schweizer Illustrierte",
"Science News",
"Science & Diplomacy",
"Scientific American",
"Screen Daily", // same as Screen International
"Screen India", // same as Screen (magazine)
"Screen International",
"ScreenDaily", // same as Screen International
"Screen (magazine)",
"See Magazine", // canonical
"Seed (magazine)",
"SEEMagazine.com",
"Select (magazine)",
"Semana",
"Senses of Cinema",
"Sentimentalist Magazine",
"Seventeen (American magazine)",
"SFX (magazine)", // canonical
"SFX Magazine",
"She Kicks",
"SHOOT online",
"Shoot (advertising magazine)", // canonical
"ShortList",
"Sight and Sound",
"Sight & Sound", // canonical
"Sinclair User",
"Sing Out!",
"Sixth Tone",
"Skeptical Inquirer",
"Slam (magazine)", // canonical
"Slant magazine",
"Slant Magazine", // canonical
"Slant (magazine)",
"Slate",
"Slate magazine",
"Slate Magazine",
"Slate (magazine)", // canonical
"SLAM Magazine",
"Smart Computing",
"SmartComputing", // canonical
"Smash Hits",
"Smithsonian",
"Smithsonian Magazine",
"Smithsonian (magazine)", // canonical
"Soap Opera Digest",
"Soap Opera Update",
"Soap Opera Weekly",
"Soaps in Depth",
"Soaps In Depth", // canonical
"Soccer America",
"Socialist Standard",
"Softalk",
"Songlines (magazine)",
"Sonic Seducer",
"Soul Shine",
"Soul Shine Magazine", // canonical
"Sound on Sound",
"Sound & Vision (magazine)",
"Sounds (magazine)",
"Southeast Asia Building (magazine)",
"Southern Living",
"SpaceNews",
"Spacing (magazine)",
"Spectrum Culture",
"Spectrum Magazine",
"Spectrum (magazine)", // canonical
"Speed Sport",
"Spin",
"Spin (magazine)", // canonical
"Spin Magazine",
"SPIN Magazine",
"Sport Magazine", // same as Sport (UK magazine)
"Sport (UK magazine)", // canonical
"Sporting News",
"Sports Illustrated",
"Sports Illustrated Kids",
"Sports Weekly", // same as USA Today Sports Weekly
"Sportstar",
"SQL Server Magazine",
"Star Magazine", // same as Star (magazine)
"Star (magazine)",
"Stardust (magazine)",
"Starlog",
"State Magazine",
"State (magazine)",
"Stereophile",
"Stern (magazine)",
"Strategy (magazine)", // canonical
"Stylus Magazine",
"St. Nicholas Magazine", // canonical
"St. Nicholas (magazine)",
"Substream Magazine",
"Suburban Voice",
"Super Luchas", // same as Súper Luchas
"Super Play",
"Superbike (magazine)",
"SuperBike (magazine)", // canonical
"SuperLuchas", // same as Súper Luchas
"Supply Management (magazine)",
"Swimming World", // canonical
"Swimming World Magazine",
"Súper Luchas", // canonical
"Tablet Magazine",
"Tablet (magazine)", // canonical
"Taki's Magazine",
"TeamRock.com", // same as Classic Rock (magazine)
"Technology Review", // same as MIT Technology Review
"TechRepublic", // canonical
"TechRepublic.com",
"Teen Vogue",
"Tehelka",
"Teknisk Ukeblad",
"Telquel",
"TelQuel", // canonical
"Terrorizer (magazine)",
"TES (magazine)", // canonical
"Texas Monthly",
"Télérama",
"Thalia (magazine)",
"that's Beijing",
"That's Beijing", // canonical
"The Advocate (LGBT magazine)",
"The American",
"The American Conservative",
"The American Interest",
"The American Mercury",
"The American Prospect",
"The American Spectator",
"The Atlantic", // canonical
"The Atlantic Monthly",
"The Australian Women's Weekly",
"The Baffler",
"The Banker",
"The Believer (magazine)",
"The Big Takeover",
"The Blood-Horse", // canonical
"The Blood-Horse magazine",
"The Blue and White",
"The Bookseller",
"The Brooklyn Rail",
"The Bulletin (Australian periodical)",
"The Bulletin (Brussels weekly)",
"The Caterer",
"The Chicago Reporter",
"The Chronicle of Philanthropy",
"The Chronicle of the Horse",
"The Comet",
"The Comics Journal",
"The Contemporary Review",
"The Crisis",
"The Deli", // canonical
"The Deli Magazine",
"The Diplomat",
"The Dissolve",
"The DO",
"The Escapist (magazine)",
"The Fader", // canonical
"The FADER",
"The First Post",
"The Fly (magazine)",
"The Hockey News",
"The Hollywood Reporter", // canonical
"The Horn Book",
"The Horn Book Magazine", // canonical
"The Illustrated Weekly of India",
"The Improper Bostonian",
"The Irrawaddy",
"The Jazz Review",
"The Journal of Commerce",
"The Lawyer",
"The Line of Best Fit",
"The List",
"The List (magazine)", // canonical
"The Magazine of Fantasy & Science Fiction",
"The Middle East in London", // canonical
"The Middle East (magazine)", // same as The Middle East in London
"The Militant",
"The Monthly",
"The Moving Arts Film Journal",
"The Moving Picture World",
"The Music",
"The Music (magazine)", // canonical
"The Nation",
"The National Interest",
"The National Law Journal",
"The National Law Review",
"The New American",
"The New Leader",
"The New Republic", // canonical
"The New York Review of Books", // the 'magazine'
"The New York Times Book Review", // the newspaper 'magazine supplement'
"The New York Times Magazine",
"The New Yorker", // canonical
"The Paris Review",
"The Phoenix (magazine)",
"The Point Magazine",
"The Point (magazine)", // canonical
"The Quietus",
"The Real Deal (magazine)",
"The Ring (magazine)",
"The Root (magazine)",
"The Rotarian",
"The Saturday Evening Post",
"The Skinny (magazine)",
"The Source", // canonical
"The Source (magazine)",
"THE SOURCE MAGAZINE",
"The Spectator",
"The Strad",
"The Texas Observer",
"The Times Higher Educational Supplement", // same as Times Higher Education
"The Times Literary Supplement",
"The Tyee",
"The Walrus", // canonical
"The Walrus (magazine)",
"The Washington Spectator",
"The Washingtonian", // same as Washingtonian (magazine) (via dab)
"The Week", // canonical
"The Week (magazine)",
"The Weekly Standard",
"The Wire magazine",
"The Wire (magazine)", // canonical
"The Writer",
"Theatre Pasta",
"Theatre Record",
"This Week In Palestine",
"Tiger Beat",
"Time",
"TIME",
"Time Asia",
"TIME europe",
"TIME Europe",
"Time for Kids",
"Time magazine",
"Time Magazine",
"TIME magazine",
"TIME Magazine",
"Time Out",
"Time Out Chicago", // same as Time Out (magazine)
"Time Out New York", // same as Time Out (magazine)
"Time Out Singapore", // same as Time Out (magazine)
"Time Out (magazine)", // canonical
"Time (magazine)", // canonical
"Time (Magazine)",
"TIME (magazine)",
"Times Educational Supplement", // same as TES (magazine)
"Times Higher Education", // canonical
"TIME.com", // same as Time (magazine)
"Time.com",
"Top Gear (magazine)",
"Toronto Life",
"Total Film",
"Total Guitar",
"Touchstone Magazine",
"Touchstone (magazine)", // canonical
"Track and Field News",
"Track & Field News", // canonical
"Trains Magazine",
"Trains (magazine)", // canonical
"Transworld Skateboarding",
"Travel + Leisure",
"Tribune Magazine",
"Tribute (magazine)", // canonical
"Trouser Press",
"True Crime Zine",
"True West Magazine",
"TV Guide", // canonical
"TV Guide Canada",
"TV Guide (Canada)", // canonical
"TV Guide (magazine)",
"TV Technology",
"TV Times", // same as TVTimes
"TV Week", // not the same as TVWeek
"TVTimes", // canonical
"TVWeek", // not the same as TV Week
"T: The New York Times Style Magazine",
"UN Chronicle",
"Uncut",
"Uncut Magazine",
"Uncut (magazine)", // canonical
"Under the Radar (magazine)",
"URB (magazine)",
"US Magazine", // same as Us Weekly
"Us Magazine", // same as Us Weekly
"Us Weekly", // canonical
"US Weekly",
"US News",
"US News and World Report",
"Usmagazine.com", // same as Us Weekly
"U.S. News and World Report",
"US News & World Report",
"USA Today Sports Weekly", // canonical
"U.S. News & World Report", // canonical
"V (American magazine)",
"Vancouver Magazine",
"Vanity Fair",
"Vanity Fair (magazine)", // canonical
"Variety",
"Variety Magazine",
"Variety (magazine)", // canonical
"Variety (Magazine)",
"Variety (publication)",
"variety.com",
"Variety.com",
"Vegetarian Times",
"VegNews",
"Veja (magazine)",
"VeloNews",
"Vibe (magazine)",
"Vice",
"vice (magazine)",
"Vice (magazine)", // canonical
"Vice.com",
"Virginia Quarterly Review", // canonical
"Virginia Quarterly Review: A National Journal of Literature & Discussion",
"Visible Language",
"Visão",
"Vogue",
"Vogue Magazine",
"Vogue UK", // same as British Vogue
"Vogue (British magazine)", // same as British Vogue
"Vogue (magazine)", // canonical
"Voici",
"Vox (magazine)",
"Vrij Nederland",
"W Magazine",
"W (magazine)", // canonical
"Washington Examiner", // canonical
"Washington Monthly",
"Washingtonian (magazine)", // canonical
"Wave Magazine",
"Wave (magazine)", // canonical
"Wax Poetics",
"Westchester Magazine",
"Western Standard",
"Westword",
"What Car?",
"What Hi-Fi?",
"Wild River Review",
"Wired",
"Wired UK",
"Wired magazine",
"Wired Magazine",
"Wired News",
"Wired (magazine)", // canonical
"Wired (website)",
"Wizard (magazine)",
"Woman's Day",
"Women's Wear Daily",
"Worcester Magazine",
"World Press Review",
"Worship Leader",
"Worship Leader (magazine)", // canonical
"Wrestling Observer",
"Wrestling Observer Newsletter", // canonical
"Writer magazine", // same as The Writer
"XXL",
"XXL Magazine",
"XXL (magazine)", // canonical
"X-One",
"Yachting Monthly",
"Yankee Magazine",
"Yankee (magazine)", // canonical
"Época (Brazilian magazine)",
":de:Laut.de", // at de.wiki
":en:U.S. News & World Report", // same as U.S. News & World Report
"+972 Magazine",
};
foreach (string magazine in magazines)
periodical_map.Add (magazine, "magazine");
//----------< N E W S P A P E R S >----------
string[] newspapers = {
"20 Minuten",
"20 minutes (France)", // canonical
"20 minutos", // canonical
"20 Minutos",
"20minutes.fr", // same as 20 minutes (France)
"24 Chasa",
"24 heures (Switzerland)",
"A Bola",
"A Noite",
"A Semana",
"Abante",
"ABC (newspaper)",
"Abilene Reporter News",
"Abilene Reporter-News", // canonical
"AbqJournal.com", // same as Albuquerque Journal
"Accra Daily Mail",
"Accrington Observer",
"Addison County Independent",
"Adelaide Now", // same as The Advertiser (Adelaide)
"AdelaideNow",
"Adevărul",
"Ad-Diyar",
"Air Force Times",
"AJC.com", // same as The Atlanta Journal-Constitution
"ajc.com",
"Aftenposten",
"Aftonbladet",
"Akron Beacon Journal", // canonical
"Akron Beacon-Journal",
"Akşam",
"Al Akhbar (Lebanon)",
"Al Anba",
"Al Yaum (newspaper)",
"Alaska Dispatch News",
"Alaska Journal of Commerce",
"Albany Business Review",
"Albany Democrat-Herald",
"Albany Times Union", // same as Times Union (Albany)
"Albuquerque Journal", // canonical
"Alexandria Times",
"Algemeen Dagblad", // canonical
"Algemeiner Journal",
"Alton Evening Telegraph", // same as The Telegraph (Alton, Illinois)
"Altoona Tribune",
"Al-Ahram",
"Al-Ahram Weekly", // not the same as Al-Ahram
"Al-Binaa",
"Al-Masdar News",
"Al-Masry Al-Youm",
"Al-Mustaqbal",
"Al-Mustaqbal (newspaper)", // canonical
"AM New York",
"Amandala",
"Amar Ujala",
"Amarillo Globe-News",
"Ambergris Today",
"American Free Press",
"American Statesman",
"American-Statesman", // same as Austin American-Statesman
"Americus Times-Recorder",
"Ames Daily Tribune",
"Ames Tribune", // canonical
"Amigoe",
"An Phoblacht",
"Anchorage Daily News",
"Anchorage Press",
"Anderson Herald", // same as The Herald Bulletin
"Andersonstown News",
"Antilliaans Dagblad",
"Appeal-Democrat",
"Apple Daily",
"Apple Daily (Taiwan)", // canonical
"Arab News", // canonical
"Arab Times", // canonical
"Arab Times Online",
"ArabNews", // same as Arab News
"arabnews.com", // same as Arab News
"Argumenty i Fakty", // canonical
"Argus-Press",
"Arizona Daily Star",
"Arizona Daily Sun",
"Arizona Daily Wildcat",
"Arkansas Catholic",
"Arkansas Democrat-Gazette", // canonical
"Arkansas Gazette",
"Arkansas Online", // same as Arkansas Democrat-Gazette
"Arkansas Times",
"Armenian Mirror-Spectator",
"Armenian Weekly",
"Army Times",
"Asahi Shimbun",
"Asbarez",
"Asbury Park Press",
"Asbury-Park Press",
"Asharq Al-Awsat",
"Ashland Daily Tidings",
"Asian Tribune",
"Asian Voice",
"AsianWeek",
"Athens Banner Herald",
"Athens Banner-Herald", // canonical
"Athens Daily Review", // canonical
"Athens Review", // same as Athens Daily Review
"Atlanta Business Chronicle",
"Atlanta Jewish Times",
"Atlanta Journal Constitution", // same as The Atlanta Journal-Constitution
"Aujourd'hui Le Maroc",
"Austin American-Statesman", // canonical
"Austin American Statesman",
"Austin Business Journal",
"Automotive News",
"Aydınlık",
"azcentral.com", // same as The Arizona Republic
"Aztag (daily)",
"Århus Stiftstidende",
"Ballymoney and Moyle Times",
"Baltimore Afro-American",
"Baltimore City Paper",
"Baltimore Chronicle",
"Bangalore Mirror",
"Bangkok Post",
"Bangla Mirror",
"Bangladesh Pratidin",
"Bangor Daily News",
"Barren County Progress",
"Barrie Examiner",
"Barron's (newspaper)", // canonical
"Barron's",
"Basellandschaftliche Zeitung",
"Basingstoke Gazette",
"Bath Chronicle",
"Baton Rouge Morning Advocate", // same as The Advocate (Louisiana)
"Bay Area Reporter",
"Bay of Plenty Times",
"Bedfordshire on Sunday",
"Beeld",
"Beijing Daily",
"Beijing Evening News",
"Belfast Telegraph",
"Bellaire Examiner",
"Belleville News Democrat",
"Belleville News-Democrat", // canonical
"Bend Bulletin", // same as The Bulletin (Bend)
"Bendigo Advertiser",
"Bennington Banner",
"Berita Harian",
"Berkeley Daily Planet",
"Berliner Morgenpost",
"Berliner Zeitung",
"Berlingske", // canonical
"Berlingske Tidende",
"Bermuda Sun",
"Berner Zeitung",
"Bild", // canonical
"Bild-Zeitung", // same as Bild
"Birmingham Business Journal",
"Birmingham Mail",
"Birmingham Post", // canonical
"Bisbee Observer",
"Black Country Bugle",
"Blackpool Gazette",
"Blic",
"Blick",
"BN DeStem",
"Boca Raton News",
"Bon Dia", // same as Bondia (newspaper)
"Bondia (newspaper)", // canonical
"Bossier Press-Tribune",
"Boston Business Journal",
"Boston Evening Transcript",
"Boston Herald",
"Boston Phoenix", // sort of same as The Phoenix (newspaper)
"Boston Standard",
"Bota Sot",
"Botswana Guardian",
"Boulder Daily Camera", // same as Daily Camera
"Bournemouth Daily Echo",
"Brabants Dagblad", // canonical
"Brampton Guardian",
"Brattleboro Reformer",
"Brazosport Facts",
"Brenham Banner-Press", // canonical
"Bridgeport Post", // same as Connecticut Post
"Brisbane Times",
"Bristol Evening Post", // same as Bristol Post
"Bristol Herald Courier",
"Bristol Post", // canonical
"Bronx Times",
"Bronx Times-Reporter", // canonical
"Brooklyn Eagle", // canonical
"Bryan-College Station Eagle",
"BT (tabloid)", // same as B.T. (tabloid)
"Bucks County Courier Times",
"Bucks Free Press",
"Budstikka",
"Buenos Aires Herald",
"Buffalo Business First",
"Burnaby Now",
"Burnley Express",
"Business Day (Nigeria)",
"Business First of Louisville",
"Business Line", // canonical same as "The Hindu Business Line",
"Business Recorder",
"Business Standard", // canonical
"Business Times (Singapore)",
"BusinessMirror",
"BusinessWorld",
"business-standard.com", // same as Business Standard
"Buxton Advertiser",
"BVI Beacon",
"B.T. (tabloid)", // canonical
"Calgary Herald",
"Cambridge Evening News",
"Cambridge News", // canonical
"Camden New Journal",
"Canada Gazette",
"Canadian Jewish News",
"Canarias7",
"Canton Repository", // same as The Repository
"Cape Argus",
"Cape Breton Post",
"Cape Cod Times",
"Cape May County Herald",
"Cape Times",
"Capital Press",
"Capital (newspaper)",
"Carlow Nationalist",
"Casa Grande Dispatch",
"Casper Star-Tribune",
"Catholic New York",
"Catholic Sentinel",
"Catholic Standard",
"Cedar Rapids Gazette", // same as The Gazette (Cedar Rapids)
"Central Michigan Life",
"Centre Daily Times",
"Centretown News",
"Champaign-Urbana Courier", // canonical
"Chapel Hill News",
"Charleston City Paper",
"Charleston Daily Mail",
"Charlotte Business Journal",
"CharlotteObserver.com", // same as The Charlotte Observer
"Charlottetown Guardian", // same as The Guardian (Charlottetown)
"Chatham This Week",
"Chattanooga Times Free Press", // canonical
"Cherwell (newspaper)",
"Chester Chronicle",
"Chicago Business Journal",
"Chicago Inter Ocean", // canonical
"Chicago Reader",
"Chicago Sun Times",
"Chicago Sun-Times", // canonical
"Chicago Tribune", // canonical
"chicagotribune.com", // same as Chicago Tribune
"Chillicothe Gazette",
"Chilton Times-Journal",
"China Daily",
"Chosun.com", // same as The Chosun Ilbo
"Chronicle Live", // same as Evening Chronicle
"Chunichi Shimbun",
"CiN Weekly",
"Cincinnati Business Courier",
"Cincinnati CityBeat",
"cincinnati.com",
"Cincinnati.com", // same as The Cincinnati Enquirer
"City Pages",
"Clarín", // redirect to dab Clarin
"Clarín (Argentine newspaper)",
"Clarksdale Press Register",
"Cleveland Jewish News",
"Cleveland Scene",
"Click!",
"Clinton Herald",
"Clovis News Journal",
"Colorado Daily",
"Colorado Springs Independent",
"Columbia Daily Spectator", // canonical
"Columbia Missourian",
"Columbia Spectator", // same as Columbia Daily Spectator
"Columbus Business First",
"Columbus Ledger-Enquirer", // same as Ledger-Enquirer
"Commercial-News",
"Concord Monitor",
"Connacht Tribune",
"Connecticut Jewish Ledger", // same as Jewish Ledger
"Connecticut Post", // canonical; same as ct post
"Copenhagen Post",
"Cork Independent (newspaper)", // canonical
"Cornell Chronicle",
"Corpus Christi Caller-Times",
"Correio da manha", // same as Correio da Manhã
"Correio da Manhã", // canonical
"Correio da Manhã (Brazil)", // not same as Correio da Manhã
"Correio Popular",
"Corriere della Sera",
"Corvallis Gazette-Times",
"Cotidianul",
"Cottage Grove Sentinel",
"courant.com", // same as Hartford Courant
"Courier News (New Jersey)",
"Courier Times",
"Courier-Post",
"couriermail.com.au", // same as The Courier-Mail
"Coventry Evening Telegraph", // same as Coventry Telegraph
"Coventry Telegraph", // canonical
"Craven Herald & Pioneer",
"Creston News Advertiser",
"Crewe Chronicle",
"CT Post",
"Culpeper Star-Exponent",
"Cumberland Times-News",
"Cyprus Mail",
"D N A - Daily News And Analysis", // same as Daily News and Analysis
"Dagbladet",
"Dagbladet Arbejderen",
"Dagens Industri",
"Dagens Nyheter",
"Dagsavisen",
"Daily American",
"Daily Beacon",
"Daily Bhaskar", // same as Dainik Bhaskar
"Daily Breeze",
"Daily Bruin",
"Daily Camera", // canonical
"Daily Collegian",
"Daily Commercial",
"Daily Democrat",
"Daily Dispatch",
"Daily Echo",
"Daily Emerald", // canonical
"Daily Express", // canonical (UK)
"Daily Express (Malaysia)", // canonical
"Daily Express (Sabah)", // same as Daily Express (Malaysia)
"Daily Freeman", // canonical
"Daily Gazette",
"Daily Hampshire Gazette",
"Daily Hankook", // same as Hankook Ilbo
"Daily Herald (Arlington Heights)",
"Daily Herald (Arlington Heights, Illinois)", // canonical
"Daily Herald (Arlington Heights, Illinois newspaper)",
"Daily Herald (Utah)",
"Daily Hive",
"Daily Independent", // same as The Daily Independent (Lagos newspaper)
"Daily Intelligencer",
"Daily Journal", // dab
"Daily Local News",
"Daily Mail", // canonical
"Daily Mail Australia", // same as MailOnline
"Daily Mail UK",
"Daily Manab Zamin", // same as Manab Zamin
"Daily Maverick",
"Daily Mercury",
"Daily Mirror", // canonical
"Daily Monitor",
"Daily Nation",
"Daily News", // many ...
"Daily News and Analysis", // canonical
"Daily News Egypt",
"Daily News of Los Angeles", // same as Los Angeles Daily News
"Daily News (Harare)",
"Daily News (New York)",
"Daily News (Sri Lanka)",
"Daily News (Tanzania)",
"Daily News & Analysis", // same as Daily News and Analysis
"Daily Nexus",
"Daily NK",
"Daily Non Pareil", // same as The Daily Nonpareil
"Daily Nonpareil",
"Daily Pakistan",
"Daily Pilot",
"Daily Post (London newspaper)",
"Daily Post (Nigeria)",
"Daily Post (Vanuatu)", // same as Vanuatu Daily Post
"Daily Press (Virginia)",
"Daily Racing Form",
"Daily Record", // sia
"Daily Record (Morristown)",
"Daily Record (Scotland)",
"Daily Record (Washington)", // canonical
"Daily Republic",
"Daily Sabah", // not same as Sabah
"Daily Southtown",
"Daily Star (British newspaper)", // same as Daily Star (United Kingdom)
"Daily Star (United Kingdom)", // canonical
"Daily Sun",
"Daily Times", // dab
"Daily Times (Pakistan)", // canonical
"Daily Tribune (Philippines)", // canonical
"Daily Trojan",
"Daily Vanguard",
"DailyMail",
"dailymail.co.uk", // same as Daily Mail
"Dainik Bhaskar", // canonical
"Dainik Jagran",
"Dallas Business Journal",
"Dallas News", // same as The Dallas Morning News
"Dallas Observer",
"Dallas Voice",
"Danas (newspaper)",
"Darlington & Stockton Times",
"Dawn (newspaper)",
"dawn.com",
"Dayton Business Journal",
"Dayton Daily News",
"Daytona Beach Morning Journal", // same as The Daytona Beach News-Journal
"De Gelderlander",
"De Standaard",
"De Stentor",
"De Telegraaf", // canonical
"De Tijd (Netherlands)",
"de Volkskrant",
"De Volkskrant",
"Deccan Chronicle",
"Deccan Herald",
"Del Rio News-Herald",
"Delaware News-Journal", // same as The News Journal
"Democrat and Chronicle", // canonical; same as Rochester Democrat & Chronicle
"Democrat & Chronicle",
"Denton Record-Chronicle",
"Denver Business Journal",
"Der Bund",
"Der Funke",
"Der Landbote",
"Der Standard",
"Der Tagesspiegel",
"Derby Evening Telegraph", // same as Derby Telegraph
"Derby Telegraph", // canonical
"Derry Journal",
"Deseret Morning News", // same as Deseret News
"Deseret News", // canonical
"deseretnews.com",
"Desi Xpress",
"Detroit Free Press", // canonical
"Detroit Metro Times", // same as Metro Times
"Dhaka Tribune",
"Diario AS",
"Diario Co Latino",
"Diario de Cadiz",
"Diario de Cádiz", // canonical
"Diario de Centro América",
"Diario Marca", // same as Marca
"Diario Popular",
"Diario (Aruba)",
"Diário de Notícias",
"die Tageszeitung",
"Die Tageszeitung", // canonical
"Die Welt",
"Die Zeit", // canonical
"Digby Courier",
"Dinamalar",
"DNA India", // same as Daily News and Analysis
"DNA (newspaper)", // same as Daily News and Analysis
"dnaindia.com", // same as Daily News and Analysis
"DNAinfo", // canonical
"DNAinfo.com",
"Dneven Trud", // same as Trud (Bulgarian newspaper)
"Dodge City Daily Globe",
"Dominican Today",
"Doncaster Free Press",
"Donegal Democrat",
"Donegal News",
"Dorset Echo",
"Dover Post",
"Drammens Tidende",
"Dudley News",
"Duluth News Tribune",
"Dundalk Democrat",
"Dunfermline Press",
"Durham Herald-Sun", // same as The Herald-Sun (Durham, North Carolina)
"D'Lëtzebuerger Land",
"Eagle Tribune", // same as The Eagle-Tribune
"East Bay Business Times",
"East Bay Express",
"East Bay Times",
"East Valley Tribune",
"Eastern Daily Press",
"Eau Claire Leader-Telegram", // canonical
"Ebela",
"economist.com", // same as The Economist
"Edinburgh Evening News", // canonical
"Edinburgh News", // same as Edinburgh Evening News
"Edmonton Journal",
"Edmonton Sun",
"EDP24", // same as Eastern Daily Press
"Education Week",
"Edwardsville Intelligencer",
"Effingham Daily News",
"Ei Samay", // same as Ei Samay Sangbadpatra
"Ei Samay Sangbadpatra", // canonical
"Ekstra Bladet",
"El Comercio", // dab
"El Comercio Perú",
"El Comercio (Peru)", // canonical
"El Confidencial",
"El Día (Chile)",
"El Día (La Plata)",
"El Diario de Hoy",
"El Economista",
"El Español",
"El Espectador",
"El Heraldo", // dab
"El Informador (Mexico)",
"El Intransigente",
"El Mercurio",
"El Moudjahid",
"El Mundo", // there are several of these
"El Mundo Deportivo", // same as Mundo Deportivo
"El Mundo (Spain)",
"El Norte de Castilla",
"El Nuevo Diario",
"El Nuevo Día",
"El Observador (Uruguay)",
"El Pais", // same as El País
"El País", // canonical
"El País (Cali)",
"El País (Uruguay)",
"El Paso Times",
"El Periódico", // dab
"El Peruano",
"El Porvenir",
"El Porvenir (newspaper)", // canonical
"El Siglo de Torreón",
"El Tiempo", // dab page
"El Tiempo (Colombia)",
"El Universal", // several
"El Universal (Caracas)",
"El Universal (Mexico)", // same as El Universal (Mexico City)
"El Universal (Mexico City)", // canonical
"El Universo",
"El Vocero",
"El Watan",
"eldiario.es",
"Eleftherotypia",
"Ellensburg Daily Record", // same as Daily Record (Washington)
"Emirates Business 24/7",
"Emporia Gazette",
"Erie Reader",
"Erie Times-News",
"Essex Chronicle",
"Estadão", // same as O Estado de S. Paulo
"Eugene Register-Guard", // same as The Register-Guard
"Eugene Weekly",
"Evangelicals Now",
"Evenimentul Zilei",
"Evening Chronicle", // canonical
"Evening Courier", // same as Champaign-Urbana Courier
"Evening Echo", // same as The Echo (Cork newspaper)
"Evening Gazette (Teesside)", // same as Teesside Gazette
"Evening Herald", // same as The Herald (Ireland)
"Evening Independent",
"Evening Press",
"Evening Standard",
"Evening Star (Ipswich)", // same as Ipswich Star
"Evening Sun", // dab
"Evening Telegraph", // dab
"Evening Telegraph (Dundee)",
"Evening Times",
"Evansville Courier & Press",
"Evrensel",
"Excélsior",
"Expansión (Spain)",
"Expansión (Spanish newspaper)", // canonical
"Express and Star", // same as Express & Star
"Express Buzz", // same as The New Indian Express
"Express India", // same as The Indian Express
"Express & Star", // canonical
"Expressen",
"Expresso (newspaper)", // canonical
"Expresso (Portuguese newspaper)", // same as Expresso (newspaper)
"Eye Weekly",
"Fairbanks Daily News-Miner",
"Faro de Vigo",
"Fast Forward Weekly", // canonical
"FFWD Weekly", // same as Fast Forward Weekly
"Fiji Times",
"Financial Express", // dab
"Financial News & Daily Record", // same as Jacksonville Daily Record
"Financial Post",
"Financial Times",
"Financial Tribune",
"Florida Catholic",
"Florida Keys Keynoter",
"Florida Times-Union",
"Florida Today",
"Focus Online", // same as Focus (German magazine)
"Folha de S. Paulo",
"Folha de S.Paulo", // canonical
"Fort Collins Coloradoan",
"Fort Lauderdale Sun Sentinel", // same as Sun-Sentinel
"Fort Saskatchewan Record",
"Fort Wayne Journal Gazette", // same as The Journal Gazette
"Fort Worth Star-Telegram", // canonical
"Fort Worth Weekly",
"Foster's Daily Democrat",
"France-Guyane",
"Frankfurter Allgemeine",
"Frankfurter Allgemeine Zeitung", // canonical
"Frankfurter Rundschau",
"Fraser Coast Chronicle",
"Frederick News-Post",
"Free Press Houston",
"Friesch Dagblad",
"Fyens Stiftstidende", // canonical
"fyens.dk", // same as Fyens Stiftstidende
"Gainesville Times", // dab
"Galveston County Daily News", // same as The Daily News (Texas)
"Galveston Daily News", // same as The Daily News (Texas)
"Galway Advertiser",
"Galway Independent", // same as Cork Independent (newspaper)
"Ganashakti",
"Gazet van Antwerpen",
"Gazeta Pomorska",
"Gazeta Stoleczna", // same as Gazeta Wyborcza
"Gazeta Stołeczna", // same as Gazeta Wyborcza
"Gazeta Wyborcza",
"Gazette and Herald",
"Gazette Times", // same as Corvallis Gazette-Times
"Gazette & Observer", // canonical
"Gazette-Times",
"Gazzetta di Parma",
"Gândul",
"Geelong Advertiser",
"Georgia Today",
"Ghanaian Times",
"Giornale di Brescia",
"Glasgow Daily Times",
"Glasgow Herald", // same as The Herald (Glasgow)
"global times",
"Global Times", // canonical
"Globe Gazette",
"Globes",
"Gloucester Citizen",
"Gloucester County Times",
"Gloucester Daily Times", // canonical
"Gloucester Times", // same as Gloucester Daily Times
"Gloucestershire Echo",
"Gold Coast Bulletin",
"Goulburn Evening Penny Post", // canonical
"Goulburn Post", // same as Goulburn Evening Penny Post
"Go-Set",
"Grand Forks Herald",
"Green Bay Press-Gazette",
"Green Left Weekly",
"Grimsby Telegraph",
"Guangzhou Daily",
"Guardian Australia", // canonical
"Guardian (newspaper)", // same as The Guardian
"Guelph Mercury",
"Gulf Daily News",
"Gulf News",
"Gulf Times",
"Gulf Today",
"Gulfnews", // same as Gulf News
"gulfnews.com",
"Gulfnews.com",
"Guyana Chronicle",
"Gwinnett Daily Post",
"Göteborgs-Posten",
"Haaretz", // canonical
"Habertürk",
"Hainan Daily", // canonical
"Halifax Chronicle Herald", // same as The Chronicle Herald
"Hamar Arbeiderblad",
"Hamburger Abendblatt",
"Hammond Times", // same as The Times of Northwest Indiana
"Handelsblatt",
"Hankook Ilbo", // canonical
"Harrisburg Patriot-News", // same as The Patriot-News
"Hartford Courant", // canonical
"Hartlepool Mail",
"Harvard Law Record",
"Hattiesburg American",
"Haveeru Daily",
"Hawaii Tribune-Herald",
"Hays Daily News",
"Ha'aretz", // same as Haaretz
"Helsingin Sanomat",
"Helsinki Times",
"Henderson Gleaner",
"Herald Dispatch", // same as The Herald-Dispatch
"Herald of Randolph", // canonical
"Herald Scotland", // same as The Herald (Glasgow)
"Herald Sun", // canonical
"Herald & Review",
"Heraldo de Aragón",
"Herald-Citizen",
"Hereford Times",
"Het Belang van Limburg",
"Het Laatste Nieuws",
"Het Nieuwsblad",
"Het Parool",
"Hidrocálido",
"Hindustan Times",
"HiNews", // same as Hainan Daily
"hln.be", // same as Het Laatste Nieuws
"Hofstra Chronicle",
"Holland Evening Sentinel", // same as The Holland Sentinel
"Home News Tribune",
"Homer News",
"Honolulu Star-Advertiser",
"Honolulu Star-Bulletin",
"Honolulu Weekly",
"Hospodářské noviny",
"Houma Today", // same as The Houma Courier
"Hour Community", // canonical
"Hour (magazine)", // same as Hour Community
"Houston Business Journal",
"Houston Chronicle",
"Houston Defender",
"Houston Press",
"Huddersfield Daily Examiner",
"Human Events",
"Hurriyet Daily News", // same as Hürriyet Daily News
"Hurriyet Daily News and Economic Review", // same as Hürriyet Daily News
"Hürriyet", // canonical; not same as Hürriyet Daily News
"Hürriyet Daily News", // canonical
"Hyde Park Herald",
"IB Times", // same as International Business Times
"IBTimes",
"IceNews",
"Idaho Statesman",
"Il Gazzettino",
"Il Giornale",
"Il Messaggero",
"Il Sole 24 Ore",
"Il Tempo",
"Il Tirreno",
"Ilgan Sports",
"Ilkley Gazette", // same as Gazette & Observer
"Illawarra Mercury",
"Iltalehti",
"Ilta-Sanomat",
"Independent Online", // same as The Independent
"Independent Record",
"Independent Weekly", // same as
"independent.co.uk", // same as The Independent
"India Abroad",
"India Currents",
"Indiana Daily Student",
"Indianapolis Business Journal",
"Indianapolis Recorder",
"India Tribune",
"Indiana Gazette",
"Indy Week", // canonical
"Indystar.com", // same as The Indianapolis Star
"Information Times",
"Ingeniøren",
"Inland Valley Daily Bulletin",
"Inquirer Bandera",
"Inquirer.net", // same as Philippine Daily Inquirer
"Interaksyon", // same as The Philippine Star
"InterAksyon", // same as The Philippine Star
"International Business Times", // canonical
"Iowa City Press Citizen",
"Iowa City Press-Citizen", // canonical
"Iowa State Daily",
"Ipswich Star", // canonical
"Iran Daily", // same as Iran (newspaper)
"Iran (newspaper)", // canonical
"Irish Daily Mail",
"Irish Examiner",
"Irish Independent",
"Irish Sun", // same as The Sun (United Kingdom)
"Ironton Tribune",
"Ironwood Daily Globe",
"Island Sun",
"Isthmus (newspaper)",
"Izvestia", // canonical
"Izvestiya",
"Jackson Citizen Patriot",
"Jacksonville Business Journal",
"Jacksonville Daily Journal", // same as Journal-Courier
"Jacksonville Daily Record", // canonical
"Jakarta Globe",
"Jamaican Observer", // same as The Jamaica Observer
"JANJAN",
"Japan Today",
"Jersey Evening Post",
"Jewish Business News",
"Jewish Ledger", // canonical
"Jewish News of Greater Phoenix",
"Jewish Post of New York", // same as
"Jewish Standard",
"Jewish Telegraph",
"Jewish Voice",
"JoongAng Daily", // same as Korea JoongAng Daily
"JoongAng Ilbo", // canonical
"Joplin Globe",
"Jornal de Notícias",
"Journal des débats",
"Journal Star (Peoria)", // canonical
"JournalNews", // same as Journal-News
"Journal-Courier", // canonical
"Journal-News", // canonical
"Journal-News Pulse", // canonical
"jpost.com", // same as The Jerusalem Post
"Juneau Empire",
"Jurnalul", // same as Jurnalul Național
"Jurnalul Național", // canonical
"Jutarnji list",
"Jyllands-Posten",
"j.",
"J.",
"J. The Jewish News of Northern California", // canonical
"J Weekly",
"Ka Leo O Hawaii",
"Kalamazoo Gazette",
"Kamloops Daily News",
"Kansas City Business Journal",
"Kansas City Journal-Post",
"Kansas.com", // same as The Wichita Eagle
"Kashmir Times",
"Kathimerini",
"Kearny Hub", // misspelling
"Kearney Hub", // canonical
"Keighley News",
"Kentucky New Era",
"Kerala Kaumudi",
"Keskisuomalainen",
"Khaleej Times",
"Khao Sod",
"Kilburn Times",
"Kilkenny People",
"Kingston Daily Freeman", // same as Daily Freeman
"Kingston Whig-Standard",
"Kirksville Daily Express",
"Kitsap Sun",
"Kjarninn",
"Klassa",
"Knoxville News Sentinel",
"Kobe Shimbun",
"Koha Ditore",
"Koha Jone",
"Koha Jonë", // canonical
"Kokomo Tribune",
"Komanda (newspaper)",
"Kommersant", // canonical
"Kompas",
"Komsomolskaja Pravda", // same as Komsomolskaya Pravda
"Komsomolskaya Pravda", // canonical
"Korea Daily", // same as JoongAng Ilbo
"Korea Economic Daily",
"Korea JoongAng Daily", // canonical
"Korea Times",
"Korzár",
"Kosmo!",
"Krasnaya Zvezda",
"Kristeligt Dagblad",
"Kristianstadsbladet",
"Kurier",
"Kyiv Post",
"LA Times",
"LA Weekly", // canonical
"La Capital",
"La Capitale",
"La Crosse Tribune",
"La Crónica de Hoy",
"La Cuarta",
"La Dernière Heure",
"La Dépêche du Midi",
"La Gazzetta del Mezzogiorno",
"La Gazzetta dello Sport",
"La Grande Observer", // same as The Observer (La Grande)
"La Jornada",
"La Libre", // same as La Libre Belgique
"La Libre Belgique", // canonical
"La Nacion", // same as La Nación
"La Nación", // canonical
"La Nación (Buenos Aires)", // same as La Nación
"La Nación (Chile)",
"La Nación (San José)",
"La Nazione",
"La Nueva España",
"La Nouvelle République du Centre-Ouest",
"La Nuova Sardegna",
"La Opinión",
"La Prensa", // dab
"La Prensa (Managua)",
"La Prensa (Panama City)",
"La Prensa Gráfica",
"La Presse", // dab
"La Presse (Canadian newspaper)",
"La Razón", // dab
"La Razón (Madrid)",
"la Repubblica",
"La Repubblica", // canonical
"La Republica",
"La República", // canonical
"La Segunda",
"La Stampa", // canonical
"La Tercera",
"La Tribuna",
"La Tribune",
"La Vanguardia",
"La Vie éco",
"La Vie Éco", // canonical
"La Voix du Nord (daily)",
"La Voz de Galicia",
"La Voz de Michoacán",
"La Voz del Interior",
"Lafayette Daily Advertiser", // same as The Daily Advertiser (Lafayette, Louisiana)
"LaGrange Daily News",
"Lakeland Ledger",
"Lancashire Evening Post",
"Lancashire Evening Telegraph", // same as Lancashire Telegraph
"Lancashire Telegraph", // canonical
"Laredo Morning Times", // canonical
"Las Provincias",
"Las Vegas Mercury",
"Las Vegas Review Journal",
"Las Vegas Review-Journal", // canonical
"Las Vegas Sun",
"Las Vegas Weekly",
"LATIMES", // same as Los Angeles Times
"Latimes.com",
"Lawrence Journal-World",
"Le Courrier du Vietnam",
"Le Devoir",
"Le Figaro",
"Le Jeudi",
"Le Journal de Montréal",
"Le Journal de Québec",
"Le Matin", // dab
"Le Matin du Sahara et du Maghreb",
"Le Monde",
"Le Parisien",
"Le Progres",
"Le Progrès", // canonical
"Le Républicain Lorrain",
"Le Soir",
"Le Soir d'Algérie",
"Le Soleil", // dab
"Le Soleil (Quebec)",
"Le Temps",
"Le Télégramme",
"Leadership (newspaper)", // canonical
"Leadership (Nigeria)", // same as Leadership (newspaper)
"Leader-Post", // same as Regina Leader-Post
"Leader-Telegram", // same as Eau Claire Leader-Telegram
"Leamington Courier",
"Leavenworth Times",
"Lebanon Express",
"Ledger-Enquirer", // canonical
"Leeuwarder Courant",
"Legislative Gazette",
"Leicester Mercury",
"Leidsch Dagblad",
"Les Affaires",
"Les Echos", // dab
"Lethbridge Herald",
"Lewiston Journal", // same as Sun Journal (Lewiston, Maine)
"Lewiston Morning Tribune",
"Lexington Herald Leader",
"Lexington Herald-Leader", // canonical
"Lianhe Zaobao",
"Liberian Daily Observer", // same as Liberian Observer
"Liberian Observer",
"Libertatea",
"Liberté (Algeria)",
"Libya Herald",
"Libération",
"Lidové noviny",
"Lietuvos žinios",
"Lincoln Journal Star", // canonical
"Lincoln Journal-Star",
"Lincolnshire Echo",
"Live Mint", // same as Mint (newspaper)
"LiveMint",
"livemint.com",
"Liverpool Daily Post",
"Liverpool Echo", // canonical
"liverpool Echo",
"Llanelli Star",
"Lloyd's Weekly Newspaper",
"Lockport Union-Sun & Journal",
"Lodi News-Sentinel",
"Lompoc Record",
"London Evening Standard",
"London Free Press",
"London Informer",
"Londonderry Sentinel",
"Long Beach Press Telegram", // same as Press-Telegram
"Long Beach Press-Telegram",
"Long Island Business News",
"Long Island Newsday", // same as Newsday
"Long Island Press",
"Longview News-Journal",
"Los Andes",
"Los Andes (Argentine newspaper)", // canonical
"Los Angeles Business Journal",
"Los Angeles Daily News",
"Los Angeles Examiner",
"Los Angeles Herald Examiner", // canonical
"Los Angeles Herald-Examiner",
"Los Angeles Times", // canonical
"Los Angeles Weekly", // same as LA Weekly
"Los Tiempos",
"Louisville Business First",
"Lubbock Avalanche Journal",
"Lubbock Avalanche-Journal", // canonical
"Luxemburger Wort",
"Luzerner Zeitung",
"Lytham St Annes Express",
"L. A. Times",
"L.A. Times",
"L'Humanité",
"L'Est Républicain",
"L'Express (Mauritius)",
"L'Osservatore Romano",
"L'Unione Sarda",
"L'Unità",
"L'Équipe",
"Macau Daily Times",
"Macclesfield Express",
"Madhyamam",
"Madhyamam Daily", // canonical
"Madras Musings",
"Maeil Business Newspaper",
"Maharashtra Times",
"Maidenhead Advertiser",
"Mail and Guardian",
"Mail Online", // same as Daily Mail
"Mail Tribune",
"Mail & Guardian", // canonical
"Mail & Guardian",
"MailOnline", // same as Daily Mail; canonical for Daily Mail Australia
"Mainichi Shimbun", // canonical
"Makedonia (newspaper)",
"Malay Mail", // canonical
"Malaya Business Insight",
"Malayala Manorama", // canonical
"Malaysia Nanban",
"Malaysia Sun",
"Malta Today",
"Manab Zamin", // canonical
"Manager Daily",
"Manchester Evening News",
"Mangalore Today",
"Manila Bulletin",
"Manila Standard", // canonical
"Manila Standard Today",
"Manitou Messenger",
"Manly Daily",
"Manorama Online", // same as Malayala Manorama
"Mansfield News Journal",
"Marca",
"Marca (newspaper)", // canonical
"Marianas Variety News & Views", // same as Marianas Variety
"Marianas Variety", // canonical
"Marin Independent Journal",
"Marine Corps Times",
"Maryland Gazette",
"Mason City Globe-Gazette", // same as
"Mathrubhumi",
"Mat-Su Valley Frontiersman",
"McAlester News-Capital",
"Memphis Business Journal",
"Memphis Commercial Appeal", // same as The Commercial Appeal
"Merced Sun Star",
"Merced Sun-Star", // canonical
"Mercury (newspaper)", // dab
"Meriden Record", // same as Record-Journal
"merkur.de", // same as Münchner Merkur
"Mesabi Daily News",
"Methodist Recorder",
"Metro Pulse",
"Metro Silicon Valley",
"Metro Times",
"Metro (Associated Metro Limited)", // same as Metro (British newspaper)
"Metro (British newspaper)", // canonical
"Metropolitan News-Enterprise",
"Miami Herald", // canonical
"Miami New Times",
"Miami News Record", // canonical
"MiamiHerald.com", // same as Miami Herald
"Mid Day", // canonical
"MiD DAY",
"Mid Sussex Times",
"Midi Libre",
"Midi olympique",
"Midi Olympique", // canonical
"Midland Reporter-Telegram",
"Mid-Day",
"Milenio", // canonical
"Milenio Diario",
"Mill Valley Herald",
"Milliyet",
"Milton Keynes Citizen",
"Milwaukee Journal Sentinel", // canonical
"Milwaukee Journal-Sentinel", // same as Milwaukee Journal Sentinel
"Milwaukee Sentinel",
"MindaNews",
"Minden Press-Herald",
"Minneapolis Star Tribune", // same as Star Tribune
"Minneapolis Star-Tribune", // same as Star Tribune
"Minneapolis / St. Paul Business Journal",
"Minnesota Daily",
"MinnPost",
"Mint (newspaper)", // canonical
"Mississippi Business Journal",
"Missoula Independent",
"Missoulian",
"Moberly Monitor-Index",
"Mobile Press-Register", // same as Press-Register
"Molalla Pioneer",
"Monadnock Ledger-Transcript",
"Money Marketing",
"Monroe News Star", // same as The News-Star
"Monterey County Weekly",
"Montgomery Advertiser",
"Monterey Herald", // same as The Monterey County Herald
"Montreal Gazette", // canonical
"Montreal Mirror",
"montrealgazette.com", // same as Montreal Gazette
"Morgunblaðið",
"Morning Star", // dab
"Morning Star (British newspaper)",
"Morocco Times",
"Moscow-Pullman Daily News",
"Motion Picture Herald",
"Motor Cycle News", // canonical
"Motorcycle News", // same as Motor Cycle News
"Mound City News",
"Mount Pleasant Tribune",
"Mountain View Voice",
"Mumbai Mirror", // canonical
"Mundo Deportivo", // canonical
"Muscat Daily",
"Muscatine Journal",
"Muskegon Chronicle",
"Muskogee Phoenix",
"Myanmar Times",
"Märkische Allgemeine",
"Münchner Merkur", // canonical
"Nagaland Post",
"Namibian Sun",
"Nanaimo Daily News",
"Nanfang Daily",
"Napa Valley Register",
"Naples Daily News",
"Narodne novine",
"Nashua Telegraph", // same as The Telegraph (Nashua)
"Nashville City Paper", // same as The City Paper
"NASHVILLE SCENE",
"Nashville Scene", // canonical
"Nasz Dziennik",
"Nation News",
"National Business Review",
"National Catholic Register",
"National Catholic Reporter",
"National Network (newspaper)", // canonical
"National Post", // canonical
"Nationalnetworkonline.com", // same as National Network (newspaper)
"Navajo Times",
"Navbharat Times",
"Navshakti",
"Navy Times",
"Nederlands Dagblad",
"Nemzeti Sport",
"Neue Zürcher Zeitung",
"Nevada State Journal", // same as Reno Gazette-Journal
"New Age (Bangladesh)",
"New Braunfels Herald-Zeitung",
"New Europe (newspaper)",
"New Jersey Jewish News", // canonical
"New Light of Myanmar",
"New Hampshire Business Review",
"New Hampshire Union Leader",
"New Haven Register",
"New Jersey Herald",
"New Mexico Business Weekly",
"New Orleans Times Picayune",
"New Orleans Times-Picayune", // same as The Times-Picayune/The New Orleans Advocate
"New Straits Times",
"New Vision",
"New Vision (newspaper)", // canonical
"New York Age",
"New York Daily News", // canonical
"NEW YORK DAILY NEWS",
"New York Law Journal",
"New York Post",
"New York Press",
"New York World",
"Newcastle Evening Chronicle", // same as Evening Chronicle
"Newark Star-Ledger", // same as The Star-Ledger
"News and Star",
"News Democrat & Leader", // same as News-Democrat & Leader
"News of the World",
"News & Record",
"News-Democrat & Leader", // canonical
"News-Herald", // dab
"News-Register (McMinnville)",
"Newsday", // canonical
"NewsDay (Zimbabwean newspaper)",
"Newsday.com", // same as Newsday
"News-Tribune", // same as The News Tribune
"New-York Tribune",
"Nezavisimaya Gazeta",
"Nhan Dan", // same as Nhân Dân
"Nhân Dân", // canonical
"Niagara Falls Review",
"Niagara Gazette",
"Nice-Matin",
"Nichi Bei Times",
"Nieuwe Tilburgsche Courant", // same as Brabants Dagblad
"Nigerian Tribune",
"Nihon Keizai Shimbun", // same as The Nikkei
"Nikkan Sports",
"Nikkei Asian Review", // same as The Nikkei
"Nine O'Clock",
"Niva (newspaper)",
"No Cut News",
"Nogales International",
"Norrköpings Tidningar",
"Norrländska Socialdemokraten",
"North County News", // same as The Baltimore Sun
"North County Times",
"North Texas Daily",
"North Wales Daily Post",
"Northampton Chronicle & Echo",
"Northampton Mercury",
"Northeast Times",
"Northern Life (newspaper)",
"Northern News",
"Northern Star (newspaper of the Society of United Irishmen)",
"Northern Territory News",
"Northumberland Gazette",
"Northwest Arkansas Democrat-Gazette", // same as Arkansas Democrat-Gazette
"Northwest Herald",
"Norwalk Reflector",
"Norwich Bulletin", // same as The Bulletin (Norwich)
"Noticel",
"NotiCel", // canonical
"Nottingham Evening Post", // same as Nottingham Post
"Nottingham Post", // canonical
"Nouse",
"Now Toronto", // same as Now (newspaper)
"Now (newspaper)", // canonical
"NOLA.com", // same as The Times-Picayune/The New Orleans Advocate
"NRC Handelsblad",
"Nugget Newspaper",
"Nunatsiaq News",
"Nuneaton News",
"NY Daily News", // same as New York Daily News
"NY Times",
"NYDailyNews", // same as New York Daily News
"NYDailyNews.com",
"nydailynews.com",
"nytimes.com",
"NYTimes.com",
"NZ Herald News", // same as The New Zealand Herald
"O Clarim",
"O Estado de S. Paulo", // canonical
"O Globo",
"O Jogo",
"Oakland Tribune",
"Oberhessische Presse",
"Observer.com", // same as The New York Observer
"Observer-Dispatch",
"Observer-Reporter",
"Observer–Reporter", // canonical (ndash)
"OC Register", // same as Orange County Register
"OC Weekly",
"Ocala Star-Banner", // same as Star-Banner
"Odessa American",
"Official Gazette of the Republic of the Philippines",
"Official Gazette (Philippines)", // canonical
"Ogden Standard-Examiner", // same as Standard-Examiner
"Olean Times Herald",
"Omaha World-Herald",
"Oman Daily Observer",
"Orange County Business Journal",
"Orange County Register", // canonical
"Oregon Daily Emerald", // same as Daily Emerald
"Orlando Business Journal",
"Orlando Sentinel",
"Orlando Weekly",
"Oshkosh Northwestern",
"Otago Daily Times",
"Otago Witness",
"Ottawa Citizen", // canonical
"Ottawa Journal",
"Ottawa Sun",
"Ottawa XPress",
"Ottumwa Courier",
"Ouachita Citizen",
"Ouest-France",
"OutInPerth",
"Oxford Mail",
"Oxnard Press-Courier",
"Õhtuleht",
"Pacific Appeal",
"Pacific Business News",
"Pacific Daily News",
"Paisley Daily Express",
"Pakistan Daily Times", // same as Daily Times (Pakistan)
"Pakistan Observer",
"Pakistan Times",
"Pakistan Today",
"Palisadian Post",
"Palisadian-Post", // canonical
"Palm Beach Daily News",
"Palo Alto Daily Post",
"Palo Alto Weekly",
"Park Record",
"Pasadena Star-News",
"Penarth Times",
"Pennsylvania Gazette",
"Pensacola News Journal",
"People's Court Daily",
"People's Daily",
"People's Journal", // canonical
"People's Journal (newspaper)",
"People's Liberation Army Daily", // canonical
"Peoria Journal Star", // same as Journal Star (Peoria)
"Peoria Journal-Star",
"Perth Now", // same as The Sunday Times (Western Australia)
"PerthNow",
"Peterborough Examiner",
"Peterborough Telegraph", // canonical
"Peterborough Today", // same as Peterborough Telegraph
"Philadelphia Bulletin",
"Philadelphia Business Journal",
"Philadelphia City Paper",
"Philadelphia Daily News",
"Philadelphia Tribune",
"Philadelphia Weekly",
"Philippine Inquirer", // same as Philippine Daily Inquirer
"Philippine Daily Inquirer", // canonical
"Philstar", // same as The Philippine Star
"PhilStar",
"Phoenix Business Journal",
"Picayune Item",
"Pilipino Star Ngayon",
"Pine Rivers Press",
"Pink News",
"PinkNews", // canonical
"Pittsburgh Business Times",
"Pittsburgh City Paper",
"Pittsburg Morning Sun", // same as The Morning Sun (Pittsburg)
"Pittsburgh Post Gazette",
"Pittsburgh Post-Gazette", // canonical
"Pittsburgh Press",
"Pittsburgh Tribune-Review",
"PLA Daily", // same as People's Liberation Army Daily
"Point Pleasant Register",
"Politiken",
"Por Esto!",
"Portland Business Journal",
"Portland Mercury",
"Portland Observer (Oregon)", // canonical
"Portland Press Herald",
"Portland Tribune",
"Portsmouth Daily Times",
"Portsmouth News", // same as The News (Portsmouth)
"Postimees",
"Post-Bulletin",
"Post Gazette", // same as Pittsburgh Post-Gazette
"Post-Gazette", // same as Pittsburgh Post-Gazette
"Poughkeepsie Journal",
"Prachatai",
"Pravda Severa",
"Pravda (Slovakia)",
"Premium Times",
"Prensa Libre",
"Prescott Evening Courier", // same as The Daily Courier (Arizona)
"Press and Journal", // dab
"Press Citizen", // same as Iowa City Press-Citizen
"Press Enterprise (Pennsylvania)", // not the same as The Press-Enterprise
"Press & Sun-Bulletin",
"Press-Register", // canonical
"Press-Telegram", // canonical
"Primera Hora", // dab
"Primera Hora (Guaynabo)", // same as Primera Hora (Puerto Rico)
"Primera Hora (Mexico)",
"Primera Hora (Puerto Rico)", // canonical
"Prince Albert Daily Herald",
"Prothom Alo",
"Proto Thema",
"Providence Journal",
"Public Advertiser",
"Puget Sound Business Journal",
"Pulaski News",
"Pune Mirror", // same as Mumbai Mirror
"Página/12",
"Público", // dab
"Público (Portugal)",
"Qatar Tribune",
"Quad City Times",
"Quad-City Times", // canonical
"Queens Tribune",
"Q-Notes",
"Raleigh News and Observer", // same as The News & Observer
"Raleigh News & Observer", // same as The News & Observer
"Randolph Herald", // same as Herald of Randolph
"Rapid City Journal",
"Reading Eagle",
"Record Mirror",
"Record (newspaper)",
"Record-Courier (Ohio)",
"Record-Journal",
"RedEye",
"Reformatorisch Dagblad",
"Regina Leader-Post",
"Reno Gazette-Journal", // canonical
"Reporter-Herald",
"Republican American",
"Republican-American", // canonical
"Review Journal", // same as Las Vegas Review-Journal
"reviewjournal.com", // same as Las Vegas Review-Journal
"Richmond and Twickenham Times",
"Richmond Times-Dispatch",
"River Cities' Reader",
"Riverdale Press",
"Riverfront Times",
"Riverside Press-Enterprise", // same as The Press-Enterprise
"Rochester Democrat & Chronicle", // same as Democrat & Chronicle
"Rocky Mountain News",
"Roll Call (newspaper)",
"Romanian Times",
"Rome News-Tribune",
"România Liberă",
"Roscommon Herald",
"Rossiiskaya Gazeta", // same as Rossiyskaya Gazeta
"Rossiyskaya Gazeta", // canonical
"Rotterdams Nieuwsblad", // same as Algemeen Dagblad
"Russia Beyond", // canonical
"Russia Beyond the Headlines",
"Ruston Daily Leader",
"Rutland and Stamford Mercury",
"Rutland Herald",
"Rzeczpospolita (newspaper)",
"Sabah (newspaper)", // not same as Daily Sabah
"Sacramento News & Review",
"Saint Paul Pioneer Press", // same as St. Paul Pioneer Press
"Saint Petersburg Times", // same as Tampa Bay Times
"Saipan Tribune",
"Sakaal Times",
"Sakal",
"Salisbury Journal",
"Salisbury Post",
"Salt Lake City Weekly",
"Samakal",
"Samoa Observer",
"San Angelo Standard Times",
"San Angelo Standard-Times", // canonical
"San Antonio Business Journal",
"San Antonio Current",
"San Antonio Express News",
"San Antonio Express-news", // canonical
"San Antonio Express-News",
"San Diego Business Journal",
"San Diego Daily Transcript", // canonical
"San Diego Union Tribune",
"San Francisco Business Times",
"San Francisco Chronicle", // canonical
"San Francisco Gate",
"San Gabriel Valley Tribune",
"San Jose Mercury News", // same as The Mercury News
"San Mateo County Times",
"San Mateo Daily Journal",
"Sandusky Register",
"Santa Barbara Independent",
"Santa Cruz Sentinel",
"Santa Fe Reporter",
"Santa Rosa Press Democrat", // same as The Press Democrat
"Sarasota Herald-Tribune",
"Sarnia Observer",
"Saskatoon Star-Phoenix", // same as The StarPhoenix
"Saudi Gazette",
"Savannah Morning News",
"Savon Sanomat",
"Schenectady Gazette", // same as The Daily Gazette
"Scotland on Sunday",
"Scunthorpe Telegraph",
"Seattle Post Intelligencer",
"Seattle Post-Intelligencer", // canonical
"Seattle Weekly",
"Seguin Gazette",
"Seven Days (newspaper)",
"SF Gate", // same as San Francisco Chronicle
"SF Weekly",
"SFGate", // same as San Francisco Chronicle
"sfgate.com",
"SFGate.com",
"Shanghai Daily",
"Sheffield Star",
"Shenzhen Daily",
"Shreveport Journal",
"Shropshire Star",
"Sidmouth Herald",
"Sierra Maestra (newspaper)",
"Silicon Valley Business Journal",
"Silicon Valley/San Jose Business Journal",
"Sioux City Journal",
"Sirp",
"Skånska Dagbladet",
"Slobodna Dalmacija",
"SME (newspaper)",
"smh.com.au", // same as The Sydney Morning Herald
"Smh.com.au",
"Smålandsposten",
"Socialist Worker",
"South China Morning Post",
"South Florida Business Journal",
"South Florida Sun Sentinel", // same as Sun-Sentinel
"South Florida Sun-Sentinel", // same as Sun-Sentinel
"South London Press",
"South Wales Argus",
"South Wales Evening Post", // canonical
"Southeast Missourian",
"Southern Daily Echo",
"Southern Metropolis Daily",
"Southern Weekly",
"SouthtownStar", // same as Daily Southtown
"Southwest Journal",
"Sovetsky Sport",
"Sözcü",
"Spartanburg Herald Journal",
"Spartanburg Herald-Journal", // canonical
"Spokane Chronicle", // same as The Spokesman-Review
"Spokane Daily Chronicle",
"Sportbladet",
"Sport Express",
"Sporti Shqiptar",
"Sporting Chronicle",
"Sporting Life", // dab
"Sporting Life (British newspaper)",
"Sports Donga", // redirect to The Dong-a Ilb
"Sports DongA",
"Sports Hochi",
"Sports Nippon",
"Sportske novosti",
"Springfield News-Leader",
"St Petersburg Times", // same as Tampa Bay Times
"Stabroek News",
"Stamford Advocate", // same as The Advocate (Stamford)
"Stampa Sera", // same as La Stampa
"Standard-Examiner", // canonical
"Standard-Speaker",
"Standart (newspaper)",
"Star Beacon",
"Star Tribune", // canonical
"Starnews", // same as Star-News
"Stars and Stripes",
"Stars and Stripes (newspaper)", // canonical
"StarTribune", // same as Star Tribune
"StarTribune.com", // same as Star Tribune
"Star-Banner", // canonical
"Star-Gazette",
"Star-News", // canonical
"Star-Tribune", // same as Star Tribune
"Staten Island Advance",
"Statesman Journal",
"Stltoday", // same as St. Louis Post-Dispatch
"Stltoday.com",
"Stornoway Gazette",
"Stuttgarter Zeitung",
"Style Weekly",
"St. Joseph News-Press",
"St. Louis Jewish Light",
"St. Louis Post Dispatch",
"St. Louis Post-Dispatch", // canonical
"St. Paul Globe",
"St. Paul Pioneer Press", // canonical
"St. Petersburg Times", // same as Tampa Bay Times
"Suara Pembaruan",
"Sud Quotidien",
"Sudbury Star",
"Sun Journal", // dab
"Sun Journal (Lewiston)",
"Sun Journal (Lewiston, Maine)", // canonical
"Sun Journal (New Bern)",
"Sun Sentinel", // same as Sun-Sentinel
"Sunday Express",
"Sunday Herald",
"Sunday Herald Sun", // same as Herald Sun
"Sunday Independent (Ireland)",
"Sunday Mail (Scotland)",
"Sunday Mirror",
"Sunday Mercury",
"Sunday Observer (Sri Lanka)",
"Sunday Star Times",
"Sunday Sun",
"Sunday Tribune", // same as Sunday Star-Times
"Sunday Star-Times", // canonical
"Sunday World",
"Sunderland Echo",
"suntimes.com", // same as Chicago Sun-Times
"Sun-Sentinel", // canonical
"Sun.Star",
"Sun.Star Superbalita Davao", // not same as Sun.Star
"Svenska Dagbladet",
"Swazi Observer",
"Sweet Home New Era", // same as The New Era (newspaper)
"Swindon Advertiser",
"Sydney Daily Telegraph", // same as The Daily Telegraph (Sydney)
"Sydsvenskan",
"Syracuse New Times",
"Süddeutsche Zeitung",
"Südwest Presse",
"Ta Nea",
"Tageblatt",
"Tagesspiegel", // same as Der Tagesspiegel
"Tages Anzeiger",
"Tages-Anzeiger", // canonical
"Taipei Times",
"Taiwan Journal", // canonical
"Taiwan News",
"Taiwan Today", // same as Taiwan Journal
"Tallahassee Democrat",
"Tampa Bay Business Journal",
"Tampa Bay Times", // canonical
"Tanzania Daily News", // same as Daily News (Tanzania)
"Technician (newspaper)",
"Technique (newspaper)", // canonical
"Teesside Gazette",
"Tehachapi News",
"Tehran Times",
"Telangana Today",
"Telegraaf",
"Telegraf", // dab
"Telegrafi",
"Telegram & Gazette", // canonical
"telegram.com", // same as Telegram & Gazette
"Telegraph Herald", // canonical
"Telegraph India", // same as The Telegraph (Kolkata)
"Telegraph & Argus",
"Telegraph (newspaper)", // same as The Daily Telegraph
"telegraph.co.uk", // same as The Daily Telegraph
"Telegraph-Herald", // same as Telegraph Herald
"Telegraph-Journal",
"Temple Daily Telegram",
"Texas Jewish Post",
"Thanh Nien", // same as Thanh Niên
"Thanh Nien Daily", // same as Thanh Niên
"Thanh Nien News", // same as Thanh Niên
"Thanh Niên", // canonical
"Thanhnien News",
"The Adelaide Advertiser", // same as The Advertiser (Adelaide)
"The Advertiser", // dab
"The Advertiser (Adelaide)", // canonical
"The Advocate (Baton Rouge)",
"The Advocate (Louisiana)", // canonical
"The Advocate (Stamford)", // canonical
"The Age", // canonical
"The Age (newspaper)",
"The Alabama Baptist",
"The Albany Herald",
"The Albuquerque Tribune",
"The American Israelite",
"The Amherst Student",
"The Angola Herald", // same as The Herald Republican
"The Anniston Star",
"The Aquarian",
"The Aquarian Weekly", // canonical
"The Arab American News",
"The Ardmoreite", // canonical
"The Arizona Republic",
"The Art Newspaper",
"The Asian Age",
"The Asian Today",
"The Aspen Times",
"The Atlanta Constitution", // same as The Atlanta Journal-Constitution
"The Atlanta Journal", // same as The Atlanta Journal-Constitution
"The Atlanta Journal-Constitution", // canonical
"The Augusta Chronicle",
"The Austin Chronicle",
"The Australian",
"The Australian Financial Review",
"The AV Club",
"The A.V. Club", // canonical
"The Badger Herald",
"The Ball State Daily News",
"The Baltimore Sun", // canonical
"The Banner-Press", // same as Brenham Banner-Press
"The Baton Rouge Advocate", // same as The Advocate (Louisiana)
"The Battalion",
"The Beacon Herald",
"The Beaumont Enterprise", // canonical
"The Beaver County Times", // canonical
"The Bellingham Herald",
"The Berkshire Eagle",
"The Beverly Hills Courier",
"The Big Issue",
"The Billings Gazette",
"The Birmingham News",
"The Bismarck Tribune",
"The Blade",
"The Blade (Toledo)",
"The Blade (Toledo, Ohio)", //canonical
"The Bond Buyer",
"The Border Mail",
"The Borneo Post",
"The Boston Globe", //canonical
"The Boston Journal",
"The Bradenton Herald", //canonical
"The Bradenton Times",
"The Brooklyn Daily Eagle", // same as Brooklyn Eagle
"The Brooklyn Paper",
"The Brown Daily Herald",
"The Brownsville Herald",
"The Brunei Times",
"The Bryan Times",
"The Budapest Times",
"The Buffalo News", // canonical
"The Bulletin (Bend)", // canonical
"The Bulletin (Norwich)", // canonical
"The Burlington Free Press",
"The Cairns Post",
"The California Aggie",
"The Call (Woonsocket)",
"The Camden News",
"The Canberra Times",
"The Capital Times",
"The Cardiff Times",
"The Catholic Herald",
"The Catholic Review",
"The Catholic Times (Wisconsin)",
"The Charleston Gazette",
"The Charlotte Observer", // canonical
"The Chicago Defender",
"The Chicago Maroon",
"The China Post",
"The China Press",
"The Chosun Ilbo", // canonical
"The Christian Post",
"The Christian Science Monitor", // canonical
"The Chronicle of Higher Education", // canonical
"The Chronicle Herald", // canonical
"The Chronicle Review", // same as The Chronicle of Higher Education
"The Cincinnati Enquirer",
"The Citizen (South Africa)",
"The Citizens' Voice",
"The City Paper", // canonical
"The Clarion Ledger",
"The Clarion-Ledger", // canonical
"The Cleveland Plain Dealer", // redirect to The Plain Dealer
"The Coast",
"The Coast News",
"The Collegian", // dab
"The Coloradoan", // same as
"The Columbian",
"The Columbus Dispatch",
"The Commercial Appeal", // canonical
"The Concrete Herald",
"The Connaught Telegraph",
"The Corkman",
"The Cornell Daily Sun",
"The Courier (Dundee)",
"The Courier-Journal", // canonical
"The Courier Mail",
"The Courier-Mail", // canonical
"The Coventry Telegraph",
"The Covington News",
"The Crimson", // same as The Harvard Crimson
"The Cumberland News",
"The Daily Advertiser", // dab
"The Daily Advertiser (Lafayette, Louisiana)", // canonical
"The Daily Ardmoreite", // same as The Ardmoreite
"The Daily Astorian",
"The Daily Californian",
"The Daily Collegian (Penn State)", // same as Daily Collegian
"The Daily Comet",
"The Daily Cougar",
"The Daily Courier (Arizona)", // canonical
"The Daily Eastern News",
"The Daily Edge", // same as TheJournal.ie
"The Daily Evergreen",
"The Daily Gazette", // canonical
"The Daily Gleaner",
"The Daily Graphic",
"The Daily Herald",
"The Daily Independent (Lagos)", // same as The Daily Independent (Lagos newspaper)
"The Daily Independent (Lagos newspaper)", // canonical
"The Daily Item", // dab
"The Daily Mirror (Sri Lanka)",
"The Daily Nation (Barbados)",
"The Daily Nebraskan",
"The Daily News (Halifax)",
"The Daily News (Kentucky)",
"The Daily News (Texas)", // canonical
"The Daily Nonpareil", // canonical
"The Daily Northwestern",
"The Daily Observer",
"The Daily of the University of Washington",
"The Daily Pennsylvanian",
"The Daily Post", // dab
"The Daily Press", // dab
"The Daily Princetonian",
"The Daily Progress",
"The Daily Reflector",
"The Daily Reveille",
"The Daily Star", // one of several
"The Daily Star (Bangladesh)",
"The Daily Star (Lebanon)",
"The Daily Tarheel",
"The Daily Tar Heel", // canonical
"The Daily Telegraph", // London
"The Daily Telegraph (Sydney)", // canonical
"The Daily Telegraph#Website", // same as The Daily Telegraph
"The Daily Times (Salisbury)",
"The Daily Times (Salisbury, Maryland)", // canonical
"The Daily Titan",
"The Daily Texan",
"The Daily Toreador",
"The Daily Transcript", // same as San Diego Daily Transcript
"The Daily Tribune", // same as Daily Tribune (Philippines)
"The Daily Yomiuri",
"The Dallas Morning News", // canonical
"The Dalles Chronicle",
"The Dartmouth",
"The Day (New London)",
"The Daytona Beach News-Journal", // canonical
"The Decatur Daily",
"The Denver Post",
"The Des Moines Register",
"The Desert Sun",
"The Detroit Jewish News",
"The Detroit News", // canonical
"The Dispatch", // dab
"The Dispatch (Lexington)",
"The Dominion Post (Wellington)",
"The Dong-a Ilbo",
"The Drawbridge",
"The Durango Herald",
"The Durant Daily Democrat",
"The Eagle-Tribune",
"The East African", // same as The EastAfrican
"The East Hampton Star",
"The EastAfrican", // canonical
"The Eastern Wake News",
"The Echo (Cork newspaper)", // canonical
"The Economic Times",
"The Economist", // canonical
"The Edge (Malaysia)",
"The Emory Wheel",
"The Enterprise (Brockton)",
"The Epoch Times",
"The Eugene Daily Guard", // same as The Register-Guard
"The Eugene Guard", // same as The Register-Guard
"The Evening Post (New Zealand)",
"The Examiner (Independence)", // same as The Examiner (Missouri)
"The Examiner (Missouri)", // canonical
"The Examiner (Tasmania)",
"The Exeter News-Letter",
"The Exonian",
"The Express Tribune",
"The Express-Times",
"The Faster Times",
"The Fayetteville Observer",
"The Financial Express", // same as dab Financial Express
"The Financial Express (Bangladesh)",
"The Financial Express (India)",
"The Flint Journal",
"The Forum of Fargo-Moorhead",
"The Free Lance-Star",
"The Free Lance–Star", // canonical (ndash)
"The Free Press Journal",
"The Fresno Bee",
"The Gadsden Times",
"The Gainesville Sun",
"The Gainesville Times (Georgia)",
"The Gardner News",
"The Gaston Gazette",
"The Gazette (Cedar Rapids)", // canonical
"The Gazette (Colorado Springs)",
"The Gazette (Maryland)",
"The Gazette (Montreal)", // same as Montreal Gazette
"The Georgia Straight",
"The Gettysburg Times",
"The Gleaner", // same as The Gleaner (newspaper)
"The Gleaner (newspaper)", // canonical
"The Globe and Mail", // canonical
"The Goldsboro News-Argus",
"The Grand Rapids Press",
"The Greenfield Recorder", // same as The Recorder (Massachusetts newspaper)
"The Greenville News",
"The Grid (newspaper)",
"The guardian",
"The Guardian",
"The Guardian (Charlottetown)", // canonical
"The GW Hatchet",
"The Hamilton Spectator",
"The Hankyoreh",
"The Hans India",
"The Harvard Crimson", // canonical
"The Hawk Eye",
"The Herald Bulletin", // canonical
"The Herald Republican", // canonical
"The Herald (Glasgow)", // canonical
"The Herald (Ireland)", // canonical
"The Herald (Plymouth)", // canonical
"The Herald (Scotland)", // same as The Herald (Glasgow)
"The Herald (Sharon)",
"The Herald (Zimbabwe)",
"The Herald-Dispatch", // canonical
"The Herald-News",
"The Herald-Palladium", // canonical
"The Herald-Standard",
"The Herald-Sun (Durham, North Carolina)", // canonical
"The Hill",
"The Hill (newspaper)", // canonical
"The Himalayan Times",
"The Hindu",
"The Hindu Business Line", // same as "Business Line",
"The Hobart Mercury", // same as The Mercury (Hobart)
"The Holland Sentinel", // canonical
"The Honolulu Advertiser",
"The Houma Courier",
"The Hounslow Chronicle",
"The Hour (newspaper)",
"The Hudson Reporter",
"The Huntsville Item",
"The Huntsville Times",
"The Hutchinson News",
"The Idaho Press-Tribune",
"The Impartial Reporter",
"The Independent", // canonical
"The Independent Florida Alligator",
"The Independent on Sunday",
"The Independent (London)",
"The Independent (Uganda)",
"The Indian Express", // canonical
"The Indianapolis Star", // canonical
"The Insider (newspaper)",
"The Intelligencer (Doylestown, Pennsylvania)", // canonical
"The Inter Ocean", // same as Chicago Inter Ocean
"The International Herald Tribune",
"The Inverness Courier",
"The Irish Emigrant",
"The Irish Independent",
"The Irish News",
"The Irish Times",
"The Island Packet",
"The Island (Sri Lanka)",
"The Item",
"The Jackson Sun",
"The Jakarta Post",
"The Jamaica Observer", // canonical
"The Japan News",
"The Japan Times", // canonical
"The Japan Times Online",
"The Jersey Journal",
"The Jerusalem Post", // canonical
"The Jewish Chronicle",
"The Jewish Chronicle of Pittsburgh",
"The Jewish Journal of Greater Los Angeles",
"The Jewish News", // same as New Jersey Jewish News
"The Jewish News of Northern California", // same as J. The Jewish News of Northern California
"The Jewish Post", // canonical
"The Jewish Week",
"The Journal Gazette", // canonical
"The Journal (newspaper)",
"The Kansas City Star", // canonical
"The Keene Sentinel",
"The Kentucky Kernel",
"The Kerryman",
"The Knoxville News-Sentinel", // same as Knoxville News Sentinel
"The Korea Herald",
"The Land (newspaper)",
"The Leaf Chronicle",
"The Leaf-Chronicle", // canonical
"The Legal Intelligencer",
"The Ledger",
"The Lima News",
"The Linfield Review",
"The London Gazette",
"The Louisiana Weekly",
"The Louisville Courier Journal", // same as The Courier-Journal
"The Louisville Courier-Journal", // same as The Courier-Journal
"The Mail on Sunday",
"The Mainichi", // same as Mainichi Shimbun
"The Malay Mail Online", // same as Malay Mail
"The Malaysian Insider",
"The Malta Independent",
"The Manchester Guardian", // same as The Guardian
"The Manila Times",
"The Manobkantha",
"The Martlet",
"The Maui News",
"The Mercer Cluster",
"The Mercury", // same as Mercury (newspaper) – a dab
"The Mercury News", // canonical
"The Mercury (Hobart)", // canonical
"The Meridian Star",
"The Messenger (newspaper)",
"The MetroWest Daily News",
"The Miami Daily News-Record", // same as Miami News Record
"The Miami News",
"The Miami Student",
"The Michigan Daily",
"The Middletown Journal",
"The Milwaukee Journal",
"The Minneapolis Journal", // same as Star Tribune
"The Minneapolis Star", // same as Star Tribune
"The Mississauga News",
"The Modesto Bee",
"The Monitor (Texas)",
"The Monroe News-Star", // same as The News-Star
"The Montana Standard",
"The Monterey County Herald", // canonical
"The Montserrat Reporter",
"The Morning Call",
"The Morning Journal",
"The Morning Sun (Pittsburg)",
"The Moscow Times",
"The Mountain Enterprise",
"The Munster Express",
"The Muslim Observer",
"The Namibian",
"The Nation (Malawi)",
"The Nation (Pakistan)", // canonical
"The Nation (Pakistani newspaper)", // same as The Nation (Pakistan)
"The Nation (Thailand)",
"The National (Abu Dhabi)",
"The Navhind Times",
"The Nelson Mail",
"The New Daily",
"The New Era (newspaper)", // canonical
"The New Indian Express", // canonical
"The New Light of Myanmar",
"The New Mexican", // same as The Santa Fe New Mexican
"The New Orleans Tribune",
"The New Paper",
"The New Times (Rwanda)",
"The New York Clipper",
"The New York Observer", // canonical
"The New York Sun", // canonical
"The New York Times", // canonical
"The New Zealand Herald", // canonical
"The Newberg Graphic",
"The Newcastle Herald",
"The News and Courier", // same as The Post and Courier
"The News and Eastern Townships Advocate",
"The News Herald (Panama City)",
"The News International",
"The News Journal", // canonical
"The News Leader",
"The News Letter",
"The News Today", // dab
"The News Today (Bangladesh)",
"The News Tribune", // canonical
"The News Virginian",
"The News & Observer", // canonical
"The News-Gazette", // dab
"The News-Palladium", // same as The Herald-Palladium
"The News-Press",
"The News-Review",
"The News-Star", // canonical
"The News-Times",
"The News (Portsmouth)", // canonical
"The Nikkei",
"The Non-League Paper",
"The Nonprofit Times",
"The Norman Transcript",
"The North Jefferson News",
"The Northern Echo",
"The Northern Scot",
"The Northern Star",
"The Northern Times",
"The Northwest Arkansas Times",
"The Norwegian American",
"The Oakland Press",
"The Oban Times",
"The Observer",
"The Observer (La Grande)",
"The Observer (Uganda)",
"The Oklahoman",
"The Olympian",
"The Orcadian",
"The Oregonian",
"The Oxford Student",
"The Oxford Times",
"The Palm Beach Post",
"The Pantagraph", // canonical
"The Paris News",
"The Patriot Ledger",
"The Patriot-News", // canonical
"The Pembrokeshire Herald and General Advertiser",
"The Peninsula (newspaper)",
"The People", // same as The Sunday People
"The Philadelphia Inquirer",
"The Philadelphia Record",
"The Philippine Star", // canonical
"The Philippine STAR",
"The Phnom Penh Post",
"The Phoenix (newspaper)", // canonical
"The Pitch (newspaper)",
"The Pitt News",
"The Pittsburgh Courier",
"The Plain Dealer", // canonical
"The Plymouth Evening Herald", // same as The Herald (Plymouth)
"The Polytechnic", // same as The Rensselaer Polytechnic
"The Point (Gambia)",
"The Point (the Gambia)", // canonical
"The Port Arthur News",
"The Portland Phoenix", // same as The Phoenix (newspaper)
"The Portland Observer", // same as Portland Observer (Oregon)
"The Portsmouth Herald",
"The Post", // dab
"The Post and Courier", // canonical
"The Post-Standard",
"The Post-Star",
"The Potpourri",
"The Prescott Courier", // same as The Daily Courier (Arizona)
"The Press",
"The Press and Journal (Scotland)",
"The Press Democrat", // canonical
"The Press of Atlantic City",
"The Press (York)",
"The Press-Enterprise", // canonical / not the same as Press Enterprise (Pennsylvania)
"The Province", // canonical
"The Pueblo Chieftain",
"The Pulse-Journal", // same as Journal-News Pulse
"The Punch",
"The Queensland Times",
"The Rakyat Post",
"The Rand Daily Mail", // canonical
"The Record (Bergen County)",
"The Record (Bergen County, New Jersey)", // canonical
"The Recorder (Massachusetts newspaper)", // canonical
"The Register Guard",
"The Register-Guard", // canonical
"The Register Herald",
"The Register-Herald", // canonical
"The Rensselaer Polytechnic", // canonical
"The Repository", // canonical
"The Republican",
"The Republican (Springfield)",
"The Republican (Springfield, Massachusetts)", // canonical
"The Rising Nepal",
"The Roanoke Times",
"The Rocket (newspaper)",
"The Royal Gazette (Bermuda)",
"The Sacramento Bee", // canonical
"The Sacramento Observer",
"The Salem News",
"The Salina Journal",
"The Saline Courier",
"The Salt Lake Tribune",
"The San Bernardino County Sun", // same as The San Bernardino Sun
"The San Bernardino Sun", // canonical
"The San Diego Reader",
"The San Diego Union", // same as The San Diego Union-Tribune
"The San Diego Union-Tribune", // canonical
"The San Francisco Call",
"The San Francisco Examiner",
"The San Pedro Sun",
"The Santa Fe New Mexican", // canonical
"The Santiago Times",
"The Saratogian",
"The Scots Independent",
"The Scotsman",
"The Scranton Times-Tribune", // canonical
"The Seattle Times", // canonical
"The Sentinel (Staffordshire)",
"The Shreveport Times", // same as The Times (Shreveport)
"The Siasat Daily",
"The Singapore Free Press",
"The Slovak Spectator",
"The Sofia Echo",
"The Southeast Missourian",
"The Southern Star (County Cork)",
"The Sowetan",
"The Spokesman-Review", // canonical
"The Stage",
"The Standard", // several ...
"The Standard (Hong Kong)",
"The Standard (Kenya)",
"The Stanford Daily",
"The Standard-Times (New Bedford)",
"The Star Online", // same as The Star (Malaysia)
"The Star Phoenix", // same as The StarPhoenix
"The Star (Kenya)",
"The Star (Malaysia)", // canonical
"The Star (South Africa)",
"The StarPhoenix", // canonical
"The Star Ledger", // same as The Star-Ledger
"THE STAR-LEDGER",
"The Star-Ledger", // canonical
"The Star Telegram",
"The Star-Telegram", // same as Fort Worth Star-Telegram
"The State (newspaper)",
"The State Journal-Register",
"The Statesman (India)",
"The Straits Times", // canonical
"The Stranger",
"The Stranger (newspaper)", // canonical
"The Student Life",
"The St. Louis American",
"The Sun Daily", // same as The Sun (Hong Kong)
"The Sun (Hong Kong)",
"The Sun (Malaysia)",
"The Sun (New York)",
"The Sun (Nigeria)",
"The Sun (Sydney)",
"The Sun (United Kingdom)", // canonical
"The Sunday Age", // same as The Age
"The Sunday Business Post",
"The Sunday Guardian", // canonical
"The Sunday Independent (South Africa)",
"The Sunday Leader",
"The Sunday Mail", // dab
"The Sunday People", // canonical
"The Sunday Post",
"The Sunday Standard",
"The Sunday Telegraph", // UK
"The Sunday Telegraph (Sydney)",
"The Sunday Times", // canonical
"The Sunday Times (Singapore)", // same as The Straits Times
"The Sunday Times (Sri Lanka)",
"The Sunday Times (Western Australia)", // canonical
"The Sunday Times (UK)",
"The Sun-Herald",
"The Sydney Mail",
"The Sydney Morning Herald", // canonical
"The Tampa Tribune",
"The Taos News",
"The Tart",
"The Taunton Gazette",
"The Tech",
"The Tech (newspaper)",
"The Technique", // same as Technique (newspaper)
"The Telegram",
"The Telegraph",
"The Telegraph (Alton, Illinois)", // canonical
"The Telegraph (Calcutta)", // same as The Telegraph (Kolkata)
"The Telegraph (India)", // same as The Telegraph (Kolkata)
"The Telegraph (Kolkata)", // canonical
"The Telegraph (Nashua)", // canonical
"The Telegraph (UK)", // same as The Daily Telegraph
"The Tennessean",
"The Tide News",
"The Tide (Nigeria)", // canonical
"The Tifton Gazette",
"The Times",
"The Times and Democrat",
"The Times Beacon Record",
"The Times News", // redirect to dab Times-News
"The Times of India", // canonical
"The Times of Israel", // canonical
"The Times Of Israel",
"The Times of Northwest Indiana", // canonical
"The Times of Trenton", // same as The Times (Trenton)
"The Times (Malta)",
"The Times (Shreveport)", // canonical
"The Times (Trenton)", // canonical
"The Times-Picayune", // same as The Times-Picayune/The New Orleans Advocate
"The Times-Picayune/The New Orleans Advocate", // canonical
"The Times-Reporter",
"The Times-Tribune (Corbin)",
"The Times-Tribune (Scranton)", // same as The Scranton Times-Tribune
"The Tolucan Times",
"The Tombstone Epitaph",
"The Topeka Capital-Journal", // canonical
"The Topeka Daily Capital", // same as The Topeka Capital-Journal
"The Town Talk", // canonical
"The Town Talk (Alexandria)",
"The Trentonian",
"The Tribune", // there are many
"The Tribune (Chandigarh)", // canonical
"The Tribune-Democrat", // canonical
"The Trinity Tripod",
"The Tuscaloosa News",
"The Tuskegee News",
"The Ukrainian Weekly",
"The Unesco Courier", // capitalization
"The UNESCO Courier", // canonical
"The Union Democrat",
"The Union (newspaper)",
"The University Daily Kansan",
"The Valdosta Daily Times",
"The Vancouver Observer",
"The Victoria Advocate",
"The Vidette-Messenger", // same as Vidette Times
"The Village Voice", // canonical
"The Villager", // dab
"The Villager (Manhattan)",
"The Villages Daily Sun",
"The Vindicator", // sam as The Vindicator (Ohio newspaper)
"The Vindicator (Ohio newspaper)", // canonical
"The Vindicator (Ulster newspaper)",
"The Virgin Islands Daily News",
"The Virginia Pilot", // same as The Virginian-Pilot
"The Virginian-Pilot", // canonical
"The Wall Street Journal", // canonical
"The Wall-Street Journal",
"The Washington Herald",
"The Washington Post", // canonical
"The Washington Post and Times Herald",
"The Washington Post and Times-Herald",
"The Washington Times", // canonical
"The Waterloo-Cedar Falls Courier", // canonical
"The West Australian",
"The West Briton",
"The Western Star (Ohio)",
"The Wichita Eagle",
"The Windsor Daily Star", // same as Windsor Star
"The Winfield Daily Courier",
"The Winnipeg Tribune",
"The York Dispatch",
"The Yorkshire Post",
"The Yorkshire Times",
"The Youngstown Vindicator", // same as The Vindicator (Ohio newspaper)
"theage.com.au", // same as The Age
"thehill.com", // same as The Hill (newspaper)
"thehindu.com", // same as The Hindu
"TheJournal.ie", // canonical
"thenational.ae", // same as The National (Abu Dhabi)
"thesun.co.uk", // same as The Sun (United Kingdom)
"This Day", // canonical
"ThisDay",
"This is South Wales", // same as South Wales Evening Post
"Time Weekly", // dab
"Times Colonist", // canonical
"Times-Colonist (Victoria)",
"Times Daily", // same as TimesDaily
"Times Free Press", // same as Chattanooga Times Free Press
"Times Herald-Record", // canonical
"Times Leader",
"Times of india", // capitalization
"Times Of India", // capitalization
"Times of London", // same as The Times
"Times of Malta",
"Times of Oman",
"Times of Swaziland",
"Times Online", // same as The Times
"Times Record News",
"Times Union (Albany)", // canonical
"Times West Virginian",
"Times & Star",
"Times & Transcript",
"TimesDaily", // canonical
"timesofindia.indiatimes.com", // same as The Times of India
"timesofisrael.com", // same as The Times of Israel
"timesonline.co.uk", // same as The Times
"Timesunion.com", // same as Times Union (Albany)
"timesunion.com",
"Times-Herald Record", // same as Times Herald-Record
"Times-News", // dab
"Times-News (Burlington, North Carolina)",
"Times-Union", // dab
"Tiverton Gazette",
"To Vima",
"Today (Singapore newspaper)",
"Today's Zaman",
"Toledo Blade", // same as The Blade (Toledo, Ohio)
"toledoblade.com",
"Tonawanda News",
"Topeka Capital Journal", // same as The Topeka Capital-Journal
"Toronto Star",
"Toronto Sun", // canonical
"Toronto SUN",
"Townsville Bulletin",
"Travel Trade Gazette",
"Travel Weekly",
"Traverse City Record-Eagle",
"Triangle Business Journal",
"Tribune Chronicle",
"Tribune India", // same as The Tribune (Chandigarh)
"Tribune-Star",
"Trinidad Guardian",
"Trinidad and Tobago Express",
"Trinidad and Tobago Guardian",
"Tri-City Herald",
"Tri-County News (Kiel, Wisconsin)",
"Trouw",
"Trud (Bulgarian newspaper)", // canonical
"Tucson Citizen", // canonical
"Tucson Daily Citizen", // same as Tucson Citizen
"Tulsa World",
"Turkish Daily News", // same as Hürriyet Daily News
"Turks and Caicos Weekly News",
"Tuttosport",
"Tuổi Trẻ", // canonical
"Tuổi trẻ Online", //same as Tuổi Trẻ
"Tyler Morning Telegraph",
"UB Post",
"UCSD Guardian",
"Ukrayinska Pravda",
"Ulster Herald",
"Undercurrent (newspaper)",
"University Times",
"UNESCO Courier",
"USA Today", // canonical
"USA TODAY",
"USAToday",
"USAToday.com",
"Utrinski vesnik", // canonical
"Utrinski vesnik (daily newspaper)",
"Utusan Malaysia",
"U-T San Diego", // same as The San Diego Union-Tribune
"Última Hora (Paraguay)",
"Vail Daily",
"Valley Morning Star",
"Valor Econômico",
"Vancouver Courier",
"Vancouver Free Press",
"Vancouver Province", // same as The Province
"Vancouver Sun", // canonical
"Vanguard Nigeria",
"Vanguard (Nigeria)", // canonical
"Vanguardia", // dab
"Vanuatu Daily Post",
"Varden (newspaper)",
"Vatan",
"Värmlands Folkblad",
"Vedomosti",
"Ventura County Star",
"Verdens Gang",
"Vestmanlands Läns Tidning",
"Victoria Times Colonist", // same as Times Colonist
"Victorville Daily Press",
"Vidette Times",
"Vietnam News", // same as Việt Nam News
"Vijayavani",
"Vijesti",
"Village News and Southwest News", // canonical
"Village News & Southwest News",
"Vineyard Gazette",
"Việt Nam News", // canonical
"Vjesnik",
"Voir",
"Vnexpress",
"VnExpress", // canonical
"VUE Weekly",
"Vue Weekly", // canonical
"Vzglyad (newspaper)",
"Waco Tribune-Herald", // canonical
"WacoTrib.com",
"Wakefield Express",
"Walla Walla Union-Bulletin",
"Waltham Forest Guardian",
"Warrington Guardian",
"Warsaw Business Journal",
"Washington Blade",
"Washington Business Journal",
"Washington City Paper",
"washingtonpost.com",
"Waterford News & Star",
"Waterloo Courier", // same as The Waterloo-Cedar Falls Courier
"Waterloo Region Record",
"Watertown Daily Times",
"Watford Observer",
"WAtoday",
"Wausau Daily Herald",
"WCF Courier", // same as The Waterloo-Cedar Falls Courier
"Weatherford Democrat",
"Weekly Alibi",
"Western Gazette",
"Western Leader",
"Western Mail (Wales)",
"Western Morning News",
"Western People",
"Western Telegraph",
"Westmeath Examiner",
"Westmoreland News",
"Wexford People",
"What's on TV",
"Whitehaven News",
"Whittier Daily News",
"Wichita Business Journal",
"Wilamette Week", // same as Willamette Week
"Willamette Week", // canonical
"Williamson Daily News",
"Williamsport Sun-Gazette",
"Williston Herald",
"Wilmington Morning Star", // same as Star-News
"Wilsonville Spokesman",
"Wiltshire Times",
"Windsor Star", // canonical
"Windy City Times",
"Winnipeg Free Press",
"Winnipeg Sun",
"Winona Daily News",
"Winston-Salem Chronicle",
"Winston-Salem Journal",
"WirtschaftsBlatt",
"Wisconsin Jewish Chronicle",
"Wisconsin State Journal",
"Woodburn Independent",
"Woodstock Sentinel-Review",
"Worcester News",
"wsj.com", // same as The Wall Street Journal
"Wyoming Tribune Eagle", // canonical
"Wyoming Tribune-Eagle",
"Xenia Daily Gazette",
"XPRESS (newspaper)",
"Yangcheng Evening News",
"Yomiuri Shimbun",
"York Daily Record",
"Yorkshire Evening Post",
"Zaman",
"Yale Daily News",
"Yemen Times",
"York News-Times",
"Yuma Daily Sun", // same as Yuma Sun
"Yuma Sun", // canonical
"zeit.de", // same as Die Zeit
"Ziarul Financiar",
"Ziua",
"Аргументы и Факты", // same as Argumenty i Fakty
"Коммерсантъ", // same as Kommersant
};
foreach (string newspaper in newspapers)
periodical_map.Add (newspaper, "newspaper");
//----------< W E B S I T E S >----------
string[] websites = {
"AbsolutePunk",
"Al Bawaba",
"All Movie", // same as AllMovie
"All Movie Guide", // same as AllMovie
"All Music Guide", // same as AllMusic
"AllHipHop", // canonical
"AllHipHop.com",
"allmovie",
"Allmovie",
"AllMovie", // canonical
"allmusic",
"Allmusic",
"AllMusic", // canonical
"Allmusic.com",
"Ars Technica",
"Atlas of U.S. Presidential Elections", // same as Dave Leip's Atlas of U.S. Presidential Elections
"Aviation Safety Network",
"Baseball Reference",
"Baseball-Reference",
"Baseball-Reference.com", // canonical
"Bleacher Report",
"Box Office India", // canonical
"Box Office Mojo", // canonical
"BoxOfficeMojo.com",
"Boxofficeindia.com", // same as Box Office India
"British Comedy Guide",
"Broadway World",
"BroadwayWorld", // canonical
"BroadwayWorld.com",
"Business Insider",
"Catholic-Hierarchy.org",
"Chortle (website)",
"Civil Georgia",
"CNET", // canonical
"Cnet.com",
"CNN Business", // canonical
"CNN Money", // same as CNN Business
"CNNmoney.com", // same as CNN Business
"Collider",
"Collider (website)",
"Comic Book Resources",
"CricketArchive", // no en.wiki article
"Dave Leip's Atlas of U.S. Presidential Elections", //canonical
"Destructoid",
"Detik",
"detik.com", // canonical
"Digital Photography Review",
"Digital Spy",
"DyeStat",
"Elite Daily",
"EurekAlert!",
"Eurogamer", // canonical
"EuroGamer",
"FanHouse",
"Find a grave",
"Find a Grave", // canonical
"FXGuide",
"Gamasutra",
"GameDev.net",
"GameFAQs",
"GamesRadar+",
"GameSpot",
"GCatholic.org",
"Gizmodo",
"gizmodo.com",
"Harvard University Gazette",
"IGN", // canonica;
"IGN.com",
"IMDb", // canonical
"imdb.com",
"Inside Higher Ed", // canonical
"Inside Higher Education",
"Inside Philanthropy",
"Internet Movie Database", // same as IMDb
"It's An Honour",
"Jezebel (website)",
"Kotaku",
"Long War Journal",
"MacRumors",
"MarketWatch",
"Medline Plus",
"MedlinePlus", // canonical
"Medscape",
"Metacritic",
"MLB.com",
"Mymovies.it",
"New Advent",
"NintendoLife",
"Nintendo Life", // canonical
"Noise Creep",
"Noisecreep", // canonical
"Okayplayer",
"Phoenix New Times",
"Pitchfork",
"Pitchfork (website)", // canonical
"Polygon",
"Polygon (website)", // canonical
"Polygon.com",
"Pro-Football-Reference",
"Pro-Football-Reference.com", // canonical
"Rec.Sport.Soccer Statistics Foundation", // RSSSF canonical
"Reddit",
"Rotten Tomatoes",
"RSSSF",
"RTÉ",
"RxList",
"Science Daily",
"ScienceDaily", // canonical
"Science-Based Medicine",
"Sci-News.com",
"Screen Rant", // canonical
"Screenonline",
"Screenrant",
"Sherdog",
"Snopes", // canonical
"Snopes.com",
"Sparknotes",
"Sportskeeda",
"Sputnik Music",
"Sputnikmusic", // canonical
"SputnikMusic",
"Stereogum",
"Swissinfo",
"The Arts Desk",
"The Cairo Post",
"The Daily Beast",
"The Futon Critic",
"The Green Papers",
"The Intercept",
"The Numbers (website)",
"The Local",
"The Plant List",
"The Political Graveyard",
"The Raw Story",
"The Register",
"The Smoking Gun",
"The Straight Dope",
"The Verge",
"TheGATE.ca",
"TheWrap", // canonical
"The Wrap",
"Think Progress",
"ThinkProgress", // canonical
"TV Line",
"TV Tonight",
"TVARK",
"uselectionatlas.org", // same as Dave Leip's Atlas of U.S. Presidential Elections
"VG247",
"Vox (website)",
"WebMD",
"Wired.com",
"ZDnet",
"1UP.com",
//----------< S P O R T T E A M S I T E S >----------
"Adelaide United FC",
"Adirondack Thunder",
"Adler Mannheim",
"American Hockey League",
"American National Rugby League",
"Amsterdamsche FC",
"Alaska Aces (ECHL)",
"Allen Americans",
"Arizona Coyotes", // canonical
"Augsburger Panther",
"Bridgeport Sound Tigers",
"Brisbane Roar FC",
"Buffalo Sabres",
"Calgary Flames",
"Canadian Football League Players Association",
"Canadian Football League Players' Association", // canonical
"Cape Breton Screaming Eagles",
"Carolina Hurricanes",
"Central Hockey League",
"Champions Hockey League",
"Chicago Blackhawks",
"Cincinnati Cyclones",
"Colorado Eagles",
"Country Rugby League", // canonical
"Country Rugby league of NSW", // same as Country Rugby League
"Dallas Stars",
"Deutsche Eishockey Liga",
"Dinamo Riga",
"Djurgårdens IF Hockey",
"Düsseldorfer EG",
"EC KAC",
"EC Red Bull Salzburg",
"EC VSV",
"ECHL",
"Edmonton Oilers",
"EHC Biel",
"EHC Black Wings Linz",
"EHC Kloten",
"EHC Red Bull München",
"EHC Wolfsburg", // same as Grizzlys Wolfsburg
"Eisbären Berlin",
"Elmira Jackals",
"ERC Ingolstadt",
"Espoo Blues",
"EV Zug",
"Florida Panthers",
"Fort Worth Brahmas",
"Frisk Asker", // same as IF Frisk Asker
"Färjestad BK", // canonical
"Färjestads BK",
"Genève-Servette HC",
"Graz 99ers",
"Grizzlys Wolfsburg", // canonical
"Hamburg Freezers",
"HC Ambrì-Piotta",
"HC Bozen–Bolzano", // canonical
"HC Davos",
"HC Dinamo Minsk",
"HC Dynamo Pardubice", // canonical
"HC Fribourg-Gottéron",
"HC Innsbruck",
"HC Kometa Brno",
"HC Lada Togliatti",
"HC Lev Praha",
"HC Neftekhimik Nizhnekamsk",
"HC Nové Zámky",
"HC Oceláři Třinec",
"HC Pardubice", // same as HC Dynamo Pardubice
"HC Plzeň",
"HC Pustertal Wölfe", // canonical
"HC Sibir Novosibirsk",
"HC Spartak Moscow",
"HC TPS",
"HC TWK Innsbruck",
"HC Yugra",
"HC České Budějovice", // same as Motor České Budějovice
"HCB South Tyrol", // same as HC Bozen–Bolzano
"HIFK",
"HK Nitra",
"HKm Zvolen", // canonical
"HKM Zvolen",
"HockeyAllsvenskan",
"HPK",
"HV71",
"IF Frisk Asker", // canonical
"IIHF", // same as International Ice Hockey Federation
"International Ice Hockey Federation", // canonical
"International Rugby Board", // same as World Rugby
"Iowa Wild",
"Iraklis F.C.",
"Iraklis F.C. (Thessaloniki)", // canonical
"IRUPA", // same as Rugby Players Ireland
"Iserlohn Roosters",
"Island Storm",
"Kalamazoo Wings",
"Kontinental Hockey League",
"Lahti Pelicans",
"Lausanne HC",
"Lehigh Valley Phantoms",
"LeKi", // same as Lempäälän Kisa
"Leksands IF",
"Lempäälän Kisa", // canonical
"Liga Portuguesa de Futebol Profissional", // canonical
"Linköpings HC",
"LPFP", // same as Liga Portuguesa de Futebol Profissional
"Lukko",
"Luleå HF",
"Malmö Redhawks",
"Melbourne Victory FC",
"Metallurg Magnitogorsk",
"Mikkelin Jukurit",
"Milwaukee Admirals",
"Modo Hockey",
"Montreal Canadiens",
"Motor České Budějovice", // canonical
"Mountfield HK",
"Nashville Predators",
"National Football League", // canonical
"National Hockey League", // canonical
"National Hockey League Players' Association",
"National Rugby League",
"New Jersey Devils",
"New York Islanders",
"New York Rangers",
"NFL.com", // same as National Football League
"NHL.com", // same as National Hockey League
"Ontario Hockey League",
"Orlando Solar Bears (ECHL)",
"Orli Znojmo",
"Ottawa Redblacks",
"Ottawa Senators",
"Parramatta Eels",
"Philadelphia Flyers",
"Phoenix Coyotes", // same as Arizona Coyotes
"Polish Football Association", // canonical
"Portland Pirates",
"Portland Winterhawks",
"Providence Bruins",
"Pustertal-Val Pusteria Wolves", // same as HC Pustertal Wölfe
"PZPN", // same as Polish Football Association
"Queensland Rugby League",
"Rangers FC",
"Rangers F.C.", // canonical
"Rapid City Rush",
"Reading Royals",
"Rugby Players Ireland",
"Rugby Union Players' Association", // canonical
"RUPA", // same as Rugby Union Players' Association
"SaiPa",
"San Jose Sharks",
"Saracens F.C.",
"SC Bern",
"SC Langenthal",
"Schwenninger Wild Wings",
"Sheffield Steelers",
"Skellefteå AIK",
"Starbulls Rosenheim",
"Stockton Heat",
"St. John's IceCaps",
"Swedish Hockey League",
"Swedish Ice Hockey Association",
"Södertälje SK",
"Tappara",
"Thomas Sabo Ice Tigers",
"Toronto Maple Leafs",
"Toronto Marlies",
"Traktor Chelyabinsk",
"United States Hockey League",
"Vienna Capitals",
"Vålerenga Ishockey",
"World Rugby", // canonical
"ZSC Lions",
};
foreach (string website in websites)
periodical_map.Add (website, "website");
//----------< N E W S W O R K S >----------
string[] newsworks = {
"AgoraVox",
"All Africa", // same as AllAfrica.com
"AllAfrica.com", // canonical
"Alternative Addiction",
"Alternet",
"Anime News Network", // canonical
"AnimeNewsNetwork",
"AsiaOne",
"banglanews24", // same as Bdnews24.com
"Bdnews24.com",
"BBC Online", // canonical
"BBC news",
"BBC News", // canonical
"BBC News Asia",
"BBC News Online",
"BBC Sport",
"bbc.co.uk", // same as BBC Online
"BBC.co.uk",
"Bernews",
"Blabbermouth",
"Blabbermouth.net", // canonical
"Bollywood Hungama",
"BVI News",
"CBSSports.com",
"CNS News", // same as CNSNews.com
"CNSNews.com", // canonical
"Colombia Reports",
"Comics Alliance",
"ComicsAlliance", // canonical
"cricinfo", // same as ESPNcricinfo
"Cricinfo",
"Crosscut.com",
"Cycling News",
"Cyclingnews.com", // canonical
"DailyTech",
"Defense News",
"Digital Journal",
"Dread Central",
"Duowei",
"Duowei News", // canonical
"Duowei Times", // same as Duowei News
"ESPNcricinfo", // canonical
"espncricinfo.com",
"ESPN scrum",
"ESPN Scrum",
"ESPNscrum", // canonical
"ESPNScrum",
"Eurasianet", // canonical
"Eurasianet.org",
"Firstpost",
"FiveThirtyEight",
"Flightglobal",
"FlightGlobal", // canonical
"FlightGlobal.com",
"Free Malaysia Today",
"Gazeta Express",
"GeekWire",
"Gigwise", // canonical
"Gigwise.com",
"Global Post",
"GlobalPost", // canonical
"Guardian Unlimited", // same as TheGuardian.com
"guardian.co.uk",
"Guardian.co.uk", // same as TheGuardian.com
"HitFix",
"Hollywood.com",
"huffington Post",
"Huffington Post",
"Huffington Post Canada",
"Huffington Post UK",
"huffingtonpost.com",
"HuffPost",
"Independent Online (South Africa)",
"Indian Country Today",
"Indie Wire",
"indieWire",
"Indiewire",
"IndieWire", // canonical
"Indie-Wire",
"Inquisitr", // canonical
"IT News Africa",
"Latin Times",
"Malaysiakini",
"Medical News Today",
"Metal Injection",
"Mississippi Today",
"Mizzima News",
"MMAjunkie.com",
"Mondoweiss",
"Mother Nature Network",
"MovieWeb",
"Newsarama",
"News.com.au",
"news.com.au", // canonical
"NK News",
"Neon Tommy",
"News24",
"NJ.com", // canonical
"nj.com",
"OhmyNews",
"Oregon Live",
"OregonLive.com", // canonical
"Pegasus News",
"PhysOrg", // same as Phys.org
"Phys.org", // canonical
"PandoDaily",
"PolitiFact.com",
"Quartz",
"QUARTZ",
"Quartz (publication)", // canonical
"Rappler", // canonical
"Rappler.com",
"Recode",
"Reuters", // news agency; keep?
"Reuters UK",
"Rogerebert.com",
"Rugby Heaven",
"Salon",
"Salon (magazine)",
"Salon (website)", // canonical
"Salon.com",
"San Diego Gay & Lesbian News",
"Scottish Legal News",
"Scrum.com", // same as ESPNscrum
"Southeast European Times",
"Space Fellowship",
"space.com",
"Space.com", // canonical
"Speedcafe",
"Spiegel online",
"Spiegel Online", // canonical
"Sportnet.hr",
"Stuff.co.nz",
"Suburban News", // same as NJ.com
"Sudan Tribune",
"SwimSwam",
"TCPalm",
"Tech Radar",
"TechCrunch", // canonical
"techcrunch.com",
"TechRadar", // canonical
"The Athletic",
"The Aviation Herald",
"The Cleveland Leader",
"The Conversation",
"The Daily Caller",
"The Daily Dot",
"The Daily Signal",
"The Huffington Post",
"The Inquisitr News", // same as Inquisitr
"The Texas Tribune",
"theguardian.com",
"Theguardian.com",
"TheGuardian.com", // canonical
"TMZ",
"tmz.com",
"Tokyo Reporter",
"Truthout.org",
"TVLine", // canoncial
"TVLine.com",
"Twin Cities Daily Planet",
"Universe Today",
"Universetoday.com",
"VentureBeat",
"Wales online",
"Wales Online",
"Walesonline",
"WalesOnline", // canonical
"Washington Free Beacon",
"Yahoo News", // same as Yahoo! News
"Yahoo Sports", // canonical
"Yahoo Finance",
"Yahoo! Finance", // canonical
"Yahoo! News", // canonical
"Yahoo! Sports",
"Ynet",
"Ynetnews", // same as Ynet
"ZDNet",
"zdnet.com",
};
foreach (string newswork in newsworks)
periodical_map.Add (newswork, "work");
// here we map parameter names from periodical_map to appropriate template names
Dictionary<string, string> template_map = new Dictionary<string, string>();
template_map.Add("dictionary", "cite dictionary");
template_map.Add("encyclopedia", "cite encyclopedia");
template_map.Add("journal", "cite journal");
template_map.Add("magazine", "cite magazine");
template_map.Add("newspaper", "cite news");
template_map.Add("website", "cite web");
template_map.Add("work", "cite news");
//---------------------------< S T A R T >--------------------------------------------------------------------
ArticleText = hide (ArticleText, IS_CS1_PERIODICAL); // hide all templates that aren't cs1 periodical templates & hide wikilinks
//---------------------------< P U B L I S H E R W I T H I T A L I C M A R K U P >----------------------
//
// sub-task 1
//
// For ...|publisher=''<periodical name>'' |... (balanced markup at beginning an end - trailing whitespace ignored)
//
// This for periodicals (newspapers, magazines, ...) listed in the periodical_map[]. May include 'domain' names
// Salon.com and the like.
//
// |publisher= value should not have italic (or bold) wiki markup; These are usually 'work' put in the wrong
// parameter and then given the markup so that the citation looks correct after rendering
// ...|publisher=''[[The New York Times]]'' |...
//
// in this case the template should be renamed to {{cite news}} (if a cs1 template) and |publisher= renamed to
// |newspaper= and the wiki markup stripped
pattern = @"\{\{\s*(" + IS_CS1_PERIODICAL + @")[^}]*\|\s*publisher\s*=\s*'{2,3}([^\|\}]+?)'{2,3}\s*[\|\}][^\}]*\}"; // last } omitted for when publisher is last param in template
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string fixed_template; // a fixed citation template is assembled here
string raw_template = match.Groups[0].Value; // the whole citation template; if we can't fix the template then return raw_template
string raw_template_name = match.Groups[1].Value; // the template name in case we need to change it
string raw_periodical_name = match.Groups[2].Value; // the name inside the italic wiki italic markup in |publisher=; if wikilinked will have the hide keywords
string periodical_name = periodical_name_get (raw_periodical_name); // get periodical name stripped of wikilink markup
string periodical_param = param_name_get (periodical_map, periodical_name);
if (@"" == periodical_param)
{
unrecognized_periodical_count++; // bump the counter
return raw_template; // and return the raw, unmodified template
}
fixed_template = empty_param_remove (raw_template); // make a copy with all empty parameters removed from this template
pattern = @"\|\s*" + IS_PERIODICAL_PARAM; // if any periodical parameters remain (not empty), abandon this citation
if (Regex.Match (fixed_template, pattern).Success)
{
periodical_param_conflict_count++;
return raw_template;
}
fixed_template = template_rename (fixed_template, template_map[periodical_param]); // rename the template
fixed_template = nparameter_rename (fixed_template, @"publisher", periodical_param, raw_periodical_name); // rename the parameter & remove italic wiki markup
fixed_count_ital++;
gSkip = false;
return fixed_template;
});
}
//---------------------------< P U B L I S H E R U N B A L A N C E D I T A L I C M A R K U P >----------
//
// sub-task 2
//
// For ...|publisher=''<periodical name> |... (unbalanced markup at beginning only)
//
// This for periodicals (newspapers, magazines, ...) listed in the periodical_map[]. May include 'domain' names
// Salon.com and the like.
//
// |publisher= value should not have italic (or bold) wiki markup; These are usually 'work' put in the wrong
// parameter and then given the markup so that the citation looks correct after rendering
// ...|publisher=''[[The New York Times]]'' |...
//
// in this case the template should be renamed to {{cite news}} (if a cs1 template) and |publisher= renamed to
// |newspaper= and the wiki markup stripped
pattern = @"\{\{\s*(" + IS_CS1_PERIODICAL + @")[^}]*\|\s*publisher\s*=\s*('{2}([^\|\}]+)\s*)[\|\}][^\}]*\}"; // last } omitted for when publisher is last param in template
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string fixed_template; // a fixed citation template is assembled here
string raw_template = match.Groups[0].Value; // the whole citation template; if we can't fix the template then return raw_template
string raw_template_name = match.Groups[1].Value; // the template name in case we need to change it
string raw_publisher_value = match.Groups[2].Value; // the whole value assigned to |publisher= including italic markup
string raw_periodical_name = match.Groups[3].Value; // the name inside the italic wiki italic markup in |publisher=; if wikilinked will have the hide keywords
if (1 != substr_count (raw_publisher_value, @"''")) // abandon when publisher value has more than one '' markup
return raw_template;
string periodical_name = periodical_name_get (raw_periodical_name); // get periodical name stripped of wikilink markup
string periodical_param = param_name_get (periodical_map, periodical_name);
if (@"" == periodical_param)
{
unrecognized_periodical_count++; // bump the counter
return raw_template; // and return the raw, unmodified template
}
fixed_template = empty_param_remove (raw_template); // make a copy with all empty parameters removed from this template
pattern = @"\|\s*" + IS_PERIODICAL_PARAM; // if any periodical parameters remain (not empty), abandon this citation
if (Regex.Match (fixed_template, pattern).Success)
{
periodical_param_conflict_count++;
return raw_template;
}
fixed_template = template_rename (fixed_template, template_map[periodical_param]); // rename the template
fixed_template = nparameter_rename (fixed_template, @"publisher", periodical_param, raw_periodical_name); // rename the parameter & remove italic wiki markup
fixed_count_ital++;
unbalanced_count++;
gSkip = false;
return fixed_template;
});
}
//---------------------------< C I T E W E B W I T H I T A L I C P U B L I S H E R >------------------
//
// sub-task 3
//
// This for cite web where |publisher=''<domain name>''. Must end with '.tld' where 'tld' is two or more lower-
// case letters; For the purposes of this section, upper-case letters are considered to be styling that might
// better be handled by an entry in the periodical_map[].
//
// pattern = @"\{\{\s*[Cc]ite ?web[^}]*\|\s*publisher\s*=\s*'{2}([^\|\}]+\.[a-z]{2,})'{2}\s*[\|\}][^\}]*\}"; // last } omitted for when publisher is last param in template
pattern = @"\{\{\s*[Cc]ite ?web[^}]*\|\s*publisher\s*=\s*'{2}([a-z\d\-\.]+\.[a-z]{2,})'{2}\s*[\|\}][^\}]*\}"; // last } omitted for when publisher is last param in template
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string fixed_template; // a fixed citation template is assembled here
string raw_template = match.Groups[0].Value; // the whole citation template; if we can't fix the template then return raw_template
string raw_domain_name = match.Groups[1].Value; // the name inside the italic wiki italic markup in |publisher=; if wikilinked will have the hide keywords
fixed_template = empty_param_remove (raw_template); // make a copy with all empty parameters removed from this template
pattern = @"\|\s*" + IS_PERIODICAL_PARAM; // if any periodical parameters remain (not empty), abandon this citation
if (Regex.Match (fixed_template, pattern).Success)
{
web_param_conflict_count++;
return raw_template;
}
fixed_template = nparameter_rename (fixed_template, @"publisher", @"website", raw_domain_name); // rename the parameter & remove italic wiki markup
web_fixed_count++;
gSkip = false;
return fixed_template;
});
}
//---------------------------< P U B L I S H E R W I T H O U T M A R K U P >------------------------------
//
// sub-task 4
//
// For ...|publisher=<periodical name> |...
//
// This for periodicals (newspapers, magazines, ...) listed in the periodical_map[]. May include 'domain' names
// Salon.com and the like.
// ...|publisher=[[The New York Times]] |...
//
// in this case the template should be renamed to {{cite news}} (if a cs1 template) and |publisher= renamed to
// |newspaper=
//
// if <periodical name> not found in periodical_map[], this sub-task does not bump the unrecognized counter
// because we can't know if |publisher=<periodical name> was intended to render as a periodical
//
pattern = @"\{\{\s*(" + IS_CS1_PERIODICAL + @")[^}]*\|\s*publisher\s*=\s*([^\|\}]+)\s*[\|\}][^\}]*\}"; // last } omitted for when publisher is last param in template
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string fixed_template; // a fixed citation template is assembled here
string raw_template = match.Groups[0].Value; // the whole citation template; if we can't fix the template then return raw_template
string raw_template_name = match.Groups[1].Value; // the template name in case we need to change it
string raw_periodical_name = match.Groups[2].Value; // the name in |publisher=; if wikilinked will have the hide keywords
string periodical_name = periodical_name_get (raw_periodical_name); // get periodical name stripped of wikilink markup
string periodical_param = param_name_get (periodical_map, periodical_name); // get appropriate parameter name
if (@"" == periodical_param)
return raw_template; // and return the raw, unmodified template
fixed_template = empty_param_remove (raw_template); // make a copy with all empty parameters removed from this template
pattern = @"\|\s*" + IS_PERIODICAL_PARAM; // if any periodical parameters remain (not empty), abandon this citation
if (Regex.Match (fixed_template, pattern).Success)
{
periodical_param_conflict_count++;
return raw_template;
}
fixed_template = template_rename (fixed_template, template_map[periodical_param]); // rename the template
pattern = @"(\|\s*)publisher"; // replace 'publisher' with periodical parameter
fixed_template = Regex.Replace (fixed_template, pattern, "$1" + periodical_param);
fixed_count++;
gSkip = false;
return fixed_template;
});
}
//---------------------------< W O R K W I T H I T A L I C M A R K U P >--------------------------------
//
// sub-task 5
//
// This for periodicals (newspapers, magazines, ...) listed in the periodical_map[]. May include 'domain' names
// Salon.com and the like.
//
// |work= (and aliases) value should not have italic (or bold) wiki markup;
// ...|website=''Los Angeles Times'' |...
//
// in this case the template should be renamed to {{cite news}} (if a cs1 template) and |website= renamed to
// |newspaper= and the wiki markup stripped
//
// in this sub-task, we only replace periodical parameter / template name when we recognize the periodical name
// (periodical_param has a value). Wiki markup around the value in |<periodical>= will be stripped in the sub-task 5.
//
pattern = @"\{\{\s*(" + IS_CS1_PERIODICAL + @")[^}]*\|\s*" + IS_PERIODICAL_PARAM + @"\s*=\s*'{2,3}([^\|\}]+?)'{2,3}\s*[\|\}][^\}]*\}"; // last } omitted for when publisher is last param in template
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string fixed_template; // a fixed citation template is assembled here
string raw_template = match.Groups[0].Value; // the whole citation template; if we can't fix the template then return raw_template
string raw_template_name = match.Groups[1].Value; // the template name in case we need to change it
string raw_periodical_name = match.Groups[2].Value; // the name inside the italic wiki italic markup in |<periodical>=; if wikilinked will have the hide keywords
string periodical_name = periodical_name_get (raw_periodical_name); // get periodical name stripped of wikilink markup
string periodical_param = param_name_get (periodical_map, periodical_name);
if (@"" == periodical_param)
{
unrecognized_work1_count++; // bump the counter
return raw_template; // and return the raw, unmodified template
}
fixed_template = empty_param_remove (raw_template); // make a copy with all empty parameters removed from this template
fixed_template = template_rename (fixed_template, template_map[periodical_param]); // rename the template
fixed_template = nparameter_rename (fixed_template, IS_PERIODICAL_PARAM, periodical_param, raw_periodical_name); // rename the parameter & remove italic wiki markup
work1_fixed_count++;
gSkip = false;
return fixed_template;
});
}
//---------------------------< W O R K U N B A L A N C E I T A L I C M A R K U P >----------------------
//
// sub-task 5a
//
// This for periodicals (newspapers, magazines, ...) listed in the periodical_map[]. May include 'domain' names
// Salon.com and the like where the markup is unbalanced (beginning markup only).
//
// |work= (and aliases) value should not have italic (or bold) wiki markup;
// ...|website=''Los Angeles Times |...
//
// in this case the template should be renamed to {{cite news}} (if a cs1 template) and |website= renamed to
// |newspaper= and the wiki markup stripped
//
// in this sub-task, we only replace periodical parameter / template name when we recognize the periodical name
// (periodical_param has a value). Wiki markup around the value in |<periodical>= will be stripped in sub-task 5a.
//
pattern = @"\{\{\s*(" + IS_CS1_PERIODICAL + @")[^}]*\|\s*" + IS_PERIODICAL_PARAM + @"\s*=\s*'{2,3}([^\|\}]+)\s*[\|\}][^\}]*\}"; // last } omitted for when publisher is last param in template
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string fixed_template; // a fixed citation template is assembled here
string raw_template = match.Groups[0].Value; // the whole citation template; if we can't fix the template then return raw_template
string raw_template_name = match.Groups[1].Value; // the template name in case we need to change it
string raw_periodical_name = match.Groups[2].Value; // the name inside the italic wiki italic markup in |<periodical>=; if wikilinked will have the hide keywords
string periodical_name = periodical_name_get (raw_periodical_name); // get periodical name stripped of wikilink markup
string periodical_param = param_name_get (periodical_map, periodical_name);
if (@"" == periodical_param)
{
unrecognized_work1_count++; // bump the counter
return raw_template; // and return the raw, unmodified template
}
fixed_template = empty_param_remove (raw_template); // make a copy with all empty parameters removed from this template
fixed_template = template_rename (fixed_template, template_map[periodical_param]); // rename the template
fixed_template = nparameter_rename (fixed_template, IS_PERIODICAL_PARAM, periodical_param, raw_periodical_name); // rename the parameter & remove italic wiki markup
work1_fixed_count++;
unbalanced_count++;
gSkip = false;
return fixed_template;
});
}
//---------------------------< E X T R A N E O U S T E X T C O U N T >------------------------------------
//
// matches |publisher=<anything>''<periodical name>''<anything but[|}]> so this matches:
// ...| publisher = ''The Hindu Business Line'', 22 October 2001}}
// but this does not:
// ...|publisher=''[[The Spring Observer]]'' at the ''[[Houston Chronicle]]''|...
//
// the first example above increments ext_text_count because, after trimming whitespace not ''<periodical name>''
//
// the second example 'looks' like a single italicized name so doesn't increment the counter
//
pattern = @"\{\{\s*" + IS_CS1_PERIODICAL + @"[^}]*\|\s*publisher\s*=([^\|\}]*?'{2}[^\|\}]+?'{2}[^\|\}]+)[\|\}]";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace (ArticleText, pattern, // done this way so that we can get a count
delegate(Match match)
{
string raw_periodical_name = match.Groups[1].Value; // |publisher= value
raw_periodical_name = raw_periodical_name.Trim(); // trim leading and trailing whitespace
pattern = @"^'{2}.+'{2}$"; // if first and last characters not italic markup
if (!Regex.Match (raw_periodical_name, pattern).Success)
ext_text_count++;
else
{
int count = substr_count (raw_periodical_name, @"''");
if (2 < count)
ext_text_count++;
}
return match.Groups[0].Value; // not replacing anything so always return raw match
});
}
//---------------------------< H I D E S W I T C H >--------------------------------------------------------
//
// switch from hiding all but cs1 periodical templates to hiding all but cs1 non-periodical templates
//
ArticleText = unhide (ArticleText); // unhide all templates
ArticleText = hide (ArticleText, IS_CS1_NON_PERIODICAL); // hide all templates that aren't cs1|2 templates
//---------------------------< C S 1 P U B L I S H E R & W I K I M A R K U P >-----------------------
//
// sub-task 6
//
// removes italic (and bold) markup from |publisher= in cs1 non-periodical templates
//
pattern = @"(\{\{\s*" + IS_CS1_NON_PERIODICAL + @"[^}]*\|\s*publisher\s*=\s*)'{3}([^\|\}]+)'{3}(\s*[\|\}])"; // bold
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace (ArticleText, pattern, // done this way so that we can get a count
delegate(Match match)
{
string fixed_template = match.Groups[1].Value + match.Groups[2].Value + match.Groups[3].Value;
publisher_fixed_count++;
gSkip = false;
return empty_param_remove (fixed_template); // remove empty parameters and done
});
}
pattern = @"(\{\{\s*" + IS_CS1_NON_PERIODICAL + @"[^}]*\|\s*publisher\s*=\s*)'{2}([^\|\}]+)'{2}(\s*[\|\}])"; // italic
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace (ArticleText, pattern, // done this way so that we can get a count
delegate(Match match)
{
string fixed_template = match.Groups[1].Value + match.Groups[2].Value + match.Groups[3].Value;
publisher_fixed_count++;
gSkip = false;
return empty_param_remove (fixed_template); // remove empty parameters and done
});
}
//---------------------------< H I D E S W I T C H >--------------------------------------------------------
//
// switch from hiding all but cs1 periodical templates to hiding all but cs1|2 templates
//
ArticleText = unhide (ArticleText); // unhide all templates
ArticleText = hide (ArticleText, IS_CS1); // hide all templates that aren't cs1|2 templates
//---------------------------< C S 1 | 2 W O R K & W I K I M A R K U P >-----------------------------
//
// sub-task 7
//
// DISABLED FOR EARLY RUNS OF THIS BOT -- SEE DOCUMENTATION FOR SUB-TASK 6
//
// removes italic (and bold) markup from |work= (and aliases)
//
/* pattern = @"(\{\{\s*" + IS_CS1 + @"[^}]*\|\s*" + IS_PERIODICAL_PARAM + @"\s*=\s*)'{3}([^\|\}]+)'{3}(\s*[\|\}])"; // bold
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace (ArticleText, pattern, // done this way so that we can get a count
delegate(Match match)
{
string fixed_template = match.Groups[1].Value + match.Groups[2].Value + match.Groups[3].Value;
work_fixed_count++;
gSkip = false;
return empty_param_remove (fixed_template); // remove empty parameters and done
});
}
pattern = @"(\{\{\s*" + IS_CS1 + @"[^}]*\|\s*" + IS_PERIODICAL_PARAM + @"\s*=\s*)'{2}([^\|\}]+)'{2}(\s*[\|\}])"; // italic
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace (ArticleText, pattern, // done this way so that we can get a count
delegate(Match match)
{
string fixed_template = match.Groups[1].Value + match.Groups[2].Value + match.Groups[3].Value;
work_fixed_count++;
gSkip = false;
return empty_param_remove (fixed_template); // remove empty parameters and done
});
}
*/
//---------------------------< C S 2 S K I P C O U N T >--------------------------------------------------
//
// cs2 ({{citation}}) is skipped because it isn't always possible to know if the citation refers to a periodical or to a book.
//
// cite book may use |publisher= but not with italic markup; remove the italic markup;
// disabled because editors using cite book for stuff other than books:
// {{cite book|title=Dirty Dancing|date=September 3, 2000|publisher='' [[The E! True Hollywood Story]]''}}
// {{cite book|title=Billboard 7 February 1998|publisher=''[[Billboard (magazine)|Billboard]]''|url=https://books.google.com/books?id=fQ0EAAAAMBAJ&lpg=PA59&dq=Billboard%20%22denmark%22%20%22ifpi%2Fnielsen%22%201994&hl=da&pg=PA59#v=onepage&q&f=false|accessdate=2010-12-01}}
//
pattern = @"(\{\{\s*(?:[Cc]itation|[Cc]ite ?book))([^}]*\|\s*publisher\s*=\s*'{2}[^\|\}]+'{2}\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string cap1 = match.Groups[1].Value; // prefix: hidden opening {{ and template name
string cap2 = match.Groups[2].Value; // from the template name to end of |publisher=
cs2_skip_count++; // bump the count
return cap1 + @"__1NV4L__" + cap2; // 'invalidate' the template name (so we don't loop here for ever)
});
}
pattern = @"(\{\{\s*(?:[Cc]itation|[Cc]ite ?book))__1NV4L__"; // restore the invalidated templates
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
//---------------------------< F I N I S H >------------------------------------------------------------------
ArticleText = unhide (ArticleText); // unhide all that is hidden
if (true == gSkip)
Summary = @""; // if skipping, no public edit-summary lead
else
Summary = "[[User:Monkbot/task_14: repair improper use of publisher params in cs1 templates|Task 14]]:";
// Summary = "[[User:Monkbot/task_14: repair improper use of publisher params in cs1 templates|Task 14]] ([[Wikipedia:Bots/Requests_for_approval/Monkbot_14|BRFA testing]]):";
// Summary = "[[User:Monkbot/task_14: repair improper use of publisher params in cs1 templates|Task 14]] (developmental testing):";
Summary = Summary + @" cs1 template fixes: misused |publisher= (" + fixed_count_ital + @"×/" + fixed_count + @"×);";
if (0 != unbalanced_count)
Summary = Summary + @" unbalanced (" + unbalanced_count + @"×);";
if (0 != unrecognized_periodical_count || 0 != periodical_param_conflict_count)
Summary = Summary + @" skipped:";
if (0 != unrecognized_periodical_count)
Summary = Summary + @" unrecognized periodical (" + unrecognized_periodical_count + @"×);";
if (0 != periodical_param_conflict_count)
Summary = Summary + @" conflicting periodical (" + periodical_param_conflict_count + @"×);";
if (0 != web_fixed_count)
Summary = Summary + @" fixed web site (" + web_fixed_count + @"×);";
if (0 != web_param_conflict_count)
Summary = Summary + @" skipped conflicting website (" + web_param_conflict_count + @"×);";
if (0 != work1_fixed_count)
Summary = Summary + @" fixed work alias (" + work1_fixed_count + @"×);";
if (0 != publisher_fixed_count)
Summary = Summary + @" removed markup from cs1 publisher (" + publisher_fixed_count + @"×);";
if (0 != work_fixed_count)
Summary = Summary + @" removed markup from cs1|2 work alias (" + work_fixed_count + @"×);";
if (0 != cs2_skip_count)
Summary = Summary + @" book/cs2 skip (" + cs2_skip_count + @"×);";
if (0 != ext_text_count)
Summary = Summary + @" ext text skip (" + ext_text_count + @"×);";
Skip = gSkip;
return ArticleText;
}
//===========================<< S U P P O R T >>==============================================================
//---------------------------< H I D E >----------------------------------------------------------------------
//
// HIDE TEMPLATES: find templates that are not <dont_hide>; replace the opening {{ with __0P3N__, the closing }}
// with __CL0S3__, and internal | (pipes) with __P1P3__
//
// single curly braces in urls and other parameter values can confuse other regex in this code so replace {
// with __0CU!21Y__ and } with __CCU!21Y__
//
private string hide (string ArticleText, string dont_hide)
{
string pattern = @"\{\{(?!\s*" + dont_hide + @")[^\{\}]*\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string fixed_template; // a hidden template is assembled here
string raw_template = match.Groups[0].Value; // the whole template
pattern = @"\{\{"; // hide the opening {{
fixed_template = Regex.Replace (raw_template, pattern, "__0P3N__");
pattern = @"\}\}"; // hide the closing }}
fixed_template = Regex.Replace (fixed_template, pattern, "__CL0S3__");
pattern = @"\|"; // and hide the pipes
fixed_template = Regex.Replace (fixed_template, pattern, "__P1P3__");
return fixed_template;
});
}
pattern = @"([^\{])\{([^\{])"; // single opening curly brace
ArticleText = Regex.Replace(ArticleText, pattern, "$1__0CU!21Y__$2");
pattern = @"([^\}])\}([^\}])"; // single closing curly brace
ArticleText = Regex.Replace(ArticleText, pattern, "$1__CCU!21Y__$2");
pattern = @"\[\[(?![Ff]ile|[Ii]mage)([^\|\]]+)\|([^\]]+)\]\]"; // HIDE complex wikilinks: [[article title|label]] to __WL1NK_O__article title__P1P3__label__WL1NK_C__
ArticleText = Regex.Replace(ArticleText, pattern, "__WL1NK_O__$1__P1P3__$2__WL1NK_C__"); // [[File: with wikilinks inside can be confusing
pattern = @"\[\[([^\]]+)\]\]"; // HIDE simple wikilinks: [[article title]] to __WL1NK_O__article title__WL1NK_C__
ArticleText = Regex.Replace(ArticleText, pattern, "__WL1NK_O__$1__WL1NK_C__");
return ArticleText;
}
//---------------------------< U N H I D E >------------------------------------------------------------------
//
// UNHIDE TEMPLATES: find templates and wikilinks that are hidden; replace the 'hide' keywords with the
// appropriate wiki markup
//
private string unhide (string ArticleText)
{
ArticleText = Regex.Replace(ArticleText, @"__WL1NK_O__", "[["); // UNHIDE: replace __WL1NK_O__ with [[
ArticleText = Regex.Replace(ArticleText, @"__WL1NK_C__", "]]"); // UNHIDE: replace __WL1NK_C__ with ]]
ArticleText = Regex.Replace(ArticleText, @"__P1P3__", "|"); // UNHIDE: replace __P1P3__ with |
ArticleText = Regex.Replace(ArticleText, @"__0CU!21Y__", "{"); // UNHIDE: replace __0CU!21Y__ with {
ArticleText = Regex.Replace(ArticleText, @"__CCU!21Y__", "}"); // UNHIDE: replace __CCU!21Y__ with }
ArticleText = Regex.Replace(ArticleText, @"__0P3N__", "{{"); // UNHIDE: replace __0P3N__ with {{
ArticleText = Regex.Replace(ArticleText, @"__CL0S3__", "}}"); // UNHIDE: replace __CL0S3__ with }}
return ArticleText;
}
//---------------------------< P E R I O D I C A L _ N A M E _ G E T >----------------------------------------
//
// returns the <periodical name> from one of these forms:
// <periodical name>
// [[<periodical name>]]
// [[<periodical name>|<label>]]
//
private string periodical_name_get (string raw_periodical_name)
{
string periodical_name;
periodical_name = raw_periodical_name; // make a copy
periodical_name = Regex.Replace (periodical_name, @"__WL1NK_[OC]__", ""); // replace hide wikilink open and close with empty strings
periodical_name = Regex.Replace (periodical_name, @"__P1P3__.*", ""); // replace hide wikilink pipe and label with empty strings
periodical_name = Regex.Replace(periodical_name, @"_+", " "); // replace any underscores with a single space character
periodical_name = periodical_name.Trim(); // remove leading and trailing whitespace
return periodical_name;
}
//---------------------------< P A R A M _ N A M E _ G E T >--------------------------------------------------
//
// this function queries periodical_map to see if it holds <periodical_name>. If an exact match is found,
// returns the associated parameter name (journal, newspaper, magazine, website, etc)
//
// when an exact match is not found, if the <periodical_name> has leading 'The ', the function strips it so
// 'The Newspaper' becomes 'Newspaper'; else, doesn't have leading 'The ' so adds it. Which ever the result of
// the modification, periodical_map is queried for <mod_periodical_name>. When found, returns <periodical_name>;
// empty string else.
//
private string param_name_get (Dictionary<string, string> periodical_map, string periodical_name)
{
string mod_periodical_name;
if (periodical_map.ContainsKey (periodical_name)) // if periodical is an exact match
return periodical_map[periodical_name]; // found it, so done
if (Regex.Match(periodical_name, @"^[Tt]he\s+").Success) // if periodical_name has leading 'The ' or 'the ', try without
{
mod_periodical_name = Regex.Replace(periodical_name, @"^[Tt]he\s+", "");
if (periodical_map.ContainsKey (mod_periodical_name))
return periodical_map[mod_periodical_name]; // found it return the periodical's proper parameter
}
else // periodical_name doesn't have leading 'The ', try adding it
{
mod_periodical_name = @"The " + periodical_name;
if (periodical_map.ContainsKey (mod_periodical_name))
return periodical_map[mod_periodical_name]; // found it return the periodical's proper parameter
}
if (Regex.Match(periodical_name, @"^[Ww]{3}\.").Success) // if periodical_name has leading 'www.' (case insensitive, try without
{
mod_periodical_name = Regex.Replace(periodical_name, @"^[Ww]{3}\.", "");
if (periodical_map.ContainsKey (mod_periodical_name))
return periodical_map[mod_periodical_name]; // found it return the periodical's proper parameter
}
return @""; // not found return empty string
}
//---------------------------< E M P T Y _ P A R A M _ R E M O V E >------------------------------------------
//
// This function removes all empty named parameters from a template, attempting to leave what remains the same form.
//
// this is a multi-step process that attempts to handle most of the vagaries of how templates are written in
// wikitext. In general there are three basic 'styles': horizontal – all parameters written on a single
// line of text, vertical – all parameter written singly one-to-a-line, and a mix of the two – multiple lines
// where each has one or more parameters.
//
// 1. where the parameter name & '=' are on one line and the value on a following line, put the value on the same line as the '='
// 2. for mixed, when empties are followed by new line; remove the empty but leave the newline
// 3. for any, empties are followed by pipe closing }; remove the empty but leave the | or }
// 4. the preceding steps can leave blank lines; remove the blank lines
//
private string empty_param_remove (string template)
{
string pattern = @"(\|[^=]+=[ \t]*)[\r\n]+(?!\s*[\|\}])"; // parameter name & '=' on one line, value on a following line
while (Regex.Match(template, pattern).Success) // put them on the same line
template = Regex.Replace(template, pattern, "$1");
pattern = @"\|[^=]+=[ \t]*([\r\n]+)"; // empty followed by new line
while (Regex.Match(template, pattern).Success)
template = Regex.Replace(template, pattern, "$1");
pattern = @"\|[^=]+=\s*([\|\}])"; // empty followed by pipe or at end of template
while (Regex.Match(template, pattern).Success)
template = Regex.Replace(template, pattern, "$1");
pattern = @"([\r\n]+)[ \t]*[\r\n]+"; // close up multiple new lines
while (Regex.Match(template, pattern).Success)
template = Regex.Replace(template, pattern, "$1");
return template;
}
//---------------------------< T E M P L A T E _ R E N A M E >------------------------------------------------
//
// replaces existing name in <template> with <name>
//
private string template_rename (string template, string name)
{
string pattern = @"\{\{[^\|]+?(\s*\|)";
return Regex.Replace(template, pattern, "{{" + name + "$1");
}
//---------------------------< P A R A M E T E R _ R E N A M E >----------------------------------------------
//
// replaces <old_param> with <new_param> in <template>; removes italic wiki markup
//
private string nparameter_rename (string template, string old_param, string new_param_name, string new_param_val)
{
string pattern = @"(\|\s*)" + old_param + @"(\s*=\s*)[^\|\}]+?(\s*[\|\}])";
// return Regex.Replace(template, pattern, "$1" + new_param_name + "$2" + new_param_val + "$3");
return Regex.Replace(template, pattern, // done this way because domain name can begin with digits so:
delegate(Match match) // 24dash.com concatenated on "$2 makes capture $224 which is empty
{ // and so gives |website$224dash.com; this method avoids that
return match.Groups[1].Value + new_param_name + match.Groups[2].Value + new_param_val + match.Groups[3].Value;
});
}
//---------------------------< S U B S T R _ C O U N T >------------------------------------------------------
//
// returns the number times of <substr> appears in <str>; not found returns 0
//
private int substr_count (string str, string substr)
{
int count = 0;
int index = 0;
while ((index = str.IndexOf(substr, index)) >= 0)
{
count++;
index += substr.Length;
}
return count;
}
//Monkbot_task_14_repair_improper_use_of_publisher_params.cs