Wikipedia talk:AutoWikiBrowser/Typos/Archive 2
This is an archive of past discussions on Wikipedia:AutoWikiBrowser. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 |
Development list
Would it be useful to have a page where you can test new regexes that will be loaded either with, or instead of, the main typo list, so you can debug live/reduce chances of causing problems to live lists?
—Reedy 15:51, 2 August 2008 (UTC)
- I think testing should be done in Find&Replace. However, it would be FKING AWESOME if there was an "export to RETF" feature of Find&Replace once I'm done testing. --mboverload@ 17:51, 2 August 2008 (UTC)
- Just thinking, it wouldnt be difficult to have it copy it to clipboard as a typo style rule as an option.... —Reedy 20:44, 22 August 2008 (UTC)
- That would be most neato.--mboverload@ 19:51, 25 August 2008 (UTC)
- Just thinking, it wouldnt be difficult to have it copy it to clipboard as a typo style rule as an option.... —Reedy 20:44, 22 August 2008 (UTC)
zero-width assertions and performance
I think that starting a search string with a zero-width look-ahead and then the desired search string, usually used to exclude certain proper names, is harder on performance than either avoiding the zero-width assertions or using a zero-width look-behind assertion after the desired search string. Putting it at the beginning doubles the effort on things like Tremelo: at each check point (in this case, between every letter), see if Tremelo is the next string; if not, see if tremelo/Tremelo/tremelos/Tremelos is the next string; if so, replace the middle with remolo. I replaced it with a (buggy, but now fixed) version with no zero-width assertions, but the look-behind version would have been: see if tremelo/Tremelo/tremelos/Tremelos is the next string; if so, and it doesn't end with an s, make sure it wasn't Tremelo; if it wasn't, replace the middle with remolo. So the extra check is only made once AWB has gotten a possible match, not on every spot.
There are a couple of other places where a similar change could be made, but I remember some possible problem with the look-behinds and some of the other tools that use this list, so I'd like to open it up for discussion first. -- JHunterJ (talk) 11:47, 25 August 2008 (UTC)
- Hmm, the zero-width look-aheads at the start of a rule are very useful, and performance of the typo list as a whole seems good to me, so I would be cautious about changing them. Could you provide an example of how the Tremelo rule would work with a look behind, as the current rule after your change and my fix looks confusing, though it works correctly? Thanks Rjwilmsi 12:07, 25 August 2008 (UTC)
- Something like this:
<Typo word="Tremolo" find="\b(T|t)remelo(s\b|\b(?<!Tremelo))" replace="$1remolo$2"/> <!-- don't match the place name Tremelo -->
- So, match either T or t, then remelo, then either an s at the end of the word, or if we're already at the end of the word with "remelo", look back and make sure we didn't just see Tremelo. The only time we stop to look around is after we've already matched either Tremelo or tremelo. -- JHunterJ (talk) 12:18, 25 August 2008 (UTC)
find="\b(T|t)remelo(s)?\b(?<!Tremelo)"
- That seems to work just as well (though XML markup is wrong...). If it's true then the change is simply to move all
(?!foo)bar
tobar(?<!foo)
, and the question is whether this causes problems for other tools using the typo list? Rjwilmsi 12:44, 25 August 2008 (UTC)- Yours looks behind for "Tremelo" even in the case where we might have found an s at the end of the word. It should be possible to look behind only when the looked-for word could possibly appear, but either should perform better than starting with the look-ahead. -- JHunterJ (talk) 12:59, 25 August 2008 (UTC)
- One other option to use the current version with possibly less confusion:
<Typo word="Tremolo" find="\b(?|(T)remelo(s)|(t)remelo(s)?)\b" replace="$1remolo$2"/>
- ?| is a "branch-reset" grouping, so each alternative therein should start numbering at 1. (Perl 5.10.0 and later). I can test it this evening (EST) if no one does so before then. -- JHunterJ (talk) 16:52, 25 August 2008 (UTC)
- It looks good for this example but I thought we wanted a general solution/standard. I think my suggestion is the most simple/general so far, but we need some performance data to see if it improves on the current entries using (?!blah). Rjwilmsi 17:14, 25 August 2008 (UTC)
- Why does the solution need to be generic? I'd prefer the slight complication of inserting the look-behind only where it can match. -- JHunterJ (talk) 17:18, 25 August 2008 (UTC)
- (?| ... ) is an unrecognized grouping construct, according to the regexp tester in AWB. So the last bit is moot. -- JHunterJ (talk) 00:09, 26 August 2008 (UTC)
- It looks good for this example but I thought we wanted a general solution/standard. I think my suggestion is the most simple/general so far, but we need some performance data to see if it improves on the current entries using (?!blah). Rjwilmsi 17:14, 25 August 2008 (UTC)
Double word
"My my" seems common enough to leave? Rich Farmbrough, 22:22 24 August 2008 (GMT).
- Or perhaps first replaced by "My, my"? -- JHunterJ (talk) 11:54, 25 August 2008 (UTC)
- IDK, perhaps, the Abba song is where I've bumped up against it. Rich Farmbrough, 14:23 26 August 2008 (GMT).
- Remove "my" and "on" from the doubled word check. -- JHunterJ (talk) 23:38, 26 August 2008 (UTC)
- IDK, perhaps, the Abba song is where I've bumped up against it. Rich Farmbrough, 14:23 26 August 2008 (GMT).
Exclusion of words from title
This is another obvious way to avoid false positives. See for example nor'easter. Rich Farmbrough, 18:33 26 August 2008 (GMT).
- Not the (perfectly reasonable) fix suggested, but I did except nor'easter from the easter match. -- JHunterJ (talk) 23:55, 26 August 2008 (UTC)
- I highly support this feature being added to AWB. --mboverload@ 04:45, 27 August 2008 (UTC)
Proposal for simplification of some rules
The typo rule standard seems to be to explicitly match all endings of a word when the typo is in the start/middle of a word. It seems to me we could simplify such rules. Example:
<Typo word="Interfere" find="\b(I|i)ntefer(e[ds]?|ence|ing)\b" replace="$1nterfer$2" />
Here it's clear that the error is a missing 'r' in the middle of the word and there's no ambiguity about which word this applies to, so the following would achieve the same result (edit summary would stay the same):
<Typo word="Interfere" find="\b(I|i)ntefer([a-z]+)\b" replace="$1nterfer$2" />
I think if we adopted such a convention for such situations (some if not a majority of the typo rules) by using [a-z]+
or [a-z]*
we would benefit from: shorter rules, easier maintenance and easier addition of new rules. I would like feedback from others as to whether this seems like a good idea, particularly if there would likely be any performance change to the rules? Thanks Rjwilmsi 12:15, 25 August 2008 (UTC)
- I agree. Or even
<Typo word="Interfere" find="\b(I|i)ntefer(\w+)" replace="$1nterfer$2" />
- -- JHunterJ (talk) 12:23, 25 August 2008 (UTC)
- Sounds like an idea. Rjwilmsi, AWB has typo profiling... It might be worth me creating a temporary page with a few of these changed rules, and time them against the old version. —Reedy 12:43, 25 August 2008 (UTC)
- Yes please Reedy, and the lookbehind / lookahead change in the above section too, if possible. Thanks Rjwilmsi 12:45, 25 August 2008 (UTC)
- Sounds like an idea. Rjwilmsi, AWB has typo profiling... It might be worth me creating a temporary page with a few of these changed rules, and time them against the old version. —Reedy 12:43, 25 August 2008 (UTC)
Profiling
Against Alexandria (Tends to be MaxSem's standard test case, long article)
[44, \b(I|i)ntefer(\w+) > $1nterfer$2] [44, \b(I|i)ntefer([a-z]+)\b > $1nterfer$2] [43, \b(I|i)ntefer(e[ds]?|ence|ing)\b > $1nterfer$2] [41, \b(I|i)ntefer(\w+) > $1nterfer$2] [41, \b(I|i)ntefer([a-z]+)\b > $1nterfer$2] [41, \b(I|i)ntefer(e[ds]?|ence|ing)\b > $1nterfer$2] <application restarted> [45, \b(I|i)ntefer([a-z]+)\b > $1nterfer$2] [44, \b(I|i)ntefer(\w+) > $1nterfer$2] [44, \b(I|i)ntefer(e[ds]?|ence|ing)\b > $1nterfer$2] [42, \b(I|i)ntefer(\w+) > $1nterfer$2] [42, \b(I|i)ntefer([a-z]+)\b > $1nterfer$2] [42, \b(I|i)ntefer(e[ds]?|ence|ing)\b > $1nterfer$2]
When the regexes have been run for the first time, they are quicker than the original run, but have the same execution time.
It would seem, that according to that, the execution time is slightly better (its in milliseconds) on the more verbose one
—Reedy 17:18, 25 August 2008 (UTC)
- Hmm, I'd have said that within the measurement error there's no difference between the three. Perhaps we should try a longer one, where the advantage of simplification would be greater. Maybe:
<Typo word="(In)Different" find="\b(D|d|[Ii]nd)if(?:er?|f[ai]?)ren(t|tly|ce[sd]?|cing|tia(ls?|te[ds]?|ting|tions?|ble|bility|e?))\b" replace="$1ifferen$2" />
- Thanks. Rjwilmsi 17:30, 25 August 2008 (UTC)
- I was thinking that myself to be honest. Is it a case of replacing the capture groups with \w+ and [a-zA-Z]+? (just thinking that it would be case sensitive as it is) —Reedy 18:03, 25 August 2008 (UTC)
- Yes, I would envisage using a \w+ or \w* as appropriate to make suitable rules shorter and more readable, make it easier to add new rules and potentially to catch endings that have been missed to date, while supporting all existing fixes. By using \w+ rather than just cutting off the regex, we will display the complete word changed in the edit summary.
- If there are no objections I'll start making a few changes tomorrow. Rjwilmsi 18:32, 25 August 2008 (UTC)
- As a side thought, it will help reduce the size of the page to be loaded aswell, which cant be a bad thing. —Reedy 20:27, 27 August 2008 (UTC)
- I was thinking that myself to be honest. Is it a case of replacing the capture groups with \w+ and [a-zA-Z]+? (just thinking that it would be case sensitive as it is) —Reedy 18:03, 25 August 2008 (UTC)
broke again
It appears it is broken again. error picture--Rockfang (talk) 22:36, 27 August 2008 (UTC)
- It's fixed again ;) Rjwilmsi 23:40, 27 August 2008 (UTC)
Avoiding false positives on scientific (Latin) names
One of the most common false positives I come across seems to be matching on lowercase words in scientific (Latin) names. An example would be ''Blah carolina'' (what Blah is doesn't matter here). These are matched by rules like \bcarolina\b
as the regex \b
includes a '
. So the rule wants to be \b
but not '
. I'm struggling to find a neat way to do that beyond an explicit set of [\.,\s-]
(since there are many entries that could do with this change). Anybody have any ideas? Rjwilmsi 11:29, 29 August 2008 (UTC)
- Add a zero-width negative look-ahead to make sure the next character isn't an apostrophe:
\bcarolina\b(?!')
. But this will prevent a match on "I forgot to capitalize south carolina's initials." So make it look for two apostrophes:\bcarolina\b(?!'')
. -- JHunterJ (talk) 12:37, 29 August 2008 (UTC)- Interesting, I'll test that later. Rjwilmsi 12:44, 29 August 2008 (UTC)
I have been putting these in {{lang|lat|Tuxedo carolina}} templates. But it's not satisfactory. I'd rather have a scientific name mark-up. Rich Farmbrough, 19:36 1 September 2008 (GMT).
- When I did that I got told off by WP:PLANT people! JHunterJ's
(?!')
works well though. Rjwilmsi 23:04, 1 September 2008 (UTC)- A separate "scientific name" markup would be nice. We still give up catching "I forgot to capitalize bastard out of carolina." with the current solution. -- JHunterJ (talk) 02:27, 2 September 2008 (UTC)
Typo bug
distictly goes to districtly, when context makes it clear it should be distinctly. Should this go here, or in the main AWB bugs section? gnfnrf (talk) 18:57, 30 August 2008 (UTC)
- Here's the right place. I've added a new rule to cover 'distictly'. Thanks Rjwilmsi 00:50, 31 August 2008 (UTC)
qualified → qualifed
AWB tries to do the following: qualified → qualifed I can't figure if qualifed is even a word, it doesn't look right. Thanks. §hep • ¡Talk to me! 17:57, 2 September 2008 (UTC)
- My changes earlier broke this fix. It's correct now after this fix. Thanks Rjwilmsi 18:23, 2 September 2008 (UTC)
"approxiatemately" → "approximatemately"
The title of this section is a regex bug I just found. I might come back and fix it myself later, but I'm simply noting it here for now. {{Nihiltres|talk|log}} 16:53, 3 September 2008 (UTC)
- Thanks, I've fixed it. Rjwilmsi 17:00, 3 September 2008 (UTC)
- Great, that was fast. :) {{Nihiltres|talk|log}} 17:15, 3 September 2008 (UTC)
Das ist borked
I'm getting a duplicate rule error in AWB while trying to load errors. I can screenshot it if needed.--Rockfang (talk) 00:19, 5 September 2008 (UTC)
- Fixed yesterday. Rjwilmsi 00:22, 7 September 2008 (UTC)
spanish word for "effect"
As far as I know, the Spanish word for "effect" is "efecto". Currently, the typo part of AWB is seeing "Efecto" and suggesting it be changed to "Effecto". I'm not sure if/how other languages are tied into the typo fixing, but we may want to remove this fix.--Rockfang (talk) 20:28, 5 September 2008 (UTC)
The same is happening for the spanish word for "different". It sees "Diferente", and suggests "Differente".--Rockfang (talk) 20:33, 5 September 2008 (UTC)
- The idea with foreign text is to use the {{lang|es|effecto}} language tags, then the English typo fixes aren't applied to it. Rjwilmsi 20:39, 5 September 2008 (UTC)
- Did you mean {{lang|es|efecto}}? That is the proper Spanish spelling of the word.--Rockfang (talk) 20:43, 5 September 2008 (UTC)
- Doh! Yes, though I was just providing an example of the template syntax. Rjwilmsi 21:25, 5 September 2008 (UTC)
- Thanks for the reply and the info. I didn't even know of that template.--Rockfang (talk) 21:30, 5 September 2008 (UTC)
- Doh! Yes, though I was just providing an example of the template syntax. Rjwilmsi 21:25, 5 September 2008 (UTC)
- Did you mean {{lang|es|efecto}}? That is the proper Spanish spelling of the word.--Rockfang (talk) 20:43, 5 September 2008 (UTC)
"annoucned" → "announcned"
AWB is currently suggesting the above change. This should probably be fixed/changed.--Rockfang (talk) 21:47, 6 September 2008 (UTC)
- Well spotted. This edit will catch it. Example of successful edit. Rjwilmsi 00:21, 7 September 2008 (UTC)
error in spellchecker
erroneously changes "spacious" to "spacitous" and "capacious" to "capacitous" Ling.Nut (WP:3IAR) 05:55, 4 September 2008 (UTC)
- I found the error. Removed the following for scrutiny:
- <Typo word="-pacity" find="\b(\w+?)paci(y|ous)\b" replace="$1pacit$2" />
- Ling.Nut (WP:3IAR) 06:01, 4 September 2008 (UTC)
- Fixed with two rules, one for aciy -> acity, one for acitous -> acious. -- JHunterJ (talk) 11:14, 5 September 2008 (UTC)
- Do you really want that "p" in the Replace field?--BillFlis (talk) 12:56, 5 September 2008 (UTC)
- Nope; I fixed just now. Thanks! -- JHunterJ (talk) 13:53, 6 September 2008 (UTC)
- Do you really want that "p" in the Replace field?--BillFlis (talk) 12:56, 5 September 2008 (UTC)
- Fixed with two rules, one for aciy -> acity, one for acitous -> acious. -- JHunterJ (talk) 11:14, 5 September 2008 (UTC)
Also, it erroneously changes "acompany" to "anccompany" rather than "accompany" (diff). — Jeff G. (talk|contribs) 11:37, 7 September 2008 (UTC)
writter → writer
If there is a rule for the above, it's not working. If there is not, please make one. Thanks! — Jeff G. (talk|contribs) 11:53, 7 September 2008 (UTC)
- Existing rule expanded. Thanks Rjwilmsi 13:00, 7 September 2008 (UTC)
emminent
emminent currently corrects to eminent; sometimes it should become imminent instead; hypothetically, it could also be a mistaken immanent. Even though emminent is never correct, we might need to delete it. Or consider ways to eliminate the false fixes. "an emminent" might work to still catch some eminents with little chance of intending imminent, for example. -- JHunterJ (talk) 16:34, 25 August 2008 (UTC)
- Pity we can't offer the editor choices. Rich Farmbrough, 14:28 26 August 2008 (GMT).
- There is a feature request for something like that... —Reedy 17:04, 27 August 2008 (UTC)
- I requested a feature like this on 19 September last year, but the suggestion was never picked up
- There is a feature request for something like that... —Reedy 17:04, 27 August 2008 (UTC)
Status | New |
---|---|
Description | There are quite a lot of typos that have had to be rejected for the RETF page because either the correction isn't unambiguous (e.g. 'distict' could be a typo for 'district' or 'distinct', or because it's valid in one context, but not in another e.g. 'Valparaiso' is correct when referring to Valparaiso, Florida, but should be corrected to Valparaíso when referring to the city in Chile.
I'd like suggest an enhancement to AWB to help with situations like those. There would be a new 'Ambiguous Typos' list, much like the current 'Typos' list, with entries along the lines of <AmbigTypo find="\b([Dd])istict\b" replaceOptions="$1istrict,$1istinct"> AWB would read this list and, on finding the RegEx value in an article, would present a panel much like the current link disambiguation panel, for the AWB user to select from the listed replace options. |
Added in revision |
Colonies Chris (talk) 10:57, 12 September 2008 (UTC)
Proper name getting mangled
"Beliveau", as in Jean Béliveau/"Jean Beliveau", should not be changed to "Believeau". Thanks, {{Nihiltres|talk|log}} 17:11, 10 September 2008 (UTC)
- What if we just change each "Beliveau" to "Béliveau" (with accent acute)?--BillFlis (talk) 17:18, 10 September 2008 (UTC)
- Exception added for Beliveau in the meantime. We could probably add the accent fix too. Rjwilmsi 17:24, 10 September 2008 (UTC)
- I had heard that for whatever reason, the diacritics for names like this are being excluded from some pages (e.g. Montreal Canadiens); I don't think AWB should be making the correction for the accent (despite that I think we should have the accents; consensus overrules my preference). {{Nihiltres|talk|log}} 12:44, 11 September 2008 (UTC)
- Yes, there's an agreement over at WP:HOCKEY that players' names don't show accents in the NHL context, because the NHL jerseys don't use them. But they're used in the player's own article. So AWB can't do it as a general fix. Colonies Chris (talk) 11:34, 12 September 2008 (UTC)
- I had heard that for whatever reason, the diacritics for names like this are being excluded from some pages (e.g. Montreal Canadiens); I don't think AWB should be making the correction for the accent (despite that I think we should have the accents; consensus overrules my preference). {{Nihiltres|talk|log}} 12:44, 11 September 2008 (UTC)
dispicable
"dispicable" should probably become "despicable", not "despairicable"
I didnt make the change anyway, as it was in quotes on the target page
—Reedy 12:34, 11 September 2008 (UTC)
availble → availab$2
Could someone fix whatever's doing this please? Colonies Chris (talk) 10:42, 12 September 2008 (UTC)
- Brackets were missing. Fixed. Rjwilmsi 11:12, 12 September 2008 (UTC)
Buddah
It has recently come to my attention that AWB recommends a correction of Buddah to Buddha. This is a very problematic correction because of the famous record label, Buddah Records, often shortened to just Buddah. There are probably a few hundred pages which mention Buddah Records, and because of this I'd like to ask that this correction be removed from the list. Chubbles (talk) 16:23, 13 September 2008 (UTC)
- Exception added so that Buddah Records isn't changed. Rjwilmsi 17:28, 13 September 2008 (UTC)
- I changed the expression from
- Make sure the current position doesn't lead into Buddah Records
- Look for Buddah
- to
- Look for Buddah
- Make sure the current position doesn't lead into Records
- which should be better-performing, as it has about half as much work to do. -- JHunterJ (talk) 19:05, 13 September 2008 (UTC)
- I changed the expression from
Graph fixes
Graph looks like a good candidate for wholesale replacement, without trying to identify all the prefixes and suffixes. Can we fix any instance of "grpah", regardless of surrounding letters, or is there a false positive that that would hit? -- JHunterJ (talk) 20:17, 16 September 2008 (UTC)
- We'll soon find out ;) Rjwilmsi 20:27, 16 September 2008 (UTC)
displease
This is currently replacing unpleased with displease$2.--balloonguy (talk) 21:56, 17 September 2008 (UTC)
Wonderful resource for spelling errors uncaught by AWB
- Go look at History of Ethiopia. I have already manually changed several spelling errors that AWB didn't catch (look at the diff of my AWB edit as well; I manually changed a few there as well). If you keep looking, you'll probably spot more. Ling.Nut (talk—WP:3IAR) 01:43, 20 September 2008 (UTC)
- sceptre is a word. New typos are "enroach", "asecended", "ephipany" added to typo list; Ethiopia article fixed. Thanks Rjwilmsi 08:45, 20 September 2008 (UTC)
variations on "accede" changed to "ascend"; rmvd offending regex
Here ya go. I'd fix it myself, but I'm busy washing dishes with WP:AWB:
<Typo word="Ascend" find="\b(A|a)(?:cce|sece)n?(sions?|d(?:ed|ing|s)?)\b" replace="$1scen$2" />
Noth shouldnt be changed to North
As per usage in Deuteronomist, its legit.
Or at least \bNoth\b shouldnt be, others are alright to be changed
—Reedy 11:19, 11 September 2008 (UTC)
- Fixed. -- JHunterJ (talk) 11:50, 11 September 2008 (UTC)
- Don't overlook the comment I posted few minutes ago (below), but another noth-north problem at Australian English. Ling.Nut (talk—WP:3IAR) 11:02, 23 September 2008 (UTC)
Aberravon
Something, probably this regex
<Typo word="Aberration" find="\b(A|a)b(?:ber?|e)ra(\w+)\b" replace="$1berra$2" />
is wrongly converting Aberavon to Aberravon, but I'm not sure how to fix it. Colonies Chris (talk) 18:56, 24 September 2008 (UTC)
quatermain & quaternion --> quartermain & quarternion
- Attempt to fix "quater-->quarter" hoses words that legitimately contain "quater-". Ling.Nut (talk—WP:3IAR) 10:45, 23 September 2008 (UTC)
- ...doesn't catch plurals; see Cross product forex. Ling.Nut (talk—WP:3IAR) 08:31, 27 September 2008 (UTC)
- obvious problems. Ling.Nut (talk—WP:3IAR) 15:13, 25 September 2008 (UTC)
medially→medically
The above change is in the list of typos. I suggest it be removed as medially is a word.--Rockfang (talk) 21:23, 4 October 2008 (UTC)
- Exception added. Thanks Rjwilmsi 22:36, 4 October 2008 (UTC)
passable
The ending "-(s)ible" incorrectly converts "passable/-ably/-ability" to "passible/-ibly/-ibility". I think it's fixable by replacing [Pp](?:[ao]s|lau)
with [Pp](?:os|lau)
, but I'm not confident enough to do it . Am I close? —SMALLJIM 22:15, 4 October 2008 (UTC)
- Yes. Change made. Thanks Rjwilmsi 22:36, 4 October 2008 (UTC)
Xbox
\b(?i)xbox\b
Wouldnt using something like the above, make more sense? Ie do it all case insensitive, and therefore it'll match any of the variations (save having various hardcoded versions). —Reedy 10:23, 5 October 2008 (UTC)
- As long as it doesn't match the correct variation (and isn't a performance problem). -- JHunterJ (talk) 12:03, 5 October 2008 (UTC)
\b((?i)xbox)\b(?<!Xbox)
perhaps -- JHunterJ (talk) 12:09, 5 October 2008 (UTC)
AWB replaces noting with nothing
Status | New |
---|---|
Description | AWB replaces "noting" by "nothing". Noting is a word.Headbomb {ταλκ – WP Physics: PotW} 15:12, 5 October 2008 (UTC) |
To duplicate: | Type noting in your sandbox, run AWB |
Operating system | Win XP sp 3 |
.NET FW Version | Unknown |
AWB version | 4.4.1.0 |
Workaround | Manually telling AWB to not replace noting by nothing everytime. |
Fixed in version | Unknown |
- Exception added. Thanks Rjwilmsi 15:21, 5 October 2008 (UTC)
This page is HUGE!
My Opera hangs for at least 10 seconds when loading the typos page. This is intolerable, let's take some measures to reduce it. I've already tweaked AWB to use Gzip compression when loading typos, but the list is still huge, and this has no effect on people who maintain or view the list from their browsers.
- I've also dropped the requirement for word="foo" attribute to be present in the rules in the next version of AWB, so removing them all will somewhat reduce the size, but will make it harder to understand what a rule is supposed to do.
- We could also replace those fancy <syntaxhighlight lang=""> tags with simple <pre>'s - that will reduce the size of HTML output by about 225 bytes per rule, at expence of a bit of human readability.
Please opine. MaxSem(Han shot first!) 15:55, 4 October 2008 (UTC)
- I wouldn't say it's intolerable. I would prefer to have a large, comprehensive list at the expense of a slow-ish load time than have some arbitrary size limit. There aren't many users who regularly load the page in browser so I'm not sure it's such an issue. Some changes have already reduced the size a little, any further changes to reduce it without cutting functionality/usability are welcome. Rjwilmsi 16:56, 4 October 2008 (UTC)
- I agree, it's too big. I'm reluctant to edit it because loading and saving it - even just a section - takes so long. Could it be split alphabetically, or by category, and recombined by AWB when loading it? Colonies Chris (talk) 14:10, 7 October 2008 (UTC)
Thunderdome
... is replaced by Thundredome. Colonies Chris (talk) 14:05, 7 October 2008 (UTC)
superintendants → superintendentts
I even missed this one, though fortunately someone else caught me before I found it in my contributions just a moment ago. {{Nihiltres|talk|log}} 05:17, 21 October 2008 (UTC)
- BillFlis fixed the incorrect entry yesterday. Thanks Rjwilmsi 11:15, 21 October 2008 (UTC)
Complaints to Compliants
Could someone fix this? Thanks, I wish I could but I am an utter newb at regex. Blame the actual edit on me being half asleep. :P — neuro(talk) 21:49, 22 October 2008 (UTC)
- I fixed it yesterday. Thanks Rjwilmsi 07:03, 23 October 2008 (UTC)
How did this happen?
I really don't get this one --Closedmouth (talk) 07:21, 27 October 2008 (UTC)
- Now fixed.
- BTW, do you have use the typoscan plugin? Try it if you don't (place the typoscan.dll next to autowikibrowser.exe then edit as normal and look at the typos tab) – for one thing you can then use it to report which regex is wrong. Thanks Rjwilmsi 07:54, 27 October 2008 (UTC)
- I do use TypoScan, but I didn't even think of doing that. Silly me. --Closedmouth (talk) 11:04, 27 October 2008 (UTC)
disitributed -> dissitributed error
\b(D|d)isi([a-kmo-z]|m[a-nq-z])\B to $1issi$2 causes the error in the title. Don't know how to fix it. --Closedmouth (talk) 05:56, 28 October 2008 (UTC)
- fixed by expanding "Distribute" to catch "disitribute-". Thanks Rjwilmsi 08:10, 28 October 2008 (UTC)
Vertibrate to Vnvertebrate
Not sure what is causing this. — neuro(talk) 00:09, 29 October 2008 (UTC)
- I think I fixed it here. --Closedmouth (talk) 08:53, 29 October 2008 (UTC)
Add ecspecailly -> especially
The "especially" regex is much too complicated for me. Diff. --Closedmouth (talk) 09:05, 29 October 2008 (UTC)
- That should do it. Rjwilmsi 12:07, 29 October 2008 (UTC)
Hyphenation Sea and Grand
OK.. first my admission my OED is missing Vols VI, VII, and X so I couldn't check some other hyphenations. But for "grand-" used as a familial relation quantifier my OED has:
- Grand-aunt
- Grand-dad and granddad
- Grand-daughter
- Grandfather
- Grandmama
- Grandmother
- Grand-nephew
- Grand-niece
- Grandpapa
- Grandparent
- Grandpaternal
- Grandsire
- Grandson
- Grand-uncle
For "sea" words where the idea of sea is part of the meaning, over 36 pages all compound words (mainly things like sea-fox) were hyphenated except where shown in the following list:
- Seafaring and seafarer but sea-fare
- Seaman
- Sealess
- Seamost
- Seaport (but example contains hyphen)
- Seaquake and sea-quake
- Seaside and sea-side
- Seaweed
- Seaworthiness (but example contains hyphen)
- Seaworthy
The following were split into two words
- Sea air
- Sea cucumber
- Sea legs
- Sea spider (but example contains hyphen)
This is, of course, not to say that other presentations of these words are wrong.
Rgds, Rich Farmbrough, 14:40 26 August 2008 (GMT).
- It is worth asking: What edition of the OED do you have? Hyphenation has most certainly not remained constant over the centuries. "Deluxe", for example, appears in some old texts as "de-luxe". --Cryptic C62 · Talk 04:05, 17 November 2008 (UTC)
Disimpaction to dissimpaction
The former is correct. — neuro(talk) 11:57, 29 October 2008 (UTC)
Historical
AWB regularly tells me to change "an historical" to "a historical." Can this be fixed? --Andrew Kelly (talk) 04:14, 4 November 2008 (UTC)
- Same here, I get this error a lot when with NRHP articles. §hep • ¡Talk to me! 01:09, 13 November 2008 (UTC)
- What is wrong with this? 'A historical' is correct, 'an historical' is not. AWB is doing the right thing. — neuro(talk) 10:57, 23 November 2008 (UTC)
First and only
Only things invariably are the first: the phrase has inexcusable redundancy and should read "only". But there are very occasional false positive - First and Only (book), "first (and only the first). Is this annoyance within the scope of AWB regexp magic? --Tagishsimon (talk) 23:43, 12 November 2008 (UTC)
- I don't want to cut and paste the whole thing here, but there is more detail and discussion of pitfalls on the bot request page. 71.156.33.248 (talk) 04:25, 13 November 2008 (UTC)
- See Wikipedia:Bot requests/Archive 23#Repeat edit--Tagishsimon (talk)
- I can create and add a rule for this tomorrow, but it might have to be removed again if there are too many false positives that can't be handled as exceptions within the rule. Rjwilmsi 00:09, 14 November 2008 (UTC)
- See Wikipedia:Bot requests/Archive 23#Repeat edit--Tagishsimon (talk)
- Thank you; I'm very grateful. The need to remove it if too many FPs is well understood; it'll be interesting to hear if that need transpires. --Tagishsimon (talk) 01:04, 14 November 2008 (UTC)
- Fix now added. I'll aim to have a run through some matches over the weekend. Rjwilmsi 13:54, 14 November 2008 (UTC)
- Thank you; I'm very grateful. The need to remove it if too many FPs is well understood; it'll be interesting to hear if that need transpires. --Tagishsimon (talk) 01:04, 14 November 2008 (UTC)
"The only" doesn't necessarily mean the first in contexts outside this encyclopedia, is there a section for matching inside the project?
- Could you expand on that impenetrable assertion? What "first and only" things, if reduced to "only" would no longer be first? --Tagishsimon (talk) 18:53, 14 November 2008 (UTC)
- I'm thinking about small wikis that might copy our rules and don't have the manpower or realize when something is no longer "only" to change it to "first". — Dispenser 19:58, 14 November 2008 (UTC)
Coming here from this comment. I think many things can be the only without being first -- the thrift shop example I gave, or what if as in the example I cited at Rjwilmsi's page, GSH no longer performs that in the future but instead Nyack Hospital does. Or if they both do, GSH is no longer the only, but was the first. To me I think it adds ambiguity. StarM 16:04, 16 November 2008 (UTC)
- I must say, I quite agree with StarM. Dilemma: Poison xxx cannot be detected by current autopsy techniques. It is, in fact, the only poison that cannot be detected. If, however, there was, at some point in the past, a poison which could not be detected for a significant chunk of history, but which recent medical advancements have made entirely detectable, poison xxx would be only, but not first. If, on the other hand, by some strange coincidence poison xxx were the first poison of which we have accurate record of being undetectable, and that undetectability were to somehow remain true unto this day, it would be first and only. While I do agree that in many cases, having both words is often redundant, it seems obvious to me that, depending on the surrounding context, adding first may very well help to clarify things. An anti-redundancy task force may work in this case. AWB would not. --Cryptic C62 · Talk 04:03, 17 November 2008 (UTC)
- Also, sometimes we might want to replace "is the first and only" with "was the first and remains the only". As in the case of two institutions, say, where the second one closes, so the first was not always the only. There is also the converse, where the first closes and the second remains the only: "second and only"!--BillFlis (talk) 12:42, 17 November 2008 (UTC)
Add me to those who disagree with this regex being there, though the false positive issue is a side matter. I don't think this phrase is necessarily "bad" all the time. I'm sure some people dislike the phrase due to considering it trite, but that's hardly reason to eliminate it entirely. It indicates that by default the event was supposed to be just the "first" one, but something went horribly wrong and it ended being the "only" one as well. Bad usage example: Eddie Gaedel's first, and only, at-bat was on August 19, 1951. Obviously just "only" suffices here. Decent usage example: Bob chaired the meeting for his first, and only, time on Saturday. (but it's later explained it was such a disgrace he got kicked off the council) This becomes even more clear if you think of examples where the adjective first has become part of the word: "His First Communion was his only one." SnowFire (talk) 00:51, 24 November 2008 (UTC)
- I have to disagree also, although I think First and only is not the best phrase to use it will often add clarity and by changing it to only may remove that clarity. Darryl.matheson (talk) 01:59, 24 November 2008 (UTC)
Draughts to droughts
I have no idea with regex here, anyone know how to fix this? — neuro(talk) 20:05, 18 November 2008 (UTC)
When going from womens' to women's, AWB bunches the words together
Like Womens' tennis to Women'stennis. — neuro(talk) 10:55, 23 November 2008 (UTC)
- That should fix it. Rjwilmsi 11:09, 23 November 2008 (UTC)
Unacceptable to unacceptible
I got this incorrect change in the article Hyperion sewage treatment plant. Auntof6 (talk) 01:51, 28 November 2008 (UTC)
Later got same error (plus acceptable to acceptible) again in other articles. Auntof6 (talk) 02:23, 28 November 2008 (UTC)
- BillFlis has fixed the error. Rjwilmsi 12:33, 28 November 2008 (UTC)
A few more that need fixing
- From Chester P: constructivly to construct$1ively (s/b constructively)
- From Canadian Gaelic: Pronunciation to Pronounciation (already correct)
- From Giovanni Lanfranco: Annunciation to Announciation (already correct)
- From Kirklareli: "the womens' bath" to "the women's' bath" (s/b "the women's bath")
The articles now have the correct spellings: I mention them in case you need to see the original text. - Auntof6 (talk) 04:43, 28 November 2008 (UTC)
- All four should now be fixed following these changes. Rjwilmsi 12:48, 28 November 2008 (UTC)
'Acceptable'
Regex keeps on changing 'acceptable' to 'acceptible'. I believe this is not a correct spelling. Ohconfucius (talk) 09:23, 28 November 2008 (UTC)
- BillFlis has fixed the error. Rjwilmsi 12:33, 28 November 2008 (UTC)
Traffics to trafficks
In Ravnica: City of Guilds. Wiktionary says "traffics" is correct. - Auntof6 (talk) 09:24, 29 November 2008 (UTC)
Undid edit which broke AWB
Per this, I did this pending a fix. If anyone could fix it and add it back in, that'd be great. Thanks! :) — neuroIT'S MY BIRTHDAY! 20:01, 1 December 2008 (UTC)
- I've fixed the entry. Rjwilmsi 20:19, 1 December 2008 (UTC)
you shouldn't fix every false positive that exists
Hi, I just happened to see an edit summary " -> fix false positive: Aroud=name". I'm not picking on that particular edit or editor; please don't be offended. Just in general, you shouldn't fix every false positive that exists. "Aroud" may be a name, but I have never seen it before, and I doubt it's a common one. Let's check: I see 73 ghits for "Aroud" on Wikipedia. Nine of those are in Categories; non-editable. So 64 instances of "Aroud" . I see about four that are probably names and about 60 that are errors.
you shouldn't fix every false positive that exists. Please use logic & common sense. Ling.Nut (talk—WP:3IAR) 02:30, 1 December 2008 (UTC)
- I agree. If we could account for all the false positives, then we wouldn't need humans to check each typo fix.
- What I've been wishing for is a way to flag text as not misspelled. It could be used for foreign words, things that are misspelled on purpose (I've seen a lot of band names and song titles like that), etc. However, I doubt it would be worth developing that, and things would probably get marked that shouldn't. Auntof6 (talk) 06:59, 1 December 2008 (UTC)
- It exists – see the documentation on the various (hidden) uses of the {{sic}} template. Rjwilmsi 08:25, 1 December 2008 (UTC)
- On review of the recent changes to the article list the only one I would say is incorrect is the change to "(Un)Official" as this removes a common typo when avoiding the false positives. The other changes are only to avoid capitalised names, which seems reasonable to me. Rjwilmsi 08:32, 1 December 2008 (UTC)
- It exists – see the documentation on the various (hidden) uses of the {{sic}} template. Rjwilmsi 08:25, 1 December 2008 (UTC)
- As the person who made all these changes, I must disagree. Every change I made avoids more false positives than actual errors. (I checked via a WP search before each change.) As Rjwilmsi pointed out, in the majority of the cases I excluded only the capitalized form of the word. That way nearly all legitimate misspelling are still caught since the word will usually be lower case when not a proper noun. In regards to (Un)Official, the foreign language spelling "oficial" is quite common on Wikipedia. For example, of the first 50 WP search listings there are 44 legit spellings and 6 errors. Of the 6 errors, 4 were oficially which will still be picked up by the separate rule for officially. --ThaddeusB (talk) 16:36, 2 December 2008 (UTC)
clarification
The typo page says "Although this project was started with the aim of 100% accuracy, the less accurate but more inclusive list we have now is better." I assume this is supposed to mean that a rule that detects some false positives is OK, as long as such matches are rare. Is that a correct interpretation and what is a good "rule of thumb" as to what these actually means in practical terms. Thanks --ThaddeusB (talk) 19:25, 2 December 2008 (UTC)
- Your interpretation seems about right. As for "a rule of thumb" I don't think it's really possible to provide one, as searching for a particular typo at any one time to compare false positives and genuine typos is not an accurate measure, because the number of genuine typos fluctuates as users fix errors and others introduce them. Therefore it is difficult to be any more precise than the existing wording you cite. I believe the any false positives should be treated on a case-by-case basis to determine whether there are indeed too many false positives. Rjwilmsi 19:33, 2 December 2008 (UTC)
- That typos are underrepresented in any given search did occur to me, since they will constantly being corrected but that correct false positive spellings will (hopefully) stay around. What I was trying to get at is if there was a frequency of occurrence that would make a word definitely out. For example, there are several hundred legit "oficial"s so I took it off, but you seemed to think this was mistake on my part. On the other hand, "teh" is perhaps the most common typo of all, but isn't detected since it also has many legit uses. Sorry if I am being difficult, I am just trying to understand the logic used (if any ;) ). Thanks --ThaddeusB (talk) 20:19, 2 December 2008 (UTC)
abolish
I believe the correct noun for the word 'abolish' is 'abolition'. However, I have come across a few instances where 'abolishment' has been used. Ohconfucius (talk) 05:23, 3 December 2008 (UTC)
- Both are correct and despite their similar form actually have different word origins. In other words, abolition is simply a synonym of abolishment, not a form of the word abolish. --ThaddeusB (talk) 05:48, 3 December 2008 (UTC)
word endings question
A number of the word endings regrexes are of the form "\b(\w+)[ending]\b" Unless I'm missing something, the "\b(\w+)" part will be true in every case except " ending ". I would think it would save a good deal of processor if these were changed to simply "[ending]\b" (excluding cases where " ending " actually should be excluded. Am I missing something? --ThaddeusB (talk) 03:40, 3 December 2008 (UTC)
- Yes, it's the edit summary displayed to other editors: consider the"-solutely" fix. As it stands corrections will show 'typos fixed absolutly --> absolutely' etc. whereas the change you suggest would just show 'typos fixed solutly --> solutely'. A reviewer wouldn't be able to see what word was actually corrected. Rjwilmsi 08:22, 3 December 2008 (UTC)
Balenciaga
is being incorrectly changed to Balanciaga. Colonies Chris (talk) 00:38, 5 December 2008 (UTC)
- Thanks for the notice. Fixed here [2]. --ThaddeusB (talk) 03:07, 5 December 2008 (UTC)
A quick aside for anyone who thinks these sort of exclusions are silly: I found 3 "Balanciaga"s on wikipedia and all 3 had been "fixed" from Balenciaga using AWB. One was even done by Reedy. (All fixed now.) Not blaming anyone as its is natural to assume to the regrex knows what it is doing and not realize it was actually trying to fix a "version of balance." --ThaddeusB (talk) 04:41, 5 December 2008 (UTC)
- Thanks for sorting those out. Colonies Chris (talk) 12:00, 5 December 2008 (UTC)
Departement → Department
This should probably be removed, all I have ever got from this rule is false positives on French articles (mostly related to wine, meh). — neuro(talk) 03:08, 6 December 2008 (UTC)
- The french word should have an accent. According to dictionary.com département is considered valid in English as well. Therefore, I modified to old rule to avoid dropping the e (and still fix other department typos), but added a new one to add the accent. [3] --ThaddeusB (talk) 03:46, 6 December 2008 (UTC)
Included → includeed
Anyone got any ideas on this? — neuro(talk) 03:42, 6 December 2008 (UTC)
- Ok this was a really freaky error. The text on showed as "included" on Heribert of Cologne, however the character after the d was NOT an e exactly. If you copy and paste the text into notepad it transforms into "includ-ed" which is (sort of) the way regrex saw it. I guess it is some weird unicode character that looks exactly like an e, but doesn't behave nice. I retyped 'included' on that page and saved [4] and the problem was fixed. So strange.
- P.S. I was only able to fix it because of the screenshot - without it I would've had no clue which article it came from and would have bashed my brains in trying to figure it out. ;) So thanks for that. Next time you make one, click on the typos tab first so it'll show the regrex causing the problem to make it even easier to fix. (I had to load the article to get this info before I figured out it was a problem with the text, not the regrexes.) --ThaddeusB (talk) 04:12, 6 December 2008 (UTC)
- I have seen this sort of problem before -- I believe it is something to do with a non-Unicode character being pasted into the edit box at some point, which the typo rules will treat as a word boundary. Rjwilmsi 23:19, 10 December 2008 (UTC)
Unconscience → Unconscious$3
Looks like something is wrong with some regex somewhere. — neuro(talk) 03:45, 6 December 2008 (UTC)
fixed. [5] --ThaddeusB (talk) 03:54, 6 December 2008 (UTC)
Access
Access keeps trying to be changed to "Accesss" which is not the correct spelling QueenCake (talk) 20:50, 10 December 2008 (UTC)
- Similar errors: accessibility → accesssibility and accessible → accesssible
- Another one accessory → accesssory QueenCake (talk) 21:57, 10 December 2008 (UTC)
- Hello, please reload the typo list. I corrected this error while you were writing your message (I saw it too) :). [6] --ThaddeusB (talk) 22:23, 10 December 2008 (UTC)
RegEx advice please
I use a regex that converts \[\[(.*?)\]\], \[\[s1|s2|..|s50\]\] (where the sn are the US states), to [[$1, $2]]. This fixes ambiguous link pairs such as [[Jackson]], [[Mississippi]] (the occasional false positive I just undo manually before saving). However, it fails badly when there a list of states e.g. [[Utah]], [[Nevada]] gets converted to [[Utah, Nevada]]. Is there a way to exclude cases where the first element is also a US state? Colonies Chris (talk) 23:36, 10 December 2008 (UTC)
- Try negative look behind: \[\[(.*?)\]\](?<!\[\[(?:s1|s2...)\]\]), \[\[(s1|s2|..|s50)\]\] --ThaddeusB (talk) 01:40, 11 December 2008 (UTC)
Remove extra-curricular as a misspelling
Per WP:ENGVAR, please remove extra-curricular from the list of misspellings. [7], [8]. Thanks, DoubleBlue (talk) 17:48, 13 December 2008 (UTC)
- I'll acknowledge this one. I kept getting hits on extra-curricular on AWB/TypoScan, and actually started "fixing" a number of articles. US dictionaries show it as one word, extracurricular, but UK shows as two words, hyphenated. ♪BMWΔ 18:03, 13 December 2008 (UTC)
- I have removed this "fix" and the one for extra-marital. OED lists both as valid. Thanks for drawing my attention to this. I have personally "fixed" several dozen "extra-curricular"s without realizing the error. --ThaddeusB (talk) 02:25, 14 December 2008 (UTC)
I just added extrajudicial, extramundane, extraordinary, extraposable, extraprovincial, extraterritorial to the extra- rule. All are one word (no hyphen) per OED. I trust there are no complaint about these? --ThaddeusB (talk) 03:04, 14 December 2008 (UTC)
- I extrapolated that you might do that ;-) ♪BMWΔ 12:20, 15 December 2008 (UTC)
Recommending a fix to Naught
Naught and nought are spelling variations, this fix is not necessary? - 142.167.83.243 (talk) 07:07, 14 December 2008 (UTC)
- Fixed Hello, I have corrected the naught->nought "correction." It's ironic that someone made it since naught is the more common variation. FYI though, the two words aren't always interchangeable - certain senses require one word or the other. --ThaddeusB (talk) 17:26, 14 December 2008 (UTC)
opinion question: "lifeform"
I am considering adding a rule to correct "lifeform" which is not found in OED or Webster's Unabridged. It is, however, commonly written as one word here. OED lists the preferred spelling as "life-form" but also accepts "life form." So the question is should the word be corrected? If yes then correct it to "life-form" or "life form"? (Leaving both hyphenated and spaced words unchanged.) --ThaddeusB (talk) 21:07, 14 December 2008 (UTC)
- We should definately change the one-word version. Change it to the "preferred", and as noted, let the 2-word version remain. ♪BMWΔ 12:20, 15 December 2008 (UTC)
an historic → a historic
I just saw this edit, and wanted to point out that in many regional pronunciations/dialects of English, the "h" in "history" is silent, so "an historic" would be appropriate and not a typo. This seems like a case of British vs. American English, and it would be like saying colour or colonise were typos. Just wanted to see if this "typo" could be removed from the list (if it hasn't been already). Thanks!-Andrew c [talk] 17:36, 15 December 2008 (UTC)
- I eliminated the rule [9] because there is no consensus on which is correct (a or an). Here is a link to some discussion on the subject for anyone interested: [10]. Including this from Oxford: "... an was common in the 18th and 19th centuries, because the initial h was commonly not pronounced for these words. In standard modern English the norm is for the h to be pronounced in words like hotel and historical, and therefore, the indefinite article a is used; however, the older form, with the silent h and the indefinite article an, is still encountered, especially among older speakers." And the Google stats: 68% a/32% an.
- Well, there is correctness and then there is consistency. Are you voting against the latter? Furthermore, does anybody, anywhere, really say "an history"?--BillFlis (talk) 00:08, 17 December 2008 (UTC)
- Its not really up to me, you know. :) Clearly someone says "an history" or it would never appear in Wikipedia. :p The real question is does any authority view it as correct? I seriously doubt any does. (Google usage stats are like 95% to 5%) However, it does appear in old book titles and such (it once was correct). These cases *should* be in the form "An History..." so I don't see much problem with having a rule to match "(A|a)n history" --ThaddeusB (talk) 03:19, 17 December 2008 (UTC)
- Well, there is correctness and then there is consistency. Are you voting against the latter? Furthermore, does anybody, anywhere, really say "an history"?--BillFlis (talk) 00:08, 17 December 2008 (UTC)
- Come to think of it, Mary Poppins and her chimneysweep squeeze say "an history". Who are we to discriminate against them?--BillFlis (talk) 03:49, 17 December 2008 (UTC)
Sea-
Hello, someone recently deleted the sea- rule. I have partially restored it to correct only a few of the previous cases where it should never be hyphenated in modern English. The following is from the archive:
For "sea" words where the idea of sea is part of the meaning, over 36 pages all compound words (mainly things like sea-fox) were hyphenated except where shown in the following list:
* Seafaring and seafarer but sea-fare
* Seaman
* Sealess
* Seamost
* Seaport (but example contains hyphen)
* Seaquake and sea-quake
* Seaside and sea-side
* Seaweed
* Seaworthiness (but example contains hyphen)
* Seaworthy
It's a good start, but the author missed a few unhyphenated by OED: seaboard, seafood, seaplane, and seaward. The new rule corrects those 4, seaman, seaport, seaweed, and seaworthy plus derivatives there of. Most the others in the old rule are hyphenated in OED, but no other major dictionary...
Words frequently used in close association tend to become unified in form as they are in meaning, and ultimately to acquire a single accent. There are three stages in the development of compounds. At first the components of the compound expression are written separately; next they are united by a hyphen; finally, when the separate significance and accent of these components have been lost sight of, they are combined into one word. The hyphenated stage may thus be considered merely preparatory to the coalescence of the various members into one word. [11]
Considering the last OED was publishing in 1989 I think words like "seabird" and "seacoast" have since reached this final stage of development. (E.g., No other dictionary lists them as hyphenated). I am confident that once current revisions of OED reach S it too will list them as one word. However, until that happens (most likely within the next year or two) I think we have to accept both the hyphenated and unhyphenated forms.
See also Hyphen, Wikipedia:HYPHEN --ThaddeusB (talk) 21:17, 16 December 2008 (UTC)
- Wait, is this the American Wikipedia or the British Wikipedia here? They're not the same language, you know. I think that would make a big difference in what stays and what goes. Do the Powers That Be recogni(s/z)e this?--BillFlis (talk) 00:21, 17 December 2008 (UTC)
- (I'm pretty sure you already knew this, but for anyone else reading who might not have known...) Uh, this is the English Wikipedia. We recognize all forms of English (see Wikipedia:ENGVAR). Hence why we shouldn't auto-correct hyphenated forms to unhyphenated forms unless it the word is written as such in ALL forms of English. There are times where such changes are appropriate (such as articles written about American subjects), however it is not ALWAYS appropriate.
- Also the OED doesn't track just British English, but rather "attempt[s] to record a word's most-known usages and variants in all varieties of English past and present, world-wide." (from OED article.)
- In conclusion, removing the hyphen in some cases is similar to (although not as obvious as and probably less controversial than) changing colour to color. --ThaddeusB (talk) 02:43, 17 December 2008 (UTC)
- P.S. just for kicks, here is OED's variant section for recogni(s/z)e:
5 Sc. racwnnis, racunnys, recognis, (6 -eis); 6 recognish(e, -yse, -yce); 6- recognise, -ize.
- The Sc. means Scottish. The 5 means used in the 15th century, the 6 in the 16th and the 6- in the 16th century to present. Thus, both -ise and -ize are acceptable modern spellings. The headword line, however, simply reads "recognize, v.1" which is usually interpreted as what they view as "most correct" (sometimes they list multiple forms in the headword line such as "colour, color, n.1") but according to the OED usage guide actually "shows the most common modern spelling of the word." (Per policy -ise/-ize words are listed under -ize headwords normally.) --ThaddeusB (talk) 03:10, 17 December 2008 (UTC)
False positive
John Mangos's surname shouldn't be changed to Mangoes. --Closedmouth (talk) 07:18, 20 December 2008 (UTC)
- Actually, the fruit shouldn't be corrected either since mongos is listed as correct in some dictionaries. The rule has been updated accordingly: [12]. --ThaddeusB (talk) 18:49, 20 December 2008 (UTC)
Correction to "-goes"
I'm apparently not smart enough to figure out the right sequence of parentheses to correct this entry. Error says it has "not enough )'s".
<Typo word="-goes" find="\b((?:[Ee]mbar|[JjLl]in|[Uu]nder)gos\b" replace="$1goes" />
Could someone please fix it? Thanks, Pigman☿ 19:16, 20 December 2008 (UTC)
- I've taken the chance of correcting it but someone should really double-check me. I am not very experienced with coding arguments.
- <Typo word="-goes" find="\b((?:[Ee]mbar|[JjLl]in|[Uu]nder)gos)\b" replace="$1goes" />
- Cheers, Pigman☿ 19:28, 20 December 2008 (UTC)
- Thanks for trying to correct the error, in fact you weren't quite right but BillFlis has now corrected the entry. Rjwilmsi 22:02, 20 December 2008 (UTC)
enmity
Something's incorrectly changing 'enmity' to 'emmity'. Colonies Chris (talk) 23:05, 21 December 2008 (UTC)
- Fixed [13] There was an extra m in the exclusion rule. --ThaddeusB (talk) 03:58, 22 December 2008 (UTC)
coffee -> coffeee
Can't figure out why it's doing that. --Closedmouth (talk) 11:22, 22 December 2008 (UTC)
- BillFlis has already fixed the error. Rjwilmsi 13:38, 22 December 2008 (UTC)
Quatermass
Please can Quatermass → Quartermass (Bernard Quatermass, The Quatermass Experiment etc) be added as an exclusion? Thanks, mattbr 15:28, 24 December 2008 (UTC)
- That should do it. Rjwilmsi 15:46, 24 December 2008 (UTC)
- It has, thanks! mattbr 19:26, 24 December 2008 (UTC)
Typo in AWB
See this diff. "Emmity" is not an English word (the correct word is "enmity"). siℓℓy rabbit (talk) 04:45, 27 December 2008 (UTC)
- This has been fixed – see entry above. AWB users need to 'refresh status/typos' on File menu to pick up the corrected typo list. Rjwilmsi 11:32, 27 December 2008 (UTC)
Medical jargon
While I was using AWB, it tried to correct wikt:Serous into wikt:Serious and wikt:distention into wikt:distantion. --Steven Fruitsmaak (Reply) 13:35, 28 December 2008 (UTC)
- Fixed Thank you for reporting these errors. I have fixed them here [14] and here [15]. --ThaddeusB (talk) 16:36, 28 December 2008 (UTC)
THe → The
Could THe → The be added as a rule? I have come across this error many times in articles! QueenCake (talk) 21:04, 28 December 2008 (UTC)
- Good suggestion! I added a regrex to convert "tHe" or "THe" to "The" here: [16] --ThaddeusB (talk) 23:23, 28 December 2008 (UTC)
- That's done it, cheers! QueenCake (talk) 23:31, 28 December 2008 (UTC)
empplaning
Can someone fix whatever is causing it to try and change "enplaning" to "empplaning". Both emplaning and enplaning are correct but don't ask me which is the more common term. CambridgeBayWeather Have a gorilla 04:12, 2 January 2009 (UTC)
- Fixed [17] "enplane" was excluded from the enp->emp rule but derivatives thereof were not. --ThaddeusB (talk) 04:32, 2 January 2009 (UTC)
Upper case instead of lower after condensing piped links
In Bruce Morrison (cricketer), "[[Not out|not out]]" was changed to "[[Not out]]"
In Philippine Idol, "[[A cappella|a cappella]]" was changed to "[[A cappella]]".
Shouldn't these have been "[[not out]]" and "[[a cappella]]", respectively, with lower case? --Auntof6 (talk) 06:31, 3 January 2009 (UTC)
- That's an AWB bug fixed in the latest svn --Closedmouth (talk) 06:52, 3 January 2009 (UTC)
Tottaly to tottally
This was in Radha Burnier. I guess it's just looking at the suffix, but can/should this be fixed? --Auntof6 (talk) 06:33, 3 January 2009 (UTC)
- This should do it. Rjwilmsi 10:06, 3 January 2009 (UTC)
excatly -> excately
For some reason, the typofixer currently fixes "excatly" to "excately" which is really no better; the correct fix would probably be "exactly". I'm not quite sure how to track this down -- it doesn't seem to be the "(In)Exact" correction that's the problem. -- intgr [talk] 13:32, 3 January 2009 (UTC)
- Rule amended to catch 'excatly'. Thanks Rjwilmsi 14:15, 3 January 2009 (UTC)
Could of
What false positives would "could of" hit that "should of" and "would of" don't? I've been using that construct in my AWB for a while now, and "of course" is the only false positive I think I've hit. ("of necessity" is perhaps more often set off by commas.) -- JHunterJ (talk) 02:13, 10 January 2009 (UTC)
- I did a wiki search for all three. Between would of/should of here were around 10 of necessity matches - not much, but enough to add the exclusion. I didn't see any of FPs for those. "Could of" is the most common construct, but also generates a lot of false positives: [[18]]. Most the current matches are FPs in a variety of different constructs. Things like "replicate as much as he could of the observations", "preserved what it could of the professional capabilities", "taking measures to save what she could of the family property", and "relieved them as best she could of the filth". None of these examples are eloquent sentences, but neither do they use an "of" where an "have" should be used. --ThaddeusB (talk) 02:40, 10 January 2009 (UTC)
- A new rule (or modification of the current one) could be made to correct specific cases - "could of been" for example. --ThaddeusB (talk) 02:43, 10 January 2009 (UTC)
- That sounds like a better approach, then. Gone, done, been, and said sound like the likeliest candidates. -- JHunterJ (talk) 13:47, 10 January 2009 (UTC)
word="Pre-"
I've disabled it for the time being, it takes ages for AWB to execute it (60 times that of the next regex)
[4969, \b(P|p)er?(rogative[sd]?|scri(ber?[sd]?|bing|ptiw+)|sident\w*)\b > $1re$2] [81, (?!\b(?:(?:Vit|K[ei]ns?e|Cre|Ann|Don|Glene|Kilte|(?:Spez|[Bb]r?)i)aly|(?:[Ss]i|[Ll]in)alyl)\b)\b(\w+)(ic?|\w[nu]|\we)alyl?\b > $1$2ally]
Done against the AWB Sandbox
—Reedy 13:47, 11 January 2009 (UTC)
- "ptiw+" should have been "pti\w+", but
\b(P|p)er?(rogative\w*|scri[bp]\w+|sident\w*)\b > $1re$2
- might be more efficient. I don't see anything horribly inefficient in the previous one though. -- JHunterJ (talk) 14:26, 11 January 2009 (UTC)
- I don't see any reason why the first regrex would take so long - might have just been a fluke. How did you get the numbers (I'd be interesting in testing some regexs myself). --ThaddeusB (talk) 14:52, 11 January 2009 (UTC)
- AWB has a profiling option for the regextypofix under the File menu. Cant actually remember if its only enabled in debug mode... Will check when im back on my main desktop later on today. —Reedy 16:14, 11 January 2009 (UTC)
- Yeah.. Its only in debug builds... I could enable it in release builds if wanted? —Reedy 23:58, 11 January 2009 (UTC)
- If you don't mind, enabling it would be nice. I don't really know anything about how to get different builds and such. I'm sure I could figure it out... but I'd rather not bother if I don't have to. :) --ThaddeusB (talk) 00:45, 12 January 2009 (UTC)
- Yeah.. Its only in debug builds... I could enable it in release builds if wanted? —Reedy 23:58, 11 January 2009 (UTC)
- AWB has a profiling option for the regextypofix under the File menu. Cant actually remember if its only enabled in debug mode... Will check when im back on my main desktop later on today. —Reedy 16:14, 11 January 2009 (UTC)
- I don't see any reason why the first regrex would take so long - might have just been a fluke. How did you get the numbers (I'd be interesting in testing some regexs myself). --ThaddeusB (talk) 14:52, 11 January 2009 (UTC)
laborious / labourious
How does AWB cope with differing spellings on both sides of the Atlantic? I ask this because several editors have visited Barry Island Pleasure Park and changed the correctly spelled 'labouriously' to 'laboriously'. Is there any way of preventing these edits? Thanks. 21stCenturyGreenstuff (talk) 22:01, 11 January 2009 (UTC)
- AWB should not change any correct spelling (whether American or British). In this case, labouriously would indeed seem to be an incorrect spelling with the correct one being laboriously (based on laborious), according to my UK English dictionary, which also lists labour. Do you have any reference for labouriously being the correct spelling? mattbr 22:27, 11 January 2009 (UTC)
- Google it, there are loads...but here are just a few http://en.wiktionary.org/wiki/labourious http://www.websters-online-dictionary.org/la/labourious.html http://www.lexipedia.com/english/labourious http://www.answers.com/topic/labourious. Labouriously and labourious are definitely British English spelling variations on the later American spelling. 21stCenturyGreenstuff (talk) 22:53, 11 January 2009 (UTC)
- Searches on the BBC News website and various newspaper sites are weighted heavily towards laborious(ly) (hundreds compared to less than 5) suggesting that variation is heavily favoured. Compare also http://www.askoxford.com/concise_oed/laborious?view=uk (found) and http://www.askoxford.com/concise_oed/labourious?view=uk (not found). I'll move the discussion to the AWB typos talk page for more input. mattbr 23:24, 11 January 2009 (UTC)
- Google it, there are loads...but here are just a few http://en.wiktionary.org/wiki/labourious http://www.websters-online-dictionary.org/la/labourious.html http://www.lexipedia.com/english/labourious http://www.answers.com/topic/labourious. Labouriously and labourious are definitely British English spelling variations on the later American spelling. 21stCenturyGreenstuff (talk) 22:53, 11 January 2009 (UTC)
Copied from Wikipedia talk:AutoWikiBrowser. mattbr 23:24, 11 January 2009 (UTC)
The Oxford English dictionary does NOT list labourious at all in its definition for laborious. It doesn't list it as an alternative spelling, an archaic spelling, nor do any of the examples use that spellings. It also is not in Webster's unabridged international dictionary. Per above, British news organizations only rarely let it slip through editing. All of this very strongly implies it is a non-standard spelling of the word. Of course, English has no official authority to say its "wrong" per say, but neither is there one to say "labuorious", "laboorious", or "laborius" is wrong either.
A few might accept "labourious", but a few also accept "alot." IMO, Wikipedia should correct to standard spelling when one can be determined. Therefore, I think this correction is perfectly reasonable. --ThaddeusB (talk) 00:19, 12 January 2009 (UTC)
- Incidentally, the word most probably derives either from the French "laborieux" or the Latin "labōriōsus" and not from "labour" (which itself came from old French "labour"), which would explain the spelling. --ThaddeusB (talk) 01:26, 12 January 2009 (UTC)
Link to Compact Oxford English dictionary definition [19] noq (talk) 14:08, 12 January 2009 (UTC)
- (shows only laborious spelling)
Word up: feature in next release: Don't fix a typo if the word is in the article title
I remember a while ago there was a good suggestion that a typo shouldn't be applied if it's also a word in the article's title, as this was a source of many false positives that could be avoided on surnames and archaic/unusual spellings etc. Well, using my new powers as an AWB developer I've implemented this feature in the SVN version. If all goes well the feature will be in the next AWB release. Note, other typos will still be applied to the article if they don't match the title. Thanks Rjwilmsi 00:03, 14 January 2009 (UTC)
- Smashing. --Closedmouth (talk) 13:08, 14 January 2009 (UTC)
"convertion" to "conversiov "?
See here where I committed the edit then corrected it. --John (talk) 08:19, 14 January 2009 (UTC)
Honourary
AWB seems to be replacing Honourary with Honorary. Considering the rules on US/Commonwealth spelling could someone remove this? Ironholds (talk) 00:21, 3 January 2009 (UTC)
<Typo word="(Dis)Hono(u)r" find="\b(H|h|[Dd]ish)ouno(u?r)(s|e[de]|ing|ifics?|abl[ey])?\b" replace="$1ono$2$3" /> <Typo word="Honorary" find="\b(H|h)ono(u?r)a?y\b" replace="$1ono$2ary" />
Unless I'm being thick, neither of those actually make that change (even with my minor tweak to the 2nd one). Another rule? —Reedy 00:24, 3 January 2009 (UTC)
- Fixed as per below..? —Reedy 20:24, 5 January 2009 (UTC)
It seems the typo fixer is fixing Honourary to Honorary which is a case of WP:ENGVAR and it shouldn't be changing that. -Djsasso (talk) 15:27, 3 January 2009 (UTC)
- This correction was made to the list earlier today. After refreshing the typo list is the error still present? Rjwilmsi 15:33, 3 January 2009 (UTC)
- Nevermind, didn't notice a fix was made. I am lookin at edits from about 12 hours ago so its probably not a problem now. -Djsasso (talk) 15:43, 3 January 2009 (UTC)
- Not a case of ENGVAR. Honorary is correct worldwide. See Oxford for example. --John (talk) 18:04, 3 January 2009 (UTC)
- Nevermind, didn't notice a fix was made. I am lookin at edits from about 12 hours ago so its probably not a problem now. -Djsasso (talk) 15:43, 3 January 2009 (UTC)
- To add to the above, the full OED entry for "honorary" lists "honourary" as a derivative spelling that went out of use in the 19th century. No other major dictionary lists that spelling of the word. Therefore, IMO, it is both accurate and desirable to change honourary to honorary. --ThaddeusB (talk) 19:45, 3 January 2009 (UTC)
More data
BBC News - 2780 honorary vs. 53 honourary Times - 4558 honorary vs. 16 honourary Guardian - 3420 honorary vs. 82 honourary
Conclusion: major UK news organizations prefer honorary, but sometime let honourary slip through. 97:1 usage overall, compared to about 30:1 for honour vs. honor.--ThaddeusB (talk) 20:11, 3 January 2009 (UTC)
- This may also be of interest; the OUP regards it as a spelling error. It may also be a form of hypercorrection, as it looks like it might be a US/UK English variant, even though it isn't. --John (talk) 20:31, 3 January 2009 (UTC)
Any objections?
To re-adding the honourary -> honorary fix based on the above information? --ThaddeusB (talk) 17:41, 18 January 2009 (UTC)
technololgy → technolology
See here. --John (talk) 21:22, 25 January 2009 (UTC)
beautiful
Note to self: This is a quite beautiful feature. If only I understand it properly. Can we translated it into Simple English perhaps, one day. innit. --Voletyvole (talk) 21:47, 25 January 2009 (UTC)
- Can put a typo list on simple for AWB too... —Reedy 22:02, 25 January 2009 (UTC)
this page: calm it down, or shut it down?
I love playing with regular expressions. It's fun fun fun. But.. this page.. seems to have become The Regular Expression Game. I think it needs to be shut down. Not deleted... its contents are largely very useful.. but frozen. Folks have been adding pointless fixes for a very long time. They are adding things.. not because the things need to be added.. but because they can add them. And because it's fun to do so. Ling.Nut (talk—WP:3IAR) 03:02, 10 January 2009 (UTC)
- Do you have something specific in mind when you say "pointless fixes." I've made a good chunk of the recent editions and I stand by every one as being useful. Perhaps you think it is pointless to change "Christmas day" into "Christmas Day", for example? Or "a American" into "an American"? As far as I'm concerned these just as legitimate as a spelling correction. If the spelling/grammar/capitalization/punctuation is wrong, it is wrong. If one is scanning the page anyway why not fix every error you can? An edited, published text would certainly fix these sorts of thing - why shouldn't we? --ThaddeusB (talk) 03:20, 10 January 2009 (UTC)
- Question from someone with a programming background, but not this kind of programming: At what point do additional entries become a performance issue? As long as it's 1) not too much of a performance issues, 2) the people doing the maintenance can cope with the size, and 3) we aren't trying to change "a xxx" into "an xxx" where "xxx" is every noun that starts with a vowel (and the reverse case), what's the problem? --Auntof6 (talk) 05:38, 10 January 2009 (UTC)
- I started essentially the same thread back on 1 Dec of last year. You guys need to realize that this page is the equivalent of a header file, and headers should be far more stable. This whole page should also be subject to a top-to-bottom, item by item review to see how many items are truly worthwhile. Ling.Nut (talk—WP:3IAR) 05:41, 10 January 2009 (UTC)
- I guess that depends entirely on "worthwhile" means. Does it that the error is common? If so, that can be hard to determine because if a user goes through and changes all instances of "some error" that error will appear to be non-existent until another user makes it again. I'm currently working on gathering word-by-word statistics on the last db dump (which is pretty out of date). Maybe that will help determine "useless" rules when I'm done.
- Not all expressions require equal processor effort. For example, the endings section tends to be especially taxing since they match EVERY word for a while only to be rejected at the end. This should probably be taken into account - BUT our guidelines call for combining words into generic patterns when possible. So these types of rules are in a way both encouraged and discouraged. Hmmm....
- We do have a guideline that says "remove uncommon errors", which I have done on occasion. I will personally keep a closer eye out for rare errors in the future. --ThaddeusB (talk) 06:24, 10 January 2009 (UTC)
- I started essentially the same thread back on 1 Dec of last year. You guys need to realize that this page is the equivalent of a header file, and headers should be far more stable. This whole page should also be subject to a top-to-bottom, item by item review to see how many items are truly worthwhile. Ling.Nut (talk—WP:3IAR) 05:41, 10 January 2009 (UTC)
- Question from someone with a programming background, but not this kind of programming: At what point do additional entries become a performance issue? As long as it's 1) not too much of a performance issues, 2) the people doing the maintenance can cope with the size, and 3) we aren't trying to change "a xxx" into "an xxx" where "xxx" is every noun that starts with a vowel (and the reverse case), what's the problem? --Auntof6 (talk) 05:38, 10 January 2009 (UTC)
- Yep, I'm having fun adding things. Wikipedia is being improved (more typos are being fixed), so the fixes aren't pointless. I don't think of this as a header file, more like an INI file or configuration file - meant to be being tweaked. So unless there's some revelation of an actual problem, no, I don't think it needs either calming or shutting. -- JHunterJ (talk) 13:45, 10 January 2009 (UTC)
- See below, only like when the execution time is over 60 times more than the next regex, it should be disabled. —Reedy 13:49, 11 January 2009 (UTC)
- "add Kveta Peschke->Květa Peschke)". I rest my case. Ling.Nut (talk—WP:3IAR) 17:02, 25 January 2009 (UTC)
- This was a specific request by someone and we already have several similar rules to correct accents on people's names. I really fail to see a problem. I am not specifically seeking out names to correct, but if someone points one out, I fail to see a problem with correcting it. --ThaddeusB (talk) 17:24, 25 January 2009 (UTC)
- I review every new rule before adding it to make sure that the error actually exists in wikipedia (by searching for the erroneous spelling). Also, I occasionally review previously added rules the same way, and, if the error does not exist, I delete the rule. There has also been a lot of progress in consolidating rules, many of which have been moved to the "Beginnings" and "Endings" sections. And sure, it's fun; you got a problem with that? Why are you here?--BillFlis (talk) 20:17, 25 January 2009 (UTC)
- This was a specific request by someone and we already have several similar rules to correct accents on people's names. I really fail to see a problem. I am not specifically seeking out names to correct, but if someone points one out, I fail to see a problem with correcting it. --ThaddeusB (talk) 17:24, 25 January 2009 (UTC)
- "add Kveta Peschke->Květa Peschke)". I rest my case. Ling.Nut (talk—WP:3IAR) 17:02, 25 January 2009 (UTC)
- See below, only like when the execution time is over 60 times more than the next regex, it should be disabled. —Reedy 13:49, 11 January 2009 (UTC)
(undent) No, nothing is wrong with having fun. Are there performance issues involved? People with slow Internet connections having long waits for info to download? What about performance issues in checking each page against such a heavy set of regexes? Just on the face of it, it makes no sense at all: every page I spellcheck will check for "Kveta Peschke"? I've done thousands of pages, on one occasion, and may do so again... Ling.Nut (talk—WP:3IAR) 09:44, 26 January 2009 (UTC)
bug: capitalizes every instance of words beginning with "th"
I just added the RegExp button to wikEd and tried it out. It capitalized every word beginning with "th" - the, them, their, etc. Obviously a bug since most aren't the first words of sentences. I didn't commit these changes, of course. -Armchair info guy (talk) 03:08, 19 January 2009 (UTC)
- I'd certainly think this is a bug related to wikEd - this certainly doesn't happen in AWB. The rule in question is likely
<Typo word="The" find="\b[Tt]He(n?|irs?|re|se|y)\b" replace="The$1" />
which should only capitalize thing like tHe and THere. I'm guessing it is matching case insensitively, which is going to mess up a number of other rules as well. I'm not familiar with the software, but perhaps this is an option that can be toggled? --ThaddeusB (talk) 03:56, 19 January 2009 (UTC)- I personally have no idea. I started a new topic on the wikEd talkpage and link to here. Hopefully you guys can figure out what's going on. Thanks for all your work on these tools! --Armchair info guy (talk) 04:01, 19 January 2009 (UTC)
- ThaddeusB was right and I have fixed this in the latest release of wikEd. Cacycle (talk) 13:43, 26 January 2009 (UTC)
- I personally have no idea. I started a new topic on the wikEd talkpage and link to here. Hopefully you guys can figure out what's going on. Thanks for all your work on these tools! --Armchair info guy (talk) 04:01, 19 January 2009 (UTC)
Error
I think something is wrong with this one: <Typo word="Gandhi" find="\bghandi\b" replace="Gandhi"/>
. Plrk (talk) 21:01, 27 January 2009 (UTC)
- No wait, it is supposed to change "ghandi" to "Gandhi"? Is that really a good idea? Plrk (talk) 21:02, 27 January 2009 (UTC)
Restauranteur
AWB is currently suggesting that restauranteur be changed to restaurateur. At least according to Wiktionary, both appear to be valid spellings of the word. It may be prudent to remove the word change from the list.--Rockfang (talk) 17:19, 29 January 2009 (UTC)
- I have just discovered this page, and removed that rule. I knew about the problem because I had earlier reverted such a change. —AlanBarrett (talk) 09:02, 31 January 2009 (UTC)
- Done Rules corrected to allow 'restauranteur' as a correct spelling variant. Rjwilmsi 09:50, 31 January 2009 (UTC)
"cataloged" changed to "catalogued"
This edit changed "cataloged" to "catalogued", which I think should not be done, because both spellings are acceptable. However, I can't find the rule that would have made the change. Can anybody find the rule, and either fix it (of this was a false positive for a rule that has a legitimate purpose)? —AlanBarrett (talk) 09:13, 31 January 2009 (UTC)
- Done 'Cataloged' is the US variant, per Concise OED. I think that 'correction' was introduced by mistake. List corrected. Thanks Rjwilmsi 09:40, 31 January 2009 (UTC)
Nestin
Nestin shouldn't be changed to nesting, see Nestin (protein) --Closedmouth (talk) 07:57, 3 February 2009 (UTC)
- Done Rule removed. Thanks Rjwilmsi 08:59, 3 February 2009 (UTC)
Two incorrect typo "fixes"
In the article Greenwich, Connecticut, "disibilities" was changed to "dissibilities". It should be "disabilities".
In the article Jail Killing Day, "acquitted" was changed to "acquit". It should have been left as it was.
Thanks. --Auntof6 (talk) 06:47, 16 February 2009 (UTC)
- Done New rule added for first problem, second was just non-printing character in middle of word in article, no change to typo rules needed for it. Thanks Rjwilmsi 08:15, 16 February 2009 (UTC)
Date Fix
How do you use AWB to change "2006-05-07" to "May 7, 2006". I've seen many pages use the former date format and it's a little unclear (e.g. List of Eureka Seven episodes). Thanks. - plau (talk) 06:41, 8 March 2009 (UTC)
"is is" to "it is"
I've seen AWB catch this typo frequently, however, the solution has never been to make that change. It has always been to just remove the first is (aka "is is" to "is"). Any way that can be fixed? --Kbdank71 20:23, 25 March 2009 (UTC)
- Done Rich Farmbrough changed it just recently here. Rjwilmsi 08:04, 26 March 2009 (UTC)
- Perhaps I wasn't clear. I meant to request undoing that change, as every time I've come across "is is", the correct typo fix is to just drop one is. I have never encountered a situation when "it is" was the correct solution. --Kbdank71 15:23, 27 March 2009 (UTC)
- Examples: [20] [21] [22] [23] [24] [25] --Kbdank71 15:43, 27 March 2009 (UTC)
Two common misspellings I've come across that are not in the list
Firstly there's "enoble" (91 article hits), which should be "ennoble".
Then there's "meterorite" which should be "meteorite" however I'm not so sure about this one, it could just be an American/British thing.
I'm very unfamiliar with how to add these, I haven't learned the proper rules/expressions yet and don't want to screw it up so can someone add these please? -- OlEnglish (Talk) 23:22, 27 March 2009 (UTC)
- Done. I added those two here. -- JHunterJ (talk) 23:30, 27 March 2009 (UTC)
- "meterorite" gets only one hit, a redirect to the article with the correct spelling. I think it ought not to have been added.--BillFlis (talk) 19:46, 2 April 2009 (UTC)
- OlEnglish fixed several of them on March 25. -- JHunterJ (talk) 19:51, 2 April 2009 (UTC)
- "meterorite" gets only one hit, a redirect to the article with the correct spelling. I think it ought not to have been added.--BillFlis (talk) 19:46, 2 April 2009 (UTC)
Sadly passed
Does AWB typos extend to dealing with unnecessary phrases such as "Sadly passed" (6,791 hits) "Passed away" (65,434) "sadly passed away" (4,909) and "Sadly died" (6,986), the vast majority of which really want to say "died"?
If not, can you advise of anywhere that does deal with this sort of issue; thanks --Tagishsimon (talk) 20:17, 26 March 2009 (UTC)
- I keep my own pet-peeve wordy phrases in my replacement list in AWB, but mostly they're not in AWB Typos unless they're wrong (as opposed to just verbose or over-written). -- JHunterJ (talk) 20:35, 26 March 2009 (UTC)
- Feel free to add these. I noticed Wikipedia:AutoWikiBrowser/Typos#Incorrect phrases and thought these might be candidates for that, currently empty, space. --Tagishsimon (talk) 20:39, 26 March 2009 (UTC)
- I removed the passed away additions. As I mentioned, there is nothing incorrect about saying someone passed away; since it isn't incorrect, it can't be corrected. -- JHunterJ (talk) 23:28, 27 March 2009 (UTC)
I also removed corrections for "at a young age", "sady died" (sic), and "tragically died", for the same reasons. "at a young age" -> "young", in particular, will result in awkward sentences (see http://www.google.com/search?q=%22at+a+young+age%22+site%3Awiki.riteme.site ). And removing adverbs from sentences, while often useful from an editorial standpoint, is not typo fixing. -- JHunterJ (talk) 11:41, 1 April 2009 (UTC)
- I think your removals are not justified by your explanation: "since it isn't incorrect, it can't be corrected". There is clear guideline support for doing away with the death euphemisms, above, in Wikipedia:Words to avoid#Death and dying. Your "is not typo fixing" does not seem to mesh with Wikipedia:AutoWikiBrowser/Typos#Incorrect phrases. Like the person who added the phrases, like User:BillFlis, who probably knows his way around this place with his 2,700 odd contributions, I would wish to keep these. Perhaps you would consider reinstating them. --Tagishsimon (talk) 00:11, 2 April 2009 (UTC)
For what its worth, I agree that at least some of these "corrections" are appropriate. While it may not be technically incorrect to say 'passed away' it is against the style guide, which is a good enough reason to change it as far as I'm concerned. After all, many of our corrections already in use aren't, strictly speaking, "typo fixes."
I would tentatively support the following changes, but likely no others (as I feel other phrases may lead to undesirable changes). However, I could be persuaded against them if they are shown to cause false positives/undesirably changes. "passed away" (all lower case only) -> "died" "gave his(/her) life" -> "died" "died tragically" / "tragically died" -> "died"
--ThaddeusB (talk) 01:26, 2 April 2009 (UTC)
- If the typo fixing rules can be used to assist in compliance to the agreed style guides then let's do it. Though as ThaddeusB says, if there are too many false positives we might have to remove or restrict the entries just like with any other typo rule. Rjwilmsi 11:06, 2 April 2009 (UTC)
- I was not aware that the style guide covered them. Ones that are covered by a WP style guide and avoid false-positive problems, yes, I (no longer) have any objection to them. -- JHunterJ (talk) 11:35, 2 April 2009 (UTC)
- I think that "gave his/her life" has too many possible false positives. A quick search shows it's being used in at least two other contexts: devotion to religion (e.g., "gave his life to Jesus"), and "gave his life new direction". — TKD::{talk} 12:53, 2 April 2009 (UTC)
- Thanks for reconsidering this; much appreciated. --Tagishsimon (talk) 13:39, 2 April 2009 (UTC)
- How do we amend eg Euphemisms, to show it should not be changed by AWB from passed away to died? Kittybrewster ☎ 20:17, 5 April 2009 (UTC)
- Thanks for reconsidering this; much appreciated. --Tagishsimon (talk) 13:39, 2 April 2009 (UTC)
- I think that "gave his/her life" has too many possible false positives. A quick search shows it's being used in at least two other contexts: devotion to religion (e.g., "gave his life to Jesus"), and "gave his life new direction". — TKD::{talk} 12:53, 2 April 2009 (UTC)
- I was not aware that the style guide covered them. Ones that are covered by a WP style guide and avoid false-positive problems, yes, I (no longer) have any objection to them. -- JHunterJ (talk) 11:35, 2 April 2009 (UTC)
False positive -> Airbourne (band) being corrected to Airborne
i.e. [26] –xeno (talk) 17:56, 10 April 2009 (UTC)
- Fixed Fixed with this edit. -- JHunterJ (talk) 20:12, 10 April 2009 (UTC)
Homberg changed to Homburg - except there is a place called Homberg
One of the typo fixes changes "Homberg" to "Homburg" (it's buried under endings, search for word="-burg". Fair enough most of the time, except there is a place called Homberg, see Homberg (Efze). I assumed it was the correct Anglicisation of a German word, but now I suspect it is not. Mr Stephen (talk) 23:24, 9 April 2009 (UTC)
- ... and several other Hombergs. Mr Stephen (talk) 23:26, 9 April 2009 (UTC)
- Done, it should not catch Homberg any more.--Dycedarg ж 02:27, 10 April 2009 (UTC)
- Thanks. Mr Stephen (talk) 10:03, 11 April 2009 (UTC)
- Done, it should not catch Homberg any more.--Dycedarg ж 02:27, 10 April 2009 (UTC)
Is there a way to tweak the corrections for "answer", so that it doesn't systematically suggest to correct the above, e.g. on Maharana Pratap Sagar? Generally it's used in one of the species of Anser (genus)#Living species and taxonomy. -- User:Docu
- Fixed Fixed with this edit. -- JHunterJ (talk) 12:44, 11 April 2009 (UTC)
- Thanks. -- User:Docu
Nee/Née
Both nee and née are both acceptable.--BillFlis (talk) 10:11, 13 April 2009 (UTC)
- If née is not preferred (I still think it could be preferred), then we should leave the rule so that it fixes incorrect accenting (e.g., neé) or remove the rule entirely? -- JHunterJ (talk) 11:02, 13 April 2009 (UTC)
- Surely it is preferred. Kittybrewster ☎ 12:15, 13 April 2009 (UTC)
- I would think so. --ThaddeusB (talk) 02:35, 17 April 2009 (UTC)
- I second that. --bender235 (talk) 15:18, 17 April 2009 (UTC)
- I would think so. --ThaddeusB (talk) 02:35, 17 April 2009 (UTC)
- Surely it is preferred. Kittybrewster ☎ 12:15, 13 April 2009 (UTC)
Suggestion for large-scale addition to the typos list
There are many redirects from titles without diacritics to the the correct article title, with diacritics - e.g. Jerome Bonaparte, Brunswick-Luneburg. I believe it would be possible to use these redirects to set up regexes to automatically add the missing diacritics wherever the non-diacritic version is used (but I don't have the skills to do it). Here's how I think it could be done:
1. For each item in Category:Redirects from title without diacritics, select only those where
- a. the only difference between the source and target is the addition of diacritics (despite the name of the category, this isn't always the case)
- b. there is at least one link to the redirect (an optional filter to reduce the size of the list)
and from each selected redirect, create an XML/regex (in the style of the typo list) to map source --> target
2. Add the generated list of corrections to the AWB typo list.
Does this sound feasible/desirable? Colonies Chris (talk) 11:16, 15 April 2009 (UTC)
Replace double hypen with em dash
Is it possible to have the AutoWikiBrowser detect double hyphens between letters (such as "abc--xyz", or spaced like "abc -- xyz") and replace them with correct em dashes? (see also MOS:EMDASH) --bender235 (talk) 22:20, 17 April 2009 (UTC)
- It is possible, but should be added as a general fix if anything. I have requested it for you here. --ThaddeusB (talk) 00:08, 18 April 2009 (UTC)
- And I've done it. Rjwilmsi 17:54, 19 April 2009 (UTC)
False positives
I had a couple of false positives for Welsh place-names when using AWB earlier - it wanted to turn Aberaeron to Aberraeron and Aberafon to Aberrafon. In both cases, the existing spelling is correct. — Tivedshambo (t/c) 22:18, 18 April 2009 (UTC)
- Fixed Fixed with this edit. -- JHunterJ (talk) 15:13, 19 April 2009 (UTC)
Scenarios
[27] appears to fail to fix the misspelling. MBisanz talk 23:24, 20 April 2009 (UTC)
- Done That should sort it out. Rjwilmsi 23:39, 20 April 2009 (UTC)
Telecommunications
Not fixing at [28]. MBisanz talk 23:53, 20 April 2009 (UTC)
- I expand communicate to match telecommunicate cases here (I assume this is want you wanted done). Although, it won't actually match your example since telecommunications actually has one 'l' not two :) --ThaddeusB (talk) 03:16, 21 April 2009 (UTC)
Discernable
Currently it changes discernable → discernible, but I think both are acceptable. See Merriam Webster. shirulashem (talk) 00:23, 21 April 2009 (UTC)
- [29] You are correct 'discernable' is listed in several dictionaries, and thus should probably not be corrected. Interestingly, 'indiscernable' is listed in none. Thus, I left the correction for indiscernable cases only. --ThaddeusB (talk) 03:29, 21 April 2009 (UTC)
Wicher/Witcher
This is almost always a false positive. I've encountered many false positives but never a correction. -download | sign! 02:14, 21 April 2009 (UTC)
- fixed here --ThaddeusB (talk) 03:37, 21 April 2009 (UTC)
"2×" instead of "2x"
Oftentimes in athletes' infoboxes there are things like "2x National Champion" or "4x Most Valuable Player". But it should be "2× ..." or "4× ...", respectively, using the multiplication sign. --bender235 (talk) 08:46, 7 April 2009 (UTC)
- I've been reverted when making that kind of change on sports pages, because of other editors' preference for the ASCII representation x. -- JHunterJ (talk) 11:34, 7 April 2009 (UTC)
- Where and why? Don't we replace - with – as well, because "p. 12-15" would be wrong (and "p. 12–15" correct)? --bender235 (talk) 12:47, 7 April 2009 (UTC)
- Here. Don't know why. I didn't have the drive to pursue it. -- JHunterJ (talk) 14:53, 7 April 2009 (UTC)
- Okay, let me do the dirty work. ;-) --bender235 (talk) 15:01, 7 April 2009 (UTC)
- Since no one seems to oppose this proposal, I guess its fair to add this to the typo fixes, isn't it? --bender235 (talk) 23:25, 15 April 2009 (UTC)
- Here. Don't know why. I didn't have the drive to pursue it. -- JHunterJ (talk) 14:53, 7 April 2009 (UTC)
- Where and why? Don't we replace - with – as well, because "p. 12-15" would be wrong (and "p. 12–15" correct)? --bender235 (talk) 12:47, 7 April 2009 (UTC)
- Has anyone added this fix as of now? --bender235 (talk) 13:28, 24 April 2009 (UTC)
None the less
Hello, you (or, at least, the AWB bot) have been treating "none the less" (three words) as a typo, and changing it to nonetheless (one word).
Most dictionaries say it can be either. The Oxford Dictionary for Writers and Editors (ODWE), which I have always gone to when in doubt, says the three-word version is actually to be preferred (unlike "nevertheless", which is always one word).
It's a very small matter in the great scheme of things, but I think at the very least there is no need to change "none the less" when it appears as three words. Alarics (talk) 20:15, 21 April 2009 (UTC)
- Thanks, I'll point it out to the devs. MBisanz talk 04:48, 22 April 2009 (UTC)
Subsequently
Could someone please add "subsequently", replacing misspellings like "supsequently" or "subsiquently"? --bender235 (talk) 14:02, 24 April 2009 (UTC)
- The latter is already there ("-sequent" rule). I'll add a rule for the first. Rjwilmsi 17:56, 24 April 2009 (UTC)
Academey?
I was just using TypoRegex, and AWB tried to correct "Acadmey" with "Academey". Shouldn't it be "Academy"? --bender235 (talk) 21:23, 24 April 2009 (UTC)
- Fixed with this edit. -- JHunterJ (talk) 11:44, 27 April 2009 (UTC)
requirments -> requirements
Please add this one to the Regex database. --bender235 (talk) 22:15, 24 April 2009 (UTC)
- Done Existing rule expanded. Rjwilmsi 07:09, 25 April 2009 (UTC)
Replacing "1/2" with "½", etc.
I don't know whether this should be added as a "general fixes" request, but misspelled fractions like "1/2" or "3/4" should be replaced with ½ and ¾, respectively. That would include ½, ⅓, ⅔, ¼, ¾, ⅛, ⅜, ⅝, and ⅞. --bender235 (talk) 16:46, 27 April 2009 (UTC)
- 1/2 isn't misspelled, but I get your point. There is the possibility for many false positives this way, though, in dates, military unit designations, etc. etc. -- JHunterJ (talk) 16:59, 27 April 2009 (UTC)
- I think we have a guideline NOT to replace these with the Unicode characters somewhere, instead we should use upper/lowercase. Cacycle (talk) 12:20, 28 April 2009 (UTC)
- WP:MOSNUM#Fractions specifies using the {{frac}} template. Square and cube exponents are guidelined against using their Unicode characters though. -- JHunterJ (talk) 18:05, 28 April 2009 (UTC)
- This seems like more of an AWB general fix than a typo rule. Rjwilmsi 18:17, 28 April 2009 (UTC)
- It is definitely not a typo fix and probably not appropriate as a general fix either since "1/2" can mean a lot more things than just "one half". --ThaddeusB (talk) 00:31, 30 April 2009 (UTC)
- If somebody can come up with an extremely reliable set of cases where fractions could be replaced then AWB could do it as a new general fix, otherwise, I think this can't go anywhere. Rjwilmsi 11:26, 30 April 2009 (UTC)
- It is definitely not a typo fix and probably not appropriate as a general fix either since "1/2" can mean a lot more things than just "one half". --ThaddeusB (talk) 00:31, 30 April 2009 (UTC)
- This seems like more of an AWB general fix than a typo rule. Rjwilmsi 18:17, 28 April 2009 (UTC)
- WP:MOSNUM#Fractions specifies using the {{frac}} template. Square and cube exponents are guidelined against using their Unicode characters though. -- JHunterJ (talk) 18:05, 28 April 2009 (UTC)
- I think we have a guideline NOT to replace these with the Unicode characters somewhere, instead we should use upper/lowercase. Cacycle (talk) 12:20, 28 April 2009 (UTC)
Example --> Exemple
Many false positives, as this is a word in French. I suggest it be removed. -download | sign! 23:31, 29 April 2009 (UTC)
- The word in French should be cast within a {{lang}} template, which will enclose it within a span identifying the language and protect it from automatic English-language fixes on the English-language projects. I don't think we wish to remove all strings that are words in other languages. -- JHunterJ (talk) 00:42, 30 April 2009 (UTC)
- I agree. In some cases it could be a misspelling of the English word -- this is, after all, the English Wikipedia. Besides, this kind of thing is the reason that AWB changes are supposed to be checked by a human before being saved. --Auntof6 (talk) 05:00, 30 April 2009 (UTC)
fourtunate
Currently this corrects to ffortunate. Not sure if it's worth fixing. -- User:Docu
- Fixed[30] - The problem was an extra "f" in the replacement part. --ThaddeusB (talk) 19:08, 1 May 2009 (UTC)
Sources of revenue
"Corrected" here to References of revenue, which is nonsense. This is the second time this has happened; is there some way to encourage AWBers to look before they edit? Can the article be templated to be left alone? Septentrionalis PMAnderson 22:39, 6 May 2009 (UTC)
- I'd hazard a guess that that's a problem in the general fixes, not in the Typo list. -- JHunterJ (talk) 00:44, 7 May 2009 (UTC)
- Does not appear to happen in the current version of AWB. --ThaddeusB (talk) 01:46, 7 May 2009 (UTC)
- It was a header; if there is a subprogram correcting sources to references in headers, I can see why it exists; but urge it be recomnsiders. Septentrionalis PMAnderson 02:11, 7 May 2009 (UTC)
- I know - what I mean is I loaded the page in my current AWB and it didn't try to make the correction. Presumably, this means the "fix" was taken out or fixed to only match "==Sources==" and not "==Sources XXX== at some point. I would have to guess that the user who made the change is using an older version or something. --ThaddeusB (talk) 03:22, 7 May 2009 (UTC)
- It was a header; if there is a subprogram correcting sources to references in headers, I can see why it exists; but urge it be recomnsiders. Septentrionalis PMAnderson 02:11, 7 May 2009 (UTC)
- Does not appear to happen in the current version of AWB. --ThaddeusB (talk) 01:46, 7 May 2009 (UTC)
I suggest you contact the user who made the edit to ask them why it happened. It is not caused by any core AWB functionality. Rjwilmsi 06:44, 7 May 2009 (UTC)
- Ah, I see you already have. The user in question just needs to improve their logic to make sure 'sources' is the entire text of the heading, rather than just the start of it. Rjwilmsi 06:48, 7 May 2009 (UTC)
nbsp; before units
I can't see a FAQ around here so... Why is AWB replacing spaces with nbsp; before units? Eg. "12 mm" to "12 mm"? ··gracefool☺ 15:28, 10 May 2009 (UTC)
- So that the unit description doesn't fall on the next line; it will always be right next to the unit value. –xeno talk 15:34, 10 May 2009 (UTC)
See also WP:NBSP --ThaddeusB (talk) 16:59, 10 May 2009 (UTC)
Saxon possessive plurals
We do womens = > women's childrens => children's should we also correct mens? (And maybe oxens, vixens, and sheeps?) Rich Farmbrough, 10:24 12 May 2009 (UTC).
- Unfortunately, it looks like these errors are ambiguous in that half are incorrect plural forms and half are incorrect possessive forms. Thus a typo rule is probably not ideal. --ThaddeusB (talk) 14:35, 12 May 2009 (UTC)
- I added oxens & sheeps to the manual typo fixing list. Vixens is quite often correct as a proper noun, so I didn't add it. --ThaddeusB (talk) 14:40, 12 May 2009 (UTC)
Fine tuning
Petersberg Agreement is correct. Rich Farmbrough, 01:12, 4 June 2009 (UTC).
- Hun? What is the correction you want adjusted here? --ThaddeusB (talk) 01:23, 4 June 2009 (UTC)
- Fixed with these edits. -- JHunterJ (talk) 11:07, 4 June 2009 (UTC)
Double spacing
How about removing double spacing? Ie replacing ". X" with ". X"? This was mentioned earlier as part of a bunch of changes. So far I've got "\. [ ]+([A-Za-z\[])" → ". $1" but I'll probably find room for improvement. ··gracefool☺ 14:43, 10 May 2009 (UTC)
- The problem with this is some people are running AWB "skip if no typo fix" and then this non-visible change would be considered a typo fix, effectively causing them to break the rule against insignificant edits. –xeno talk 15:38, 10 May 2009 (UTC)
- Are there many other non-visible changes like this? If so, we could make a new "skip non-visible changes" checkbox... ··gracefool☺ 16:28, 10 May 2009 (UTC)
- This change is against MOS (unless it has changed since I last read) since we don't endorse one system of spacing over another (2 spaces in standard in American English). Also, it is completely pointless since most browsers compress multiple spaces into one. --ThaddeusB (talk) 17:01, 10 May 2009 (UTC)
- It won't make a visible change, and is potentially controversial (though given it's not a visible change that seems a contradiction...), so doesn't seem worthwhile. Rjwilmsi 17:21, 10 May 2009 (UTC)
- Indeed many of us prefer the double space after a full-stop even rhough it doesn't show. Rich Farmbrough, 10:18 12 May 2009 (UTC).
- MOS says there is no guideline because it doesn't matter. But obviously it shouldn't be done by itself since that would be breaking the rule against insignificant edits. ··gracefool☺ 05:39, 13 May 2009 (UTC)
- The supposed "rule" of two spaces after sentence-ending punctuation is not standard "American English", whatever that means. It is a hold-over from the bygone days of typewriters, with their (generally) non-proportional fonts. In type-set text, one space has always been the standard (see, e.g., U.S. Government Printing Office Style Manual, 1973, p. 11: "To conform with trade practice, a single justification space (close spacing) will be used between sentences."--BillFlis (talk) 18:10, 17 June 2009 (UTC)
- It won't make a visible change, and is potentially controversial (though given it's not a visible change that seems a contradiction...), so doesn't seem worthwhile. Rjwilmsi 17:21, 10 May 2009 (UTC)
Finally
Note: AWB tried to correct "finnaly" with "finnally", although it's "finally". --bender235 (talk) 14:43, 17 June 2009 (UTC)
- Fixed now. Thanks. Rjwilmsi 16:54, 17 June 2009 (UTC)
I before e except after C
I looked through the list and did see these and I think they would be good to add if not there already.
- Recieved to Received
- Decieved to Deceived
- Percieved to Perceieved --Kumioko (talk) 18:57, 24 June 2009 (UTC)
- Those three are already covered. Rjwilmsi 16:36, 25 June 2009 (UTC)
- I thought they probably where but I couldn't find them so I wanted to ask. --Kumioko (talk) 16:54, 25 June 2009 (UTC)
Archaeology
AWB tried to correct "archeaology" with "archeology", but it should be "archaeology". --bender235 (talk) 20:18, 25 June 2009 (UTC)
- Archaeology and archeology are both acceptable. -- JHunterJ (talk) 21:44, 25 June 2009 (UTC)
- But for a good reason all archaeological journals are spelled with "ae", and let's not forget the Wikipedia article is named "Archaeology". --bender235 (talk) 23:05, 25 June 2009 (UTC)
- We didn't forget. Is there a Wikipedia style guideline for opting for ae? Should we use æ instead? Should we remove "archeology" from archaeology? -- JHunterJ (talk) 00:35, 26 June 2009 (UTC)
- But for a good reason all archaeological journals are spelled with "ae", and let's not forget the Wikipedia article is named "Archaeology". --bender235 (talk) 23:05, 25 June 2009 (UTC)
Ellipse etc.
I thought about this rule:
<Typo word="Ellips(e/is/es)" find="\b(E|e)lips(e|is|es)\b" replace="$1llips$2" />
I'm pretty sure it would work, adding the second 'l' to elipse, elipsis, elipses. I've not just added it for a few reasons:
- The one-'l' version is apparently correct in a number of laguages. For example there seems to be a Serbian band "Elipse".
- I just went ahead and fixed all the unambiguous cases I could find
- 'elipse' may be a typo for 'eclipse' as well
--ospalh (talk) 14:40, 25 June 2009 (UTC)
- You can use a negative look-behind to allow Elipse:
<Typo word="Ellips(e/is/es)" find="\b(E|e)lips(es?|is)\b(?<!Elipse)" replace="$1llips$2" />
- Did you find cases where elipse was/might have been a typo for eclipse? -- JHunterJ (talk) 14:47, 25 June 2009 (UTC)
- Yes, two. But that typo isn't too hard to spot.--ospalh (talk) 07:06, 26 June 2009 (UTC)
- If we want to avoid it (and eclipses typos), we'd be left with just fixing "elipsis":
- Yes, two. But that typo isn't too hard to spot.--ospalh (talk) 07:06, 26 June 2009 (UTC)
<Typo word="Ellipsis" find="\b(E|e)lipsis\b" replace="$1llipsis" />
- I'm not sure how cautious we should be here. -- JHunterJ (talk) 11:33, 26 June 2009 (UTC)
Hindenburg
I wanted to change the -burg rule to include Hindenberg->Hindenburg.
<Typo word="-burg" find="\b([Gg]ettys|[Gg]othen|[Hh]a[bp]s|[Hh]am|[Ll]ynch|[Vv]icks)b(?:e|ou)rg\b" replace="$1burg" />
to
<Typo word="-burg" find="\b([Gg]ettys|[Gg]othen|[Hh]a[bp]s|[Hh]am|[Hh]inden|[Ll]ynch|[Vv]icks)b(?:e|ou)rg\b" replace="$1burg" />
(O.K., I did and then undid it.)
There are some typos where Hindenberg should be fixed to Hindenburg. But there is also Basil Cameron, know as "Basil George Cameron Hindenberg" or "Basil Hindenberg". I don't know how to avoid those false positives. I think an extra rule for Basil will not help as we can't be sure of the order the rules are applied.--ospalh (talk) 08:45, 29 June 2009 (UTC)
- I don't think there is a regexp to determine which Hindenbergs should be changed and which shouldn't. Both spellings appear to be valid surnames, and people are often referred to by just there surname in article bodies. -- JHunterJ (talk) 11:19, 29 June 2009 (UTC)
- I just used the regexp on its own and most "Hindenberg"s needed to be changed. But there were a few that had to stay. (Most of those did, in a way, mean Hindenburg, too, but were quotes or file names.) So in the end it's too complicated for an automatic rule and should probably not be included.--ospalh (talk) 14:27, 29 June 2009 (UTC)
Journal parameters cleanup
You can look through (Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia alphabetical) and see patterns. For example, many journal parameters start with a ' for no reason, others are italicized twice (templates place entries in italics automatically, no need to tell it twice), and so on. Headbomb {ταλκκοντριβς – WP Physics} 01:17, 30 June 2009 (UTC)
Capitalisation in URLs
Is there a way we can prevent the capitalisation rules happening inside URLs? ··gracefool☺ 04:22, 25 June 2009 (UTC)
- I have been thinking about the same problem!! Let's wait together for an answer!! --Siddhant (talk) 07:13, 4 July 2009 (UTC)
New words
- The list already has "tamil" → "Tamil". Can someone add "tamil nadu" → "Tamil Nadu".
- "indore" → "Indore". (However if the name is in a URL leave it uncapitalized.)
- "jallandhar" → "Jalandhar". (Wrong spelling of the city name.)
Thanks.--Siddhant (talk) 07:11, 4 July 2009 (UTC)
- Done Jalandhara lists Jallandhar as an alternate spelling, so I don't think we can include it here. Others added with this edit. -- JHunterJ (talk) 20:50, 4 July 2009 (UTC)
humorous
AWB tried to replace "humourous" with "$umorous", but it should be "humorous". --bender235 (talk) 20:33, 4 July 2009 (UTC)
- Fixed with this edit. -- JHunterJ (talk) 20:41, 4 July 2009 (UTC)
trilogy
AWB tried to replace "trilolgy" with "trilology", yet it should be "trilogy". --bender235 (talk) 15:30, 5 July 2009 (UTC)
- AWB's reg exp typo tab should tell you which regexp was "hit" for this one. In this case, I suspect -olgy --> -ology as a general suffix hit. I don't think it needs to be changed, although possibly an earlier "trilogy" rule that catches "trilolgy" could be added. Since it appears that you fixed the only instance of "trilolgy" on Wikipedia, I don't think a change is needed. -- JHunterJ (talk) 15:51, 5 July 2009 (UTC)
screenwriter
AWB tried to replace "scrennwriter" with "screennwriter", although it should be "screenwriter". --bender235 (talk) 20:28, 10 July 2009 (UTC)
- Done This will catch it. Thanks Rjwilmsi 07:06, 11 July 2009 (UTC)
Xbox
In {{Video game multiple console reviews}}, "XBOX" is all caps and doesn't work as "Xbox". BOVINEBOY2008 16:31, 16 July 2009 (UTC)
- Typo fixes are not applied within templates. Do you have an example diff of a problem? Rjwilmsi 16:41, 16 July 2009 (UTC)
- [31] BOVINEBOY2008 16:51, 16 July 2009 (UTC)
- Unfortunately, the link you have posted points to this page? Rjwilmsi 20:37, 17 July 2009 (UTC)
- [32] sorry. BOVINEBOY2008 20:44, 17 July 2009 (UTC)
- Okay, thank you for the link. I'm confused as to why this happened, because for me no typo fixes are applied to the article as I would expect, since the "XBOX" under question is within a template, so is ignored by AWB when applying typo corrections. I can only suppose that the user who made the edit has some customised logic running on AWB that does not implement this standard restriction. Rjwilmsi 22:10, 17 July 2009 (UTC)
- [32] sorry. BOVINEBOY2008 20:44, 17 July 2009 (UTC)
- Unfortunately, the link you have posted points to this page? Rjwilmsi 20:37, 17 July 2009 (UTC)
- [31] BOVINEBOY2008 16:51, 16 July 2009 (UTC)
I think I figured it out. One of the templates earlier in the article wasn't closed, so it may have voided something. I don't know... Either way thank you for taking note. BOVINEBOY2008 22:14, 17 July 2009 (UTC)
Moiré/moire
I'm not sure that the rule for moiré should be kept. I think I've found a false positiv: Moire (fabric). It's not strictly an error to spell the fabric with an accent, but apparently not standard.--ospalh (talk) 07:51, 20 July 2009 (UTC)
- Fixed with this edit. -- JHunterJ (talk) 10:58, 20 July 2009 (UTC)
"emporer"
"emporer" -> "emperor" -shirulashem(talk) 12:41, 22 July 2009 (UTC)
- Done along with a few other emperor fixes: [33] --ThaddeusB (talk) 04:52, 25 July 2009 (UTC)
Regex out of links
Hello!
I'm trying to write a regex to no match into links or templates
the example is:
string is : " a [[ b ]] c d [[ d ]] [[c]] "
The match should be only the c outside the links (the bolted one).
Thanks for helping--Zorlot (talk) 04:17, 25 July 2009 (UTC)
à la
Please add a typo fix for "à la", replacing things like "a la", "a lá" or "ala". --bender235 (talk) 09:49, 31 July 2009 (UTC)
- Ala appears to have some legitimate lowercase usages. Otherwise Done with this edit -- JHunterJ (talk) 11:29, 31 July 2009 (UTC)
(equals) (equals) (space)
I use autoed when i edit, and one of the edits it recommends a lot is deleting the space that is often between the "==" (of the header) and the actual section name. The proper format for a section header is ==sectionname==, NOT: ==(space)sectionname(space)==
so this is essentially two rules (as there needs to be one rule for the two equal signs on either side of the page)
replace "== " with "=="
and
Replace " ==" with "=="
tell me what you think, and if i need to elaborate--Tim1357 (talk) 17:23, 6 August 2009 (UTC)
- There's no consensus for this change, and as it's not visible to an article reader I don't see much value in it. Rjwilmsi 18:33, 6 August 2009 (UTC)
Clinitian -> Clinician
eh. –xenotalk 18:44, 7 August 2009 (UTC)
- No hits in WP search for "clinitian". Did you already fix a bunch of them? -- JHunterJ (talk) 12:12, 8 August 2009 (UTC)
- I just saw someone use it once. Is this only for common typos? –xenotalk 19:34, 13 August 2009 (UTC)
notally?
AWB tried to replace "notaly" with "notally" here, although IMO it should've been "notably". --bender235 (talk) 09:39, 8 August 2009 (UTC)
- This was the application of a suffix rule. (There's a tab in AWB that will display the rules that had matches on the current page; it can be helpful to include that info.) But I don't think it's a prevalent-enough typo to need a separate fix. -- you fixed the only occurrence. -- JHunterJ (talk) 12:10, 8 August 2009 (UTC)
- Well, I added it anyway. Rjwilmsi 12:12, 8 August 2009 (UTC)
Acheievment?
AWB also tried to replace "acheievment" with "acheievement" here, although it should've been "achievement". --bender235 (talk) 11:05, 8 August 2009 (UTC)
- Done with this edit. -- JHunterJ (talk) 12:23, 8 August 2009 (UTC)
"Passed away"
As I got reverted.. isn't this page supposed to be used to fix typos, and not to enforce WP:EUPHEMISM? We could just as well add a rule that exchanges "perversion" with "paraphilia". --Conti|✉ 19:28, 13 August 2009 (UTC)
- Yes, it's a bit of a stretch to include style changes in a typo list. Maybe this should be included with WP:FRONDS instead? –xenotalk 19:33, 13 August 2009 (UTC)
- Hmm, that would be fine by me. I don't think terms like these should be replaced among all the typos (which means that people won't really think about whether "passed away" might be appropriate after all in some situations). --Conti|✉ 19:40, 13 August 2009 (UTC)
- Conti, I reverted your removal of "passed away", but I think your argument has merit and should be discussed. I remember the first time I saw the plugin change pass away to die and I was pretty surprised. Afterall, like you and xeno said, "pass away" isn't a typo. The reason I was so quick to revert your change is that I felt removing an entry like that needed some discussion first, and in the meantime, you can just do what I do: ignore the change that AWB wants to make when it comes accross "pass away" in an article. -shirulashem(talk) 19:46, 13 August 2009 (UTC)
- Well, if we're all in agreement, what about making that change, then? :) Usually this list is only used for things that are blatantly wrong, and so far I only had to cancel a change because this list wasn't perfect, not because I disagree with it. And I'd like it to stay that way. --Conti|✉ 20:03, 13 August 2009 (UTC)
- Done... Are there any more like this? –xenotalk 20:23, 13 August 2009 (UTC)
- None that I know of, at least. Thanks. :) --Conti|✉ 20:57, 13 August 2009 (UTC)
- Done... Are there any more like this? –xenotalk 20:23, 13 August 2009 (UTC)
- Well, if we're all in agreement, what about making that change, then? :) Usually this list is only used for things that are blatantly wrong, and so far I only had to cancel a change because this list wasn't perfect, not because I disagree with it. And I'd like it to stay that way. --Conti|✉ 20:03, 13 August 2009 (UTC)
- Conti, I reverted your removal of "passed away", but I think your argument has merit and should be discussed. I remember the first time I saw the plugin change pass away to die and I was pretty surprised. Afterall, like you and xeno said, "pass away" isn't a typo. The reason I was so quick to revert your change is that I felt removing an entry like that needed some discussion first, and in the meantime, you can just do what I do: ignore the change that AWB wants to make when it comes accross "pass away" in an article. -shirulashem(talk) 19:46, 13 August 2009 (UTC)
- Hmm, that would be fine by me. I don't think terms like these should be replaced among all the typos (which means that people won't really think about whether "passed away" might be appropriate after all in some situations). --Conti|✉ 19:40, 13 August 2009 (UTC)
- Hmm. I'm not convinced that a whole hour and a half represents sufficient time to debate this issue and arrive at consensus. Here's the debate which led to the introduction of "passed away" et al. I think the argument is as strong now as then for its inclusion on grounds of policy (Wikipedia:Words to avoid#Death and dying and suitability Wikipedia:AutoWikiBrowser/Typos#Incorrect phrases. --Tagishsimon (talk) 21:06, 13 August 2009 (UTC)
- First of all, don't confuse guidelines with policy. In most cases, "died" is more appropriate than "passed away", I don't disagree with that. But I still disagree with including this entry here for two reasons: a) As I said above, this is the typo list, it contains terms that need to be fixed and are wrong 100% of the time. Which leads to b) "passed away" is usually not appropriate, but not always. Plot summaries come to mind, and of course quotes (or are we supposed to add a [sic] to someone being quoted as "He passed away", like we do with all typos?). Adding this term to WP:FRONDS instead, which people can use to hunt for badly phrased sentences, sound much better to me. --Conti|✉ 21:14, 13 August 2009 (UTC)
- Behold, it came to pass that three hundred and twenty years had passed away, and the more wicked part of the Nephites were destroyed –xenotalk 21:18, 13 August 2009 (UTC)
- That's why editors need to preview EVERY tool edit before they make them, because there are times that the suggested edit will be wrong. Also, I agree with Tagishsimon. The discussion began, I left my office to commute home, ate a slice of pizza, turned on my computer, and the discussion was over and the change was made. I think it needs to be discussed more. -shirulashem(talk) 00:54, 14 August 2009 (UTC)
- My concern is that we're moving from typos to enforcing stylistic changes. Perhaps a different checkbox should be created for this, so editors don't blindly approve the fixes (even though they aren't supposed to). There's a reason the "phrases" section was, until "passed away", empty. It's a bit of a different bird. I'm not particularly fussed though, so if you want to put it back in while more people weigh in, I won't consider it to be edit warring or anything. –xenotalk 00:57, 14 August 2009 (UTC)
- (EC with Xeno) And, Conti, you have yet to demolish a couple of arguments: 1) AWB/Typos has for along long time had a section for "incorrect phrases", which seems to indicate an intention to deal with incorrect phrases. According to the guidelines, passed away et al are incorrect phrases. 2) Your own typo and [sic] argument reveals, per Shirulashem, that there are instances where 100% turns into slightly less than 100%; you're probably as likely to get an false positive with a conventional typo regex as you are with this phrase regex. I do take Xeno's point that phrases are a different kind of bird, but am concerned that WP:FRONDS is to immature to be considered a solution. Like Xeno, I'm happy that we keep passed away removed while we discuss; the discussion is more important than whether passed away happens to be in or out as we discuss. --Tagishsimon (talk) 01:07, 14 August 2009 (UTC)
- 1) Yes, and as far as I can see, it has been empty from the day it's been added. Regardless of whether there ever was an intention to use this page to fix incorrect phrases, I simply disagree with the use of this page for that purpose. 2) I disagree here, too. Just do a search for "passed away" and see how many false positives you can find. There are a lot more than you will find when searching for actual typos. --Conti|✉ 08:52, 14 August 2009 (UTC)
- (EC with Xeno) And, Conti, you have yet to demolish a couple of arguments: 1) AWB/Typos has for along long time had a section for "incorrect phrases", which seems to indicate an intention to deal with incorrect phrases. According to the guidelines, passed away et al are incorrect phrases. 2) Your own typo and [sic] argument reveals, per Shirulashem, that there are instances where 100% turns into slightly less than 100%; you're probably as likely to get an false positive with a conventional typo regex as you are with this phrase regex. I do take Xeno's point that phrases are a different kind of bird, but am concerned that WP:FRONDS is to immature to be considered a solution. Like Xeno, I'm happy that we keep passed away removed while we discuss; the discussion is more important than whether passed away happens to be in or out as we discuss. --Tagishsimon (talk) 01:07, 14 August 2009 (UTC)
- My concern is that we're moving from typos to enforcing stylistic changes. Perhaps a different checkbox should be created for this, so editors don't blindly approve the fixes (even though they aren't supposed to). There's a reason the "phrases" section was, until "passed away", empty. It's a bit of a different bird. I'm not particularly fussed though, so if you want to put it back in while more people weigh in, I won't consider it to be edit warring or anything. –xenotalk 00:57, 14 August 2009 (UTC)
- That's why editors need to preview EVERY tool edit before they make them, because there are times that the suggested edit will be wrong. Also, I agree with Tagishsimon. The discussion began, I left my office to commute home, ate a slice of pizza, turned on my computer, and the discussion was over and the change was made. I think it needs to be discussed more. -shirulashem(talk) 00:54, 14 August 2009 (UTC)
← Best of both worlds: Wikipedia talk:AutoWikiBrowser/Feature requests#Provide a separate checkbox for "Incorrect phrases" during Regex typo fixing. –xenotalk 01:17, 14 August 2009 (UTC)
- Yup, that'd be a fine solution. --Tagishsimon (talk) 01:50, 14 August 2009 (UTC)
occasionally
AWB tried to replace "ocaissionaly" with "ocaissionally", but it should've been "occasionally". --bender235 (talk) 22:05, 18 August 2009 (UTC)
- There's a tab in AWB that will show you which rule matched. In this case, I'm betting it was a suffix rule replacing -aly with -ally, which indeed did the expected thing here. Are you suggesting the addition of a new fix to apply to "ocaission"? -- JHunterJ (talk) 23:18, 18 August 2009 (UTC)
- Sure, if that's what's necessary. --bender235 (talk) 12:52, 19 August 2009 (UTC)
- One of the project "to-dos" is to remove rare words. The only instance of "ocaissionaly" has been fixed. I'm postulating that the addition of a fix for it is not necessary. -- JHunterJ (talk) 21:12, 20 August 2009 (UTC)
- Sure, if that's what's necessary. --bender235 (talk) 12:52, 19 August 2009 (UTC)
Search method
How does AWB search for typos? Does it search the wikisource of the page or actual text that we see on article tab? Thanks! —Preceding unsigned comment added by 70.26.3.12 (talk) 00:02, 25 August 2009 (UTC)
- I need this information because I am trying to develop a list of typos for a different language Wikipedia. —Preceding unsigned comment added by 70.26.3.12 (talk) 09:29, 25 August 2009 (UTC)
- I believe AWB will look for typos in the wikitext itself, not the displayed text. –xenotalk 21:37, 29 August 2009 (UTC)
Linament instead of the correct liniment
Some people keep changing the correct spelling of liniment to "linament", using AWB in the article Slough. As you can see there is even an article for it with the correct spelling. Obviously this must be spelt incorrectly in the Browser, it wouldn't occur otherwise. Please, someone, change this. Dieter Simon (talk) 23:13, 3 September 2009 (UTC)
- This is the line, but I have no idea how to add an exception. –xenotalk 23:22, 3 September 2009 (UTC)
<Typo word="-ament" find="\b([Ff]il|[Ll]i[gn]|[Tt]est|[Tt]ourn)ia?ment(s?|ary)\b" replace="$1ament$2"/>
- Fixed with this edit. -- JHunterJ (talk) 03:21, 4 September 2009 (UTC)
Enmáscarado
In the common beginnings section, there is code which changes Enm to Emm. Could someone add Enmascarado and Enmáscarado to the list of exceptions? Thanks! Plastikspork ―Œ(talk) 15:10, 16 September 2009 (UTC)
- Those should probably be better exempted by wrapping them in
{{lang|es}}
templates, since they aren't in English usage (they don't appear in the destination article, for example). -- JHunterJ (talk) 18:20, 16 September 2009 (UTC)- It's a spanish word and it's a pain to fix it every time someone uses AWB and comes by the articles, I've had to do it like 5 times in the last 2-3 months alone. MPJ-DK (No Drama) Talk 18:34, 19 September 2009 (UTC)
- Done with this edit. To illustrate what I was saying before, I also blocked the possibility of AWB altering it on one page with these edits. AWB won't "fix" foreign-language text that's identified as foreign language text by the use of the {{lang}} template. -- JHunterJ (talk) 19:50, 19 September 2009 (UTC)
- Thank you, I really appriciate it. MPJ-DK (No Drama) Talk 20:03, 19 September 2009 (UTC)
- Done with this edit. To illustrate what I was saying before, I also blocked the possibility of AWB altering it on one page with these edits. AWB won't "fix" foreign-language text that's identified as foreign language text by the use of the {{lang}} template. -- JHunterJ (talk) 19:50, 19 September 2009 (UTC)
- It's a spanish word and it's a pain to fix it every time someone uses AWB and comes by the articles, I've had to do it like 5 times in the last 2-3 months alone. MPJ-DK (No Drama) Talk 18:34, 19 September 2009 (UTC)
webiste - website
I've done this manually, but think it could be botted for the future. ϢereSpielChequers 20:13, 18 September 2009 (UTC)
- Done with this edit -- JHunterJ (talk) 20:22, 19 September 2009 (UTC)
Petersberg - Petersburg
In this edit, an AWB user replaced a "Petersberg" referring to Petersberg, Hesse into "Petersburg" claiming it was a typo. I'm not sure this spelling should be included in the list; it might do more harm than good, considering that there are several plausible legitimate uses of Petersberg. —JAO • T • C 09:18, 6 October 2009 (UTC)
- Fixed with this edit -- JHunterJ (talk) 10:36, 6 October 2009 (UTC)
Two additions
- 1 AWB does not detect "fondation" as misspelling of "foundation", but it should.
- 2 AWB detects "Musial" (as in Stan Musial, for example) as misspelling of "musical", but it should not. --bender235 (talk) 12:11, 19 October 2009 (UTC)
Done with these edits -- JHunterJ (talk) 12:52, 19 October 2009 (UTC)
earnign
earnign currently corrects to eearning. Can't see how to fix that myself. --Closedmouth (talk) 14:36, 21 October 2009 (UTC)
- Fixed with this edit. -- JHunterJ (talk) 14:47, 21 October 2009 (UTC)
- Ta. I should also mention that advertizing is correcting to advertising. I'd remove the rule myself, but I'm not sure if it's just faulty. --Closedmouth (talk) 14:51, 21 October 2009 (UTC)
- Advertising doesn't mention "advertizing". Is it an acceptable alternate spelling? -- JHunterJ (talk) 21:59, 21 October 2009 (UTC)
- Isn't that how the Americans spell it? --Closedmouth (talk) 07:03, 22 October 2009 (UTC)
- No[34]. -- JHunterJ (talk) 11:09, 22 October 2009 (UTC)
- Well, I'm an idiot. --Closedmouth (talk) 23:19, 22 October 2009 (UTC)
- Nah, English is just a really goofy language. :) --ThaddeusB (talk) 00:35, 23 October 2009 (UTC)
- Well, I'm an idiot. --Closedmouth (talk) 23:19, 22 October 2009 (UTC)
- No[34]. -- JHunterJ (talk) 11:09, 22 October 2009 (UTC)
- Isn't that how the Americans spell it? --Closedmouth (talk) 07:03, 22 October 2009 (UTC)
- Advertising doesn't mention "advertizing". Is it an acceptable alternate spelling? -- JHunterJ (talk) 21:59, 21 October 2009 (UTC)
- Ta. I should also mention that advertizing is correcting to advertising. I'd remove the rule myself, but I'm not sure if it's just faulty. --Closedmouth (talk) 14:51, 21 October 2009 (UTC)
Fenerbahçe
Please add "Fenerbahçe" as fix of "Fenerbahce". --bender235 (talk) 08:48, 31 October 2009 (UTC)
RegExp documentation
It took me awhile to find the external link to the syntax summary on the AWB home page. For the benefit of those who know the RegExp principles, but are not acquainted with Microsoft's take on it, I suggest
- linking the documentation: official MSDN documentation (is this standardized somewhere?) Well House summary
- clarifying which elements of the Microsoft mess should not be used, and which ones should be avoided, be it for performance reasons or compatibility or whatnot
- making our own summarys for both the quick and dirty and for the advanced messers
Paradoctor (talk) 12:17, 31 October 2009 (UTC)
Collection maintenance
A few automation suggestions:
- Checking the list for expressions that match their output, i. e. matching "foo, then replacing it with "foo".
- Overlapping search expressions, e. g. if someone added a rule "ibm" -> "ibn", this would clash with "ibm" -> "IBM".
- Crawling redirects, disambiguation pages and AJAX suggestions (from search results) for useful information.
- Utilities that convert between regexps and lists of match-replacement pairs, for not-too-complex rules this could save a lot of headaches for beginners, and time for advanced users, and these cases make up the vast majority of rules.
- Writing the above it occurred to me that a simple wizard would probably be the simplest solution: You enter a match and/or replacement term, and the wizard shows you whether the matchword already has a replacement, or what words match to a given replacement term. Then, you get to choose the appropriate editing options. Forming efficient regexps can be left to the software. How does that sound?
Paradoctor (talk) 13:14, 31 October 2009 (UTC)
- First one is already done by AWB. The rest sounds great, but hard to do. Rjwilmsi 14:14, 31 October 2009 (UTC)
Welcome
Things like "wellcome" should be corrected with "welcome". --bender235 (talk) 17:00, 1 November 2009 (UTC)
- Some would not Wellcome that. ;) Paradoctor (talk) 22:16, 1 November 2009 (UTC)
- Okay. --bender235 (talk) 11:23, 7 November 2009 (UTC)
- A case-sensitive search on "wellcome" could be added though. -- JHunterJ (talk) 13:51, 11 November 2009 (UTC)
- Okay. --bender235 (talk) 11:23, 7 November 2009 (UTC)
Eulerian
Please don't correct an Eulerian to *a Eulerian. Being Swiss, he isn't pronounced like that. —Blotwell 23:28, 3 November 2009 (UTC)
- Being the one who erroneously edited several articles, replacing "an Eulerian" to "a Eulerian" with AWB, I support. --bender235 (talk) 11:27, 7 November 2009 (UTC)
addition to geographic canada
someone should add "Mississauga", "Calgary", "New Brunswick", "Nova Scotia", "Prince Edward Island", and "Edmonton" tablo (talk) 22:20, 8 November 2009 (UTC)
- What are their common misspellings? -- JHunterJ (talk) 13:50, 11 November 2009 (UTC)
continous
Hi, can we add continous as a typo for continuous? ϢereSpielChequers 12:35, 9 November 2009 (UTC)
- The "(Dis)Continuous" rule already catches it. Rjwilmsi 10:33, 11 November 2009 (UTC)
- Oh I thought that any typo with examples more than two months old was probably not on the list. As a rule of thumb how old would you suggest examples need to be for a typo not to be on the list? ϢereSpielChequers 13:33, 11 November 2009 (UTC)
- There's no rule of thumb. Typos older than the last time an editor used AWB with RETF enabled are probably not on the list. The way to check to see if it's on the list is to point AWB at the page (with RETF enabled) and see if it catches it. Since AWB usage is human-initiated, not automatic, a page that is five years old but hasn't bubbled up to some AWB editors list won't get corrected. -- JHunterJ (talk) 13:49, 11 November 2009 (UTC)
- Oh I thought that any typo with examples more than two months old was probably not on the list. As a rule of thumb how old would you suggest examples need to be for a typo not to be on the list? ϢereSpielChequers 13:33, 11 November 2009 (UTC)
False Positives in Sixteenth-Century Titles
Hi, is there a way to stop people using AWB to change sixteenth-century spellings in titles of sixteenth-century books into modern spelling? I keep reverting the corrections of agenst → against, breif → brief, mariage → marriage in the article on George Joye but people using AWB keep changing it back without thinking or reading the text in context. A note in the Discussion page did not help. GJ1535 (talk) 09:41, 11 November 2009 (UTC)
- One option would be to use {{sic}} with the 'hide' flag on. Rjwilmsi 09:50, 11 November 2009 (UTC)
- Thanks a lot. That helps a lot! GJ1535 (talk) 16:31, 11 November 2009 (UTC)
Not sure where to report this, but a similar incidence is this change of the name of a painting. Perhaps the script could be careful when the suspect word is capitalized, indicating a name? Skomorokh, barbarian 15:50, 14 November 2009 (UTC)
- It certainly could ignore capitalization, but the trade off, obviously, would be actual errors going uncaught.
- I another suggestion would be to contact the offending party and ask them to be a bit more careful. --ThaddeusB (talk) 16:51, 14 November 2009 (UTC)
Misc typofix suggestions
Some possibly autofixable typos I came across, and suggested correction.
- positionned positioned
- crittercism criticism
- successsful successful
- definately definitely
- posotive positive
- Retrivied Retrieved
- lilie lily
- alzheimers Alzheimer's
- privides provides
- battlions battalions
- determin determine
--HamburgerRadio (talk) 01:57, 16 November 2009 (UTC)
Les Mis typo
I just scanned an article about Les Miserables, where every occurence of "Rue Plumet" was suggested to be changed to "Rue Plummet". Opinions on the best way to handle this going forward? --SarekOfVulcan (talk) 19:14, 25 November 2009 (UTC)
- Which rule was catching it? We can address the rule, and/or the "Rue Plumets" can be tagged as French with the {{lang}} template. -- JHunterJ (talk) 19:29, 25 November 2009 (UTC)
based of -> based on
This rule:
"<Typo word="Based (off) of" find="\b(B|b)ased\s+(off\s+)?of\b" replace="$1ased on" />"
produced a false positive, it tried to fix "... the most dynamic, action-based of these ..." to "... the most dynamic, action-based on these ...". (from "Bacone school).
I'm not sure how often "based of" is part of a "<foo> based of <bar>" construction and how often it should be changed to "based on".--ospalh (talk) 14:04, 7 December 2009 (UTC)
- Would a negative look-behind matching the hyphen to avoid fixes of "-based" suit this problem? -- JHunterJ (talk) 14:34, 7 December 2009 (UTC)
- Looks like a good idea.--ospalh (talk) 15:10, 7 December 2009 (UTC)
Fixed with this edit. -- JHunterJ (talk) 22:43, 7 December 2009 (UTC)
Better typo correction
AWB changes indias to Indias, but it should change indias to India's (with an apostrophe). Please correct it. --Siddhant (talk) 19:05, 8 December 2009 (UTC)
- indias -> Indias is a good change. Indias -> India's has the possibility of false positives. http://www.google.com/search?q=%22many+indias , for example. -- JHunterJ (talk) 01:35, 9 December 2009 (UTC)
- I understand. Thanks for explaining. --Siddhant (talk) 16:03, 9 December 2009 (UTC)
correction for "indiscernible"
The replacement yields "iiscernible" as it now stands. Also the word "indiscernible" may exist as a stray on a line above. LilHelpa (talk) 01:58, 17 December 2009 (UTC)
- Done Thanks for reporting – fixed. Rjwilmsi 07:57, 17 December 2009 (UTC)
Wikipedia:Lists of common misspellings
This program can incorporate data from Wikipedia:Lists of common misspellings. -- Wavelength (talk) 16:14, 17 December 2009 (UTC)
"concerned" is being changed to "concearned"
And I can't find why. Actually, could be "concerning" to "concearning" or both. Can't recall. LilHelpa (talk) 01:40, 21 December 2009 (UTC)
- There's a regexp tab in AWB that will tell you which patterns hit. (BTW, adding comments to talk pages aren't minor edits.) -- JHunterJ (talk) 02:11, 21 December 2009 (UTC)
- Bah, it's my own setting, not one from the list. Nevermind. Sorry. LilHelpa (talk) 00:20, 22 December 2009 (UTC)
- No problem. Please, though, don't mark talk page comments as minor. See WP:MINOR. Thanks. -- JHunterJ (talk) 01:46, 22 December 2009 (UTC)
- Will try to remember that. Difficult when almost everything I do is minor ;) LilHelpa (talk) 00:14, 23 December 2009 (UTC)
- No problem. Please, though, don't mark talk page comments as minor. See WP:MINOR. Thanks. -- JHunterJ (talk) 01:46, 22 December 2009 (UTC)
- Bah, it's my own setting, not one from the list. Nevermind. Sorry. LilHelpa (talk) 00:20, 22 December 2009 (UTC)
himself herself
Please would somebody add himslef herslef themsleves. Kittybrewster ☎ 09:05, 23 December 2009 (UTC)
- Done Corrected existing rule here. Thanks Rjwilmsi 09:51, 23 December 2009 (UTC)
Error: enployed → empployed
AWB erroneously fixes "enployed" to "empployed" using the "Emp-" beginning rule. MANdARAX • XAЯAbИAM 11:26, 26 December 2009 (UTC)
- Done That should fix it. Rjwilmsi 11:39, 26 December 2009 (UTC)
Women's'
" Womens' " gets changed to " women's' " instead of " women's ". (Spaces added so you could actually see what I was talking about.) --Closedmouth (talk) 14:24, 26 December 2009 (UTC)
Capitalization of titles in other languages
A recent edit at Nicole Oresme cleaned up a lot of things, but incorrectly changed the word latin to Latin in the title of the following book in French.
- Wolowski, ed., Traictié de la première invention des monnoies de Nicole Oresme, textes français et latin d'après les manuscrits de la Bibliothèque Impériale, et Traité de la monnoie de Copernic, texte latin et traduction française (Paris, 1864)
French usage minimizes capitalization, and the lower cased latin was correct. Is there a way to make your capitalization changes language sensitive? Thanks. --SteveMcCluskey (talk) 22:01, 1 January 2010 (UTC)
- I would have updated the article but you've now removed those references. The answer is to use the {{lang}} template to enclose the foreign-language text. e.g. Smith, F. {{lang|fr|Quelques mots en français}}. Rjwilmsi 22:35, 1 January 2010 (UTC)
Purportrated
The word "purpotrated" gets fixed to "purportrated" by
<Typo word="Purport" find="\b(P|p)(?:urpo|erpor?)t(\w*)\b" replace="$1urport$2" />
.
It should, of course, become "perpetrated". I don't know if it's worth making a new rule for this uncommon typo, but I do think the "Purport" rule should be fixed so it doesn't catch it. MANdARAX • XAЯAbИAM 23:11, 1 January 2010 (UTC)
- Done "Purport" rule updated. Rjwilmsi 00:16, 3 January 2010 (UTC)
"à la" fix disabled
I've disabled the fix for "à la" because there are lots of false positives (particularly on Spanish/Italian text). If we are to keep the rule we need to fine a more restrictive version with many fewer false positives. Rjwilmsi 17:28, 3 January 2010 (UTC)
rhtyhmic not detected
For some reason, AWB did not find "rhtyhmic" as misspelling of "rhythmic" here. Could someone please fix that? --bender235 (talk) 13:57, 5 January 2010 (UTC)
- Done here. Thanks Rjwilmsi 14:47, 5 January 2010 (UTC)
dissigner
"disigner" gets changed to "dissigner" instead of "designer" for some reason. --Closedmouth (talk) 13:56, 9 January 2010 (UTC)
- That would be because of this prefix regex:
- <Typo word="Dissi-" find="\b(D|d)isi([a-ko-z]|m[a-nq-z])(\w+)\b" replace="$1issi$2$3" />
- Doing a little back of the envelope testing with an English wordlist, I anticipate at least 50 words that if spelled incorrectly, could be transformed nonproductively (like your example), including things like:
- desiccant
- designed
- designator
- designer
- desirable
- The good news is they were spelled wrong before, so a new rule could anticipate those extra S's.
I could propose one, but it would require more testing than I can do right now before going live.
- I count only a few that should have one s and will be made incorrect, (disidentify, disimitate, disimitation), and about 30 words that the filter will correct (dissimilar, dissipated, dissipation).
- Those somewhat strange rules in the middle serve to exclude about 100 or so correct words that would be changed. These include words like disinterested, disinfect, disincline.
- One possible solution is to remove r from the range: \b(D|d)isi([a-ko-qs-z]|m[a-nq-z])(\w+)\b (the added part is in bold). This eliminates about half of the problem words, including your example, and only eliminates two of the legitimate corrections. This is at a cost of about 5% of the legitimate corrections.
- Again, these are all estimates and don't take into account the frequency with which the words are used, which is a big factor. Shadowjams (talk) 06:27, 13 January 2010 (UTC)
- There already is a fix for "Design", and it should fix the misspelling above.
- It is: <Typo word="Design" find="\b(D|d)[ei]s(?:sigi?n|gin|ing)(s?|ed|ers?|ing)\b" replace="$1esign$2" />
- Interestingly enough though, it won't catch these misspelling: desigins, desiginer, desiging. I think that could be added ( add |igin after |ing ) without breaking anything, but I can't test it right now. Shadowjams (talk) 07:21, 13 January 2010 (UTC)
- Done Design rule expanded for 'disign'. Rjwilmsi 08:24, 13 January 2010 (UTC)
- Thanks guys :) --Closedmouth (talk) 08:28, 13 January 2010 (UTC)
Rule for "platform"
Please add a rule that corrects things like "plattform" or "plataform" to "platform". --bender235 (talk) 22:37, 16 January 2010 (UTC)
- That's a start. Rule added for the two misspellings you give. Rjwilmsi 23:08, 16 January 2010 (UTC)
- Please check usage beforehand. The variant "Plattform" has numerous legitimate uses, among them PlattForm Advertising. "plataform" looks good, only exception seems to be PLATAFORM BL
"long tenured" -> "long-tenured"
Please add a rule that replaces "long tenured" with "long-tenured". Thanks. --bender235 (talk) 14:05, 11 January 2010 (UTC)
- Anyone? --bender235 (talk) 20:07, 19 January 2010 (UTC)
"approximatley" -> "approximately"
Could someone please add that rule? AWB did not detect it here (I changed it per hand). --bender235 (talk) 00:59, 17 January 2010 (UTC)
Spelling corrections in URLs
AWB permanently tries to correct spellings in URLs, like "www.xyz.com/india" -> "www.xyz.com/India". Can this be prevented? --bender235 (talk) 19:50, 16 January 2010 (UTC)
- Possibly. Do you have examples of an article with such a problem? Rjwilmsi 20:43, 16 January 2010 (UTC)
- E.g. Concepcion Quetzaltepeque El Salvador. AWB tried to correct "http://www.lonelyplanet.com/worldguide/destinations/central-america/el-salvador/essential?a=culture" to "http://www.lonelyplanet.com/worldguide/destinations/central-America/el-salvador/essential?a=culture" --bender235 (talk) 22:36, 16 January 2010 (UTC)
- Hmm, if that page were reformatted to use external links or citation templates the typo fixing would know to leave the URLs alone. Rjwilmsi 23:10, 16 January 2010 (UTC)
- E.g. Concepcion Quetzaltepeque El Salvador. AWB tried to correct "http://www.lonelyplanet.com/worldguide/destinations/central-america/el-salvador/essential?a=culture" to "http://www.lonelyplanet.com/worldguide/destinations/central-America/el-salvador/essential?a=culture" --bender235 (talk) 22:36, 16 January 2010 (UTC)
- Done There was an AWB bug report, which has been fixed for the next release. Rjwilmsi 20:17, 20 January 2010 (UTC)
Dates in succession boxes
Hi, I have noticed that AWB removes the spaces between the years and the – in succession boxes, which is contrary to Wikipedia:WikiProject_Succession_Box_Standardization/Guidelines#B._Years_and_dates (point vii, a). Could someone please fix this. Thanks ~~ Phoe talk ~~ 19:38, 20 January 2010 (UTC)
- That guideline contravenes WP:YEAR. Rjwilmsi 20:13, 20 January 2010 (UTC)
- Yes, but WP:Year doesn't mention succession boxes. Additionally sometimes full dates are used in succession boxes (for example in articles about music albums or about boxes), which wouldn't come under WP:Year, but perhaps rather under MOS:DOB. Finally MOS:DASH allows exceptions in lists, so why should'nt it also apply for succession boxes ? (Yes I know that succession boxes are an odd version of a list). If I have not convinced you, then please consider this as settled. Best wishes ~~ Phoe talk ~~ 20:48, 20 January 2010 (UTC)
- Yes, date ranges are different to year ranges. WP:DASH perhaps has a clearer explanation of why. AWB is not removing spaces in date ranges, only year ranges. The WP:DASH exception for lists is the extra use of endahses, not an exception to allow year ranges to be spaced. Rjwilmsi 21:16, 20 January 2010 (UTC)
- I agree with Rjwilmsi on this one. --bender235 (talk) 16:20, 23 January 2010 (UTC)
- Yes, date ranges are different to year ranges. WP:DASH perhaps has a clearer explanation of why. AWB is not removing spaces in date ranges, only year ranges. The WP:DASH exception for lists is the extra use of endahses, not an exception to allow year ranges to be spaced. Rjwilmsi 21:16, 20 January 2010 (UTC)
- Yes, but WP:Year doesn't mention succession boxes. Additionally sometimes full dates are used in succession boxes (for example in articles about music albums or about boxes), which wouldn't come under WP:Year, but perhaps rather under MOS:DOB. Finally MOS:DASH allows exceptions in lists, so why should'nt it also apply for succession boxes ? (Yes I know that succession boxes are an odd version of a list). If I have not convinced you, then please consider this as settled. Best wishes ~~ Phoe talk ~~ 20:48, 20 January 2010 (UTC)
"whoom" -> "whom"
Could someone please add that rule? Thanks. --bender235 (talk) 16:21, 23 January 2010 (UTC)
- Testing: \b([Ww])hoo+m\b => $1hom right now. I'll see if there are problematic false positives. Shadowjams (talk) 00:01, 25 January 2010 (UTC)
- That expression works, but it's not a common typo. Scanning the November database dump I only find that misspelling used in 4 articles, 2 of which are intentional, and 2 of which I corrected. The two misspellings were added by one editor. I'm going to hold off adding it. Shadowjams (talk) 01:55, 25 January 2010 (UTC)
"intitled" -> "entitled"
Please add that one. Found it here, but AWB did not detect (changed it manually). --bender235 (talk) 22:03, 27 January 2010 (UTC)
- Make sure not to correct "intitle", this is used in query strings for Google Books URLs, e. g. Populares. Paradoctor (talk) 22:34, 27 January 2010 (UTC)
- Do you have any indication that there is another rule that generally handles this, but didn't in this case? I can't find (in a very quick search, admittedly) a rule that would have matched this. I'll work on a new one, but if there's an old one that should have gotten it, knowing that would be very helpful. Shadowjams (talk) 08:26, 29 January 2010 (UTC)
- Ok, this should work. I don't want to add it in quite yet because I haven't tested it very much, but feel free to add it to your add/replaces, and if you don't see any problems then go ahead and add it to the typo list.
- I'm not 100% that "intitled" is a typo, the dictionary references I looked up were a little unclear. But I don't think it's a problem edit either. In most cases "entitled" is going to be more right than "intitled", although I wonder if there are cases where "intitled" is correct. I'm not sure.
- The other downside, I can't offhand think of a way to keep the case correct while transforming letters, so you'll need two rules, one for "Intitled" and one for "intitled". Just change the first letters, respectively. This one should also catch a simple transposition or deletion in the middle (the most likely typo).
- Find: \binti[tl]{1,2}ed\b
- Replace: entitled
- Let me know how it works out. I'm using it on my own personal set at the moment. Shadowjams (talk) 08:52, 29 January 2010 (UTC)
- I'm finding a lot of English language quotes, particularly in legal opinions, from the 1800s and before use "intitled". Perhaps we need to make sure any edit doesn't change a quote. Shadowjams (talk) 08:59, 29 January 2010 (UTC)
- Don't know why I missed this, but my Merriam Webster lists "intitle" as an archaic version of "entitle". Paradoctor (talk) 09:08, 29 January 2010 (UTC)
- There are ways to exclude quoted statements like this, but all of them that I'm coming up with right now are pretty processor intensive. There might be a way to creatively limit this, at some expense of type 2 errors, that is less processor intensive. I might revisit it at another time. I would recommend against using the above regex unless you're extremely careful you're not changing a quote. Shadowjams (talk) 09:10, 29 January 2010 (UTC)
- Paradoctor - That is what I found, more or less as well. I don't think there's a problem converting modern text, but we certainly don't want to alter any quotes that use it. Because AWB uses the .net regex library there are some non-greedy expressions that aren't possible in most other regexes that might fix this nicely... but I'm concerned that most solutions will eat a lot of processing power. If some others have ideas I'd like any advice. Shadowjams (talk) 09:14, 29 January 2010 (UTC)
- AWB does not apply the typo fixing rules within templates e.g. {{cquote}} or within quote marks e.g. " and all the common variations. Rjwilmsi 09:38, 29 January 2010 (UTC)
- Oh, ok, so in a find-replace yes, but not within AWB/t? Shadowjams (talk) 09:42, 29 January 2010 (UTC)
- I'm not sure I understand your question. I'll explain my answer again in more detail in the hope it does answer your question: when AWB executes a typo rule from the WP:AWB/T list it first hides the quotes then applies the typo regexes, then unhides the quotes again. If you apply the regex by other means you will not get this quote hiding (unless you write a custom module to access the functions). Rjwilmsi 09:55, 29 January 2010 (UTC)
- Sorry for the confusion. That wasn't very clear. You understood what I meant though. I believe, in that case, that the above should fix what the OP was talking about. Of course, the question of whether or not the i version is appropriate in the modern context is still open, although I would assume not especially controversial. Shadowjams (talk) 10:14, 29 January 2010 (UTC)
- I'm not sure I understand your question. I'll explain my answer again in more detail in the hope it does answer your question: when AWB executes a typo rule from the WP:AWB/T list it first hides the quotes then applies the typo regexes, then unhides the quotes again. If you apply the regex by other means you will not get this quote hiding (unless you write a custom module to access the functions). Rjwilmsi 09:55, 29 January 2010 (UTC)
- Oh, ok, so in a find-replace yes, but not within AWB/t? Shadowjams (talk) 09:42, 29 January 2010 (UTC)
- AWB does not apply the typo fixing rules within templates e.g. {{cquote}} or within quote marks e.g. " and all the common variations. Rjwilmsi 09:38, 29 January 2010 (UTC)
- Paradoctor - That is what I found, more or less as well. I don't think there's a problem converting modern text, but we certainly don't want to alter any quotes that use it. Because AWB uses the .net regex library there are some non-greedy expressions that aren't possible in most other regexes that might fix this nicely... but I'm concerned that most solutions will eat a lot of processing power. If some others have ideas I'd like any advice. Shadowjams (talk) 09:14, 29 January 2010 (UTC)
- I'm finding a lot of English language quotes, particularly in legal opinions, from the 1800s and before use "intitled". Perhaps we need to make sure any edit doesn't change a quote. Shadowjams (talk) 08:59, 29 January 2010 (UTC)
E.g.
The rule for “e.g.” (currently fourth among new additions) adds left bracket, for example “eg.” → “(e.g.”. This should be fixed by removing the bracket. Svick (talk) 04:08, 30 January 2010 (UTC)
- I originally put it there, and then its structure was changed, and then User:Marek69 disabled it, then made some changes and renabled it. The original one had a leading ( because the overwhelming majority of examples I found were at the beginning of parentheticals, which makes sense when you consider how people use the abbreviation. It is probably adding it because it was removed by Marek without changing the corresponding output.
- I had tested the first version and was reasonably confident it didn't have many (I never found any) false positives. I cannot say the same about this new version. I am going to revert it back to the earlier version with a note. If someone wants to test it and change it that's fine too, but I think we're seeing some problems with it right now. Shadowjams (talk) 22:38, 30 January 2010 (UTC)
- Another small question. Is E.g. ever proper in the Manual of style? (compared to e.g.). I don't know the answer, but wanted to bring it up. Shadowjams (talk) 22:41, 30 January 2010 (UTC)
- The last version didn't work again (changed “eg.” to “(e.g.”, but didn't change “(eg.”), so I disabled it. Before it is turned on again, please make sure it works as it should. Svick (talk) 23:36, 30 January 2010 (UTC)
- Looks fixed now. My mistake for not noticing that Marek's change was correct; the simplification is where it caused the problem.
- If there are false positives without the (, then we'll need to note those here. Shadowjams (talk) 02:41, 31 January 2010 (UTC)
"Discoverinig" -> "Discovering"
AWB accidently replaced "Discoverinig" with "Discoverining" here, but it should be "Discovering", of course. --bender235 (talk) 23:31, 6 February 2010 (UTC)
- That appears to be a result of the "-ining" regex, which is (?!\b(?:(?:Br|Kl|M|H|St)e|Nar|Kurt|Lap)inig\b)\b(\w+)inig(s|ly)?\b. I don't see any systematic way to fix this class of typos without interfering with the others. In other words, "inig" that should be "ing" are virtually indistinguishable from "inig" that should be "ining". If someone has some way to distinguish the two that would be useful, but I can't think of one right now.
- I also don't know which is more common, but that could be a useful exercise. Shadowjams (talk) 08:24, 7 February 2010 (UTC)
Fluorescent
Using the "-escent" rule, AWB changes "floresent" to "florescent". Although that is a valid word, the more likely intended word is "fluorescent". A wiki search for "fluorescent" produced 1042 articles, and "florescent" found 32 pages. For those 32, I fixed the incorrect usages, discovering that all except 3 were actually intended to be "fluorescent". MANdARAX • XAЯAbИAM 21:31, 9 February 2010 (UTC)
- I've expanded the "Fluoresce" rule and removed "|[Ff]lu?or" from the "-escent" rule. I excluded "florescent" and "florescence" from "fixing" as they are correctly spelled words; however, as noted above, they're extremely rare on Wikipedia and the "fluo..." word is almost always the intended one, so if anyone thinks it's better without the exclusion, feel free to remove it. MANdARAX • XAЯAbИAM 04:09, 21 February 2010 (UTC)
New or Fix existing typos
I have come across a couple typos that are either not working or need to be added. Below are a few that I have found that either need to be added or don't seem to be working.
- occassion to occasion. This exists in the typo list but doesn't seem to work all the time.
- Philidelphia to Philadelphia. This exists in the typo list but doesn't seem to work all the time.
- Pitsburg, Pittsberg, Pittsburg to Pittsburgh --Kumioko (talk) 18:37, 23 February 2010 (UTC)
- But there are a lot of places named Pittsburg without the "h".--BillFlis (talk) 20:44, 9 March 2010 (UTC)
- The rule for "Occasion" seemed correct for the case you cite, but I expanded it a little anyway to catch more misspellings.--BillFlis (talk) 20:51, 9 March 2010 (UTC)
- Thanks, not great at regex developement myself. --Kumioko (talk) 20:58, 9 March 2010 (UTC)
Workign -> Working
For some reason, AWB tried to replace "Workign" with "Wooking" here (I correct it manually), but it should be "Working". --bender235 (talk) 20:10, 9 March 2010 (UTC)
- Fixed.--BillFlis (talk) 20:42, 9 March 2010 (UTC)
'yound' / 'young' and 'switchs' / 'switches'
I've noticed both while looking over today's recent changes. Are they sufficiently notable to include in the list? Mephistophelian (talk ● contributions) 22:37, 17 March 2010 (UTC)
Distict
Looks like Distict is changed to Distinct. ("<Typo word="Distinct_" find="\b(D|d)is(?:ctinc|tic|inc|t[ai]n(?=ti))t(i(ve|on|vely)|ly)?\b" replace="$1istinct$2" /> ") But it might as well be a typo for District. (Especially if capitalized).--ospalh (talk) 09:58, 18 March 2010 (UTC)
- Yes, it can be. I haven't found any good ways to differentiate between the two. Rjwilmsi 10:27, 18 March 2010 (UTC)
Exception for "antarctica" rule
Could someone please add an exception for Sinfonia antartica to that "Antarctica" rule, because I falsely "fixed" that on Vernon Handley [35], and I don't think many people know that it in fact isn't a typo. --bender235 (talk) 14:31, 19 March 2010 (UTC)
- Done updated rules. Rjwilmsi 08:27, 25 March 2010 (UTC)
Occasionally
Why is "occasionanlly" corrected to "occasionnally", from an incorrect spelling to another incorrect spelling? I know that there is a rule for -anlly -> -nally, but it shouldn't apply to that case. PleaseStand (talk) 02:40, 25 March 2010 (UTC)
- Done Rule updated to avoid that one. Rjwilmsi 08:22, 25 March 2010 (UTC)
'on bored' / 'on board'
While AWB caught that 'their' should've been 'there', it missed 'on bored'. Mephistophelian † 14:52, 26 March 2010 (UTC)
- This doesn't appear to be a very common misspelling: [36] and there are also appropriate uses of the words "on bored" together, such as: "...blames anti-social behaviour in her area on bored News Night presenters...". –xenotalk 14:55, 26 March 2010 (UTC)
Nearly
AWB tried to replace "neraly" with "nerally" here (I fixed it manually), but it should be "nearly". --bender235 (talk) 16:58, 26 March 2010 (UTC)
- Looks like it comes from the "ally" suffix. Not sure how to fix. –xenotalk 17:02, 26 March 2010 (UTC)
<Typo word="-ally" find="\b(\w+(?:[cdglntv]i|nt|ic|io?n|er|son))aly\b" replace="$1ally" /> <!--Don't match B(r)ialy, Castaly--><!--see also "-ually"-->
- "Neraly" is evidently a very rare error—I just fixed the only other one I found in wikipedia.—BillFlis (talk) 17:54, 26 March 2010 (UTC)
- Yet it could happen again. Don't forget that. --bender235 (talk) 23:17, 26 March 2010 (UTC)
Done Expanded year rule to catch "neraly". Rjwilmsi 14:04, 27 March 2010 (UTC)
A couple more possible changes
I have stumbled upon a couple more typos that I think might be useful additions to the list
- adn to and
- thier to their -- rule exists
- establishement to establishment Done
- etal to et. al.
- amry to army Done
- aviaror to aviator --Kumioko (talk) 18:02, 26 March 2010 (UTC)
- Added "Establishment". "Amry" seems to be a proper name, as are definitely "Thier", "Thiers", and "Etal", so need caution. I didn't find any occurrences of "aviaror" in wikipedia.--BillFlis (talk) 18:56, 26 March 2010 (UTC)
- Thanks, how did you search WP for that? --Kumioko (talk) 19:10, 26 March 2010 (UTC)
- Enter whatever in "search" and click "Search" (not "Go"). But even "Go" will find Thiers and Etal, as they have their own articles.--BillFlis (talk) 07:25, 27 March 2010 (UTC)
- Thanks, how did you search WP for that? --Kumioko (talk) 19:10, 26 March 2010 (UTC)
- It's not "et. al." but "et al.", which is short for "et alii" (meaning "and others"). --bender235 (talk) 23:20, 26 March 2010 (UTC)
I suggest:
- "the hoi polloi" be changed to "hoi polloi" (tautology).
- Also "return back" to "return".
- Also "their were" to "there were". -- rule exists Rjwilmsi 14:08, 27 March 2010 (UTC)
Kittybrewster ☎ 12:43, 27 March 2010 (UTC)
accidently
AWB fixed "acidentaly" with "acidentally" here, but it should've been "accidently" of course (I later fixed it manually in the article). --bender235 (talk) 22:56, 27 March 2010 (UTC)
- What "accidently"? -- wikt:accidentally. Rjwilmsi 23:47, 27 March 2010 (UTC)
- Done New rule for "Accident" to fix single 'c'. Rjwilmsi 23:54, 27 March 2010 (UTC)
XML
I would like to suggest we put this in a <syntaxhighlight lang="xml">, add an <?xml> note, create a simple inline DTD, and encase the typos in a <typos> tag. This would allow easier use by common XML parsers like expat and DOM.--Ipatrol (talk) 03:03, 26 March 2010 (UTC)
- Each section is already so encased; it doesn't show up on the page, you have to edit it to see it. What are the advantages of doing what you're suggesting? Why, for example, would anyone ever want to use an XML parser (whatever that is!)?--BillFlis (talk) 10:28, 26 March 2010 (UTC)
- Its a programming technique that makes it easier for applications like AWB to pull the information in. It could potentially be used by applications outside WP. For example if you created a program that looked at word documents for typos you could use this list rather than create your own. --Kumioko (talk) 11:11, 26 March 2010 (UTC)
Exactly! All I'm proposing is to follow a few standards here. --Ipatrol (talk) 05:18, 8 April 2010 (UTC)
- That would require rewriting whatever AWB uses to parse this input, which I assume is a very slight modification of some Microsoft .Net format (or whatever this thing's written in). The Microsoft regex library is actually pretty advanced with its look backs and look forwards, etc. Anyway, wouldn't it be simpler (or less disruptive maybe) to write a parser to translate the awb page into the XML format?
- Actually, if I get really bored that's something I might be interested in doing. What exactly is the target format (what xml ref specifically)? Shadowjams (talk) 05:21, 8 April 2010 (UTC)
Transsexual vs. Transexual Menace
While granting that "transsexual" is the more common spelling (albeit not a politically neutral one), people using this bot keep changing the name of the "Transexual Menace", which is definitely spelled with one s. Can somebody with a better grasp of the flavor of regex used here put in an exception for that organization? Shmuel (talk) 20:26, 29 March 2010 (UTC)
- Done Exception added. Rjwilmsi 22:17, 29 March 2010 (UTC)
- Thanks! Shmuel (talk) 18:35, 30 March 2010 (UTC)
Swtich -> Switch
As you can see by looking at this one it is easy to miss with the naked eye and there are quite a few hits for it. Many if you include WP: and User: etc. It doesn't seem to be covered on the list (I had a go at writing the line but definitely didn't have it right sorry). There is also a section above which has the request for switchs -> switches. Many tahnks ~ R.T.G 21:31, 1 April 2010 (UTC)
- 'Swtich' seems like a genuine typo, although I can also imagine cases where the user mistyped the 'w' and the result should have been 'stich'. 'Switchs' could be a typo of "switch's" or 'switches'. Mephistophelian † 21:48, 1 April 2010 (UTC)
- That would mean "switch ownings" possesive or "switch has", "switch is" etc., wouldn't it? The page switch has hundreds of the word but none of "switchs" or "switch's". The same goes for w:switch with 25 matches for "switch" but none for the others. ~ R.T.G 01:02, 2 April 2010 (UTC)
- Actually in searching for it "switch's" there are 53 hits for it in the possesive. "The switch's place...", "The switch's number..." etc. ~ R.T.G 01:05, 2 April 2010 (UTC)
- I may have missed some of the above, but this regex should work:
- Actually in searching for it "switch's" there are 53 hits for it in the possesive. "The switch's place...", "The switch's number..." etc. ~ R.T.G 01:05, 2 April 2010 (UTC)
- That would mean "switch ownings" possesive or "switch has", "switch is" etc., wouldn't it? The page switch has hundreds of the word but none of "switchs" or "switch's". The same goes for w:switch with 25 matches for "switch" but none for the others. ~ R.T.G 01:02, 2 April 2010 (UTC)
- <Typo word="Switch" find="\b(S|s)wtit?ch\b" replace="$1witch" />
- I haven't tested it yet, I may later, but if someone does before me and it performs well, please put it on the list. Shadowjams (talk) 01:50, 2 April 2010 (UTC)
- The rule was <Typo word="Switch" find="\b(S|s)wti?ch\b" replace="$1witch" />, and it's been added and expanded. Mephistophelian † 04:36, 2 April 2010 (UTC)
- It's testing well. I had the extra t? in there to catch for another typo, but I didn't see any examples of it when testing, so I doubt it's a common typo. I also didn't know you could use the (|x|y|z) construction to work the same as (x|y|z)?. I don't think there's a meaningful difference between the two, but interesting to note. Shadowjams (talk) 05:12, 2 April 2010 (UTC)
- Switch-er/s? ~ R.T.G 08:16, 2 April 2010 (UTC)
- It took me a minute to realize what you were saying... yeah, those are unlikely words, but I doubt they'd be false positives either. On a scan of the last database dump I only found 9 instances of "switches" being misspelled in the same way. We do have a somewhat reflexive tendency to reform the regex rules, but I don't think there's any harm in this instance. Shadowjams (talk) 08:28, 2 April 2010 (UTC)
- Switch-er/s? ~ R.T.G 08:16, 2 April 2010 (UTC)
- It's testing well. I had the extra t? in there to catch for another typo, but I didn't see any examples of it when testing, so I doubt it's a common typo. I also didn't know you could use the (|x|y|z) construction to work the same as (x|y|z)?. I don't think there's a meaningful difference between the two, but interesting to note. Shadowjams (talk) 05:12, 2 April 2010 (UTC)
- The rule was <Typo word="Switch" find="\b(S|s)wti?ch\b" replace="$1witch" />, and it's been added and expanded. Mephistophelian † 04:36, 2 April 2010 (UTC)
restricitve -->restrictive
It looks like AWB misses this incorrect spelling "restricitve" for the word "restrictive". Here is the dff for this one: [37].
And it appears that AWB consistently misses mispellings of the word "metamaterial". Here is one example: [38], although it is the plural of the word. I think the singular comes out the same. As you can see "Metamaterails" is this particular misspelling. Another misspelling that I myself do is "Metamterials" (I leave out an "a" from time to time). If you can add these, it would be much appreciated. Steve Quinn (formerly Ti-30X) (talk) 02:11, 9 April 2010 (UTC)
- restrictive Done here. Rjwilmsi 10:50, 9 April 2010 (UTC)
- "Metamaterails" Done here. Rjwilmsi 10:58, 9 April 2010 (UTC)
incoroporated --> incorporated
Done Requesting addition. I've already fixed most of them manually. -- Ϫ 10:39, 15 April 2010 (UTC)
- I expanded the previous incorporate rule. Shadowjams (talk) 00:49, 16 April 2010 (UTC)
mileage → fuel economy
This is a suggestion, perhaps the term mileage should be replaced with fuel economy; since it is technically more accurate and can resolve English variations. --JovianEye (talk) 18:32, 19 April 2010 (UTC)
- I don't think changing English variations is a good fit for AWB/T. Mileage also has other uses besides the fuel economy one, which would be hard to detect with a regular expression. -- JHunterJ (talk) 18:49, 19 April 2010 (UTC)
- Agreed. This is a good sort of thing to do with the find replace portion in AWB, because then the user's aware of the peculiarities of the expression. I have a few of these listed that I use here. They can work well, but they occasionally require a human touch.
- In fact, just searching for the word mileage, I didn't see a single example on the first page where that change would be correct, or beneficial. In most vernaculars too, even when the change would be ok, it makes the sentence more awkward. I'd be cautious before changing too many of these. Shadowjams (talk) 19:37, 19 April 2010 (UTC)
- As they say, your fuel economy may vary. But I can see myself changing some instances of "better mileage" with "greater fuel economy", although only by hand. In the U.S., the government-recognized measure of a car's fuel economy is in miles per gallon (mpg), hence the prevalence of "mileage" in this sense.--BillFlis (talk) 22:19, 19 April 2010 (UTC)
Not done Not a good fit for Typofixing module. –xenotalk 19:38, 19 April 2010 (UTC)
Etc -> Etc.
I'm not entirely sure about this one. On the one hand it is correct to add dot to show abbreviation on the other hand most of the time the dot becomes a full stop in a sentence, which is also not useful. Regards, SunCreator (talk) 16:43, 27 April 2010 (UTC)
- I'm not sure I follow. The rule right now won't add an additional dot if there already is one, and there's no difference typographically between a period and a fullstop. It also fixes "ect" typos. Shadowjams (talk) 18:58, 27 April 2010 (UTC)
- The rule adds a dot if there isn't one. One sentence looks like it is two as 'etc.' can look like the sentence ended. Regards, SunCreator (talk) 19:27, 27 April 2010 (UTC)
- e.g. "Table showing key dates, mileages, running numbers, etc for all class members." -> "Table showing key dates, mileages, running numbers, etc. for all class members." Regards, SunCreator (talk) 19:35, 27 April 2010 (UTC)
- Ah, I see. Well that's it's expected behavior. I pulled the rule from Wikipedia:Manual of Style (abbreviations) in the table. Actually I've done most of those rules there that are amenable to being fixed without a lot of issues. For example, I haven't done this with "Ltd" or "St" or "Rd" because in commonwealth spellings the period is left off... but I have done it in latinized abbreviations because I think that (maybe I'm wrong) the punctuation is appropriate in those cases in all English traditions.
- A side note, I fix "a.k.a." myself, but I haven't put it into the typo rules because it's used in templates (which typo might not mess with) as well as it comes up occasionally in legitimate areas, so typo-fixing is too blunt an instrument for those cases. I haven't seen any trouble with etc. like that, but I'll see what others have to say too. Shadowjams (talk) 19:41, 27 April 2010 (UTC)
parallelled -> paralleled
This recommend change is incorrect and not a typo. It's only a difference in national varieties of english. Regards, SunCreator (talk) 20:55, 27 April 2010 (UTC)
- Done Rule corrected. Thanks Rjwilmsi 18:51, 28 April 2010 (UTC)
philadelphia -> Philadelphia
There are some exceptions. Chroicocephalus philadelphia, Larus philadelphia and Oporornis philadelphia, all birds. Regards, SunCreator (talk) 18:13, 28 April 2010 (UTC)
- Done Rule updated. Rjwilmsi 18:55, 28 April 2010 (UTC)
Record false positives?
Can you create a list of false positives with AWB? I don't mean log the article but the actual spelling that you are double clicking to skip? Regards, SunCreator (talk) 15:00, 1 May 2010 (UTC)
- You might want to cross post this to Wikipedia talk:AutoWikiBrowser/Feature requests. I don't know if the devs watch this talk. Shadowjams (talk) 19:51, 1 May 2010 (UTC)
reasoninig
is changed to reasonining instead of reasoning. --Closedmouth (talk) 09:52, 2 May 2010 (UTC)
- Done new rule added. Rjwilmsi 12:28, 2 May 2010 (UTC)
- I found no occurrences of "reasoninig" on wikipedia.--BillFlis (talk) 12:29, 2 May 2010 (UTC)
- I've added a generic rule. Rjwilmsi 13:20, 2 May 2010 (UTC)
- I found no occurrences of "reasoninig" on wikipedia.--BillFlis (talk) 12:29, 2 May 2010 (UTC)
Hone in on -> Home in on
I believe this is a false positive, e.g on article Architect. The correct word is 'hone' to my knowledge. http://dictionary.reference.com/browse/hone 5. to make more acute or effective; improve; perfect: to hone one's skills. Regards, SunCreator (talk) 21:41, 4 May 2010 (UTC)
- Quite right. Removed [39]. –xenotalk 22:00, 4 May 2010 (UTC)
- An old one too! –xenotalk 22:03, 4 May 2010 (UTC)
- You would not of thought I failed English at school would you? ;) Regards, SunCreator (talk) 22:11, 4 May 2010 (UTC)
- FWIW, I did some research and at one point "hone in on" considered incorrect; but language being socially-constructed, it appears to be taking hold. –xenotalk 22:17, 4 May 2010 (UTC)
- 'Hone in on' has been around for a while as has the more common home Regards, SunCreator (talk) 22:37, 4 May 2010 (UTC)
- FWIW, I did some research and at one point "hone in on" considered incorrect; but language being socially-constructed, it appears to be taking hold. –xenotalk 22:17, 4 May 2010 (UTC)
- You would not of thought I failed English at school would you? ;) Regards, SunCreator (talk) 22:11, 4 May 2010 (UTC)
- An old one too! –xenotalk 22:03, 4 May 2010 (UTC)
Gabrilites => Gabrilities
A false positive I believe on Camber MacRorie (Gwynedd). Who or what are Gabrilities? Google has no idea. Regards, SunCreator (talk) 12:41, 5 May 2010 (UTC)
- Done Rule updated. Rjwilmsi 13:12, 5 May 2010 (UTC)
Can stomache => stomach be added.
A regular typo. Regards, SunCreator (talk) 13:40, 5 May 2010 (UTC)
- Done, also catches "stomoch" and "stumach", which were also found in a search.--BillFlis (talk) 14:38, 5 May 2010 (UTC)
- Thank you. Regards, SunCreator (talk) 15:28, 5 May 2010 (UTC)
Can cheerfull => cheerful be added.
Noticed it on Shubhendu. Regards, SunCreator (talk) 17:00, 5 May 2010 (UTC)
- Done Rule updated. Rjwilmsi 18:06, 5 May 2010 (UTC)
Understook => Understood
From Jagdish Lal Raj Soni "He understook a course in Design from London University in 1968." A false positive to change Understook to Understood but instead should be "He undertook a course" etc. Regards, SunCreator (talk) 16:23, 5 May 2010 (UTC)
- This one is hard. The underlying rule is <Typo word="(Mis)Understood" find="\b(U|u|[Mm]isu)nderstoo[^d]\b" replace="$1nderstood" /> which will correct any "understoo" phrase whose last letter is not a d. I suppose we could change it to <Typo word="(Mis)Understood" find="\b(U|u|[Mm]isu)nderstoo[^dk]\b" replace="$1nderstood" /> and then create a new rule of find="\b(U|u|[Mm]isu)nd[ea]rs(took(?:en)?|taken?)\b" replace="$1nder$2", but really all that rule fixes for is typos with an extra s. I don't know how common those are. (also that rule needs tested before it's inserted). Shadowjams (talk) 23:53, 5 May 2010 (UTC)
- It is not common at all. I would just leave it I think. Regards, SunCreator (talk) 00:07, 6 May 2010 (UTC)
Not done, not required. Regards, SunCreator (talk) 23:38, 7 May 2010 (UTC)
saskatoon
About <Typo word="Saskat(chewa/oo)n" find="\bsaskat(chewa|oo)n\b" replace="Saskat$1n" /> : there is a berry called saskatoon, or amelanchier alnifolia as well as the city of Saskatoon, so maybe we shouldn't automatically capitalise it.--ospalh (talk) 14:29, 7 May 2010 (UTC)
- I'll change it to preserve the case, because I don't think there's any good way to distinguish between the two cases. Shadowjams (talk) 23:31, 7 May 2010 (UTC)
- Actually I disabled it because it didn't make any changes except to capitalize it. If there are common misspellings of it though, let me know what those are and I can try and work them into the rule and have it preserve case. Shadowjams (talk) 23:37, 7 May 2010 (UTC)
Done
immediately
Missed this one, it was idmediatly a hit single. Regards, SunCreator (talk) 22:41, 9 May 2010 (UTC)
- I'm not sure what you mean by "missed this one", but this is apparently a single instance of this error in all of wikipedia. Do you really think we need to establish a permanent rule to search for this weird error ("idmediatly") every time somebody runs AWB?--BillFlis (talk) 01:34, 10 May 2010 (UTC)
- Yes, we want to avoid rare/implausible misspellings but few current matches doesn't necessarily mean there won't be some more on a regular basis in the future. If we can expand an existing rule then let's do it. Rjwilmsi 10:24, 10 May 2010 (UTC)
- This should do it. I've tried to make it broader to justify its purpose more. Someone might test it first.
- Yes, we want to avoid rare/implausible misspellings but few current matches doesn't necessarily mean there won't be some more on a regular basis in the future. If we can expand an existing rule then let's do it. Rjwilmsi 10:24, 10 May 2010 (UTC)
- find="(I|i)[^m]?med[ai]+(?:et+e|t+)(ly)?" replace="$1mmediate$2"
- The 2nd noncapturing grouping ensures it misses legitimate spellings (for performance issues). Shadowjams (talk) 10:53, 10 May 2010 (UTC)
Typo errors to be fixed
Hi, Pls suggest if the following can be done.
Spacing
- space after a full-stop except in i.e., Eg. My name is ABC.<space>This is an example.
- space after a comma including in i.e., Eg. My name is ABC,<space>working on typos.
Punctuations
- i.e. should have comma at end like i.e., Eg. - Brihadeeswarar Temple
- full-stop added after etc but not removed after reference tag.
General
- The templates are not moved to the bottom of the page along with the categories. Any particluar reason for doing so? - Chittoor
Spelling corrections
- spelling correction for diffrenet corrected as differenet instead of different - Madhyamaheshwar - required manual correction
- spelling correction for precident corrected as precedent instead of president - Nathdwara - required manual correction
- spelling correction for Dhinig corrected as Dhining instead of Dihing - Negheriting Shiva Doul - required manual correction
--Thaejas (talk) 12:00, 13 May 2010 (UTC)
- The spelling corrections are easy, so long as there aren't conflicts with other correctly spelled words. The spacing is an issue because periods and commas are used in other contexts where spacing wouldn't be appropriate. For example, in chemistry articles (off the top of my head). In addition, I worry about the extra processing required to have regex threads running on every punctuation.
- Thanks. Is it possible to include a spelling and grammar checkers in the edit window similar to the email systems or as in MS Word? I am not sure if a javascript can handle such a plug-in. But if this is possible, it would help the editors a lot before saving articles. --Thaejas (talk) 00:51, 14 May 2010 (UTC)
- That would be a developer issue. The typo engine runs (I think) as a plugin and so it can't affect the UI like that. Shadowjams (talk) 03:44, 14 May 2010 (UTC)
- Thanks. Is it possible to include a spelling and grammar checkers in the edit window similar to the email systems or as in MS Word? I am not sure if a javascript can handle such a plug-in. But if this is possible, it would help the editors a lot before saving articles. --Thaejas (talk) 00:51, 14 May 2010 (UTC)
- As for moving templates, there are lots of templates that belong on a specific part of a page, not at the end. Infoboxes, for example. In fact, anytime you see something in {{ something here }}, that's a transclusion, meaning it's copying content from another source to the current page. If there's no prefix to it, it's assumed to be a template (e.g., {{Template:Infobox person}} is the same as {{Infobox person}}. Also, moves like that are better handled through the AWB developers directly rather than through the typo rules, which are less powerful than what you can do with a full programming language.
- Thanks. Have raised the issue here. --Thaejas (talk) 00:51, 14 May 2010 (UTC)
- Etc., i.e., and e.g., don't include trailing commas intentionally, because they can be used in cases where that is not advised, there might be other punctuation, at the end of a sentence, etc., although maybe the rule should ensure some sort of punctuation afterwards. Anyone else have ideas about that?
- The spelling corrections are easy, so long as there aren't conflicts with other correctly spelled words. The spacing is an issue because periods and commas are used in other contexts where spacing wouldn't be appropriate. For example, in chemistry articles (off the top of my head). In addition, I worry about the extra processing required to have regex threads running on every punctuation.
- Finally, I'm not sure what you mean by the full-stop after reference tag. Reference tag punctuation is a common problem, but it's not quick to fix. I have an expression I use to fix it here. You could use it in your own personal find-replace settings. Shadowjams (talk) 18:05, 13 May 2010 (UTC)
- Thanks. Your fix is precisely what I am looking for. But can this fix be a part of the general typo fixes instead of being in editor's find-replace? --Thaejas (talk) 00:51, 14 May 2010 (UTC)
- I would probably be ok with that, but others might not be. I've been using that fix for a long time and I have almost no problems with it. However, it doesn't handle indefinite number of reference tags, only a finite number, and each time you up that number it increases the processing workload. I don't know how it would affect the overall processing power, but I'd also like to hear others discuss that.
- Thanks. Your fix is precisely what I am looking for. But can this fix be a part of the general typo fixes instead of being in editor's find-replace? --Thaejas (talk) 00:51, 14 May 2010 (UTC)
- Finally, I'm not sure what you mean by the full-stop after reference tag. Reference tag punctuation is a common problem, but it's not quick to fix. I have an expression I use to fix it here. You could use it in your own personal find-replace settings. Shadowjams (talk) 18:05, 13 May 2010 (UTC)
- I'll take a look at those spelling ones above after bit. Shadowjams (talk) 03:44, 14 May 2010 (UTC)
1 by 1 -- Different
I'll do these in their own section so they're easier to follow. The "different" rule doesn't correct, or corrects as it does, because the 2nd group (the non capturing one) has a rule for "f" alone by itself, without the e, but most of all because it accepts any suffix to it. I think that's probably a poor design on the end, so I've changed it to require a trailing t, or some similar suffix. If this makes it miss a lot of stuff, let me know. Shadowjams (talk) 05:01, 14 May 2010 (UTC)
- Actually I changed it by adding an extra optional e. This is easier and allows the expansive word capture on the end. I don't think that's good design generally, but it seems to work for now. Shadowjams (talk) 05:08, 14 May 2010 (UTC)
Manoeuver maneuver UK/US
Quoting user Mjroots on my talk: "Manoeuver is British English spelling, whereas maneuver is American English."
I think AWB changed it because of this regexp:
<Typo word="Maneuverable" find="\b(M|m)anoeuverab(ility|le)\b" replace="$1aneuverab$2" />
Greetings from Amsterdam, Kwiki (talk) 17:54, 15 May 2010 (UTC)
- This alternative spelling wikt:manoeuvre needs consideration too. Mjroots2 (talk) 05:20, 16 May 2010 (UTC)
Exception to "other than" rule
The "other then" -> "other than" rule produces some false positives in cases like "Other then-popular things [...]" or "Other then-known stuff [...]". I suggest someone should add an exception to that rule, saying that "if 'then' is followed by any letter, it should not be replaced with 'than'". --bender235 (talk) 15:46, 18 May 2010 (UTC)
- Fixed with this edit. -- JHunterJ (talk) 16:01, 18 May 2010 (UTC)
Church or church?
I'm curious why AWB replaces "Catholic church" with "Catholic Church", and "Methodist Church" with "Methodist church". I think the spelling should be consistent (I'd prefer lower case, for that matter). --bender235 (talk) 15:46, 18 May 2010 (UTC)
- But the Catholic Church uses uppercase. The Catholic Church is the worldwide entity, each Methodist church is a building serving a local congregation, as I understand it. -- JHunterJ (talk) 15:57, 18 May 2010 (UTC)
Other than
If you run RegExpTypoFix on Ibn Battuta, you will see that "The other then sailed away without him" is changed to "The other thansailed away without him". This is a false positive, of course, and it would be too hard to put that right, but why did the rule drop the space after "then"? Should the "replace" string be "$1 than$2"?
<Typo word="More/Greater/Less/Rather/Other than" find="\b([Mm]ore|[Gg]reater|[Ll]ess|(?:[Rr]a|O|o)ther)\s+then(?:\s)" replace="$1 than" /> John of Reading (talk) 20:24, 18 May 2010 (UTC)
- I had a (non-fatal) error in my fix above. It's refixed now. -- JHunterJ (talk) 20:29, 18 May 2010 (UTC)
- Thanks John of Reading (talk) 20:32, 18 May 2010 (UTC)
Fondation - Foundation
Fondation occurs a lot, it's a French word I believe and mostly a false positive. Regards, SunCreator (talk) 06:12, 23 May 2010 (UTC)
- If the rule excluded capitalized versions and italicized versions, would that eliminate most of these false positives? Shadowjams (talk) 06:20, 23 May 2010 (UTC)
- That is a good idea. Yes I think that capitalized Fondation is most false positives. Regards, SunCreator (talk) 06:28, 23 May 2010 (UTC)
- Ok, I'll test out a version like that. Exclusions are a reg-ex nightmare :P but let me see if I can make it work. Shadowjams (talk) 06:32, 23 May 2010 (UTC)
- I'm having some trouble with this. For example, try \b(!:Fondation)(F|f)o(?:ud?n|nd)ation(s|al|ally|less)?\b and replace it with what's in the rule right now ($1oundation$2). If your input is "fondation" then it skips it, although I would think it shouldn't. Is it not case sensitive on the (!:.) capture group? Is this a .net thing or am I just making some mistake here? (by the way, this is a simplified version of a broader issue I'm having with the !: groups). Shadowjams (talk) 07:01, 23 May 2010 (UTC)
- I think you mean (?!pattern), the zero-width negative lookahead assertion? But I prefer the negative look-behind at the end, as theoretically faster (so that it only looks behind if a potential match has been identified): (?<!pattern) -- JHunterJ (talk) 12:36, 23 May 2010 (UTC)
- That's it. Thank you. I always get that wrong. Shadowjams (talk) 01:42, 24 May 2010 (UTC)
- Fixed [40]. Shadowjams (talk) 02:27, 24 May 2010 (UTC)
- That's it. Thank you. I always get that wrong. Shadowjams (talk) 01:42, 24 May 2010 (UTC)
- I think you mean (?!pattern), the zero-width negative lookahead assertion? But I prefer the negative look-behind at the end, as theoretically faster (so that it only looks behind if a potential match has been identified): (?<!pattern) -- JHunterJ (talk) 12:36, 23 May 2010 (UTC)
- I'm having some trouble with this. For example, try \b(!:Fondation)(F|f)o(?:ud?n|nd)ation(s|al|ally|less)?\b and replace it with what's in the rule right now ($1oundation$2). If your input is "fondation" then it skips it, although I would think it shouldn't. Is it not case sensitive on the (!:.) capture group? Is this a .net thing or am I just making some mistake here? (by the way, this is a simplified version of a broader issue I'm having with the !: groups). Shadowjams (talk) 07:01, 23 May 2010 (UTC)
- Ok, I'll test out a version like that. Exclusions are a reg-ex nightmare :P but let me see if I can make it work. Shadowjams (talk) 06:32, 23 May 2010 (UTC)
- That is a good idea. Yes I think that capitalized Fondation is most false positives. Regards, SunCreator (talk) 06:28, 23 May 2010 (UTC)
Sect > etc.
Would someone technical enough mind fixing whatever caused this mistaken typo-correction? Thanks! ╟─TreasuryTag►draftsman─╢ 14:04, 23 May 2010 (UTC)
- Yeah, that was my mistake, it's been fixed above. If you reload it it should be fixed. Shadowjams (talk) 01:44, 24 May 2010 (UTC)
etc again
etc => etc is most common correction by far. So it's worth going over.
- /etc => /etc. ✗ Fail - the /etc is a common computer folder
- etc) => etc.) ✓ Pass - common and correct
- etc(end of line character) => etc. ✓ Pass again common
- etc, => etc., ✓ Pass common
- etc; => etc.; ✓ Pass
- etc. => etc. i.e No change ✓ Pass
- etc any word => etc. any word ✓ Pass✗ Fail technically correct, but I would prefer => etc., I amend most of these manually. The following space indicates it's used mid sentence and so etc., fits nicely.
Regards, SunCreator (talk) 06:46, 23 May 2010 (UTC)
- I authored that rule; good catch on the unix filestructure ones -- that needs to be an exclusion. The last one, are you talking about the fact there's no trailing comma? If that's it, it's really hard to distinguish between when there should and shouldn't be one (I think) unless I'm misunderstanding that. Shadowjams (talk) 06:57, 23 May 2010 (UTC)
- When etc is follow by a space (indicating a word is following) comma is good. So 'etc ' => 'etc., ' but when other character like closed bracket a comma, or end of line then 'etc' => 'etc.' Regards, SunCreator (talk) 07:03, 23 May 2010 (UTC)
- Ok. I fixed the unix filestructure issue (I hope). I'll handle that other one tomorrow (if someone else doesn't first). Thanks Shadowjams (talk) 07:07, 23 May 2010 (UTC)
- Thanks. Regards, SunCreator (talk) 07:27, 23 May 2010 (UTC)
- This rule now removes leading spaces in front of the word "etc". Please fix that. --Closedmouth (talk) 10:12, 23 May 2010 (UTC)
- I reverted this. Hopefully if you reload it will be working again? Regards, SunCreator (talk) 11:18, 23 May 2010 (UTC)
- Ok. I fixed the unix filestructure issue (I hope). I'll handle that other one tomorrow (if someone else doesn't first). Thanks Shadowjams (talk) 07:07, 23 May 2010 (UTC)
- When etc is follow by a space (indicating a word is following) comma is good. So 'etc ' => 'etc., ' but when other character like closed bracket a comma, or end of line then 'etc' => 'etc.' Regards, SunCreator (talk) 07:03, 23 May 2010 (UTC)
- Sorry for the trouble on this one. Could someone help me with why this doesn't work: find="(?!/etc)(E|e)(tc\b([^\.\w])|ct\b\.?)" Replace="$1tc.$3" Shadowjams (talk) 02:29, 24 May 2010 (UTC)
- That should do it. Rjwilmsi 11:09, 24 May 2010 (UTC)
- Sorry for the trouble on this one. Could someone help me with why this doesn't work: find="(?!/etc)(E|e)(tc\b([^\.\w])|ct\b\.?)" Replace="$1tc.$3" Shadowjams (talk) 02:29, 24 May 2010 (UTC)
didnt and thats
Thats goes to that's in some cases. Didnt goes to didn't but according to WP:CONTRACTION they would go to that is and did not. Regards, SunCreator (talk) 08:01, 23 May 2010 (UTC)
- Maybe there is some reason? Possibility of being in quotes? Regards, SunCreator (talk) 08:04, 23 May 2010 (UTC)
- Hmm, I'm not really in favour of the typo list enforcing WP policy like that. Rjwilmsi 10:30, 24 May 2010 (UTC)
- Okay, well quite a few will be in quotes which makes it part of general fixes rather then typos anyhow. Regards, SunCreator (talk) 15:38, 24 May 2010 (UTC)
- Hmm, I'm not really in favour of the typo list enforcing WP policy like that. Rjwilmsi 10:30, 24 May 2010 (UTC)
Capitialisation of countries, religions etcetera in parts of words
- "the Slavic population was germanized by Germans" => "the Slavic population was Germanized by Germans"
- "during the christianization of the eleventh century" => "during the Christianization of the eleventh century"
- "pro-american politics" => "pro-American politics"
Capitalization like those seems a bit odd. I have been skipping them, but I'm unsure of what is correct. Regards, SunCreator (talk) 11:29, 23 May 2010 (UTC)
- All proper nouns, so believe they're all correct. Rjwilmsi 15:22, 23 May 2010 (UTC)
- Okay, just me then. Regards, SunCreator (talk) 15:36, 24 May 2010 (UTC)
Can we add certifed => certified
A re-occurring typo. Regards, SunCreator (talk) 15:34, 24 May 2010 (UTC)
NF-kB => NF-κB
the protein NF-κB should not be written with k, but rather the greek letter κ (kappa)
it should also be written exactly in that way: with 3 capital letters and the greek one
correct:
- NF-κB
wrong:
- NF-kB
- NF-kb
- nf-kb
- NFKB
- nfkb
- etc....
--Time9 (talk) 16:56, 24 May 2010 (UTC)
- Done with this edit. -- JHunterJ (talk) 19:35, 24 May 2010 (UTC)
departement
A false positive occurred on Ida Copeland. Harrods departement stores => Harrods département stores Yet, correct is:Harrods department stores. Regards, SunCreator (talk) 18:59, 24 May 2010 (UTC)
- I think a false positive is when the typo lists tries to fix something that's not broken. This is broken, but the fixer guessed wrong on the fix. The AWB user should catch that, but if not, the article is no worse off -- it's just as wrong as it was to start with. In this case, I'm not sure how to make it distinguish between those two possible fixes. -- JHunterJ (talk) 19:10, 24 May 2010 (UTC)
- 50/50 chance I guess. So can forget this. Regards, SunCreator (talk) 19:25, 24 May 2010 (UTC)
HoTCays?
Why would it want to do this or this? And it's not because I added it to the search and replace. Enter CBW, waits for audience applause, not a sausage. 21:38, 24 May 2010 (UTC)
It also wants to replace utilidoor with utiTCoor. Nothing in the settings file and it want's to make the changes even if the typo fixing is off. Enter CBW, waits for audience applause, not a sausage. 21:49, 24 May 2010 (UTC)- Unless you've hacked the edit summary those are find & replace changes, not typo fixes. Rjwilmsi 22:00, 24 May 2010 (UTC)
- Never mind. I have no idea what caused that but deleting AWB and the settings that were stored in a separate folder followed by redownloading it has fixed the problem. Enter CBW, waits for audience applause, not a sausage. 22:35, 24 May 2010 (UTC)
ie
I've been getting a lot of false positives changing "ie" to "i.e." in hostnames with ie, the Irish top-level domain. For example, downloadmusic.ie in the article 2008 in Irish music. The current rule is:
<Typo word="i.e." find="\bi(?:\.?e|e\.)(['\s,:;\)&-])" replace="i.e.$1" />
MANdARAX • XAЯAbИAM 06:11, 26 May 2010 (UTC)
- Yeah, I'm aware of that problem. Most of those should be avoided if they're in a full url, but the ones that aren't in link templates won't be. It also shows up on a few other web addresses. One possibility is to add (?!\.ie\b) as an exclusion to the beginning (I've had a lot of trouble with those lately so I'll let someone else test that before adding it in). Shadowjams (talk) 10:33, 26 May 2010 (UTC)
- I noticed there is quite a few to skip past. Why can't .ie be ignored? Surely it's enough to have a dot infront rather then checking for a complete url? Regards, SunCreator (talk) 14:19, 26 May 2010 (UTC)
- Yeah, I'm aware of that problem. Most of those should be avoided if they're in a full url, but the ones that aren't in link templates won't be. It also shows up on a few other web addresses. One possibility is to add (?!\.ie\b) as an exclusion to the beginning (I've had a lot of trouble with those lately so I'll let someone else test that before adding it in). Shadowjams (talk) 10:33, 26 May 2010 (UTC)
Done with this update. Rjwilmsi 18:42, 26 May 2010 (UTC)
Can we add milatary => military
Occurrences here. Regards, SunCreator (talk) 00:30, 29 May 2010 (UTC)
- Fixed here. Shadowjams (talk) 05:36, 29 May 2010 (UTC) Done
french => French
We don't have this Capitalisation in Cultures, languages, and ethnic groups or elsewhere. Regards, SunCreator (talk) 00:34, 29 May 2010 (UTC)
- Done Done Shadowjams (talk) 05:31, 29 May 2010 (UTC)
- Except for french fries.--BillFlis (talk) 12:55, 29 May 2010 (UTC)
- french fries says you can use French fries with a reference. Regards, SunCreator (talk) 16:48, 29 May 2010 (UTC)
- Well, you can use "French" but "french fries" ("sometimes capitalized") and "french-fried" don't need "correcting".--BillFlis (talk)
- Good point. Should we exclude that one example, or is the rule generally problematic? I think we do the language capitalizations generally, notwithstanding other similar examples. Shadowjams (talk) 08:52, 30 May 2010 (UTC)
- There's also the verb french ("often capitalized"), which doesn't take any particular words after it. Also, french curve is only "often capitalized F".--BillFlis (talk) 12:28, 30 May 2010 (UTC)
winnining
Hello, winninig gets changed to winnining instead of winning. It's probably not very common but whatever. --Closedmouth (talk) 12:55, 8 June 2010 (UTC)
- This rule would be the problem <Typo word="-ining" find="(?!\b(?:(?:Br|Kl|M|H|St)e|Nar|Kurt|Lap)inig\b)\b(\w+)inig(s|ly)?\b" replace="$1ining$2" /><!--Don't match (Br/Kl/M/H/St)einig, (Nar/Kurt/Lap)inig-->.
- I'm honestly not sure exactly what that rule's fixing. Maybe someone can explain it, in which case I'd be more comfortable adding the exclusion for Closedmouth's example. Shadowjams (talk) 06:02, 9 June 2010 (UTC)
- It fixes typos like "beginig". No harm to add a new rule for "-inninig" to "-inning" above this one. Rjwilmsi 09:39, 9 June 2010 (UTC)
Defered from Wikipedia:AutoWikiBrowser/Tasks
I noticed today that there are many articles with the word "Olympic" or "Olympics" misspelled. Common misspellings are "Oylmpic", "Olmypic", and "Olypmic". Would a bot be able to fix these spellings, or am I in the wrong place? Thanks, GaryColemanFan User Talk:GaryColemanFan 9:05 pm, 27 May 2010, Thursday (19 days ago) (UTC−6) -- Cit helper (talk) 06:04, 16 June 2010 (UTC)
- I added a rule here. It corrects your suggestions "Olmypic" and "Olypmic", as well as "Olypic" and "Olymic" (and of course all their plurals), but I was not able to find any instances of "Oylmpic", so that's not included.--BillFlis (talk)
False Positives
N'Sync ---> NSYNC
Cit helper (talk) 01:46, 15 June 2010 (UTC)
- I may not understand you correctly: I couldn't find any occurrences of "sync" on any of those pages.--BillFlis (talk) 09:58, 15 June 2010 (UTC)
- Yes, that was a suggestion that has been brought to my attention, not a FP...Cit helper (talk) 06:04, 16 June 2010 (UTC)
- The numbered entries have False Positives with various words (this was just a dump from false_positive.txt).Cit helper (talk) 06:04, 16 June 2010 (UTC)
- Yes, that was a suggestion that has been brought to my attention, not a FP...Cit helper (talk) 06:04, 16 June 2010 (UTC)
Axel Finet -> Axel Finite (False Positive, Name) Article: Nick Tarabay
—Preceding unsigned comment added by Cit helper (talk • contribs) 07:46, 16 June 2010 (UTC)
- Agree about Finet it has several uses. Regards, SunCreator (talk) 10:53, 16 June 2010 (UTC)
- I corrected the various "Finite" and "-finite" rules not to change "Finet".--BillFlis (talk) 12:07, 16 June 2010 (UTC)
"achiveved" -> "achieveved"
AWB replaced "achiveved" with "achieveved" here, which is obviously incorrect. Could someone please fix the rule? --bender235 (talk) 13:08, 17 June 2010 (UTC)
Nurnberg
Someone please add a rule that replaces "Nurnberg" with either "Nürnberg" oder "Nuremberg" (I suggest the latter would be more appropriate). --bender235 (talk) 21:21, 18 June 2010 (UTC)
- The are several articles with "Nürnberg" in the title (e.g., German cruiser Nürnberg), although the city is under "Nuremberg". I found some (probably incorrect) occurrences of "Nurnburg" (with U for E).--BillFlis (talk) 11:56, 19 June 2010 (UTC)
Incorrect pluralizations
Please check this regex to see if it would be a good addition:
<Typo word="-eys" find="\b([Aa]ttorn|[Dd]onk|[Mm]edl|[Pp]ull|[Tt]urk)(?:ie|y)s\b" replace="$1eys" />
In particular, I question whether "Medlys" and "Medlies" should be included here, in a separate rule ("Medly" seems quite common), or not at all. PleaseStand (talk) 23:56, 19 June 2010 (UTC)
- I'm testing it now. It's mostly catching "attornies". Don't see any issues with it yet. Shadowjams (talk) 04:09, 20 June 2010
- It's pretty frequent too, more frequent than many of our rules. I'll add it in to the new additions. Shadowjams (talk) 04:14, 20 June 2010 (UTC)
- I removed the "Medlies" rule because I found the quoted term "Monstrous Medlies" at Colley Cibber (it's sourced to a book). You can add that back if you like. Shadowjams (talk) 04:18, 20 June 2010 (UTC)
- It's pretty frequent too, more frequent than many of our rules. I'll add it in to the new additions. Shadowjams (talk) 04:14, 20 June 2010 (UTC)
Importing Typo list for other languages
I would like to use this great plugin for my language but when i try to enable RegexTypoFix checkbox it is saying it will load typos list from english wikipedia. But I want to set it to download from my own langauge wikipedia. How can i do this? -- Mahir78 (talk) 10:29, 22 June 2010 (UTC)
- Add <!--Typos:http://wiki.riteme.site/w/index.php?title=Wikipedia:AutoWikiBrowser/Typos&action=raw-->, replacing the en with whatever language you want, to the local checkpage. —Reedy 10:40, 22 June 2010 (UTC)
Playright -> Playwright
There is a publishing house "Playright publishing". Is there a way to make sure the word is not replaced when it is either 1. capitalized or 2. followed by the word "publishing" ?--Muhandes (talk) 10:37, 23 June 2010 (UTC)
- Yes, you can protect that deliberate misspelling by applying the Sic template.--BillFlis (talk) 16:00, 23 June 2010 (UTC)
- Thanks, I obfuscated it with Sic on wherever it was used. --Muhandes (talk) 14:13, 24 June 2010 (UTC)
centerfield -> center field?
Concise Oxford has "centerfield" as a valid word. Should it really be replaced with center field? --Muhandes (talk) 14:13, 24 June 2010 (UTC)
childrens' → children's
I'm not sure how to add this but it is very common. Currently it does childrens' → children's' which is incorrect. If someone could add this it would be most helpful. --Muhandes (talk) 14:11, 24 June 2010 (UTC)
- I just had it work correctly at least in one case. It might be that the times when it didn't work were due to ’ used instead of '? I will have to supply an example of a page not working correctly I guess. --Muhandes (talk) 14:43, 24 June 2010 (UTC)
- Sorry for multiple edits, but I was right. The problem is indeed with the use of the second type of apostrophe. Namely, childrens’ → children's’ see Amerika-Gedenkbibliothek for example. --Muhandes (talk) 14:48, 24 June 2010 (UTC)
- I modified the rule to handle both types of apostrophe.--BillFlis (talk) 17:07, 24 June 2010 (UTC)
mens → men's
We have childrens → children's and womens → women's, why not mens → men's ? If this is appropriate, can anyone add it please? --Muhandes (talk) 09:17, 25 June 2010 (UTC)
- Because of Mens, Mens sana in corpore sano, Mens rea, etc. (Latin phrases), as well as Mens Sana Basket. However, I did add a rule to change "mens'" to "men's".--BillFlis (talk) 11:55, 25 June 2010 (UTC)
Widly
Please extend the -ely rule to catch that. I am also considering "falsly" and "sparsly" but am unsure whether it would be worth the processing time. PleaseStand (talk) 01:15, 26 June 2010 (UTC)
- I'll check it out. I wouldn't worry about the processing time for those too much. Strangely though, that rule only finds those roots that have "in" or "un" at the front. I think that's unintentional... adding a ? to that first group would allow it to find all permutations. I'm testing that rule right now to see if there's some reason for it. Shadowjams (talk) 02:41, 26 June 2010 (UTC)
- Added here. Done Shadowjams (talk) 06:30, 26 June 2010 (UTC)
More then > More than
There must be something strange about this rule - it doesn't show up in the edit summary in the same way as the others. diff -- John of Reading (talk) 15:11, 27 June 2010 (UTC)
- Done Fixed. Rjwilmsi 13:16, 28 June 2010 (UTC)
- Is there a reason the search string ends with a space (\s) rather than a simple word boundary (\b)? The "then" (for "than") could be followed by a comma (perhaps separating a parenthetical phrase); e.g., "other than, say, sausages".--BillFlis (talk) 15:11, 28 June 2010 (UTC)
- Because we want whitespace, not a word boundary, to avoid false positives when "then" is an adverb and not a misspelled preposition. For instance, since (back then) I thought that was the explanation, I didn't say any more then. -- JHunterJ (talk) 15:23, 28 June 2010 (UTC)
- Is there a reason the search string ends with a space (\s) rather than a simple word boundary (\b)? The "then" (for "than") could be followed by a comma (perhaps separating a parenthetical phrase); e.g., "other than, say, sausages".--BillFlis (talk) 15:11, 28 June 2010 (UTC)
Metropolitan: Is this a bug?
AWB ignores typos in the version containing the link [[metropolitan bishop|metropoltian]], but it works with the version without that link.--Diwas (talk) 13:08, 29 June 2010 (UTC) (I had added the new typo rule yesterday.)--Diwas (talk) 13:11, 29 June 2010 (UTC)
This links Metropolitan bishop metropolitan are incompatible with this RETF-rule too. Is the rule incomplete? --Diwas (talk) 13:49, 29 June 2010 (UTC)
The AWB Regex Tester is replacing [[metropolitan bishop|metropoltian]] with [[metropolitan bishop|metropolitan]]--Diwas (talk) 14:06, 29 June 2010 (UTC)
- No bug, deliberate behaviour: under rev 5537 we did: fix https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia_talk:AutoWikiBrowser/Bugs#names_often_spelled_differently don't apply a typo fix if there is a wikilink target using that spelling. Rjwilmsi 10:09, 1 July 2010 (UTC)
- But it was finding false positives, which I have just corrected.--BillFlis (talk) 11:23, 1 July 2010 (UTC)
- I guess the link is https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia_talk:AutoWikiBrowser/Bugs/Archive_13#names_often_spelled_differently now.
- after edit conflict: Thank you for your answer. Now it works. Originally the link was correct, but I guess this correction of my simple rule was making it working. I guess if a rule match a link, the rule will be ignored in this article. But my bad rule was matching the correct spelling. thanx --Diwas (talk) 12:51, 1 July 2010 (UTC)
separete/separeted
I notice there is separeble but not separete/separeted. It is quite a common typo. Would be nice if someone could add it. --Muhandes (talk) 15:25, 29 June 2010 (UTC)
- The existing "(In)Separable" rule covers those variations too. Rjwilmsi 13:08, 1 July 2010 (UTC)
- It didn't until I modified it a couple of days ago. I should have commented here that it had been handled.--BillFlis (talk) 13:33, 1 July 2010 (UTC)
Request addition
Could someone please add:
ascession --> accession
it's in many many "list of monarchs" type articles and it's a blatant misspelling, there's too many for me to fix them all manually. -- Ϫ 22:12, 22 June 2010 (UTC)
- Sometimes, maybe "ascession" should be "ascension", no? They only differ by one letter.--BillFlis (talk) 00:03, 23 June 2010 (UTC)
- I considered that and actually checked and pretty much all of them deal with accession. The search turns up nothing but lists of consorts etc. Besides, ascession is much more likely to be mistaken for accession because of the similar sound, and "ascension" isn't commonly misspelled. -- Ϫ 05:34, 23 June 2010 (UTC)
Okay so is noone going to add this? -- Ϫ 02:33, 25 June 2010 (UTC)
- I'm not in the process of testing it right now, but if you'd like to, try this: Find: "\b(A|a)sc+es+[io]{2}n\b" Replace: "$1ccession". The extra stuff in the middle should catch the "io" "oi" switch, and I'd guess that ascension misspellings will probably include an "n" somewhere, which would exempt it from that regex. Shadowjams (talk) 05:58, 25 June 2010 (UTC)
- Oh you mean for me to test it? No I don't normally use AWB, I don't have it installed.
- I'm curious though, why people are avoiding adding this? -- Ϫ 03:17, 27 June 2010 (UTC)
- Hah. I'm sorry; I'm not avoiding it, I'm not sure if anyone else is, but I wouldn't see a reason why if that were so. I don't have a wiki dump handy right now which is why I can't test it immediately [I did earlier but I forgot about this one]. I'll try and take a look soon. I don't foresee any issues with what I proposed above, but I get a little cautious around these British monarch-related changes because they're used in all sorts of ways that I can't begin to comprehend, so I like to test those. I am pretty cautions though, it's not a catastrophic event if they're added and then later tweaked. Shadowjams (talk) 09:00, 27 June 2010 (UTC)
- It just seemed like some are hesitant about adding it. So if some readers here need some reassurance, I did my homework on this. The search for "ascession" gives only 47 results, while "accession" gives 11,334 results! The search for "ascession" turns up almost all "List of ____ consorts" type articles. In all of these articles the word is used in the context of the definition of "accession", not "ascension", or anything else. These articles all have similar tables in which this word appears multiple times, so I'm thinking the same person created all these tables and used the same misspelled word in all of them, not knowing that "ascession" isn't even a word! I checked in multiple dictionaries and even asked the gurus over at Wiktionary's Tea room. So I'm quite certain it's safe to add this to the list! :) Ϫ 08:26, 28 June 2010 (UTC)
Meh. I'm talking this page off my watchlist. -- Ϫ 00:01, 1 July 2010 (UTC)
- Wow. Sorry that things here weren't happening fast enough to please you. We hate to see you go, really, because we are entirely at your service, and your complete satisfaction is our only goal. The thing is, some of us are Old Farts, who check our email only about every couple of hours. Even then, we tend to think a bit before we act. Oh, and you forgot take a number, so we didn't even see you there at the end of the queue.--BillFlis (talk) 00:17, 1 July 2010 (UTC)
- I did put in a rule that you could try out. Presumably you use AWB, so you could plug it in and try a few. I haven't gotten around to doing that. It's nothing personal. I think that rule will work without any problems and someone can axe it if it starts acting up. Shadowjams (talk) 09:01, 1 July 2010 (UTC)
lol, thanks Bill. Ϫ 08:30, 2 July 2010 (UTC)
- Let's bury the hatchet you two. My regex from above will probably blank the main page. Actually... that'd be much more impressive than anything I've actually contributed. Let's hope for disaster. Shadowjams (talk) 10:47, 2 July 2010 (UTC)
- It looks OK to me. I found one "ascesion" that should be "ascension", which I fixed by hand.--BillFlis (talk) 11:02, 2 July 2010 (UTC)
Genious
It seems to be a common misspelling of "genius". PleaseStand (talk) 02:48, 3 July 2010 (UTC)
- Done Here. I haven't tested it yet, but offhand doesn't seem like a large risk. Shadowjams (talk) 06:10, 3 July 2010 (UTC)
"Practive"
May someone please add this to the misspelling list, to be replaced with "practice"? I'm a bit intimidated by the code. :) Search results bring up quite a few occurrences that are tedious to be fixed manually. Thanks, Airplaneman ✈ 06:19, 5 July 2010 (UTC)
- ...or maybe not. "Proactive" could be a possibility as well. I'll go through the search and manually fix them :) Airplaneman ✈ 06:26, 5 July 2010 (UTC)
non-metropolitan
Metropolitan: The shorter rule
"\b(M|m)etr(?:(?:op|po)lit|(?:opo?|po)lti)(\w*)\b"
is correct, too. Isn't it? --Diwas (talk) 20:17, 5 July 2010 (UTC)
- Thanx BillFlis, now I see you have it done already.--Diwas (talk) 20:26, 5 July 2010 (UTC)
non-metropolitan: What about this rule for finding words like semi-metroplitan, too?
"\b((?:\w+-)?(?:M|m))etr(?:(?:op|po)lit|(?:opo?|po)lti)(\w*)\b" --Diwas (talk) 20:17, 5 July 2010 (UTC)
- I think "non-metropolitan" should be replaced by "rural." And "semi-metropolitan" by "small-town". What do you think?--BillFlis (talk) 22:01, 5 July 2010 (UTC)
- I am not sure, I am not nativ-english, but the word rural entered my mind when I was reading non-metropolitan. But non-metropolitan is a legal term in England and the rule above covers all ...-metropolitan words. Semi-metropolitan is a rare word. I am not sure if there are other words with -metropolitan. --Diwas (talk) 07:56, 6 July 2010 (UTC)
- Non-metropolitan isn't a word I think I've ever heard, and semi-metropolitan is just as weird. I am a native speaker, and rural is not an antonym of metropolitan. This is the kind of example of what this project isn't appropriate for, although may be an appropriate fix in some cases. Shadowjams (talk) 08:00, 6 July 2010 (UTC)
Staus
Does this make sense? I want to replace "staus" with "status", but only when not capitalized (to avoid the surname). The misspelling seems to be very common. Thanks, PleaseStand (talk) 02:15, 6 July 2010 (UTC)
- Yeah. AWB is case sensitive, so that's possible. Shadowjams (talk) 02:25, 6 July 2010 (UTC)
- I knew that, so I have now added that rule. All or almost all occurrences of lowercase "staus" shown in a Wikipedia search should have been "status". PleaseStand (talk) 02:49, 6 July 2010 (UTC)
- Looks like a good addition. Shadowjams (talk) 07:02, 8 July 2010 (UTC)
- I knew that, so I have now added that rule. All or almost all occurrences of lowercase "staus" shown in a Wikipedia search should have been "status". PleaseStand (talk) 02:49, 6 July 2010 (UTC)
Merovingian
For some reason, AWB tried to replace "merovingian" with "Merovingia$2". Could someone please fix this? --bender235 (talk) 11:24, 8 July 2010 (UTC)
- Fixed.--BillFlis (talk) 11:48, 8 July 2010 (UTC)
Tamil Nadu
In an abundance of caution I have removed the following line from the New section of RETF.
- <Typo word="Tamil Nadu" find="\b[Tt]amil\s*[Nn]adu\b(?<!Tamil Nadu)" replace="Tamil Nadu" />
This appears to be effecting many articles and may be a legitimate spelling of the name since it's so prolithic across Wikipedia. Can we discuss this? Just want some reassurances that all these edits I'm doing won't have to be reverted. Not against it if it's right.--mboverload@ 00:49, 28 June 2010 (UTC)
- The Indian Government website has "Tamil Nadu". Also thehindu.com. -- John of Reading (talk) 06:06, 28 June 2010 (UTC)
- It appears to me that the intent of this rule is only to capitalize it and make it two words if it appears as one. Is it doing something else?--BillFlis (talk) 09:44, 28 June 2010 (UTC)
- Being prolific across Wikipedia is not an indication of legitimacy. Unless there is a reliable source that indicates it should not be capitalized or should not be two words, you have at least my reassurances that those edits shouldn't be reverted. -- JHunterJ (talk) 11:25, 28 June 2010 (UTC)
The rule has been restored. That's all it was doing. The reference desk said either could be accurate. Might as well stick with one. --mboverload@ 06:21, 5 July 2010 (UTC)
- I'd like to point out (in case it wasn't clear) that although the official name is indeed Tamil Nadu, correcting it is in many cases wrong. Specifically, as part of an organization's name, as we all agree organization names should not be "corrected" (my favorite example being Childrens Hospital Los Angeles). As some/most people are not aware of this and might be tempted to "correct" such instances, and it is indeed very prolific, it might be best to be prudent and not include this rule. --Muhandes (talk) 08:44, 12 July 2010 (UTC)
Manoeuvre - Manouvre
Per wikt:manoeuvre, this is a British English spelling, not a typo. Mjroots (talk) 10:21, 10 July 2010 (UTC)
- And what is "manouvre"?--BillFlis (talk) 05:49, 11 July 2010 (UTC)
"-keted"
I changed the "-keted" so it wont catch racketts, but still catches bracketted. I hope I did it correctly, first time I try my hand at this. --Muhandes (talk) 11:00, 12 July 2010 (UTC)
- It seems that "rackett" is a noun, not a verb, so there would be no such word as "racketted".--BillFlis (talk) 11:35, 12 July 2010 (UTC)
- Looking at the rule. it also captures the ending "s" and "ing", so in fact it catches "-keted", "-kets", "-keting". --Muhandes (talk) 12:07, 12 July 2010 (UTC)
heavily
WB tried to replace "heaively" with "heaively", but it should've been "heavily". Please fix. --bender235 (talk) 20:22, 3 July 2010 (UTC)
- Anyone? --bender235 (talk) 00:39, 16 July 2010 (UTC)
'Publisher=' parameter of cite template
I've noticed that the "publisher=" parameter of the cite templates is widely misused to specify the name of the newpaper or magazine; sometimes the person responsible realised that the name should be italicised, so they've manually added italics e.g. "publisher=''The Times''". Of course, the real problem is that it's the wrong parameter - what's really needed is "work=The Times". I've set up my own find and replace regex to correct anything of that form, specifying a long list of widely quoted newspapers and magazines. Could/should this be added to the list of automatic corrections somehow? Colonies Chris (talk) 16:24, 13 July 2010 (UTC)
- cannot be done as a typo fix, could be as a genfix. Rjwilmsi 17:28, 13 July 2010 (UTC)
- Was this dispute settled? I thought there was still a discussion on it. My (very limited) understanding is that the website is the work, the publisher is the entity behind it, so isn't "publisher=The Times" correct? --Muhandes (talk) 18:22, 13 July 2010 (UTC)
|work=
is the name of the publication/periodical/newspaper/website so is "The Times" for www.timesonline.co.uk etc. If|publisher=
is used then it's the parent company of the website (perhaps Times Newspapers Limited or News Corporation in this case); publisher isn't used for well known publications as it's no extra use. Rjwilmsi 18:44, 13 July 2010 (UTC)
two-fold, four-fold, hundred-fold etc.
Any reason why these are corrected? They show as valid in many dictionaries. --Muhandes (talk) 23:26, 14 July 2010 (UTC)
- Please, can anyone check this? I have seen several edits in the last few days using this rule (here's one) and I am hesitant. See wikt:two-fold, and wordnet also has four-fold, five-fold, six-fold seven-fold eight-fold nine-fold --Muhandes (talk) 10:31, 20 July 2010 (UTC)
- Hyphenated versions don't seem to be in the OED, so rule seems fine: twofold. Rjwilmsi 13:00, 20 July 2010 (UTC)
- But they do appear on Merriam-Webster two-fold, so it might be more American English.--Muhandes (talk) 14:04, 20 July 2010 (UTC)
- Hyphenated versions don't seem to be in the OED, so rule seems fine: twofold. Rjwilmsi 13:00, 20 July 2010 (UTC)
Subsidiary
<Typo word="Subsidiary" find="\b(S|s)u(?!bsidia)(?:[bd][is]+[bd][iu]?|b[ds]i?|d[ds]i)ar(ies|y)\b" replace="$1ubsidiar$2"/>
I believe we can do better than what we currently have. I was considering the above proposed regex to match "subsidiery" and its variants, but I don't want it to match "subseries". PleaseStand (talk) 04:13, 21 July 2010 (UTC)
- It is a little messy though, and it's surprising how little it matches in terms of misspellings.
<Typo word="Subsidiary" find="\b(S|s)u(?!bsidia)(?:[bd]+[is]+[bd][iu]?|b[ds]i?|[ds]+i)[ia]r(ies|y)\b" replace="$1ubsidiar$2"/>
- That might work. Shadowjams (talk) 04:54, 21 July 2010 (UTC)
<Typo word="Subsidiary" find="\b(S|s)u(?!bsidiar)[bd]?[is]+[abd][aeiu]*r(ies|y)\b" replace="$1ubsidiar$2"/>
Is this better? PleaseStand (talk) 19:09, 21 July 2010 (UTC)
Nera
Currently, "nera" is fixed to "near". Glyka Nera is a place in Greece and AWB suggested changing "Nera" to "Near". This of course is wrong and had it been in a large article full of suggested changes, I may not have noticed it. "Nera" with a capital N should not be corrected. McLerristarr (Mclay1) (talk) 12:18, 1 August 2010 (UTC)
- The word "near" could begin a sentence, such as, "Near the opera house is the city hall." Some of these things just have to be tolerated -- not saying this is necessarily one, but just sayin'. --Auntof6 (talk) 17:00, 1 August 2010 (UTC)
- Well, we can't possibly correct all typos. Someone could type "three" instead of "there", so that will never be corrected. It's better to be safe than sorry, we shouldn't rely on machines to do everything for us – good ol' copy-editing is always best. So in the case of three/there and Near/Nera, they should be left alone for people to find when reading. Perhaps "Nera" could only be left alone if it follows "Glyka". I don't know if that's possible. McLerristarr (Mclay1) (talk) 02:36, 2 August 2010 (UTC)
- It's trivial to exclude the uppercase version, or to exclude "Glyka Nera" or similar constructions. Is the proper use of "Nera" identifiable from the typos by excluding times it's followed by Glyka? Shadowjams (talk) 03:06, 2 August 2010 (UTC)
- Acording to Nera, "Nera" is the name of a company, a goddess and several places, so I think it should not be corrected. McLerristarr (Mclay1) (talk) 03:27, 2 August 2010 (UTC)
- I'm fine with that. I doubt it's a common typo, and it's easily spotted by regular editing. I'll go half-way and change the rule to only correct non-capitalized versions. Someone else can remove it completely if that seems appropriate. Shadowjams (talk) 03:59, 2 August 2010 (UTC)
- Thank you Just removing capitalised Nera is what I wanted. McLerristarr (Mclay1) (talk) 07:28, 2 August 2010 (UTC)
sapces
Can somebody please add "sapce" to change to "space" and "sapces" to change to "spaces". It is an easy typo to make and currently the typo exists in seven articles. In every case, it is a typo and not a foreign word. McLerristarr (Mclay1) (talk) 07:50, 2 August 2010 (UTC)
- Done With this change. Rjwilmsi 17:49, 2 August 2010 (UTC)
Enmei vs. Emmei and Ie vs. i.e. in Japanese pages
I noticed that AWB tries to change Enmei to Emmei in places such as "Enmei ryu" (a martial arts school) and "Enmei ji" (the name of a Buddhist temple). I always leave the page at Enmei because I have seen this spelling in various places online. But I have not been able to find a definitive answer as to witch is correct. Also I notice that the family name "Ie" gets picked up and changed to "i.e.". So those using AWB on Japan related pages need to take extra care before saving. Colincbn (talk) 06:26, 3 August 2010 (UTC)
- Exception added for "Enmei". Rjwilmsi 08:33, 3 August 2010 (UTC)
Compilaton
Compilaton - Compilation
There are 11 currently and I've started fixing them but it might as well go here. ϢereSpielChequers 12:49, 4 August 2010 (UTC)
- The existing "Compilation" rule already covers that one. Rjwilmsi 16:30, 4 August 2010 (UTC)
Italicise foreign words and phrases
As per WP:MOS, foreign words and phrases should be italicised. Common foreign words and phrases used in English include those in List of Latin phrases. I brought this up on Wikipedia talk:AutoWikiBrowser/Feature requests and someone suggested it would be better if the typo finder did it. McLerristarr (Mclay1) (talk) 02:39, 11 August 2010 (UTC)
- That could work although it of course has to be case by case. Non-English words are forever an issue when trying to write a new rule. Shadowjams (talk) 08:10, 11 August 2010 (UTC)
setle
Can somebody please change "setle" to "settle", "setler" to "settler", "setlers" to "settlers", "setling" to "settling" and "setled" to "settled"? McLerristarr (Mclay1) (talk) 04:54, 11 August 2010 (UTC)
- Done New rule added. Rjwilmsi 06:53, 11 August 2010 (UTC)
canvern
Can somebody please correct "canvern" to "cavern". I always make this mistake. McLerristarr (Mclay1) (talk) 02:40, 11 August 2010 (UTC)
- A search turned up no instances of "canvern" on wikipedia. You must be doing a good job of correcting yourself!--BillFlis (talk) 17:59, 13 August 2010 (UTC)
- Well, I usually edit with Safari, which has an automatic spell check so I usually notice when I make a mistake. I was thinking more for other editor's sake, but since the typo does not exist at the moment, it's probably not worth adding. McLerristarr (Mclay1) (talk) 07:28, 14 August 2010 (UTC)
i.e. and e.g.
- "i.e" should be correct to "i.e." ("e.g" already corrects to "e.g.")
- a colon after "i.e.", "i.e", "ie", "e.g.", "e.g" or "eg" should be removed as it is completely unnecessary and yet common
- McLerristarr (Mclay1) (talk) 07:35, 11 August 2010 (UTC)
- Interesting. As to your first point, it took me a little bit to figure out why it's doing that because when I wrote the rule I did it largely to correct that problem. Whatever you're running it on that doesn't correct is a case where "i.e" is not followed by either a single quote, a space, a colon, a comma, a semi-colon, a close parenthesis mark, an ampersand (for non-breakable spaces, etc.), or a dash. Do you have an example of a page with that in the wild? It was somewhat intentional as a safety feature to not over-correct. Perhaps using \b would be sufficient, but the rule as it is now is very stable.
- As to the second, I'd invite others to comment on that. I'm not enough of a style wonk to know the right answer to that. Shadowjams (talk) 08:05, 11 August 2010 (UTC)
- Here's a proof of concept on the first point:
perl -e '$x="i.e";$x=~s/\bi(?:\.?e|e\.)([\s,:;\)&-])(?<!\.ie.)/i.e.$1/;print "$x\n"' does not correct, while perl -e '$x="i.e ";$x=~s/\bi(?:\.?e|e\.)([\s,:;\)&-])(?<!\.ie.)/i.e.$1/;print "$x\n" does. Shadowjams (talk) 08:06, 11 August 2010 (UTC)
- Here's a proof of concept on the first point:
- Wikipedia:AutoWikiBrowser/Sandbox is what I used to test the first point. It does not correct "i.e" but it does correct "ie", "eg" and "e.g". McLerristarr (Mclay1) (talk) 08:29, 11 August 2010 (UTC)
- Right. It will correct "i.e " but not "i.e" It's rare if not non-existent in articles (i.e. supposes some text after it so it should have one of the demarcating characters; if it doesn't, it likely isn't the abbreviation). Shadowjams (talk) 08:33, 11 August 2010 (UTC)
- "i.e" could exist in a list. For example:
- List of Latin abbreviations:
- c.
- e.g.
- etc.
- i.e
- McLerristarr (Mclay1) (talk) 10:02, 11 August 2010 (UTC)
- I don't think that's at all likely.--BillFlis (talk) 11:26, 11 August 2010 (UTC)
- It's more likely than "i.e" not being related to "i.e." McLerristarr (Mclay1) (talk) 12:33, 11 August 2010 (UTC)
- I meant that it's so unlikely that it's not worth making a rule here for. An error in a far-fetched list like that is less likely than someone trying to type "Ile" or "ile" and accidentally hitting the period key for the "l".--BillFlis (talk) 13:45, 11 August 2010 (UTC)
- If that were true, it would have to be in a list as well, or at the end of a paragraph that is missing a full stop. I just thought that making "i.e" always correct to "i.e." no matter what followed it would only require deleting the code that specified something followed it. I don't know though, I have no idea how this thing works. Either way, what's happening about the second point? McLerristarr (Mclay1) (talk) 03:16, 12 August 2010 (UTC)
- If "i.e" only corrects to "i.e." if followed by a space, what if "i.e" was followed by a punctuation mark such as a comma or colon? McLerristarr (Mclay1) (talk) 12:33, 13 August 2010 (UTC)
- It will work if it's followed by a space or any of these characters (in bold): ' : , ; ) & -. My reason for writing it this way was to avoid situations where ie might be used in some different, but correct way. I don't remember what exactly prompted that, maybe I found something testing or maybe I was being overly cautious. It's also important that rules don't catch correct versions of the words, and this helps with that, although you could do it other ways too. Shadowjams (talk) 19:27, 13 August 2010 (UTC)
- I meant that it's so unlikely that it's not worth making a rule here for. An error in a far-fetched list like that is less likely than someone trying to type "Ile" or "ile" and accidentally hitting the period key for the "l".--BillFlis (talk) 13:45, 11 August 2010 (UTC)
- It's more likely than "i.e" not being related to "i.e." McLerristarr (Mclay1) (talk) 12:33, 11 August 2010 (UTC)
- I don't think that's at all likely.--BillFlis (talk) 11:26, 11 August 2010 (UTC)
- Right. It will correct "i.e " but not "i.e" It's rare if not non-existent in articles (i.e. supposes some text after it so it should have one of the demarcating characters; if it doesn't, it likely isn't the abbreviation). Shadowjams (talk) 08:33, 11 August 2010 (UTC)
- Wikipedia:AutoWikiBrowser/Sandbox is what I used to test the first point. It does not correct "i.e" but it does correct "ie", "eg" and "e.g". McLerristarr (Mclay1) (talk) 08:29, 11 August 2010 (UTC)
Metres per seconds?
I use find (\d)(\s)?m/s, which I replace with $1 m/s. Hasn't caused me any problems so far. Headbomb {talk / contribs / physics / books} 23:33, 10 August 2010 (UTC)
Doesn't AWB already do that internally?I see what you're doing... you're adding spaces in those conversions. If you wanted to expand that rule though you could do: "(\b\d+)\s*m(etere?s)?(/| per | a )s(econd)\b" and replace it with "$1 m/s", although that's more expensive. Shadowjams (talk) 23:46, 10 August 2010 (UTC)- I have a few bucks here....--BillFlis (talk) 02:14, 11 August 2010 (UTC)
- I've no idea why you'd want to clutter the regex that way, but I ain't the AWB guru, so what do I know. Use whatever works, I'll be happy with it. Also this should just cover the symbols, and not the words "metres/second", the point is to add the non-breaking space in before m/s.Headbomb {talk / contribs / physics / books} 07:10, 11 August 2010 (UTC)
- Your first version's better than my convoluted second. Shadowjams (talk) 08:07, 11 August 2010 (UTC)
- I've no idea why you'd want to clutter the regex that way, but I ain't the AWB guru, so what do I know. Use whatever works, I'll be happy with it. Also this should just cover the symbols, and not the words "metres/second", the point is to add the non-breaking space in before m/s.Headbomb {talk / contribs / physics / books} 07:10, 11 August 2010 (UTC)
- I have a few bucks here....--BillFlis (talk) 02:14, 11 August 2010 (UTC)
Any updates on this? Headbomb {talk / contribs / physics / books} 16:02, 13 August 2010 (UTC)
- Added to AWB general fixes: rev 7015 support m/s as an SI unit for non-breaking space insertion. Rjwilmsi 11:34, 18 August 2010 (UTC)
Phrases
Is it entirely a good idea to correct the phrases at the bottom of the project page? If they were part of a quote, they would not need a sic tag since they are technically not incorrect. An editor may not notice they have correct something that should not have been corrected. McLerristarr | Mclay1 23:40, 17 August 2010 (UTC)
- When typo fixing all editors have to look out for untemplated quoted material. For such situations if there are problems {{sic}} can be used in hidden mode. Rjwilmsi 11:27, 18 August 2010 (UTC)
Fixing decent --> descent
This is a surprisingly common misspelling, in phrases like ".. he is of Asian decent .." , but obviously isn't suitable for a general typo fix. However, I think a regex to pick up anything of the form "of U(.*?)(an|ish|ic) decent"
. (where U represents an uppercase character) would find most of them without any false positives. My regex skills aren't up to it though - could someone more knowledgeable add this to the list? Colonies Chris (talk) 11:08, 18 August 2010 (UTC)
- I'll do a database scan for this one first, and if it goes well I'll add it as a new rule. Rjwilmsi 11:23, 18 August 2010 (UTC)
- Done New rule added (~140 matches in database scan). Rjwilmsi 14:12, 18 August 2010 (UTC)
Propellor
Can somebody please add 'propellor' to change to 'propeller'. McLerristarr | Mclay1 23:38, 17 August 2010 (UTC)
- According to Merriam-Webster, "propellor" is an alternative spelling.--BillFlis (talk) 03:44, 18 August 2010 (UTC)
- Wiktionary says propeller is 'more correct'. On that basis I'd say it's fair to add it. Rjwilmsi 11:25, 18 August 2010 (UTC)
- Are we sure it's not an WP:ENGVAR issue? –xenotalk 14:15, 18 August 2010 (UTC)
- I think not an ENGVAR issue – the Concise OED doesn't identify the two variations as being so. Rjwilmsi 15:51, 18 August 2010 (UTC)
- Alright, thanks. –xenotalk 15:52, 18 August 2010 (UTC)
- So you're going to follow the guidance of a single person at wiktionary who says it's "considered more correct by most authorities" (without a reference to even a single "authority") instead of Merriam-Webster and the OED? Maybe you want to check back with that wiktionary person first.--BillFlis (talk) 01:48, 19 August 2010 (UTC)
- The full online OED lists propellor as "nonstandard". Rjwilmsi 09:44, 19 August 2010 (UTC)
- The free Oxford online dictionary says "Propeller can also be spelled propellor: both are correct, but propeller is much more common." McLerristarr | Mclay1 11:09, 19 August 2010 (UTC)
- The full online OED lists propellor as "nonstandard". Rjwilmsi 09:44, 19 August 2010 (UTC)
- So you're going to follow the guidance of a single person at wiktionary who says it's "considered more correct by most authorities" (without a reference to even a single "authority") instead of Merriam-Webster and the OED? Maybe you want to check back with that wiktionary person first.--BillFlis (talk) 01:48, 19 August 2010 (UTC)
- Alright, thanks. –xenotalk 15:52, 18 August 2010 (UTC)
- I think not an ENGVAR issue – the Concise OED doesn't identify the two variations as being so. Rjwilmsi 15:51, 18 August 2010 (UTC)
- Are we sure it's not an WP:ENGVAR issue? –xenotalk 14:15, 18 August 2010 (UTC)
- Wiktionary says propeller is 'more correct'. On that basis I'd say it's fair to add it. Rjwilmsi 11:25, 18 August 2010 (UTC)
masturbatch
The "masturbate"-rule,<Typo word="Masturbate" find="\b(M|m)asterbat(\w+)\b" replace="$1asturbat$2" /> , tried to change masterbatch to masturbatch. I found "masterbatch" on five pages. Is that enough to add an exception? I'm not quite sure how to do that myself.--ospalh (talk) 11:57, 20 August 2010 (UTC)
- Fixed.--BillFlis (talk) 13:38, 20 August 2010 (UTC)
Commemorate
"<Typo word="Commemorate" find="\b(C|c)ommerat(es|ed|ing|ions?)\b" replace="$1ommemorat$2" /> ": Is "commerates" &c. really the most common misspelling? I thought things like "comemorate" (one m before e) or "comemerate" (e instead of o) would be more common. "<Typo word="Commemorate" find="\b(C|c)om{1,2}e(?:mo|me)?rat(e|es|ed|ing|ions?)\b" replace="$1ommemorat$2" />" would find all of these, but would also change "comerates" to "commemorates". "Comerates" is a bit too close to "Comrades" for my taste. So, "<Typo word="Commemorate" find="\b(C|c)om{1,2}e(?:mo|me)rat(e|es|ed|ing|ions?)\b" replace="$1ommemorat$2" />" would fix "comemorate" and "commemerate", but not "comerates". Any thoughts?--ospalh (talk) 11:52, 25 August 2010 (UTC)
- (Note to self: research before you type) Looks like a) "commerates" etc. is somewhat common, but b) there seems to be an actor called "Sheridan Comerate", so 'find="\b(C|c)om{1,2}e(?:mo|me)?rat(e|es|ed|ing|ions?)\b"' would give some false positives and 'find="\b(C|c)om{1,2}e(?:mo|me)rat(e|es|ed|ing|ions?)\b"' would miss some misspellings.--ospalh (talk) 12:01, 25 August 2010 (UTC)
- We can use a lookbhehind to specifically exclude "Comerate", so what then is the best rule? Rjwilmsi 07:06, 26 August 2010 (UTC)
Double superlatives
<Typo word="Most -liest" find="\b[Mm]ost\s+(\w+)liest\b" replace="$1liest" />
Is this worth it? The most common matches seem to be "most earliest", "most holiest", and "most costliest" (not necessarily in that order). PleaseStand (talk) 19:15, 25 August 2010 (UTC)
- Is there a point to deleting it? Unless there is an exception to the rule, I don't see a reason not to include something. McLerristarr / Mclay1 13:50, 26 August 2010 (UTC)
- As far as I know, the typo rule does not exist yet. My question is whether it is worth adding. PleaseStand (talk) 17:41, 26 August 2010 (UTC)
- I don't know how common it is, but I think that kind of fix is legitimate subject matter for the typo fixes. Shadowjams (talk) 19:47, 26 August 2010 (UTC)
- Ah, I see. I thought you meant is it worth keeping, as in you wanted to delete it. My mistake. One of the many problems of communicating by text. McLerristarr / Mclay1 09:28, 27 August 2010 (UTC)
- It's not always going to work as intended: When "Most" is capitalized, the adjective after correction will not be (will remain as it was). I would leave out the "M"; the error will probably be preceded by "the" anyway.--BillFlis (talk) 13:10, 27 August 2010 (UTC)
- You could remove the "l" and catch things like "most greediest" too. Shadowjams (talk) 00:35, 28 August 2010 (UTC)
- It's not always going to work as intended: When "Most" is capitalized, the adjective after correction will not be (will remain as it was). I would leave out the "M"; the error will probably be preceded by "the" anyway.--BillFlis (talk) 13:10, 27 August 2010 (UTC)
- As far as I know, the typo rule does not exist yet. My question is whether it is worth adding. PleaseStand (talk) 17:41, 26 August 2010 (UTC)
<Typo word="Most -liest" find="\b[Mm]ost\s+(\w)(\w*)iest\b" replace="$1iest" />
Melbourne
Can somebody please correct 'Melbounre' to 'Melbourne'? McLerristarr / Mclay1 06:31, 24 August 2010 (UTC)
- I'll let others opine on if there's some risk of a false positive, but this should do it: <find="\b(M|m)elbo(rn|unr)e\b" replace="$elbourne" />. It should catch "Melborne" and "Melbounre" and will capitalize any lower case versions. Shadowjams (talk) 07:01, 24 August 2010 (UTC)
- Not safe to correct missing 'u' due to Melborne Camp and Melborne surname. Rjwilmsi 12:41, 28 August 2010 (UTC)
- Done Fixed version done here. Shadowjams (talk) 20:12, 28 August 2010 (UTC)
- Not safe to correct missing 'u' due to Melborne Camp and Melborne surname. Rjwilmsi 12:41, 28 August 2010 (UTC)
Phenonema --> Phenomena
Could someone please add the plural form of Phenomenon? It should be "Phenomena" but a very common misspelling is "Phenonema", with only two letters, the n and the m, switched around, making it very hard to spot. There's also a fairly large amount of search results in Wikipedia for this misspelling. I checked the current entry for "Phenomenon" in the list, and I do believe it does not take into account this particular misspelling of the plural form. -- Ϫ 23:14, 29 August 2010 (UTC)
- Done BillFlis updated the rules. Rjwilmsi 08:05, 30 August 2010 (UTC)
Do we want to hide italics from typo fixing?
For a feature request I added the capability for AWB to hide text in italics as part of its HideMore()
function ('Ignore templates, refs, link targets...'). Do we want hiding of italics on or off for typos? We already hide untemplated quotes (text between " and related curly quotes). Rjwilmsi 09:01, 30 August 2010 (UTC)
- Sometimes we use italics to emphasise a word or a sentence. Italics are used for many reasons. Typo fixing should apply inside italics exactly the same way it applies outside them. -- Magioladitis (talk) 09:03, 30 August 2010 (UTC)
- Was the original concern over foreign and proper terms (like book/movie titles) or is there something else I'm not thinking of? Shadowjams (talk) 18:23, 30 August 2010 (UTC)
- Italics hiding was added for a feature request. We now have the option to apply it for typo fixing or not. Rjwilmsi 08:11, 31 August 2010 (UTC)
- I see. I tend to agree with Magioladitis on this point, there're a lot of these that fit within typo territory, but perhaps it cuts down on false positives. Just something to be aware of, it's obviously not an ideological issue. Shadowjams (talk) 08:51, 31 August 2010 (UTC)
- Italics hiding was added for a feature request. We now have the option to apply it for typo fixing or not. Rjwilmsi 08:11, 31 August 2010 (UTC)
Catepillar → Caterpillar
Could someone please update the entry for Caterpillar to also fix the incorrect "Catepillar" (missing the first "r")? GoingBatty (talk) 03:55, 1 September 2010 (UTC)
Apostrophe fix contested
I changed series's to series' using AWB. It was subsequently reverted[41]. Does the rule need to be removed or edited? -- JHunterJ (talk) 12:13, 5 September 2010 (UTC)
- I think the rule, and your fix, is correct, since the phrase is going to be pronounced "the seeriz antagonist", not "the seeriziz antagonist". The advice at Apostrophe#Singular nouns ending with an “s” or “z” sound is not at all clear, though. -- John of Reading (talk) 13:24, 5 September 2010 (UTC)
- The guideline is laid out here: Wikipedia:APOSTROPHE#Possessives. If you pronounce "series'[s] antagonist" as "sireez antagonist", then Wikipedia says not to use the additional s. On the other hand, it says if there are two possible pronunciations, you can use either. I definitely pronounce the phrase "series's antagonist" as "sireeziz antagonist". — the Man in Question (in question) 17:07, 5 September 2010 (UTC)
- If that's the guideline then the rule should be removed. It was added by Mboverload (talk · contribs) on 4th August 2008 apparently without any discussion on this talk page. I've pinged that user's talk page. -- John of Reading (talk) 21:01, 5 September 2010 (UTC)
- I've removed the rule. Per the guidelines on apostrophes, both versions are potentially correct, as long as usage is consistent (with the 's, without the 's, or with the 's if pronounced as iz) on a given article. -- JHunterJ (talk) 11:29, 6 September 2010 (UTC)
- If that's the guideline then the rule should be removed. It was added by Mboverload (talk · contribs) on 4th August 2008 apparently without any discussion on this talk page. I've pinged that user's talk page. -- John of Reading (talk) 21:01, 5 September 2010 (UTC)
- The guideline is laid out here: Wikipedia:APOSTROPHE#Possessives. If you pronounce "series'[s] antagonist" as "sireez antagonist", then Wikipedia says not to use the additional s. On the other hand, it says if there are two possible pronunciations, you can use either. I definitely pronounce the phrase "series's antagonist" as "sireeziz antagonist". — the Man in Question (in question) 17:07, 5 September 2010 (UTC)
specail -> special
Manually fixed one here. Regards, SunCreator (talk) 18:33, 5 September 2010 (UTC)
Besancon
A think a false positive here, AWB changes Besancon -> Besançon, but there is a place in France called Besançon and one in New Haven, Indiana called Besancon. Regards, SunCreator (talk) 21:44, 5 September 2010 (UTC)
Retropective → Retrospectiv
This edit changed Retropective → Retrospectiv instead of Retrospective. I've manually fixed this article, but could someone please update the rule? Thanks! GoingBatty (talk) 05:17, 12 September 2010 (UTC)
- Fixed.--BillFlis (talk) 06:20, 12 September 2010 (UTC)
- Thanks BillFlis - I didn't find the rule under the "R" section - should have looked under the new additions section too. GoingBatty (talk) 06:38, 12 September 2010 (UTC)
heavily, 2nd try
WB tried to replace "heaively" with "heaively", but it should've been "heavily". Please fix. --bender235 (talk) 20:22, 3 July 2010 (UTC) (—bender235 (talk) 00:50, 13 September 2010 (UTC))
- I can't find the rule that would make such a change, and I can't find any instances of "heaively" (or "heaivly", which seems more likely) in wikipedia. It looks like it's no longer a problem.--BillFlis (talk) 11:19, 13 September 2010 (UTC)
- Either bender's original post has a typo, or it's replacing "heaively" with itself, which I too can't find a rule that would do. Perhaps you meant it was replacing "heavily" with "heaiviley", which would make sense given this rule: <Typo word="-ively" find="\b(\w+)ivly\b" replace="$1ively" />. Before changing that, beware that "ively" is an equally, if not more, common version of that ending. Anyone have ideas about how to distinguish which ending is right based on the base? Shadowjams (talk) 17:40, 13 September 2010 (UTC)
Alternation vs. character classes
Hall with Schwartz calls using alternation (A|a) instead of character class [Aa] a "classic mistake" in Effective Perl Programming, and that it takes a speed penalty, perhaps on the order of 4x. Maybe the processing here has gotten smarter since then, and it does save characters when capturing, (A|a) instead of ([Aa]), but we may still want to change it back. -- JHunterJ (talk) 19:25, 13 September 2010 (UTC)
- I'll investigate what difference, if any, there is for AWB/C#. Rjwilmsi 20:31, 13 September 2010 (UTC)
- ISBN 0596528124 page 237 has a benchmark for .NET that lists character classes as being 4.7x faster. I don't know how old that is... but worth considering. There are probably other optimizations like this as well. Shadowjams (talk) 00:40, 14 September 2010 (UTC)
- VB.NET, we use C#: I profiled 1000 replace operations for "\b(R|r)ec(?:ie|ei?)pient(s?)\b" and "\b([Rr])ec(?:ie|ei?)pient(s?)\b" (details on request) and the numbers were 13463 and 12860 ms respectively i.e. around a 5% difference only. So I conclude there's not much difference for C#. We cannot take a 4x or 5x difference in another language and assume it applies for ours. Rjwilmsi 20:54, 14 September 2010 (UTC)
- ISBN 0596528124 page 237 has a benchmark for .NET that lists character classes as being 4.7x faster. I don't know how old that is... but worth considering. There are probably other optimizations like this as well. Shadowjams (talk) 00:40, 14 September 2010 (UTC)
Opiod --> Opioid
Very common misspelling, hard to spot. Please add, thanks. -- Ϫ 07:16, 14 September 2010 (UTC)
- Wow that is common. Added a rule here. I looked around in a few dictionaries thinking it might be an alternative spelling just based on how common it is, but I couldn't find anything. Done Shadowjams (talk) 15:28, 14 September 2010 (UTC)
Sargent's cypress
I had typo fixing switched on. It made this error. It is a false positive for Sargent's cypress or Sargent cypress Regards Lightmouse (talk) 09:56, 14 September 2010 (UTC)
- Not done Only an error as the article incorrectly had the word in lower case. Rjwilmsi 21:06, 14 September 2010 (UTC)
Thanks for investigating it. Lightmouse (talk) 21:47, 14 September 2010 (UTC)
km/kg corrections OK, but summary incorrect
This edit correctly changed "67 Kg" and "800 Km" to "67 kg" and "800 km". However, the edit summary reads (Typo fixing, typos fixed: 7 Kg → 7 kg (2) using AWB).
Anyone want to try updating the rule to make the edit summary better? Thanks! GoingBatty (talk) 04:49, 14 September 2010 (UTC)
- One could make the summary more accurate by putting a quantifier (+ in this case) on the \d in the rule, but that would increase the time (infinitesimally, albeit) the regex runs across every page scanned. It probably doesn't matter either way; if you want to put it in there that's how one would do it. Shadowjams (talk) 05:48, 14 September 2010 (UTC)
- Actually, on second look, that's not a Typo rule, that's a built-in program rule. I'm guessing that internal rule uses regex too though, so the same applies. Shadowjams (talk) 05:51, 14 September 2010 (UTC)
- Typo rule is for Kg to kg (case conversion). Rjwilmsi 07:22, 14 September 2010 (UTC)
- I see now. Shadowjams (talk) 16:45, 14 September 2010 (UTC)
- So should I move this from this talk page to a bug report? GoingBatty (talk) 16:34, 14 September 2010 (UTC)
- No, it is a typo issue. My second point was wrong (Rjwilmsi was correcting me). I was confused because I was looking for a rule that would add   to the output, and there isn't a rule that did that (that part is internal). However, there is a rule that did the capitalization, and updating that, would fix the OP's issue. It's this one: <Typo word="kg/km (kilogram/kilometer)" find="(\d(?:\s| |-)?)K(g|m)\b" replace="$1k$2" />.
- Typo rule is for Kg to kg (case conversion). Rjwilmsi 07:22, 14 September 2010 (UTC)
- Actually, on second look, that's not a Typo rule, that's a built-in program rule. I'm guessing that internal rule uses regex too though, so the same applies. Shadowjams (talk) 05:51, 14 September 2010 (UTC)
- Change it to <Typo word="kg/km (kilogram/kilometer)" find="(\d+(?:\s| |-)?)K(g|m)\b" replace="$1k$2" /> and you've fixed the issue (see above for speed considerations). Shadowjams (talk) 16:45, 14 September 2010 (UTC)
- All of the rules have been updated with the +. Now I see in this edit that AWB accurately changed "16KHZ" → "16 kHz", but the edit summary says: (Typo fixing, typos fixed: 16KHZ → 16kHz using AWB) (without the space) GoingBatty (talk) 03:27, 17 September 2010 (UTC)
- Also this edit changed "710 KHz" and "970 KHz" to "710 kHz" and "970 kHz", but the edit summary is (Typo fixing, typos fixed: 710 KHz → 710 kHz (2) using AWB) GoingBatty (talk) 03:53, 17 September 2010 (UTC)
- Change it to <Typo word="kg/km (kilogram/kilometer)" find="(\d+(?:\s| |-)?)K(g|m)\b" replace="$1k$2" /> and you've fixed the issue (see above for speed considerations). Shadowjams (talk) 16:45, 14 September 2010 (UTC)
Supress --> Suppress
Another very common misspelling (over 2000 search results!) Including supressed/supressing/supression and whatever other prefixes there are. I'm surprised this one wasn't in there already..
Actually I did find "(Immuno)Suppress" in the list, but that doesn't seem correct.. it's already got the double-p, so maybe that's just a mistake? or what, but I don't know if maybe the (Immuno) part is affecting the detection somehow too.
Opress --> Oppress is another one we could add, that one is a bit less common but still coming up in search results. Except that the search results come up with the false positive "of-press" for some reason, which is slightly annoying, but I don't think that would affect AWB's typo detection anyway. -- Ϫ 22:50, 15 September 2010 (UTC)
- The existing "(Immuno)Suppress" rule already covers all of the suppress variations you've listed. Rule expanded for oppress too. Rjwilmsi 09:32, 16 September 2010 (UTC)
- Oh! okay. These regexes still confuse me. :) But, is it normal for there to still be so many existing misspellings? I thought that once a typo gets added to the list they usually all get fixed pretty quickly.. Is it just that noone has patrolled these articles yet with AWB? -- Ϫ 17:05, 16 September 2010 (UTC)
- The WP:TYPOSCAN project should go through these regularly but it's waiting for new data at the moment. Rjwilmsi 17:10, 16 September 2010 (UTC)
- Oh! okay. These regexes still confuse me. :) But, is it normal for there to still be so many existing misspellings? I thought that once a typo gets added to the list they usually all get fixed pretty quickly.. Is it just that noone has patrolled these articles yet with AWB? -- Ϫ 17:05, 16 September 2010 (UTC)
achitecture → architecture
Could someone please update the existing entry for "architecture" so it also catches "achitecture"? Thanks! GoingBatty (talk) 01:53, 17 September 2010 (UTC)
- I modified the rule for "Architect" to catch this.--BillFlis (talk) 08:55, 18 September 2010 (UTC)
Inconsistent use of formats such as '(C|c)' and '[Cc]'. Propose change all to '[Cc]'
The list is inconsistent in whether the regex uses '(C|c)' or '[Cc]'. I propose running a changing them all to the format '[Cc]'. It's trivial but using the same format makes it slightly easier to notice the real differences. Any objections? Lightmouse (talk) 15:15, 17 September 2010 (UTC)
- They are not equivalent. "(C|c)" is equivalent to "([Cc])". Also, I know there was some discussion about speed, but a more important consideration might be space. This page is already huge, and changing every instance of this would add another character to each of the affected rules, which is the large majority of them.--BillFlis (talk) 18:54, 17 September 2010 (UTC)
You're quite right, the pairings are '(C|c)' with '([Cc])', or '(?:C|c)' with '[Cc]'. I agree that compact code is a good thing. I'll leave it to you. Incidentally, I'm sure there are more units of measure that would be useful, also I only see one square unit of length and there could be cubes too. Lightmouse (talk) 20:23, 17 September 2010 (UTC)
- Bill sums up the issue exactly. I can see positives to both. In some ways I think ([Cc]) is conceptually clearer, but that's a personal preference. I made the changes to all of the New additions thinking the speed tradeoff was more important than later testing demonstrated. There is 1 character difference between the two; I don't see any reason to prefer one over the other. I think it's best to leave them as they're originally created, with whatever idiom the creator chooses. Shadowjams (talk) 21:58, 17 September 2010 (UTC)
Units of measure
There is km². Would it also be possible to do km³, m², m³, ft², ft³ ? Lightmouse (talk) 08:50, 18 September 2010 (UTC)
etc... → etc.
Could the Etc. rule be changed so that it would also remove extra periods? (e.g. change "etc..." → to "etc.") Thanks! GoingBatty (talk) 02:44, 17 September 2010 (UTC)
- I think this should do it. Shadowjams (talk) 03:20, 17 September 2010 (UTC) Done
- I think you're on the right track. According to the AWB Regex Tester, that will fix "ect...." (which is great), but not "etc....." GoingBatty (talk)
- Ah. That makes sense. Ok, one more try.... Shadowjams (talk) 04:44, 17 September 2010 (UTC)
- See if that did it. Shadowjams (talk) 04:46, 17 September 2010 (UTC)
- Sorry - tried the AWB Regex Tester, and it still doesn't fix "etc...." or "etc" (with no periods) GoingBatty (talk) 16:23, 17 September 2010 (UTC)
- I took another look. What it's doing is it's looking for anything with an "Etc" followed by something that's not either a period or a word character (0-9,a-z). In the case of "etc....." it's skipping it because there's already a period, and not looking at the rest. This is intentional for two reasons. One, it terminates the search early on correct matches (which are the majority) and saves processing time, and second, it allows for unanticipated but correct uses, like an ellipsis. It not fixing "etc" is related... because there's nothing following the c, it doesn't catch. However, in a real article etc won't be alone. It will be followed by something: "etc more words". This sometimes comes up in testing. We try to design rules so they don't catch on correct spellings (even if they correct them back to themselves) because I assume they take more processing (they run entirely, as opposed to stopping midway through). Maybe that's unnecessary, but most of the rules adhere to that format. Shadowjams (talk) 22:10, 17 September 2010 (UTC)
- I appreciate your reply. I made this request because I thought that "etc." plus an ellipsis was not a correct use. Why would an ellipsis be necessary? Thanks! GoingBatty (talk) 15:26, 19 September 2010 (UTC)
- That's a good point. I tended towards the cautious with some of these when I started, and I added the etc. rule that's currently in use (although there was a simpler one earlier) earlier on. I think the change you're talking about would be fine. Shadowjams (talk) 05:12, 20 September 2010 (UTC)
- Thanks Shadowjams. I was playing around with how to edit the rule to fix "etc....", but couldn't get it to skip "etc." Could you please help me with this? Thanks! GoingBatty (talk) 17:07, 20 September 2010 (UTC)
- That's a good point. I tended towards the cautious with some of these when I started, and I added the etc. rule that's currently in use (although there was a simpler one earlier) earlier on. I think the change you're talking about would be fine. Shadowjams (talk) 05:12, 20 September 2010 (UTC)
- I appreciate your reply. I made this request because I thought that "etc." plus an ellipsis was not a correct use. Why would an ellipsis be necessary? Thanks! GoingBatty (talk) 15:26, 19 September 2010 (UTC)
- I took another look. What it's doing is it's looking for anything with an "Etc" followed by something that's not either a period or a word character (0-9,a-z). In the case of "etc....." it's skipping it because there's already a period, and not looking at the rest. This is intentional for two reasons. One, it terminates the search early on correct matches (which are the majority) and saves processing time, and second, it allows for unanticipated but correct uses, like an ellipsis. It not fixing "etc" is related... because there's nothing following the c, it doesn't catch. However, in a real article etc won't be alone. It will be followed by something: "etc more words". This sometimes comes up in testing. We try to design rules so they don't catch on correct spellings (even if they correct them back to themselves) because I assume they take more processing (they run entirely, as opposed to stopping midway through). Maybe that's unnecessary, but most of the rules adhere to that format. Shadowjams (talk) 22:10, 17 September 2010 (UTC)
- Sorry - tried the AWB Regex Tester, and it still doesn't fix "etc...." or "etc" (with no periods) GoingBatty (talk) 16:23, 17 September 2010 (UTC)
- I think you're on the right track. According to the AWB Regex Tester, that will fix "ect...." (which is great), but not "etc....." GoingBatty (talk)
Should regex be using an escape character.
I notice that square kilometre contains:
[-.\s]
Should it be:
[-\.\s]
Regards Lightmouse (talk) 16:46, 19 September 2010 (UTC)
- I don't think you need to escape charters inside character classes (says as much). Shadowjams (talk) 21:01, 19 September 2010 (UTC)
- There's another problem with that though. The - needs to be at the end of the class, otherwise it's looking for a range. I'm not sure what it does in that case, but it might explain any strange effects you're seeing. Shadowjams (talk) 21:02, 19 September 2010 (UTC)
- No, a hyphen immediately after a "[" counts as a literal hyphen. [42] -- John of Reading (talk) 06:13, 20 September 2010 (UTC)
- Interesting. That's actually a little new... it doesn't work with grep for instance. Perl calls this version 8 regex (I think). Apparently - at either the beginning or end is fine, but in the middle, of course, it's ambiguous. Shadowjams (talk) 06:17, 20 September 2010 (UTC)
- No, a hyphen immediately after a "[" counts as a literal hyphen. [42] -- John of Reading (talk) 06:13, 20 September 2010 (UTC)
Aha - "the dot is not a metacharacter inside a character class, so we do not need to escape it with a backslash.". Very interesting, thanks. Lightmouse (talk) 17:15, 20 September 2010 (UTC)
Not fixing "hungarian" ?
Although there's an existing rule for "Hungary" that includes "Hungarian", it doesn't want to fix "hungarian" and "hungarians" in Culture of Hungary. When I tried the rule in the AWB Regex tester, it seems to work fine. Any ideas? GoingBatty (talk) 04:22, 20 September 2010 (UTC)
- Typo fixing rules are not applied when a wikilink target also matches on the typo rule in order to avoid false positives on uncommon names etc. In this case there's an image linked in the article with a lowercase 'hungarian' in the file name, hence the typo fix is not applied. From looking at the Commons:File Renaming page it would appear that asking for the file to be renamed might be refused. I've now applied the typo fixing to the article. Feel free to try to get the image renamed. Rjwilmsi 16:29, 23 September 2010 (UTC)
- Thanks for the explanation - having an example makes it more clear than the manual, but I'll try to be more diligent about reading the manual first. GoingBatty (talk) 01:53, 24 September 2010 (UTC)
criticized
AWB replaced "critiziced" with "criticiziced" here, but it should have been "criticized". Please fix. —bender235 (talk) 14:07, 23 September 2010 (UTC)
- I limited the rule for "Critical", which was evidently making this change, to not make this particular change. We'll need a new rule to correct "critiziced" to "criticized", which I was surprised to find has more than a dozen occurrences on wikipedia.--BillFlis (talk) 16:22, 23 September 2010 (UTC)