User:The Transhumanist/Regexes
Below are examples of regular expressions successfully used in AWB to search/replace.
Regexes used in country outlines project
[edit]- Multi-line find and replace involving addition of text after an ordinal numeral (first, second, etc.), done using these strings of RegEx in AWB:
find="(th|nd|rd|st)(\]\])(\r\n?|\n)(\* \[\[Area of \]\]:)" replace="$1 most populous country$2$3$4"
find="(th|nd|rd|st)(\]\])(\r\n?|\n)(\* \[\[:commons:Atlas of )" replace="$1 largest country$2$3$4"
- (See User talk:Robert Skyhawk/Country Outline task list#Regex tasks for full details)
Grabbing data off another page and inserting the text into country outlines
[edit]Ok here is the way I do it: there isn't really code to share it was a one off.
1. use wget to recover the data 2. Use a perl script to create a set of AWB rules (regex encapsualted in XML) 3. Insert a suitable tag into each page using the %%title%% feature of AWB or {{subst:PAGENAME}} 4. Run AWB against the pages
Note 3 and 4 can be done in one hit, although I took two passes.
Rich Farmbrough, 21:52 22 February 2009 (UTC).
Regex question
[edit](I have the regex gadget installed above the edit window).
Below is a watchlist for use with Related changes. How would I use regex to add the corresponding talk page to the end of every entry on the list?
Wikipedia:WikiProject Outline of knowledge/Watchlist using Related changes
I look forward to your reply.
The Transhumanist 21:24, 16 June 2009 (UTC)
Not sure how that works exactly but you'd want to do something that has the effect of this, where txt is the content of the edit-window:
txt = txt.replace(/\n\*\s*\[\[([^\]]+)\]\]/g, "\n*[[$1]] ([[Talk:$1|talk]])"); // or better yet if you want a bunch of other links use a template txt = txt.replace(/\n\*\s*\[\[([^\]]+)\]\]/g, "\n*{{article|$1}}");
That would work for the article pages anyway. The other stuff would be more complicated. — CharlotteWebb 21:38, 16 June 2009 (UTC)
- Hold on a sec... I think that I can do this. I've done it with watchlists before, thanks to the handy {{swl}} template. –Drilnoth (T • C • L) 22:05, 16 June 2009 (UTC)
- It looks like it's now done, but if you wanted to do it with the Regex-tab script, you could have replaced
\* \[\[([^\]\[]*)\]\]
- with
* {{swl|$1}}
- which is effectively what Drilnoth did on the page in question. The '\[' means a literal bracket character, the '\*' is a literal asterisk, the '[^\]\[]' means any character other than brackets, and the parentheses saves the match so it can be referenced later as '$1'. I hope this makes sense. Plastikspork (talk) 00:04, 17 June 2009 (UTC)
- Thank you. That helps a lot! The Transhumanist 19:08, 18 June 2009 (UTC)