User:PerfektesChaos/js/stringLib
JavaScript function library of utilities to analyze and manipulate strings.
Usage
[edit]Import
[edit]- Include the following lines into your common.js:
mw.loader.load("https://wiki.riteme.site/w/index.php?title=User:PerfektesChaos/js/stringLib/r.js&action=raw&bcache=&maxage=604800&ctype=text/javascript",
"text/javascript");
- This works also with non-WMF sites using MediaWiki.
- Actually, it might be used anywhere since it is not depending from anything. It works even out of browsers. Please respect license statements: CC-BY-SA and GNU FDL.
Activation
[edit]- In MediaWiki context the library establishes itself as
PerfektesChaos_stringLib
component of themw.libs
collection. - Otherwise, within interactive environment the library is built on-the-fly under
window.mw.libs
component. - Under non-interactive circumstances
mw.libs
is put into global object withoutvar
declaration.
After loading, you are supposed to integrate the library into your own application object by calling something like
yourAppObj.str = mw.libs.PerfektesChaos_stringLib
Within yourAppObj
the functions can be referred now as
yourAppObj.str.
fff()
MediaWiki environment
[edit]In Mediawiki environment the following hook function may be declared:
mw.hook( "PerfektesChaos_stringLib.ready" ).add( callback );
That callback function (e.g. myTask
) triggers the actual functionality of the user application. It is called as soon loading was successfully completed.
function myTask( application )
may use one parameter. That is the application object for the library. It is supposed to be mapped into mw.libs.PerfektesChaos_stringLib
also. It might be mapped by yourAppObj.str = application
into your environment.
Codes
[edit]Source code |
|
ResourceLoader |
|
mw.libs
|
PerfektesChaos_stringLib
|
mw.hook
|
PerfektesChaos_stringLib.ready
|
Documentation
[edit]yourAppObj
is represented by leading dot here.
The following items may be overwritten by user, defaulting to false
:
.charEnt5single
[edit].charEnt5single
HTML 5 single character entities
object
.locateEntities
[edit].locateEntities
User option: Expect any HTML entity
true
or false
.sortLang
[edit].sortLang
User option: Language used for sorting
string or false
.sortMode
[edit].sortMode
User option: Special sorting mode
string like 'de-DIN31638' or false
.spaces
[edit].spaces
Various types of spaces
string with all spaces except ASCII
.sticks
[edit].sticks
Various types of horizontal dashes and lines
string with all dashes including ASCII hyphen-minus in front
.camelCasing()
[edit].camelCasing(alter)
Upcase first character, keep inner camelCasing
- Parameters
alter
– string to be camelCased- Returns
- camelCased string
.capitalize()
[edit].capitalize(alter)
Upcase first character, downcase anything else
- Parameters
alter
– string to be capitalized- Returns
- capitalized string
.charEntity()
[edit].charEntity(adjust)
Retrieve character code (UCS) for named HTML4 or numeric entity
- Parameters
adjust
– string to be examined- Returns
- information about character
false
if not resolvednumber
UCS code of single character - Since
- JavaScript 1.3 String.charCodeAt()
.charEntityAt()
[edit].charEntityAt(adjust, address, advance)
Retrieve character code of ML entity at position
- Parameters
adjust
– string to be examinedaddress
– position in adjustadvance
– true: '&' at address; false: ';' at address- Returns
- Array with entity information, or
false
[0]
code value[1]
entity position[2]
length of entity - Since
- JavaScript 1.3 String.charCodeAt()
.charEntityCode()
[edit].charEntityCode(adjust)
Retrieve character code (UCS) for numeric ML entity
- Parameters
adjust
– string with character entity like "&#xHH;" or "&#NN;"
first two characters are assumed to be '&#'
third character may be 'x' or digit
last character is assumed to be ';'- Returns
- information about character
false
if not resolvednumber
UCS code of single character - Since
- JavaScript 1.3 String.charCodeAt()
.charEntityHTML4()
[edit].charEntityHTML4(adjust)
Retrieve character code (UCS) for named HTML4 (or similar) entity
- deprecated – replaced by #.charEntityHTML5single()
- Parameters
adjust
– string with character named entity "&xyz;"
first character is assumed to be '&'
last character is assumed to be ';'- Returns
- information about character
false
if not resolvednumber
UCS code of single character
.charEntityHTML5single()
[edit].charEntityHTML5single(adjust)
Retrieve single character code (UCS) for named HTML5 entity
- Parameters
adjust
– string with character named entity "&xyz;"
first character is assumed to be '&'
last character is assumed to be ';'- Returns
- information about character
false
if not resolvednumber
UCS code of single character
.deCapitalize()
[edit].deCapitalize(alter)
Downcase first character; keep inner camelCasing
- Parameters
alter
– string to be decapitalized- Returns
- decapitalized string
.decodeOctet()
[edit].decodeOctet(assembly, address)
Retrieve hexadecimal value of octet similar to parseInt() base 16 but consider uppercase A-F only
- Parameters
assembly
– string to be analyzedaddress
– index in string- Returns
- parsed number 0...15, or -1 if invalid
- Since
- JavaScript 1.3 String.charCodeAt()
.decodeXML()
[edit].decodeXML(alter)
Convert string with XML entities as unescaped string
- Parameters
alter
– string to be analyzed- Returns
- string, may be unchanged
- Since
- JavaScript 1.3 String.charCodeAt() String.fromCharCode()
.escapeLight()
[edit].escapeLight(alter)
Minimal escaping for HTML
- Parameters
alter
– string to be escaped- Returns
- string with escaping
.fromCharCode()
[edit].fromCharCode(apply)
Extended fromCharCode for UCS > 0xFFFF (4 bytes/char)
- Parameters
apply
– number, UCS- Returns
- single character, which might have a string length of 2 instead of 1
- Since
- JavaScript 1.3 String.fromCharCode() 2 byte chars only
.fromNum()
[edit].fromNum(adjust)
Format number as string
- Parameters
adjust
– number to be formatted- Returns
- adjust as string
.hexcode()
[edit].hexcode(amount, align, allow)
Retrieve hexadecimal representation
- Parameters
amount
– number: decimalalign
– left padded number of digits, or falseallow
– true: use lowercase letters- Returns
- string with hex number
.isASCII()
[edit].isASCII(ask)
Test for ASCII only characters
- Parameters
ask
– string to be examined- Returns
true
iff ask consists of ASCII characters only
.isBlank()
[edit].isBlank(ask, any)
Test for invisible character
- Parameters
ask
– character code to be examinedany
–true
: include zero width and marks- Returns
true
iff ask is any space or other invisible character code
.isLetter()
[edit].isLetter(ask)
Test whether a character is a letter (latin based, greek, cyrillic)
- Parameters
ask
– character code to be examined, or string (first char)- Returns
true
iff ask is identified as any kind of letter- Since
- JavaScript 1.3 String.charCodeAt()
.isWhiteBlank()
[edit].isWhiteBlank(ask, any, against)
Test for invisible character or newline
- Parameters
ask
– character code to be examinedany
–true
: include zero width and direction marksagainst
–true
: behave like .isBlank()- Returns
true
iff ask is any whitespace or other invisible- See
- .isBlank()
.makeString()
[edit].makeString(apply, amount)
Return string of certain length with repeated character
- Parameters
apply
– character code to be setamount
– number of repeated characters apply- Returns
- new string
.parseIntNumber()
[edit].parseIntNumber(apply, assign)
Parse integer number string, but do not return NaN
- Parameters
apply
– string to be manipulated, or undefinedassign
– number base: 10 or 16; if false detect leading 'x'- Returns
- number, 0 if not caught
- Since
- JavaScript 1.3 String.charCodeAt()
.setChar()
[edit].setChar(array, apply, address)
Set character or string at certain string position
- Parameters
array
– string to be manipulatedapply
– character code or string to be setaddress
– single character position to be replaced- Returns
- modified string
- Since
- JavaScript 1.3 String.fromCharCode() One day direct array[i] setting might work in a JavaScript String.
.setString()
[edit].setString(array, address, adjust, apply)
Modify string in certain range
- Parameters
array
– string to be manipulatedaddress
– character position to start replacementadjust
– range specification number of characters to be removed at address string (adjust.length is used as number)apply
– string to replace range- Returns
- modified string
- Since
- JavaScript 1.3 String.fromCharCode()
.sortAppropriate()
[edit].sortAppropriate(adjust)
Retrieve sortable character(s) in particular local environment (hook)
(RegExp is not modified)
- Parameters
adjust
– character code of a single character
196 * Ä
197 * Å
198 * Æ *always*
228 * ä
229 * å
230 * æ *always*
208 * Ð
272 * Dstroke
240 * ð
273 * dstroke
568 * db digraph *always*
452 * D with Z caron *always*
497 * D with Z *always*
453 * D with z caron *always*
498 * D with z *always*
454 * d with z caron *always*
499 * d with z *always*
455 * L with J *always*
456 * L with j *always*
457 * l with j *always*
458 * N with J *always*
459 * N with j *always*
460 * n with j *always*
214 * Ö
246 * ö
338 * OElig *always*
339 * oelig *always*
546 * OU *always*
547 * ou *always*
569 * qp digraph *always*
223 * ß *always*
7838 * capital sharp S *always*
222 * Þ *always*
254 * þ *always*
220 * Ü
252 * ü- Returns
- information about sortable character
false
no particular local requesttrue
remove character from sort keynumber
with ASCII code of single characterstring
of two ASCII characters, (first) character case will be kept, second char (if any) downcase. - See
- .sortLang
.sortChar()
[edit].sortChar(adjust)
Retrieve sortable character(s) for non-ASCII Latin based Unicode
(RegExp is not modified)
- Parameters
adjust
– character code of a single character
(expectingadjust
from 160 up)- Returns
- information about sortable character
false
if nothing to dotrue
remove character from sort keynumber
with ASCII code of single characterstring
of two ASCII characters, (first) character case will be kept, second char (if any) downcase.
Only glyphs used in any (European) language considered.
.sortLocale()
[edit].sortLocale(adjust, area)
Retrieve sortcode char or string for Unicode
- Parameters
adjust
– string to be checkedarea
– language code, or falsede
German DIN 31638 (DIN 5007) requests umlaut "Ae" when sorting names of persons,- Returns
- sortable string or character
false
no particular local request
Replace by two character string for German umlauts or scandinavian "aa" for Aring. - See
- .sortMode
.sortString()
[edit].sortString(adjust, advanced)
Retrieve sortable string for non-ASCII Latin based Unicode
Trailing or multiple whitespace shrinks.
- Parameters
adjust
– string to be checked or modifiedadvanced
– optionaltrue
Replace two character string for German umlauts and scandinavian Aring.
German DIN 31638 (DIN 5007) requests umlaut "Ae" when sorting names of persons, and scandinavian languages use the same transscription as well as "aa" for aring.- Returns
- information about sortable string
false
if nothing to do, adjust is finestring
changes against adjust
Only glyphs used in any (European) language considered. - Since
- JavaScript 1.3 String.charCodeAt() String.fromCharCode()
.spaced()
[edit].spaced(adjust, any, allow)
Turn spacing charcodes of any kind into ASCII spaces, and trim
- Parameters
adjust
– string to be standardizedany
– true: remove also zero width and direction marksallow
– true: keep entities- Returns
- modified string // .isWhiteBlank() // .charEntityAt()
.substrEnd()
[edit].substrEnd(apply, amount, after)
Retrieve last characters from string like Mozilla substr(-n, n)
- Parameters
apply
– stringamount
– position counted from endafter
– optional: number of chars, if not amount- Returns
- string at end
- Since
- JavaScript 1.0 String.substr()
This function has been included for compatibility reasons. With ECMA.3, String.slice() with negative start argument will work. String.slice() with negative argument wasn’t defined in earlier JS. String.substr() with negative argument does not go with IE.
.substrExcept()
[edit].substrExcept(apply, amount)
Retrieve all but last characters from string
- Parameters
apply
– stringamount
– position counted from end- Returns
- string until near end
- See
- .substrEnd()
.terminated()
[edit].terminated(adjust, at)
Return substring terminated by separator, or entirely
- Parameters
adjust
– string to be extractedat
– string with separator to be excluded- Returns
- modified string, excluding at
.trim()
[edit].trim(adjust, any, aware, allow)
Remove heading or trailing spacing charcodes of any kind
- Parameters
adjust
– string to be trimmedany
– true: remove also zero width and direction marksaware
– true: remove also trailing line breaksallow
– true: keep entities- Returns
- modified string
.trimL()
[edit].trimL(adjust, any, aware, allow)
Return string without heading spacing charcodes of any kind
- Parameters
adjust
– string to be trimmedany
– true: remove also zero width and direction marksaware
– true: remove also line breaksallow
– true: keep entities- Since
- JavaScript 1.3 String.charCodeAt()
- See
- .locateEntities
.trimR()
[edit].trimR(adjust, any, aware, align, allow)
Return string without trailing spaces charcodes of any kind
- Parameters
adjust
– string to be trimmedany
– true: remove also zero width and direction marksaware
– true: remove also line breaksalign
– true: re-establish line breaks after trimmingallow
– true: keep entities- Since
- JavaScript 1.3 String.charCodeAt()
- See
- .locateEntities
.uniques()
[edit].uniques(adjust, against)
Return string with unique sequence of items
- Parameters
adjust
– string to be reduced, items separated by againstagainst
– string with character for separation- Returns
- string with all items in adjust, separated by
against
(no leading nor trailingagainst
)