Help:Manipulating strings

Source: Wikipedia, the free encyclopedia.

The English Wikipedia has several templates and Lua modules which can format or manipulate strings. In this context a "string" is any piece of text forming part of a page. This help page covers a few useful techniques; look in the navbox below for the full catalogue of templates.

Substrings

The simplest operation is taking a substring, a snippet of the string taken at a certain offset (called an "index") from the start or end. There are a number of legacy templates offering this (see navbox) but for new code use {{#invoke:String|sub|string|startIndex|endIndex}}. The indices are one-based (meaning the first is number one), inclusive (meaning the indices you specify are included), and may be negative to count from the other end. For example, {{#invoke:string|sub|12345678|2|-3}} → 23456. Not all the legacy substring templates use this numbering scheme, so check the documentation of unfamiliar templates.

Using existing templates

If you think that someone will have done what you want before, look in the navbox below and check. It is much easier to find and use an existing template than to write complex code to do it all in one place.

Look for a template that will do what you want all in one go. For example, rather than taking the final six characters of a string and checking if they are equal to "navbox", use {{str endswith|string|navbox}}.

Automatically trimmed whitespace

If you pass the string " abc def " (without quotes) to a template via a named or explicitly numbered parameter (like {{template|1= abc def }} the spaces on the outside will be trimmed off and will not be counted for anything the template does with that parameter. It will see the string abc def.

If you use automatically numbered parameters ({{template| abc def }}) the spaces on the outside do count, but some templates may still choose to remove them themselves.

Lua patterns (regex)

Regular expressions
(or regex) are a common and very versatile programming technique for manipulating strings. On Wikipedia you can use a limited version of regex called a Lua pattern to select and modify bits of text from a string. The pattern is a piece of code describing what you are looking for in the string. The symbols you an use in a pattern are:

  • . means any individual character. ... would mean any three characters, etc.
  • *, +, ?, and - are the quantifiers. They mean that the previous character can be repeated n times, where for each symbol n ≥ 0, n > 0, n is zero or one, and n ≥ 0 again respectively. (The difference with - is that it is "non-greedy", it matches as few symbols as possible given the rest of the pattern.)
  • ^ means the start of the string, and $ means the end.
  • [abc] means any symbol out of a, b or c, and [^abc] means anything that isn't a, b or c.
  • Preceding any of the above with a % takes away their normal meaning and makes them mean "literally" the symbol they are. Preceding anything else with a % (like %a) has a special meaning which you can check in the manual.

Putting this all together, ^[Aa]*b?c matches the first six characters of "AaAabcccc".

By wrapping part of the pattern in brackets, you can extract it, referencing it with the code %1. Example:

  • The find-replace instruction {{#invoke:string|replace|AaAabc XYZ|^([Aa]*)b?c|%1|plain=false}} gives AaAa XYZ
  • We can discard the XYZ by putting .* at the end of the search string; this picks up anything after the rest of the pattern. {{#invoke:string|replace|AaAabc XYZ|^([Aa]*)b?c.*|%1|plain=false}} gives AaAa.

StringFunctions (from ParserFunctions)

Wikipedia does not have the "StringFunctions" series of parser functions (listed below), and is not going to get them (per phab:T8455). Instead, templates use Lua (via Module:String or otherwise), alongside existing parser functions.

None of these functions will work, but they have alternatives:

Testing code

If you're not sure what some code is going to do, paste it into Special:ExpandTemplates, which will evaluate it for you to view.

See also

See navbox chart below for various string-handling templates or

parser functions
.