Combinatorics on words
![]() | This article needs attention from an expert in mathematics or computer science. The specific problem is: missing key results in free monoids, such as Levi's lemma, Fine and Wilf's theorem, Makanin's algorithm etc..(February 2015) |
![](http://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Morse-Thue_sequence.gif/210px-Morse-Thue_sequence.gif)
Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.
Definition
Combinatorics is an area of discrete mathematics. Discrete mathematics is the study of countable structures. These objects have a definite beginning and end. The study of enumerable objects is the opposite of disciplines such as analysis, where calculus and infinite structures are studied. Combinatorics studies how to count these objects using various representations. Combinatorics on words is a recent development in this field that focuses on the study of words and formal languages. A formal language is any set of symbols and combinations of symbols that people use to communicate information.[1]
Some terminology relevant to the study of words should first be explained. First and foremost, a word is basically a sequence of symbols, or letters, in a
In addition to examining sequences in themselves, another area to consider of combinatorics on words is how they can be represented visually. In mathematics various structures are used to encode data. A common structure used in combinatorics is the tree structure. A tree structure is a graph where the vertices are connected by one line, called a path or edge. Trees may not contain cycles, and may or may not be complete. It is possible to encode a word, since a word is constructed by symbols, and encode the data by using a tree.[1] This gives a visual representation of the object.
Major contributions
The first books on combinatorics on words that summarize the origins of the subject were written by a group of mathematicians that collectively went by the name of M. Lothaire. Their first book was published in 1983, when combinatorics on words became more widespread.[1]
Patterns
Patterns within words
A main contributor to the development of combinatorics on words was
Thue wrote two papers on square-free words, the second of which was on the Thue–Morse word. Marston Morse is included in the name because he discovered the same result as Thue did, yet they worked independently. Thue also proved the existence of an overlap-free word. An overlap-free word is when, for two symbols and , the pattern does not exist within the word. He continues in his second paper to prove a relationship between infinite overlap-free words and square-free words. He takes overlap-free words that are created using two different letters, and demonstrates how they can be transformed into square-free words of three letters using substitution.[1]
As was previously described, words are studied by examining the sequences made by the symbols. Patterns are found, and they can be described mathematically. Patterns can be either avoidable patterns, or unavoidable. A significant contributor to the work of
Other contributors to the study of unavoidable patterns include
When examining unavoidable patterns sesquipowers are also studied. For some patterns ,,, a sesquipower is of the form , , , . This is another pattern such as square-free, or unavoidable patterns. Coudrain and Schützenberger mainly studied these sesquipowers for group theory applications. In addition, Zimin proved that sesquipowers are all unavoidable. Whether the entire pattern shows up, or only some piece of the sesquipower shows up repetitively, it is not possible to avoid it.[1]
Patterns within alphabets
Necklaces are constructed from words of circular sequences. They are most frequently used in music and astronomy. Flye Sainte-Marie in 1894 proved there are binary de Bruijn necklaces of length . A de Bruijn necklace contains factors made of words of length n over a certain number of letters. The words appear only once in the necklace.[1]
In 1874, Baudot developed the code that would eventually take the place of Morse code by applying the theory of binary de Bruijn necklaces. The problem continued from Sainte-Marie to Martin in 1934, who began looking at algorithms to make words of the de Bruijn structure. It was then worked on by Posthumus in 1943.[1]
Language hierarchy
Possibly the most applied result in combinatorics on words is the Chomsky hierarchy, developed by Noam Chomsky. He studied formal language in the 1950s.[2] His way of looking at language simplified the subject. He disregards the actual meaning of the word, does not consider certain factors such as frequency and context, and applies patterns of short terms to all length terms. The basic idea of Chomsky's work is to divide language into four levels, or the language hierarchy. The four levels are: regular, context-free, context-sensitive, and computably enumerable or unrestricted.[2] Regular is the least complex while computably enumerable is the most complex. While his work grew out of combinatorics on words, it drastically affected other disciplines, especially computer science.[3]
Word types
Sturmian words
Sturmian words, created by François Sturm, have roots in combinatorics on words. There exist several equivalent definitions of Sturmian words. For example, an infinite word is Sturmian if and only if it has distinct factors of length , for every non-negative integer .[1]
Lyndon word
A Lyndon word is a word over a given alphabet that is written in its simplest and most ordered form out of its respective conjugacy class. Lyndon words are important because for any given Lyndon word , there exists Lyndon words and , with , . Further, there exists a
Visual representation
Group theory
One aspect of combinatorics on words studied in group theory is reduced words. A group is constructed with words on some alphabet including
Nielsen transformations were also developed. For a set of elements of a free group, a Nielsen transformation is achieved by three transformations; replacing an element with its inverse, replacing an element with the product of itself and another element, and eliminating any element equal to 1. By applying these transformations Nielsen reduced sets are formed. A reduced set means no element can be multiplied by other elements to cancel out completely. There are also connections with Nielsen transformations with Sturmian words.[1]
Considered problems
One problem considered in the study of combinatorics on words in group theory is the following: for two elements , of a semigroup, does modulo the defining relations of and .
The
Many word problems are undecidable based on the Post correspondence problem. Any two homomorphisms with a common domain and a common codomain form an instance of the Post correspondence problem, which asks whether there exists a word in the domain such that . Post proved that this problem is undecidable; consequently, any word problem that can be reduced to this basic problem is likewise undecidable.[1]
Other applications
Combinatorics on words have applications on
See also
- Fibonacci word
- Kolakoski sequence
- Levi's lemma
- Partial word
- Shift space
- Word metric
- Word problem (computability)
- Word problem (mathematics)
- Word problem for groups
- Young–Fibonacci lattice
References
Further reading
- The origins of combinatorics on words, Jean Berstel, Dominique Perrin, European Journal of Combinatorics 28 (2007) 996–1022
- Combinatorics on words – a tutorial, Jean Berstel and Juhani Karhumäki. Bull. Eur. Assoc. Theor. Comput. Sci. EATCS, 79:178–228, 2003.
- Combinatorics on Words: A New Challenging Topic, Juhani Karhumäki.
- Choffrut, Christian; Karhumäki, Juhani (1997). "Combinatorics of words". In Rozenberg, Grzegorz; Salomaa, Arto (eds.). Handbook of formal languages. Vol. 1. Springer. ISBN 978-3-540-60420-4.
- Zbl 0514.20045
- Zbl 1001.68093
- "Infinite words: automata, semigroups, logic and games", Dominique Perrin, Jean Éric Pin, Academic Press, 2004, ISBN 978-0-12-532111-2.
- Zbl 1133.68067
- "ISBN 978-1-4200-6092-8.
- Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009), Combinatorics on words. Christoffel words and repetitions in words, CRM Monograph Series, vol. 27, Providence, RI: Zbl 1161.68043
- "Combinatorics of Compositions and Words", ISBN 978-1-4200-7267-9.
- "Word equations and related topics: 1st international workshop, IWWERT '90, Tübingen, Germany, October 1–3, 1990 : proceedings", Editor: Klaus Ulrich Schulz, Springer, 1992, ISBN 978-3-540-55124-9
- "ISBN 978-981-02-4897-0
- Zbl 1197.68006.
- Berstel, Jean; Reutenauer, Christophe (2011). Noncommutative rational series with applications. Encyclopedia of Mathematics and Its Applications. Vol. 137. Cambridge: Zbl 1250.68007.
- "Patterns in Permutations and Words", ISBN 9783642173325
- "Distribution Modulo One and Diophantine Approximation", Yann Bugeaud, Cambridge University Press, 2012, ISBN 9780521111690
External links
![](http://upload.wikimedia.org/wikipedia/en/thumb/4/4a/Commons-logo.svg/30px-Commons-logo.svg.png)