Maximal munch
In computer programming and computer science, "maximal munch" or "longest match" is the principle that when creating some construct, as much of the available input as possible should be consumed.
The earliest known use of this term is by R.G.G. Cattell in his PhD thesis
Application
For instance, the
[a-z]+
(one or more lower-case letters).[2]The term is also used in
Drawbacks
In some situations, "maximal munch" leads to undesirable or unintuitive outcomes. For instance, in the C programming language, the statement x=y/*z;
(without any whitespace) will probably lead to a syntax error since the /*
character sequence (unintentionally) initiates a comment that is either unterminated or terminated by the end token */
of some later, unrelated actual comment (comments in C do not nest). What was actually meant in the statement was to assign to the variable x
the result of dividing the value in y
by the value obtained by dereferencing pointer z
; this would be valid code. It can be stated by making use of whitespace or using x=y/(*z);
.
Another example, in
>>
.[4] std::vector<std::vector<int>> my_mat_11; //Incorrect in C++03, correct in C++11.
std::vector<std::vector<int> > my_mat_03; //Correct in either C++03 or C++11.
The
Alternatives
Programming languages researchers have also responded by replacing or supplementing the principle of maximal munch with other lexical disambiguation tactics. One approach is to utilize "follow restrictions", which instead of directly taking the longest match will put some restrictions on what characters can follow a valid match. For example, stipulating that strings matching [a-z]+
cannot be followed by an alphabetic character achieves the same effect as maximal munch with that regular expression.[5] (In the context of regular expressions, the maximal munch principle is referred to as greediness and contrasted with laziness.) Another approach is to keep the principle of maximal munch but make it subordinate to some other principle, such as context (e.g., the right-shift token in Java would not be matched in the context of a generics expression, where it is syntactically invalid).[6]
References
Bibliography
- Aho, Alfred V.; Lam, Monica S.; Sethi, Ravi; Ullman, Jeffrey D. (2007). ISBN 978-0-321-48681-3.
- Page, Daniel (2009). "Compilers". Practical Introduction to Computer Architecture. Texts in Computer Science. London: Springer. pp. 451–493. ISBN 978-1-84882-255-9.
- Van den Brand, Mark G.J.; Scheerder, Jeroen; Vinju, Jurgen J.; Visser, Eelco (2002). "Disambiguation Filters for Scannerless Generalized LR Parsers". Compiler Construction. Lecture Notes in Computer Science. Vol. 2304/2002. Berlin/Heidelberg: Springer. pp. 21–44. ISSN 0302-9743.
- Vandevoorde, Daveed (14 January 2005). "Right Angle Brackets". Retrieved 31 March 2010.
- Van Wyk, Eric; Schwerdfeger, August (2007). "Context-aware scanning for parsing extensible languages". Proceedings of the 6th international conference on Generative programming and component engineering. New York: ACM. pp. 63–72. S2CID 9145863.