Browse DevX
Sign up for e-mail newsletters from DevX

Tip of the Day
Language: C++
Expertise: Intermediate
Mar 8, 2000



Building the Right Environment to Support AI, Machine Learning and Deep Learning

The "Maximal Munch" Principle

Every compiler has a tokenizer, which is a component that parses a source file into distinct tokens (keywords, operators, identifiers etc.). One of the tokenizer's rules is called "maximal munch", which says that the tokenizer should keep reading characters from the source file until adding one more character causes the current token to stop making sense. For example, if the letters 'c', 'h', and 'a' have been read, and the following character is 'r', the tokenizer will read it too and complete the token "char". In certain contexts, the maximal munch rule can have surprising effects, though. Consider the following declaration:

  vector < stack<int>> vs; // error

The programmer wanted to declare a vector of stacks. However, the tokenizer didn't parse this declaration correctly. Because of the maximal munch rule, the sequence >> is parsed as a single token (i.e., the right shift operator) rather than two tokens, each of which terminates a template's argument list. At a later stage, the syntactic analyzer will detect that a right shift operator doesn't make sense in this context. Consequently, the compiler will issue an error message.

How can you fix this? Simply insert a space between the two >>. The tokenizer treats a whitespace as a token terminator. Therefore, it will now parse the following declaration correctly:

  vector < stack<int> >; // now OK
Danny Kalev
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date