The “Maximal Munch” Principle

Every compiler has a tokenizer, which is a component that parses a source file into distinct tokens (keywords, operators, identifiers etc.). One of the tokenizer’s rules is called “maximal munch”, which says that the tokenizer should keep reading characters from the source file until adding one more character causes the current token to stop making sense. For example, if the letters ‘c’, ‘h’, and ‘a’ have been read, and the following character is ‘r’, the tokenizer will read it too and complete the token “char”. In certain contexts, the maximal munch rule can have surprising effects, though. Consider the following declaration:

   vector < stack> vs; // error

The programmer wanted to declare a vector of stacks. However, the tokenizer didn’t parse this declaration correctly. Because of the maximal munch rule, the sequence >> is parsed as a single token (i.e., the right shift operator) rather than two tokens, each of which terminates a template’s argument list. At a later stage, the syntactic analyzer will detect that a right shift operator doesn’t make sense in this context. Consequently, the compiler will issue an error message.

How can you fix this? Simply insert a space between the two >>. The tokenizer treats a whitespace as a token terminator. Therefore, it will now parse the following declaration correctly:

   vector < stack >; // now OK
Share the Post:
Share on facebook
Share on twitter
Share on linkedin


Recent Articles: