devxlogo

The “Maximal Munch” Principle

The “Maximal Munch” Principle

Every compiler has a tokenizer, which is a component that parses a source file into distinct tokens (keywords, operators, identifiers etc.). One of the tokenizer’s rules is called “maximal munch”, which says that the tokenizer should keep reading characters from the source file until adding one more character causes the current token to stop making sense. For example, if the letters ‘c’, ‘h’, and ‘a’ have been read, and the following character is ‘r’, the tokenizer will read it too and complete the token “char”. In certain contexts, the maximal munch rule can have surprising effects, though. Consider the following declaration:

   vector < stack> vs; // error

The programmer wanted to declare a vector of stacks. However, the tokenizer didn’t parse this declaration correctly. Because of the maximal munch rule, the sequence >> is parsed as a single token (i.e., the right shift operator) rather than two tokens, each of which terminates a template’s argument list. At a later stage, the syntactic analyzer will detect that a right shift operator doesn’t make sense in this context. Consequently, the compiler will issue an error message.

How can you fix this? Simply insert a space between the two >>. The tokenizer treats a whitespace as a token terminator. Therefore, it will now parse the following declaration correctly:

   vector < stack >; // now OK
See also  Professionalism Starts in Your Inbox: Keys to Presenting Your Best Self in Email
devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist