whitespace.md 1.2 KB

Whitespace

Table of contents

Overview

The exact lexical form of Carbon whitespace has not yet been settled. However, Carbon will follow lexical conventions for whitespace based on Unicode Annex #31. TODO: Update this once the precise rules are decided; see the Unicode source files proposal.

Unicode Annex #31 suggests selecting whitespace characters based on the characters with Unicode property Pattern_White_Space, which is currently these 11 characters:

  • U+0009 CHARACTER TABULATION (horizontal tab)
  • U+000A LINE FEED (traditional newline)
  • U+000B LINE TABULATION (vertical tab)
  • U+000C FORM FEED (page break)
  • U+000D CARRIAGE RETURN
  • U+0020 SPACE
  • U+0085 NEXT LINE (Unicode newline)
  • U+200E LEFT-TO-RIGHT MARK
  • U+200F RIGHT-TO-LEFT MARK
  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

The quantity and kind of whitespace separating tokens is ignored except where otherwise specified.