Explorar el Código

Change raw string literal syntax: `[#]*"` represents single-line string and `[#]*'''` represents block string (#1360)

Use `"` for simple string literals and `'''` for block string literals.

Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Richard Smith <richard@metafoo.co.uk>
Zenong Zhang hace 3 años
padre
commit
07d6a03b32
Se han modificado 2 ficheros con 143 adiciones y 61 borrados
  1. 46 61
      docs/design/lexical_conventions/string_literals.md
  2. 97 0
      proposals/p1360.md

+ 46 - 61
docs/design/lexical_conventions/string_literals.md

@@ -24,9 +24,9 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 ## Overview
 
 Carbon supports both simple literals that are single-line using one double
-quotation mark (`"`) and block literals that are multi-line using three double
-quotation marks (`"""`). A block string literal may have a file type indicator
-after the first `"""`; this does not affect the string itself, but may assist
+quotation mark (`"`) and block literals that are multi-line using three single
+quotation marks (`'''`). A block string literal may have a file type indicator
+after the first `'''`; this does not affect the string itself, but may assist
 other tooling. For example:
 
 ```carbon
@@ -34,22 +34,22 @@ other tooling. For example:
 var simple: String = "example";
 
 // Block string literal:
-var block: String = """
+var block: String = '''
     The winds grow high; so do your stomachs, lords.
     How irksome is this music to my heart!
     When such strings jar, what hope of harmony?
     I pray, my lords, let me compound this strife.
         -- History of Henry VI, Part II, Act II, Scene 1, W. Shakespeare
-    """;
+    ''';
 
 // Block string literal with file type indicator:
-var code_block: String = """cpp
+var code_block: String = '''cpp
     #include <iostream>
     int main() {
         std::cout << "Hello world!";
         return 0;
     }
-    """
+    '''
 ```
 
 The indentation of a block string literal's terminating line is removed from all
@@ -98,16 +98,26 @@ This sequence is enclosed in `"`s. For example, this is a simple string literal:
 var String: lucius = "The strings, my lord, are false.";
 ```
 
-A _block string literal_ starts with `"""`, followed by an optional file type
-indicator, followed by a newline, and ends at the next instance of three double
-quotation marks whose first `"` is not part of a `\"` escape sequence. The
-closing `"""` shall be the first non-whitespace characters on that line. The
-lines between the opening line and the closing line (exclusive) are _content
-lines_. The content lines shall not contain `\` characters that do not form part
-of an escape sequence.
+Adjacent string literals are disallowed, like the following:
+
+```carbon
+// The three adjacent simple string literals `""`, `"abc"` and `""` are invalid.
+var String: block = """abc""";
+```
+
+String literals starting with triple double quotation marks `"""` are adjacent
+string literals. It is important to reject and diagnose them.
+
+A _block string_ literal starts with `'''`. Characters on the same line
+following the `'''` are an optional file type indicator. The literal ends at the
+next instance of three single quotation marks whose first `'` is not part of a
+`\'` escape sequence. The closing `'''` shall be the first non-whitespace
+characters on that line. The lines between the opening line and the closing line
+(exclusive) are _content lines_. The content lines shall not contain `\`
+characters that do not form part of an escape sequence.
 
 The _indentation_ of a block string literal is the sequence of horizontal
-whitespace preceding the closing `"""`. Each non-empty content line shall begin
+whitespace preceding the closing `'''`. Each non-empty content line shall begin
 with the indentation of the string literal. The content of the literal is formed
 as follows:
 
@@ -122,16 +132,16 @@ as follows:
 A content line is considered empty if it contains only whitespace characters.
 
 ```carbon
-var String: w = """
+var String: w = '''
   This is a string literal. Its first character is 'T' and its last character is
   a newline character. It contains another newline between 'is' and 'a'.
-  """;
+  ''';
 
-// This string literal is invalid because the """ after 'closing' terminates
+// This string literal is invalid because the ''' after 'closing' terminates
 // the literal, but is not at the start of the line.
-var String: invalid = """
-  error: closing """ is not on its own line.
-  """;
+var String: invalid = '''
+  error: closing ''' is not on its own line.
+  ''';
 ```
 
 A _file type indicator_ is any sequence of non-whitespace characters other than
@@ -143,10 +153,10 @@ the string literal's content.
 ```carbon
 // This is a block string literal. Its first two characters are spaces, and its
 // last character is a line feed. It has a file type of 'c++'.
-var String: starts_with_whitespace = """c++
+var String: starts_with_whitespace = '''c++
     int x = 1; // This line starts with two spaces.
     int y = 2; // This line starts with two spaces.
-  """;
+  ''';
 ```
 
 The file type indicator might contain semantic information beyond the file type
@@ -226,7 +236,7 @@ trailing whitespace is replaced by a line feed character, so a `\` followed by
 horizontal whitespace followed by a line terminator removes the whitespace up to
 and including the line terminator. Unlike in Rust, but like in Swift, leading
 whitespace on the line after an escaped newline is not removed, other than
-whitespace that matches the indentation of the terminating `"""`.
+whitespace that matches the indentation of the terminating `'''`.
 
 A character sequence starting with a backslash that doesn't match any known
 escape sequence is invalid. Whitespace characters other than space and, for
@@ -248,15 +258,15 @@ var String: fret = "I would 'twere something that would fret the string,\n" +
 var String: password = "\u{1F3F9}2";
 
 // This string contains no newline characters.
-var String: type_mismatch = """
+var String: type_mismatch = '''
   Shall I compare thee to a summer's day? Thou art \
   more lovely and more temperate.\
-  """;
+  ''';
 
-var String: trailing_whitespace = """
+var String: trailing_whitespace = '''
   This line ends in a space followed by a newline. \n\
       This line starts with four spaces.
-  """;
+  ''';
 ```
 
 ### Raw string literals
@@ -266,25 +276,25 @@ of string literals can be customized by prefixing the opening delimiter with _N_
 `#` characters. A closing delimiter for such a string is only recognized if it
 is followed by _N_ `#` characters, and similarly, escape sequences in such
 string literals are recognized only if the `\` is also followed by _N_ `#`
-characters. A `\`, `"`, or `"""` not followed by _N_ `#` characters has no
+characters. A `\`, `"`, or `'''` not followed by _N_ `#` characters has no
 special meaning.
 
 | Opening delimiter | Escape sequence introducer    | Closing delimiter |
 | ----------------- | ----------------------------- | ----------------- |
-| `"` / `"""`       | `\` (for example, `\n`)       | `"` / `"""`       |
-| `#"` / `#"""`     | `\#` (for example, `\#n`)     | `"#` / `"""#`     |
-| `##"` / `##"""`   | `\##` (for example, `\##n`)   | `"##` / `"""##`   |
-| `###"` / `###"""` | `\###` (for example, `\###n`) | `"###` / `"""###` |
+| `"` / `'''`       | `\` (for example, `\n`)       | `"` / `'''`       |
+| `#"` / `#'''`     | `\#` (for example, `\#n`)     | `"#` / `'''#`     |
+| `##"` / `##'''`   | `\##` (for example, `\##n`)   | `"##` / `'''##`   |
+| `###"` / `###'''` | `\###` (for example, `\###n`) | `"###` / `'''###` |
 | ...               | ...                           | ...               |
 
 For example:
 
 ```carbon
-var String: x = #"""
+var String: x = #'''
   This is the content of the string. The 'T' is the first character
   of the string.
-  """ <-- This is not the end of the string.
-  """#;
+  ''' <-- This is not the end of the string.
+  '''#;
   // But the preceding line does end the string.
 // OK, final character is \
 var String: y = #"Hello\"#;
@@ -292,31 +302,6 @@ var String: z = ##"Raw strings #"nesting"#"##;
 var String: w = #"Tab is expressed as \t. Example: '\#t'"#;
 ```
 
-Note that both a raw simple string literal and a raw block string literal can
-begin with `#"""`. These cases can be distinguished by the presence or absence
-of additional `"`s later in the same line:
-
--   In a raw simple string literal, there must be a `"` and one or more `#`s
-    later in the same line terminating the string.
--   In a raw block string literal, the rest of the line is a file type
-    indicator, which can contain neither `"` nor `#`.
-
-```carbon
-// This string is a single-line raw string literal.
-// The contents of this string start and end with exactly two "s.
-var String: ambig1 = #"""This is a raw string literal starting with """#;
-
-// This string is a raw block string literal with file-type 'This', whose
-// contents start with "is a ".
-var String: ambig2 = #"""This
-  is a block string literal with file type 'This', first character 'i',
-  and last character 'X': X\#
-  """#;
-
-// This is a single-line raw string literal, equivalent to "\"".
-var String: ambig3 = #"""#;
-```
-
 ### Encoding
 
 A string literal results in a sequence of 8-bit bytes. Like Carbon source files,

+ 97 - 0
proposals/p1360.md

@@ -0,0 +1,97 @@
+# Change raw string literal syntax: `[#]\*"` represents single-line string and `[#]\*'''` represents block string
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/1360)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+    -   [The current implementation](#the-current-implementation)
+    -   [Using `"""` to start block string literals](#using--to-start-block-string-literals)
+    -   [Non-quote marker after the open quote](#non-quote-marker-after-the-open-quote)
+    -   [Use different quotes to allow `#'"'#`](#use-different-quotes-to-allow-)
+
+<!-- tocstop -->
+
+## Problem
+
+Under current design of string literals, users may make assumptions that a
+starting `[#]*"""` represents a block string and misunderstand the syntax.
+
+## Background
+
+The design of
+[string literals](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/string_literals.md)
+specifies a block string literal to start with `[#]*"""`. Users may take for
+granted the other way around, where any string literal starting with `[#]*"""`
+is a block string literal. This does not hold true, however. Two counter-cases
+are `"""abc"""` represents three tokens `""`, `"abc"` and `""`, and `#"""#`
+which is equivalent to `"\""`. Neither is a block string literal and may be
+visually confusing.
+
+## Proposal
+
+Interpret `[#]*"` as the start of single-line string literals, and `[#]*'''` as
+the start of block string literals. Disallow adjacent string literals like
+`"""abc"""`.
+
+## Details
+
+Users can easily distinguish single-line string literals from block string
+literals with the proposed change. Confusion on `"""abc"""` will be eliminated
+because adjacent string literals are invalid in the proposal. On the other hand,
+`#"""#` will be clear to the user of representing `"\""`, as `"""` does not
+represent a block string literal any more. More details can be found
+[here](/docs/design/lexical_conventions/string_literals.md).
+
+## Rationale
+
+This principle helps make Carbon code
+[easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write),
+because it avoids confusion on the type of certain string literals.
+
+## Alternatives considered
+
+### The current implementation
+
+In addition to the confusion described above, the
+[current implementation](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/string_literals.md)
+complicates the lexing. When the lexer sees `[#]+"""`, it temporarily accepts
+syntax of both string types because the type of the string is undecided.
+Specifically, it accepts vertical whitespaces even if they are not allowed in
+single-line strings. The type of the string won't be decided until the lexer
+sees a closing `"#` or a new line. In case of a closing `"#` where the string is
+single-line, the lexer will look back on the scanned characters for vertical
+whitespaces to decide if the single-line string is valid.
+
+### Using `"""` to start block string literals
+
+This approach loses some convenience in using raw string literals while
+addressing the problem. For example, as discussed in
+[issue #1359](https://github.com/carbon-language/carbon-lang/issues/1359),
+`#"""#` is a natural way to write a string of `"`.
+
+### Non-quote marker after the open quote
+
+Although something similar to C++ style like `"(` solves the problem, the syntax
+becomes complicated and hurts readability. In addition, `"(")"` is no simpler
+than `"\""` or `#"""#`.
+
+### Use different quotes to allow `#'"'#`
+
+When disallowing adjacent string literals, we can additionally allow `[#]+'` on
+single-line string literals. Another option is to use `[#]+'` for single-line
+string literals and `[#]+"` for block string literals. In general, `#'"'#` is
+visually confusing with character `'"'` and hurts readability.