2 years ago · 6907870a18
--- a/proposals/p3797.md
+++ b/proposals/p3797.md
@@ -0,0 +1,274 @@
 
				+# Raw identifier syntax
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+[Pull request](https://github.com/carbon-language/carbon-lang/pull/3797)
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Abstract](#abstract)
			
 
				+-   [Problem](#problem)
			
 
				+-   [Background](#background)
			
 
				+    -   [Prior discussion](#prior-discussion)
			
 
				+    -   [Other languages](#other-languages)
			
 
				+-   [Proposal](#proposal)
			
 
				+    -   [Diagnostics](#diagnostics)
			
 
				+-   [Rationale](#rationale)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				+    -   [Other raw identifier syntaxes](#other-raw-identifier-syntaxes)
			
 
				+    -   [Restrict raw identifier syntax to current and future keywords](#restrict-raw-identifier-syntax-to-current-and-future-keywords)
			
 
				+    -   [Don't require syntax for references to raw identifiers](#dont-require-syntax-for-references-to-raw-identifiers)
			
 
				+    -   [Don't provide raw identifier syntax](#dont-provide-raw-identifier-syntax)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Abstract
			
 
				+
			
 
				+We want to support legacy identifiers that overlap with new keywords (for
			
 
				+example, `base`). This is being called "raw identifier syntax" using
			
 
				+`r#<identifier>`, and is based on
			
 
				+[Rust](https://doc.rust-lang.org/reference/identifiers.html).
			
 
				+
			
 
				+Note this proposal is derived from
			
 
				+[Proposal #17: Lexical conventions](https://github.com/carbon-language/carbon-lang/pull/17).
			
 
				+
			
 
				+## Problem
			
 
				+
			
 
				+One of Carbon's most important goals is to support program and language
			
 
				+evolution. We know that the set of keywords in Carbon will grow over time, and
			
 
				+the easiest kind of language change from an evolutionary perspective is one that
			
 
				+is known to break no programs, that lets programs migrate incrementally to the
			
 
				+new language rule, and that either has no migration cost or only imposes
			
 
				+automatable migration cost on the code that intends to use the new feature.
			
 
				+
			
 
				+## Background
			
 
				+
			
 
				+### Prior discussion
			
 
				+
			
 
				+We have proposals that discussed using `r#` but did not make a decision in favor
			
 
				+of it:
			
 
				+
			
 
				+-   [Proposal #17: Lexical conventions](https://github.com/carbon-language/carbon-lang/pull/17)
			
 
				+    originally proposed it, but when it was split into multiple proposals, raw
			
 
				+    identifiers were not retained.
			
 
				+    -   This proposal copies substantial parts of its text from here.
			
 
				+-   [Proposal #2107: Clarify rules around `Self` and `.Self`](https://github.com/carbon-language/carbon-lang/pull/2107)
			
 
				+    mentions `r#` syntax as proposed but not in use.
			
 
				+
			
 
				+### Other languages
			
 
				+
			
 
				+[Rust](https://doc.rust-lang.org/reference/identifiers.html) provides this as
			
 
				+"Raw identifiers", using `r#` as a prefix (`r#self`). The documented syntax is:
			
 
				+
			
 
				+```
			
 
				+RAW_IDENTIFIER : r# IDENTIFIER_OR_KEYWORD Except crate, self, super, Self
			
 
				+```
			
 
				+
			
 
				+[C#](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/verbatim)
			
 
				+provides this as "vebatim identifiers", using `@` as a prefix (`@self`). The
			
 
				+[documented syntax](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/lexical-structure#643-identifiers)
			
 
				+is:
			
 
				+
			
 
				+```
			
 
				+fragment Escaped_Identifier
			
 
				+    // Includes keywords and contextual keywords prefixed by '@'.
			
 
				+    // See note below.
			
 
				+    : '@' Basic_Identifier
			
 
				+    ;
			
 
				+```
			
 
				+
			
 
				+[Swift](https://docs.swift.org/swift-book/documentation/the-swift-programming-language/lexicalstructure/#Identifiers)
			
 
				+provides this as part of the identifier grammar, using backticks (\`self\`). The
			
 
				+documented syntax is:
			
 
				+
			
 
				+```
			
 
				+identifier → `identifier-head identifier-characters?`
			
 
				+```
			
 
				+
			
 
				+## Proposal
			
 
				+
			
 
				+A _raw identifier_ can be specified by prefixing a word with `r#`, such as
			
 
				+`r#requires`. Raw identifiers can be used to introduce and use names that are
			
 
				+lexically identical to keywords. The declaration of a raw identifier does not
			
 
				+prevent the base word from being interpreted as a keyword; otherwise, they
			
 
				+behave identically to the word formed by removing the `r#` prefix.
			
 
				+
			
 
				+### Diagnostics
			
 
				+
			
 
				+In diagnostics, if there is a keyword `r#<identifier>`, then raw identifiers
			
 
				+should be expected to print with the `r#` prefix. Otherwise, they will typically
			
 
				+use the non-prefixed identifier name for consistency.
			
 
				+
			
 
				+## Rationale
			
 
				+
			
 
				+-   [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
			
 
				+    -   Raw identifier syntax provides a way to add keywords to the language
			
 
				+        while still offering code a reasonable upgrade path, which can also be
			
 
				+        automated.
			
 
				+-   [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
			
 
				+    -   The `r#` syntax is consistent with raw string literals, and should be
			
 
				+        representative to readers that something unusual is being done.
			
 
				+-   [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
			
 
				+    -   C++ code using identifiers that are keywords in Carbon can use raw
			
 
				+        identifier syntax.
			
 
				+    -   The converse does not work: if Carbon code has an identifier that is a
			
 
				+        C++ keyword, it needs to be renamed for use from C++ code.
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+### Other raw identifier syntaxes
			
 
				+
			
 
				+For considering other syntaxes, a couple initial considerations for
			
 
				+`r#identifier` prefixing is:
			
 
				+
			
 
				+-   We use `#` prefixes for
			
 
				+    [string literals](/docs/design/lexical_conventions/string_literals.md), and
			
 
				+    it's likely we'll support syntax similar to `f#"..."` for interpolated
			
 
				+    string literals. The `r#` syntax offers consistency with this, and will
			
 
				+    hopefully be recognizable to users.
			
 
				+-   Consistency with Rust.
			
 
				+    -   Rust uses `r#"..."` for raw string literals, whereas Carbon uses
			
 
				+        `#"..."`.
			
 
				+-   Introduces another code execution path in lexing identifiers. This likely
			
 
				+    causes a slowdown;
			
 
				+    [PR #3044](https://github.com/carbon-language/carbon-lang/pull/3344)
			
 
				+    indicates roughly 2%, although that was run on a system with noisy
			
 
				+    benchmarks -- details would require a better system for benchmark. Note 2%
			
 
				+    could represent that `r` is 1-in-55 identifiers with a 100% slowdown with
			
 
				+    linear cost scaling for other similar code, or it could indicate that the
			
 
				+    additional code path causes incremental slowdown but if other code (such as
			
 
				+    `f#"..."`) used the same codepath it may instead have constant cost scaling
			
 
				+    (negligible incremental cost). This may also be either reduced or become
			
 
				+    more significant if we enable tail calls and other optimizations. As a
			
 
				+    consequence, the precise overhead is difficult to quantify at this time.
			
 
				+
			
 
				+Various other prefixes have been discussed, mostly using a special character
			
 
				+prefix in order to restrict the lexing impact. In particular:
			
 
				+
			
 
				+-   `\` prefix, as in `\identifier`.
			
 
				+    -   Similar to `\` escaping in strings.
			
 
				+    -   More intuitive "escaping" semantic for some developers versus `r#`.
			
 
				+    -   Creates a different meaning for `\n` as an identifier versus `\n` as a
			
 
				+        character escape.
			
 
				+        -   Some of this could be addressed by restricting `\` raw identifiers
			
 
				+            to only keywords in the language, meaning `\n` would only be a
			
 
				+            character escape. The alternative
			
 
				+            [Restrict raw identifier syntax to current and future keywords](#restrict-raw-identifier-syntax-to-current-and-future-keywords)
			
 
				+            applies to this solution.
			
 
				+-   `#` prefix without `r`, as in `#identifier`.
			
 
				+    -   Would be more consistent with string literals, and avoid the lexing
			
 
				+        overhead.
			
 
				+    -   We are considering using a `#` prefix for metaprogramming, so the `r`
			
 
				+        offers a way to keep the `#` prefix available for other purposes.
			
 
				+    -   `#if` may look to C++ developers like a compiler directive, rather than
			
 
				+        a raw identifier for `if`.
			
 
				+-   `@` prefix, as in `@identifier`.
			
 
				+    -   Consistent with C#.
			
 
				+    -   We've also discussed using a `@` prefix for attributes, similar to
			
 
				+        Python. Similar to `#`, this would be conflicting.
			
 
				+-   `` ` `` wrapping, as in `` `identifier` ``.
			
 
				+    -   Consistent with Swift.
			
 
				+    -   We prefer not to use backticks for Carbon syntax so that it is easy to
			
 
				+        write in Markdown, which uses backticks for inline code. For example, to
			
 
				+        render a backtick there are a couple options:
			
 
				+        -   Use more backticks: ``` `` ` `` ```
			
 
				+        -   Use inline HTML: ``<code>\`</code>``
			
 
				+-   Other currently unused characters as prefix, such as `~identifier`,
			
 
				+    `$identifier`, or `%identifier`.
			
 
				+    -   We expect raw identifiers to be relatively rare. There may be future
			
 
				+        uses for these characters that allow us to serve a broader use-case.
			
 
				+    -   While we could change raw string literal syntax to use the same
			
 
				+        character, it would be helpful if raw string literal syntax had some
			
 
				+        degree of cross-language syntactic consistency in order to reduce
			
 
				+        learning curves.
			
 
				+
			
 
				+Raw identifier syntax is expected to be an edge case of the language. As a
			
 
				+consequence, it should probably be expected that developers reading it will be
			
 
				+more likely to rely on their understanding of the syntax either from other parts
			
 
				+of Carbon, or from other languages. This means it's helpful if the syntax can be
			
 
				+understood on its own, but if it's confusable with C++ syntax, the relative
			
 
				+rarity could exacerbate understandability issues.
			
 
				+
			
 
				+If performance of the `r#` prefix is prohibitive, that would be a justification
			
 
				+for changing approaches.
			
 
				+
			
 
				+### Restrict raw identifier syntax to current and future keywords
			
 
				+
			
 
				+We had discussed maintaining a list of current and future keywords, and only
			
 
				+allowing raw identifier syntax in those cases. If this were done as part of the
			
 
				+toolchain, releases would need to push versions that "declare" future keywords
			
 
				+without turning them into actual keywords. For a library that used those
			
 
				+identifiers, it would initially be compatible with compiler versions up to and
			
 
				+including the "future" keyword version; upon using raw identifier syntax, that
			
 
				+would become the minimum compiler version. This creates a compiler versioning
			
 
				+dependency that it might be helpful to avoid.
			
 
				+
			
 
				+As an alternative approach, Carbon could provide a command line option which
			
 
				+libraries could use to specify future keywords that are used in the program.
			
 
				+While some systems such as `bazel` allow libraries to indicate options they need
			
 
				+for compilation, other build systems such as `cmake` might require library users
			
 
				+to update their dependencies as well. The consequence would be that library
			
 
				+users might need to more carefully monitor options when updating compilers.
			
 
				+
			
 
				+### Don't require syntax for references to raw identifiers
			
 
				+
			
 
				+We could say that, in a scope where a raw identifier has been declared, the
			
 
				+token without `r#` now refers to the identifier instead of the keyword. If the
			
 
				+user actually needs the keyword within that scope, they could instead use `k#`
			
 
				+or something similar.
			
 
				+
			
 
				+A particular example of this can be seen with the `base` keyword:
			
 
				+
			
 
				+```
			
 
				+class C {
			
 
				+    // `base` now means this name in the scope of `C`.
			
 
				+    var r#base: i32;
			
 
				+    // To extend, `k#base` is now required.
			
 
				+    extend k#base: T;
			
 
				+}
			
 
				+
			
 
				+fn MakeC() -> C {
			
 
				+  // The struct literal's `base` is outside the scope of `C`, so must use
			
 
				+  // `r#base`.
			
 
				+  var c: C = {.r#base = 0, .base = { ... }};
			
 
				+  // A member reference could use the identifier-default for `base` in `C`.
			
 
				+  c.base = 1;
			
 
				+  c.k#base = {...};
			
 
				+  return c;
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+The equivalent under proposed syntax (uniformly using `r#base`) is:
			
 
				+
			
 
				+```
			
 
				+class C {
			
 
				+    var r#base: i32;
			
 
				+    extend base: T;
			
 
				+}
			
 
				+
			
 
				+fn MakeC() -> C {
			
 
				+  var c: C = {.r#base = 0, .base = { ... }};
			
 
				+  c.r#base = 1;
			
 
				+  c.base = {...};
			
 
				+  return c;
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+At present we are deciding this is unnecessary complexity, and it's better to
			
 
				+require `r#` in all references to the identifier.
			
 
				+
			
 
				+### Don't provide raw identifier syntax
			
 
				+
			
 
				+We could omit raw identifier syntax. It introduces a novel risk of underhanded
			
 
				+code that appears to mean one thing but means a different thing, by shadowing a
			
 
				+keyword with an identifier. This risk is discussed in
			
 
				+[Initial Analysis of Underhanded Source Code (Wheeler 2020)](https://www.ida.org/-/media/feature/publications/i/in/initial-analysis-of-underhanded-source-code/d-13166.ashx)
			
 
				+(page 4-2).
			
 
				+
			
 
				+This concern is considered non-blocking.