We want to support legacy identifiers that overlap with new keywords (for
example, base). This is being called "raw identifier syntax" using
r#<identifier>, and is based on
Rust.
Note this proposal is derived from Proposal #17: Lexical conventions.
One of Carbon's most important goals is to support program and language evolution. We know that the set of keywords in Carbon will grow over time, and the easiest kind of language change from an evolutionary perspective is one that is known to break no programs, that lets programs migrate incrementally to the new language rule, and that either has no migration cost or only imposes automatable migration cost on the code that intends to use the new feature.
We have proposals that discussed using r# but did not make a decision in favor
of it:
Self and .Self
mentions r# syntax as proposed but not in use.Rust provides this as
"Raw identifiers", using r# as a prefix (r#self). The documented syntax is:
RAW_IDENTIFIER : r# IDENTIFIER_OR_KEYWORD Except crate, self, super, Self
C#
provides this as "vebatim identifiers", using @ as a prefix (@self). The
documented syntax
is:
fragment Escaped_Identifier
// Includes keywords and contextual keywords prefixed by '@'.
// See note below.
: '@' Basic_Identifier
;
Swift provides this as part of the identifier grammar, using backticks (`self`). The documented syntax is:
identifier → `identifier-head identifier-characters?`
A raw identifier can be specified by prefixing a word with r#, such as
r#requires. Raw identifiers can be used to introduce and use names that are
lexically identical to keywords. The declaration of a raw identifier does not
prevent the base word from being interpreted as a keyword; otherwise, they
behave identically to the word formed by removing the r# prefix.
In diagnostics, if there is a keyword r#<identifier>, then raw identifiers
should be expected to print with the r# prefix. Otherwise, they will typically
use the non-prefixed identifier name for consistency.
r# syntax is consistent with raw string literals, and should be
representative to readers that something unusual is being done.For considering other syntaxes, a couple initial considerations for
r#identifier prefixing is:
# prefixes for
string literals, and
it's likely we'll support syntax similar to f#"..." for interpolated
string literals. The r# syntax offers consistency with this, and will
hopefully be recognizable to users.r#"..." for raw string literals, whereas Carbon uses
#"...".r is 1-in-55 identifiers with a 100% slowdown with
linear cost scaling for other similar code, or it could indicate that the
additional code path causes incremental slowdown but if other code (such as
f#"...") used the same codepath it may instead have constant cost scaling
(negligible incremental cost). This may also be either reduced or become
more significant if we enable tail calls and other optimizations. As a
consequence, the precise overhead is difficult to quantify at this time.Various other prefixes have been discussed, mostly using a special character prefix in order to restrict the lexing impact. In particular:
\ prefix, as in \identifier.
\ escaping in strings.r#.\n as an identifier versus \n as a
character escape.
\ raw identifiers
to only keywords in the language, meaning \n would only be a
character escape. The alternative
Restrict raw identifier syntax to current and future keywords
applies to this solution.# prefix without r, as in #identifier.
# prefix for metaprogramming, so the r
offers a way to keep the # prefix available for other purposes.#if may look to C++ developers like a compiler directive, rather than
a raw identifier for if.@ prefix, as in @identifier.
@ prefix for attributes, similar to
Python. Similar to #, this would be conflicting.` wrapping, as in `identifier`.
`` ` ``<code>\`</code>~identifier,
$identifier, or %identifier.
Raw identifier syntax is expected to be an edge case of the language. As a consequence, it should probably be expected that developers reading it will be more likely to rely on their understanding of the syntax either from other parts of Carbon, or from other languages. This means it's helpful if the syntax can be understood on its own, but if it's confusable with C++ syntax, the relative rarity could exacerbate understandability issues.
If performance of the r# prefix is prohibitive, that would be a justification
for changing approaches.
We had discussed maintaining a list of current and future keywords, and only allowing raw identifier syntax in those cases. If this were done as part of the toolchain, releases would need to push versions that "declare" future keywords without turning them into actual keywords. For a library that used those identifiers, it would initially be compatible with compiler versions up to and including the "future" keyword version; upon using raw identifier syntax, that would become the minimum compiler version. This creates a compiler versioning dependency that it might be helpful to avoid.
As an alternative approach, Carbon could provide a command line option which
libraries could use to specify future keywords that are used in the program.
While some systems such as bazel allow libraries to indicate options they need
for compilation, other build systems such as cmake might require library users
to update their dependencies as well. The consequence would be that library
users might need to more carefully monitor options when updating compilers.
We could say that, in a scope where a raw identifier has been declared, the
token without r# now refers to the identifier instead of the keyword. If the
user actually needs the keyword within that scope, they could instead use k#
or something similar.
A particular example of this can be seen with the base keyword:
class C {
// `base` now means this name in the scope of `C`.
var r#base: i32;
// To extend, `k#base` is now required.
extend k#base: T;
}
fn MakeC() -> C {
// The struct literal's `base` is outside the scope of `C`, so must use
// `r#base`.
var c: C = {.r#base = 0, .base = { ... }};
// A member reference could use the identifier-default for `base` in `C`.
c.base = 1;
c.k#base = {...};
return c;
}
The equivalent under proposed syntax (uniformly using r#base) is:
class C {
var r#base: i32;
extend base: T;
}
fn MakeC() -> C {
var c: C = {.r#base = 0, .base = { ... }};
c.r#base = 1;
c.base = {...};
return c;
}
At present we are deciding this is unnecessary complexity, and it's better to
require r# in all references to the identifier.
We could omit raw identifier syntax. It introduces a novel risk of underhanded code that appears to mean one thing but means a different thing, by shadowing a keyword with an identifier. This risk is discussed in Initial Analysis of Underhanded Source Code (Wheeler 2020) (page 4-2).
This concern is considered non-blocking.