Parcourir la source

No predeclared identifiers, `Core` is a keyword (#4864)

Introduce a principle that the Carbon language should not encroach on
the
developer's namespace. Satisfy this principle by making `Core` a
keyword.

---------

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
Richard Smith il y a 1 an
Parent
commit
e257051612

+ 4 - 1
docs/design/code_and_name_organization/README.md

@@ -546,6 +546,7 @@ the caller.
 
 
 ```regex
 ```regex
 import IDENTIFIER (library NAME_PATH)?;
 import IDENTIFIER (library NAME_PATH)?;
+import Core (library NAME_PATH)?;
 import library NAME_PATH;
 import library NAME_PATH;
 import library default;
 import library default;
 ```
 ```
@@ -554,7 +555,9 @@ An import with a package name `IDENTIFIER` declares a package entity named after
 the imported package, and makes API entities from the imported library available
 the imported package, and makes API entities from the imported library available
 through it. `Main` cannot be imported from other packages; in other words, only
 through it. `Main` cannot be imported from other packages; in other words, only
 `import library NAME_PATH` syntax can be used to import from `Main`. Imports of
 `import library NAME_PATH` syntax can be used to import from `Main`. Imports of
-`Main//default` are invalid.
+`Main//default` are invalid. The keyword `Core` can be used as a package name in
+an import in order to import portions of the standard library that are not part
+of the prelude.
 
 
 The full name path is a concatenation of the names of the package entity, any
 The full name path is a concatenation of the names of the package entity, any
 namespace entities applied, and the final entity addressed. Child namespaces or
 namespace entities applied, and the final entity addressed. Child namespaces or

+ 3 - 1
docs/design/lexical_conventions/words.md

@@ -39,7 +39,7 @@ in Unicode Normalization Form C (NFC).
 
 
 <!--
 <!--
 Keep in sync:
 Keep in sync:
-- utils/textmate/Syntaxes/Carbon.plist
+- utils/textmate/Syntaxes/carbom.tmLanguage.json
 - utils/tree_sitter/queries/highlights.scm
 - utils/tree_sitter/queries/highlights.scm
 -->
 -->
 
 
@@ -54,6 +54,7 @@ The following words are interpreted as keywords:
 -   `auto`
 -   `auto`
 -   `base`
 -   `base`
 -   `break`
 -   `break`
+-   `Core`
 -   `case`
 -   `case`
 -   `choice`
 -   `choice`
 -   `class`
 -   `class`
@@ -92,6 +93,7 @@ The following words are interpreted as keywords:
 -   `return`
 -   `return`
 -   `returned`
 -   `returned`
 -   `Self`
 -   `Self`
+-   `self`
 -   `template`
 -   `template`
 -   `then`
 -   `then`
 -   `type`
 -   `type`

+ 122 - 0
docs/project/principles/namespace_cleanliness.md

@@ -0,0 +1,122 @@
+# Principle: Namespace cleanliness
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Background](#background)
+-   [Principle](#principle)
+-   [Applications of this principle](#applications-of-this-principle)
+-   [Exceptions](#exceptions)
+-   [Alternatives considered](#alternatives-considered)
+
+<!-- tocstop -->
+
+## Background
+
+Names and entities in a program can come from multiple sources -- from a local
+declaration, from an import, from the standard library, or from the prelude.
+Names can be imported from Carbon code or imported or derived from code written
+in another language such as C++ or an interface description language such as
+that of Protobuf, MIDL, or CORBA. Names can be selected for use in a program
+that language designers later decide they want to use as keywords. And in order
+to use a library, it is sometimes necessary to redeclare the same name that that
+library chose.
+
+This puts a lot of pressure on the language to support a free choice of naming
+for entities. Different languages make different choices in this space:
+
+-   Many languages have a set of keywords that are not usable as identifiers,
+    with no workaround. If this set collides with a name needed by user code,
+    the user is left to solve this problem, often by rewriting the identifier in
+    some way (`klass` or `class_`), which sometimes conflicts with the general
+    naming convention used by the code. And conversely, suboptimal choices are
+    made for new language keywords to avoid causing problems for existing code.
+-   C and C++ reserve a family of identifiers, such as those beginning with an
+    underscore and a capital letter. However, it's not clear which audiences the
+    reserved identifiers are for, and this leads to collisions between standard
+    library vendors and compiler authors, as well as between implementation
+    extensions and language extensions.
+    -   MSVC provides a `__identifier(keyword)` extension that allows using a
+        keyword as an identifier. This extension is also implemented by Clang in
+        `-fms-extensions` mode.
+    -   GCC provides an `__asm__(symbol)` extension that allows a specific
+        symbol to be assigned to an object or function, which provides ABI
+        compatibility but not source compatibility with code that uses a keyword
+        as a symbol name. This extension is also implemented by Clang.
+-   Python reserves some identifiers but still allows them to be freely
+    overwritten (such as `bool`) and reserves some identifiers but rejects
+    assignment to them (such as `True`).
+-   Rust provides a raw identifier syntax to allow most identifiers with
+    reserved meaning to be used by a program, but
+    [not all](https://internals.rust-lang.org/t/raw-identifiers-dont-work-for-all-identifiers/9094):
+    `self`, `Self`, `super`, `extern`, and `crate` cannot be used as raw
+    identifiers. Rust also predeclares a large number of library names in every
+    file, but allows them to be shadowed by user declarations with the same
+    name.
+-   Swift provides a raw identifier syntax using backticks: `` `class` ``, and
+    is
+    [considering](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0451-escaped-identifiers.md)
+    extending this to allow arbitrary non-word-shaped character sequences
+    between the `` ` ``s.
+
+Carbon provides
+[raw identifier syntax](/docs/design/lexical_conventions/words.md#raw-identifiers),
+for example `r#for`, to allow using keywords as identifiers. Carbon also intends
+to have strict shadowing rules that may make predeclared identifiers that are
+_not_ keywords difficult or impossible to redeclare and use in inner scopes.
+
+## Principle
+
+In Carbon, the language does not encroach on the developer's namespace. There
+are no predeclared or reserved identifiers. In cases where the language gives
+special meaning to a word or to word-shaped syntax such as `i32`, that special
+meaning can always be undone with raw identifier syntax, `r#`.
+
+Conversely, when adding language keywords, we will not select an inferior
+keyword merely to avoid the risk of breaking existing programs. We will still
+take into account how often it is desirable to use the word as an identifier,
+including in domain-specific contexts, because that is a factor in whether it
+would make a good keyword, and will manage the rollout of new keywords to make
+it straightforward to migrate existing uses to `r#` or a different name.
+
+## Applications of this principle
+
+-   Words like `final` and `base` that only have special meaning in a few
+    contexts, and could otherwise be made available as identifiers, are keywords
+    in Carbon. `{.base = ...}` and `{.r#base = ...}` specify different member
+    names.
+-   Words like `self` that are declared by the developer but nonetheless have
+    special language-recognized meaning are keywords in Carbon. `[self:! Self]`
+    introduces a self parameter; `[r#self:! Self]` introduces a deduced
+    parameter.
+-   Words like `Self` that are implicitly declared by the language in some
+    contexts are keywords, even though we could treat them as user-declared
+    identifiers that are merely implicitly declared in some cases.
+-   Words like `i32` that are treated as type literals rather than keywords can
+    be used as identifiers with raw identifier syntax `r#i32`.
+-   There are no predeclared identifiers imported from the prelude. If an entity
+    is important enough to be available by default, we should add a keyword, and
+    allow the name of the entity to be used for other purposes with `r#`.
+-   The predeclared package name `Core` is a keyword. A package named `r#Core`
+    is an unrelated package, and `Core.foo` always refers to members of the
+    predeclared `Core` package.
+
+## Exceptions
+
+For now, we reserve the package names `Main` and `Cpp`. These names aren't
+predeclared in any scope, and the name `Main` is not even usable from within
+source files to refer to the main package. However, there is currently no way to
+avoid collisions between the package name `Cpp` and a top-level entity named
+`Cpp` if they are both used in the same source file.
+
+## Alternatives considered
+
+-   [Have both predeclared identifiers and keywords](/proposals/p4864.md#have-both-predeclared-identifiers-and-keywords)
+-   [Reserve words with a certain spelling](/proposals/p4864.md#reserve-words-with-a-certain-spelling)

+ 177 - 0
proposals/p4864.md

@@ -0,0 +1,177 @@
+# No predeclared identifiers, `Core` is a keyword
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/4864)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+-   [Rationale](#rationale)
+-   [Future work](#future-work)
+    -   [Package name `Cpp`](#package-name-cpp)
+    -   [Package name `Main`](#package-name-main)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Have both predeclared identifiers and keywords](#have-both-predeclared-identifiers-and-keywords)
+    -   [Reserve words with a certain spelling](#reserve-words-with-a-certain-spelling)
+
+<!-- tocstop -->
+
+## Abstract
+
+Introduce a principle that the Carbon language should not encroach on the
+developer's namespace. Satisfy this principle by making `Core` a keyword.
+
+## Problem
+
+Ongoing design work needs rules for how to expose types such as a primitive
+array type to Carbon code, and in particular, if we choose to make it available
+by default, whether that should be accomplished by a keyword or a predeclared
+identifier.
+
+## Background
+
+See the
+[Background section of the added principle](/docs/project/principles/namespace_cleanliness.md#background).
+
+## Proposal
+
+We choose to not have any predeclared identifiers in Carbon. If a word has
+special meaning to the language, then that word is a keyword, and a plain
+identifier with no special meaning is always available using raw identifier
+syntax.
+
+## Details
+
+See [the principle document](/docs/project/principles/namespace_cleanliness.md)
+for details of the added principle. In addition, we make one change and one
+clarification:
+
+-   `Core` is changed from being an identifier that happens to be the name of
+    the Carbon standard library, and happens to be predeclared in every source
+    file as naming that library, to being a keyword. The keyword can only be
+    used:
+
+    -   When importing the `Core` package.
+    -   When implementing the `Core` package as part of the language
+        implementation.
+    -   As a keyword naming the `Core` package, much like the `package` keyword.
+
+    The identifier `r#Core` can be used freely and does not conflict with the
+    keyword. This includes use of `r#Core` as the name of a package. Language
+    constructs that are defined in terms of entities in the `Core` package refer
+    specifically to the package named with the _keyword_ `Core`, not to any
+    other entity named `Core`.
+
+-   The `self` keyword is now included in the list of keywords. It is already
+    treated as a keyword by the toolchain.
+
+## Rationale
+
+-   [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
+    -   Code generation tools can have a uniform handling for all words with
+        special meaning, with no need to alter the spelling of names from other
+        languages.
+    -   Language tools can determine the meaning of `Core.<name>` without
+        needing to do any name lookup or sophisticated analysis.
+-   [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
+    -   Migration between versions of Carbon with a changed set of reserved
+        words can be done uniformly.
+    -   Adding names to the prelude remains a non-breaking change. Adding new
+        predeclared names requires adding a keyword, with the same cost and
+        value tradeoffs regardless of whether the keyword names a library
+        declaration or introduces new language syntax.
+-   [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+    -   Syntax highlighting tools can easily distinguish between words with
+        special meaning and words with program-defined meaning.
+    -   The meaning of core language constructs can be defined as a rewrite in
+        terms of `Core.<name>` without concern that `Core` may have some
+        different local interpretation.
+-   [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
+    -   All C++ identifiers are nameable from Carbon code without conflicts.
+        Virtual functions introduced in Carbon can be overridden in Carbon
+        regardless of their name. C++ code can be migrated to Carbon even if its
+        name in C++ has special meaning in Carbon.
+-   [Principle: Prefer providing only one way to do a given thing](/docs/project/principles/one_way.md)
+    -   This proposal specifies that there is only one way to give words special
+        meaning in Carbon, and one way to resolve issues if that special meaning
+        conflicts with another desired meaning.
+
+## Future work
+
+### Package name `Cpp`
+
+The special package name `Cpp` that refers to code written in C++ is not made a
+keyword by this proposal, but this proposal is also not deciding that it should
+_not_ be a keyword. While this name has special meaning to the language, it's
+not predeclared in any context, so it's considered to be out of scope. A future
+proposal that describes the details of C++ import should determine whether this
+name becomes a keyword. Notably, making `Cpp` a keyword would also allow an
+`import Cpp` declaration to have custom syntax, which may be useful.
+
+### Package name `Main`
+
+The special package name `Main` that is currently reserved in all package name
+contexts is not made a keyword in this proposal either. There would be no
+meaning in making it a keyword, as it is never used as a special package name in
+Carbon source files. However, we could consider using an empty package name as
+the name of the main package, and unreserving the package name `Main`, if it
+becomes a concern that we reserve this name.
+
+## Alternatives considered
+
+### Have both predeclared identifiers and keywords
+
+We could provide both predeclared identifiers and keywords. Many languages
+follow this path. However, predeclared identifiers have some problems compared
+to keywords:
+
+-   In order to locally declare a name matching a predeclared identifier, the
+    name would need to be shadowed.
+    -   Such shadowing may be invalid, depending on how the name is used.
+    -   Readability is harmed by using a name used as basic vocabulary with a
+        different, local meaning.
+    -   Shadowing a predeclared identifier typically makes the original name
+        hard to access -- an alias or similar must be established in advance.
+-   There need to be two different stories for how to deal with adding a new
+    word with special meaning to the language, depending on whether it is a
+    keyword.
+-   For each word with special meaning, we must make an arbitrary decision as to
+    which kind it is, resulting in a largely meaningless distinction that
+    nonetheless is visible and would need to be known by developers in some
+    contexts.
+
+### Reserve words with a certain spelling
+
+We could reserve words with certain spellings for future use as keywords or as
+vendor extensions. Some languages do this:
+
+-   C reserves words starting with an underscore followed by a capital letter or
+    an underscore.
+-   C++ additionally reserves words containing a double underscore anywhere.
+-   Python uses the `__name__` namespace for certain special names, and by
+    convention these names are reserved for that purpose.
+
+In Carbon we could accomplish this by saying that all words of the reserved
+forms are keywords, with no meaning ascribed to them yet.
+
+However, we do not have a clear need for such reserved words at this time, and
+we would not want to use such spellings when we do add language keywords later.
+Moreover, C++ programs frequently declare reserved words in practice, and we
+should expect the same in Carbon. Without enforcement, the names are not
+effectively reserved.
+
+If we find a need at a later time to introduce vendor-specific language
+extension keywords, we can revisit this, but should also consider alternatives
+such as a `k#foo` spelling to turn what is normally an identifier into a
+(potentially vendor-specific) keyword.

+ 1 - 1
utils/textmate/Syntaxes/carbon.tmLanguage.json

@@ -223,7 +223,7 @@
       "patterns": [
       "patterns": [
         {
         {
           "name": "support.class.carbon",
           "name": "support.class.carbon",
-          "match": "(?<=\\bpackage\\s)\\w+"
+          "match": "(?<=\\b(package|Core)\\s)\\w+"
         },
         },
         {
         {
           "name": "support.variable.carbon",
           "name": "support.variable.carbon",

+ 2 - 0
utils/tree_sitter/queries/highlights.scm

@@ -89,6 +89,7 @@
   "auto"
   "auto"
   "base"
   "base"
   "break"
   "break"
+  ; "Core"
   "case"
   "case"
   "choice"
   "choice"
   "class"
   "class"
@@ -126,6 +127,7 @@
   "return"
   "return"
   "returned"
   "returned"
   "Self"
   "Self"
+  ; "self"
   "template"
   "template"
   "then"
   "then"
   "type"
   "type"