소스 검색

No predeclared identifiers, `Core` is a keyword (#4864)

Introduce a principle that the Carbon language should not encroach on
the
developer's namespace. Satisfy this principle by making `Core` a
keyword.

---------

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
Richard Smith 1 년 전
부모
커밋
e257051612

+ 4 - 1
docs/design/code_and_name_organization/README.md

@@ -546,6 +546,7 @@ the caller.
 
 ```regex
 import IDENTIFIER (library NAME_PATH)?;
+import Core (library NAME_PATH)?;
 import library NAME_PATH;
 import library default;
 ```
@@ -554,7 +555,9 @@ An import with a package name `IDENTIFIER` declares a package entity named after
 the imported package, and makes API entities from the imported library available
 through it. `Main` cannot be imported from other packages; in other words, only
 `import library NAME_PATH` syntax can be used to import from `Main`. Imports of
-`Main//default` are invalid.
+`Main//default` are invalid. The keyword `Core` can be used as a package name in
+an import in order to import portions of the standard library that are not part
+of the prelude.
 
 The full name path is a concatenation of the names of the package entity, any
 namespace entities applied, and the final entity addressed. Child namespaces or

+ 3 - 1
docs/design/lexical_conventions/words.md

@@ -39,7 +39,7 @@ in Unicode Normalization Form C (NFC).
 
 <!--
 Keep in sync:
-- utils/textmate/Syntaxes/Carbon.plist
+- utils/textmate/Syntaxes/carbom.tmLanguage.json
 - utils/tree_sitter/queries/highlights.scm
 -->
 
@@ -54,6 +54,7 @@ The following words are interpreted as keywords:
 -   `auto`
 -   `base`
 -   `break`
+-   `Core`
 -   `case`
 -   `choice`
 -   `class`
@@ -92,6 +93,7 @@ The following words are interpreted as keywords:
 -   `return`
 -   `returned`
 -   `Self`
+-   `self`
 -   `template`
 -   `then`
 -   `type`

+ 122 - 0
docs/project/principles/namespace_cleanliness.md

@@ -0,0 +1,122 @@
+# Principle: Namespace cleanliness
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Background](#background)
+-   [Principle](#principle)
+-   [Applications of this principle](#applications-of-this-principle)
+-   [Exceptions](#exceptions)
+-   [Alternatives considered](#alternatives-considered)
+
+<!-- tocstop -->
+
+## Background
+
+Names and entities in a program can come from multiple sources -- from a local
+declaration, from an import, from the standard library, or from the prelude.
+Names can be imported from Carbon code or imported or derived from code written
+in another language such as C++ or an interface description language such as
+that of Protobuf, MIDL, or CORBA. Names can be selected for use in a program
+that language designers later decide they want to use as keywords. And in order
+to use a library, it is sometimes necessary to redeclare the same name that that
+library chose.
+
+This puts a lot of pressure on the language to support a free choice of naming
+for entities. Different languages make different choices in this space:
+
+-   Many languages have a set of keywords that are not usable as identifiers,
+    with no workaround. If this set collides with a name needed by user code,
+    the user is left to solve this problem, often by rewriting the identifier in
+    some way (`klass` or `class_`), which sometimes conflicts with the general
+    naming convention used by the code. And conversely, suboptimal choices are
+    made for new language keywords to avoid causing problems for existing code.
+-   C and C++ reserve a family of identifiers, such as those beginning with an
+    underscore and a capital letter. However, it's not clear which audiences the
+    reserved identifiers are for, and this leads to collisions between standard
+    library vendors and compiler authors, as well as between implementation
+    extensions and language extensions.
+    -   MSVC provides a `__identifier(keyword)` extension that allows using a
+        keyword as an identifier. This extension is also implemented by Clang in
+        `-fms-extensions` mode.
+    -   GCC provides an `__asm__(symbol)` extension that allows a specific
+        symbol to be assigned to an object or function, which provides ABI
+        compatibility but not source compatibility with code that uses a keyword
+        as a symbol name. This extension is also implemented by Clang.
+-   Python reserves some identifiers but still allows them to be freely
+    overwritten (such as `bool`) and reserves some identifiers but rejects
+    assignment to them (such as `True`).
+-   Rust provides a raw identifier syntax to allow most identifiers with
+    reserved meaning to be used by a program, but
+    [not all](https://internals.rust-lang.org/t/raw-identifiers-dont-work-for-all-identifiers/9094):
+    `self`, `Self`, `super`, `extern`, and `crate` cannot be used as raw
+    identifiers. Rust also predeclares a large number of library names in every
+    file, but allows them to be shadowed by user declarations with the same
+    name.
+-   Swift provides a raw identifier syntax using backticks: `` `class` ``, and
+    is
+    [considering](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0451-escaped-identifiers.md)
+    extending this to allow arbitrary non-word-shaped character sequences
+    between the `` ` ``s.
+
+Carbon provides
+[raw identifier syntax](/docs/design/lexical_conventions/words.md#raw-identifiers),
+for example `r#for`, to allow using keywords as identifiers. Carbon also intends
+to have strict shadowing rules that may make predeclared identifiers that are
+_not_ keywords difficult or impossible to redeclare and use in inner scopes.
+
+## Principle
+
+In Carbon, the language does not encroach on the developer's namespace. There
+are no predeclared or reserved identifiers. In cases where the language gives
+special meaning to a word or to word-shaped syntax such as `i32`, that special
+meaning can always be undone with raw identifier syntax, `r#`.
+
+Conversely, when adding language keywords, we will not select an inferior
+keyword merely to avoid the risk of breaking existing programs. We will still
+take into account how often it is desirable to use the word as an identifier,
+including in domain-specific contexts, because that is a factor in whether it
+would make a good keyword, and will manage the rollout of new keywords to make
+it straightforward to migrate existing uses to `r#` or a different name.
+
+## Applications of this principle
+
+-   Words like `final` and `base` that only have special meaning in a few
+    contexts, and could otherwise be made available as identifiers, are keywords
+    in Carbon. `{.base = ...}` and `{.r#base = ...}` specify different member
+    names.
+-   Words like `self` that are declared by the developer but nonetheless have
+    special language-recognized meaning are keywords in Carbon. `[self:! Self]`
+    introduces a self parameter; `[r#self:! Self]` introduces a deduced
+    parameter.
+-   Words like `Self` that are implicitly declared by the language in some
+    contexts are keywords, even though we could treat them as user-declared
+    identifiers that are merely implicitly declared in some cases.
+-   Words like `i32` that are treated as type literals rather than keywords can
+    be used as identifiers with raw identifier syntax `r#i32`.
+-   There are no predeclared identifiers imported from the prelude. If an entity
+    is important enough to be available by default, we should add a keyword, and
+    allow the name of the entity to be used for other purposes with `r#`.
+-   The predeclared package name `Core` is a keyword. A package named `r#Core`
+    is an unrelated package, and `Core.foo` always refers to members of the
+    predeclared `Core` package.
+
+## Exceptions
+
+For now, we reserve the package names `Main` and `Cpp`. These names aren't
+predeclared in any scope, and the name `Main` is not even usable from within
+source files to refer to the main package. However, there is currently no way to
+avoid collisions between the package name `Cpp` and a top-level entity named
+`Cpp` if they are both used in the same source file.
+
+## Alternatives considered
+
+-   [Have both predeclared identifiers and keywords](/proposals/p4864.md#have-both-predeclared-identifiers-and-keywords)
+-   [Reserve words with a certain spelling](/proposals/p4864.md#reserve-words-with-a-certain-spelling)

+ 177 - 0
proposals/p4864.md

@@ -0,0 +1,177 @@
+# No predeclared identifiers, `Core` is a keyword
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/4864)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+-   [Rationale](#rationale)
+-   [Future work](#future-work)
+    -   [Package name `Cpp`](#package-name-cpp)
+    -   [Package name `Main`](#package-name-main)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Have both predeclared identifiers and keywords](#have-both-predeclared-identifiers-and-keywords)
+    -   [Reserve words with a certain spelling](#reserve-words-with-a-certain-spelling)
+
+<!-- tocstop -->
+
+## Abstract
+
+Introduce a principle that the Carbon language should not encroach on the
+developer's namespace. Satisfy this principle by making `Core` a keyword.
+
+## Problem
+
+Ongoing design work needs rules for how to expose types such as a primitive
+array type to Carbon code, and in particular, if we choose to make it available
+by default, whether that should be accomplished by a keyword or a predeclared
+identifier.
+
+## Background
+
+See the
+[Background section of the added principle](/docs/project/principles/namespace_cleanliness.md#background).
+
+## Proposal
+
+We choose to not have any predeclared identifiers in Carbon. If a word has
+special meaning to the language, then that word is a keyword, and a plain
+identifier with no special meaning is always available using raw identifier
+syntax.
+
+## Details
+
+See [the principle document](/docs/project/principles/namespace_cleanliness.md)
+for details of the added principle. In addition, we make one change and one
+clarification:
+
+-   `Core` is changed from being an identifier that happens to be the name of
+    the Carbon standard library, and happens to be predeclared in every source
+    file as naming that library, to being a keyword. The keyword can only be
+    used:
+
+    -   When importing the `Core` package.
+    -   When implementing the `Core` package as part of the language
+        implementation.
+    -   As a keyword naming the `Core` package, much like the `package` keyword.
+
+    The identifier `r#Core` can be used freely and does not conflict with the
+    keyword. This includes use of `r#Core` as the name of a package. Language
+    constructs that are defined in terms of entities in the `Core` package refer
+    specifically to the package named with the _keyword_ `Core`, not to any
+    other entity named `Core`.
+
+-   The `self` keyword is now included in the list of keywords. It is already
+    treated as a keyword by the toolchain.
+
+## Rationale
+
+-   [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
+    -   Code generation tools can have a uniform handling for all words with
+        special meaning, with no need to alter the spelling of names from other
+        languages.
+    -   Language tools can determine the meaning of `Core.<name>` without
+        needing to do any name lookup or sophisticated analysis.
+-   [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
+    -   Migration between versions of Carbon with a changed set of reserved
+        words can be done uniformly.
+    -   Adding names to the prelude remains a non-breaking change. Adding new
+        predeclared names requires adding a keyword, with the same cost and
+        value tradeoffs regardless of whether the keyword names a library
+        declaration or introduces new language syntax.
+-   [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+    -   Syntax highlighting tools can easily distinguish between words with
+        special meaning and words with program-defined meaning.
+    -   The meaning of core language constructs can be defined as a rewrite in
+        terms of `Core.<name>` without concern that `Core` may have some
+        different local interpretation.
+-   [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
+    -   All C++ identifiers are nameable from Carbon code without conflicts.
+        Virtual functions introduced in Carbon can be overridden in Carbon
+        regardless of their name. C++ code can be migrated to Carbon even if its
+        name in C++ has special meaning in Carbon.
+-   [Principle: Prefer providing only one way to do a given thing](/docs/project/principles/one_way.md)
+    -   This proposal specifies that there is only one way to give words special
+        meaning in Carbon, and one way to resolve issues if that special meaning
+        conflicts with another desired meaning.
+
+## Future work
+
+### Package name `Cpp`
+
+The special package name `Cpp` that refers to code written in C++ is not made a
+keyword by this proposal, but this proposal is also not deciding that it should
+_not_ be a keyword. While this name has special meaning to the language, it's
+not predeclared in any context, so it's considered to be out of scope. A future
+proposal that describes the details of C++ import should determine whether this
+name becomes a keyword. Notably, making `Cpp` a keyword would also allow an
+`import Cpp` declaration to have custom syntax, which may be useful.
+
+### Package name `Main`
+
+The special package name `Main` that is currently reserved in all package name
+contexts is not made a keyword in this proposal either. There would be no
+meaning in making it a keyword, as it is never used as a special package name in
+Carbon source files. However, we could consider using an empty package name as
+the name of the main package, and unreserving the package name `Main`, if it
+becomes a concern that we reserve this name.
+
+## Alternatives considered
+
+### Have both predeclared identifiers and keywords
+
+We could provide both predeclared identifiers and keywords. Many languages
+follow this path. However, predeclared identifiers have some problems compared
+to keywords:
+
+-   In order to locally declare a name matching a predeclared identifier, the
+    name would need to be shadowed.
+    -   Such shadowing may be invalid, depending on how the name is used.
+    -   Readability is harmed by using a name used as basic vocabulary with a
+        different, local meaning.
+    -   Shadowing a predeclared identifier typically makes the original name
+        hard to access -- an alias or similar must be established in advance.
+-   There need to be two different stories for how to deal with adding a new
+    word with special meaning to the language, depending on whether it is a
+    keyword.
+-   For each word with special meaning, we must make an arbitrary decision as to
+    which kind it is, resulting in a largely meaningless distinction that
+    nonetheless is visible and would need to be known by developers in some
+    contexts.
+
+### Reserve words with a certain spelling
+
+We could reserve words with certain spellings for future use as keywords or as
+vendor extensions. Some languages do this:
+
+-   C reserves words starting with an underscore followed by a capital letter or
+    an underscore.
+-   C++ additionally reserves words containing a double underscore anywhere.
+-   Python uses the `__name__` namespace for certain special names, and by
+    convention these names are reserved for that purpose.
+
+In Carbon we could accomplish this by saying that all words of the reserved
+forms are keywords, with no meaning ascribed to them yet.
+
+However, we do not have a clear need for such reserved words at this time, and
+we would not want to use such spellings when we do add language keywords later.
+Moreover, C++ programs frequently declare reserved words in practice, and we
+should expect the same in Carbon. Without enforcement, the names are not
+effectively reserved.
+
+If we find a need at a later time to introduce vendor-specific language
+extension keywords, we can revisit this, but should also consider alternatives
+such as a `k#foo` spelling to turn what is normally an identifier into a
+(potentially vendor-specific) keyword.

+ 1 - 1
utils/textmate/Syntaxes/carbon.tmLanguage.json

@@ -223,7 +223,7 @@
       "patterns": [
         {
           "name": "support.class.carbon",
-          "match": "(?<=\\bpackage\\s)\\w+"
+          "match": "(?<=\\b(package|Core)\\s)\\w+"
         },
         {
           "name": "support.variable.carbon",

+ 2 - 0
utils/tree_sitter/queries/highlights.scm

@@ -89,6 +89,7 @@
   "auto"
   "base"
   "break"
+  ; "Core"
   "case"
   "choice"
   "class"
@@ -126,6 +127,7 @@
   "return"
   "returned"
   "Self"
+  ; "self"
   "template"
   "then"
   "type"