Bladeren bron

Add `var <type> <identifier> [ = <value> ];` syntax for variables (#339)

Co-authored-by: Geoff Romer <gromer@google.com>
Co-authored-by: austern <austern@google.com>
Jon Meow 5 jaren geleden
bovenliggende
commit
9d1863efb7
2 gewijzigde bestanden met toevoegingen van 541 en 0 verwijderingen
  1. 1 0
      proposals/README.md
  2. 540 0
      proposals/p0339.md

+ 1 - 0
proposals/README.md

@@ -46,6 +46,7 @@ request:
 -   [0253 - 2021 Roadmap](p0253.md)
 -   [0285 - if/else](p0285.md)
 -   [0301 - Principle: Errors are values](p0301.md)
+-   [0339 - `var` statement](p0339.md)
 -   [0340 - while loops](p0340.md)
 -   [0426 - Governance & evolution revamp](p0426.md)
 -   [0444 - GitHub Discussions](p0444.md)

+ 540 - 0
proposals/p0339.md

@@ -0,0 +1,540 @@
+# `var` statement
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/339)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Problem](#problem)
+    -   [Constants](#constants)
+-   [Background](#background)
+    -   [Terminology](#terminology)
+    -   [Out of scope features](#out-of-scope-features)
+    -   [Language comparison](#language-comparison)
+-   [Proposal](#proposal)
+    -   [Executable semantics form](#executable-semantics-form)
+-   [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
+-   [Caveats](#caveats)
+    -   [`var` name may change](#var-name-may-change)
+        -   [Changing to `let mut`](#changing-to-let-mut)
+    -   [Multiple identifiers in one statement](#multiple-identifiers-in-one-statement)
+    -   [Update provisional pattern matching syntax](#update-provisional-pattern-matching-syntax)
+    -   [Update provisional `$` syntax](#update-provisional--syntax)
+-   [Alternatives considered](#alternatives-considered)
+    -   [No `var` introducer keyword](#no-var-introducer-keyword)
+    -   [Name of the `var` statement introducer](#name-of-the-var-statement-introducer)
+    -   [Colon between type and identifier](#colon-between-type-and-identifier)
+        -   [Syntax ambiguity](#syntax-ambiguity)
+        -   [Confusion with other languages and alternatives](#confusion-with-other-languages-and-alternatives)
+        -   [Use in pattern matching](#use-in-pattern-matching)
+        -   [Advantages and disadvantages](#advantages-and-disadvantages)
+        -   [Conclusion](#conclusion)
+    -   [Type after identifier](#type-after-identifier)
+        -   [Ordering as a way to quickly answer questions](#ordering-as-a-way-to-quickly-answer-questions)
+        -   [Syntax popularity](#syntax-popularity)
+    -   [Type elision](#type-elision)
+        -   [Advantages and disadvantages](#advantages-and-disadvantages-1)
+        -   [Conclusion](#conclusion-1)
+
+<!-- tocstop -->
+
+## Problem
+
+The `var` statement is noted in the
+[language overview](https://github.com/carbon-language/carbon-lang/tree/trunk/docs/design#ifelse),
+but is provisional — no justification has been provided. Variable declarations
+are fundamental, and it should be clear to what degree the current syntax is
+adopted.
+
+It's expected that after the adoption of this proposal, `var` syntax will still
+not be finalized: the [proposal](#proposal) is an experiment.
+
+### Constants
+
+Although constants are naturally related to variables, this proposal does not
+include any syntax for constants. This is expected to be revisited later.
+
+## Background
+
+### Terminology
+
+In this proposal, "variable" is defined as an identifier referring to a mutable
+value.
+
+### Out of scope features
+
+Questions have come up about:
+
+-   The type system
+-   Type checking
+-   Scoping
+
+All of these are important features. However, in the interest of small
+proposals, they are out of scope of this proposal.
+
+### Language comparison
+
+Variables are standard in many languages. Some various forms to consider are:
+
+-   C++:
+
+    ```cc
+    int x;
+    int y = 0;
+    bool a = true, *b = nullptr, c;
+    ```
+
+-   Python:
+
+    ```python
+    x = None
+    y = 0
+    z: int = 7  # Added by PEP 526.
+    ```
+
+-   Swift:
+
+    ```swift
+    var x = 0
+    var y: Int = 0
+    var z: Int
+    ```
+
+-   TypeScript
+
+    ```ts
+    let y: Number = 0;
+    var x = 0;  # Legacy from JavaScript.
+    ```
+
+-   Rust
+
+    ```rust
+    let mut x = 0;
+    let mut y: i32 = 0;
+    let mut z: i32;
+    ```
+
+-   Go
+
+    ```go
+    var x = 0
+    y := 0
+    var z int
+    var a, b = true, false
+    ```
+
+-   Visual Basic
+
+    ```vb
+    Dim x As Integer = 3
+    ```
+
+## Proposal
+
+Carbon should adopt `var <type> <identifier> [ = <value> ];` syntax for variable
+statements.
+
+Considerations for this syntax are:
+
+-   **Type and identifier ordering:** The ordering of `<type>` before
+    `<identifier>` reflects the typical C++ ordering syntax.
+    -   Some C++ syntax can put type information after the identifier, such as
+        `int x[6];`. Carbon should be expected to place that as part of the
+        type.
+-   **`var` introducer keyword:** The use of `var` makes it clearer for readers
+    to skim and see where variables are being declared. It also reduces
+    complexity and potential ambiguity in language parsing.
+-   **One variable:** In C++, multiple variables can be declared in a single
+    statement. An equivalent Carbon syntax may end up looking like
+    `var (Int x, String y) = (0, "foo");`, so limiting to one declaration is not
+    fundamentally restrictive. However, by breaking with C++ and requiring the
+    full type to be specified with each identifier, we achieve two important
+    things:
+    -   It's clear what the full type is, preventing difficult-to-read
+        statements with a mix of stack variables, pointers, and similar.
+    -   As the language grows, a function returning a tuple may be assigned to
+        distinctly named and typed variables.
+
+**Experiment:** The ordering of type and identifier will be researched. For more
+information, see the [alternative](#type-after-identifier).
+
+### Executable semantics form
+
+Example bison syntax for executable semantics is:
+
+```bison
+statement:
+  "var" expression identifier optional_assignment;
+| /* pre-existing statements elided */
+;
+
+optional_assignment:
+  /* empty */
+| "=" expression
+;
+```
+
+## Rationale based on Carbon's goals
+
+Carbon needs variables in order to be writable by developers. That functionality
+needs a syntax.
+
+Relevant goals are:
+
+-   [3. Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write):
+
+    -   Adding a keyword makes it easy for developers to visually identify
+        functions.
+
+-   [5. Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development):
+
+    -   The addition of a keyword should make parsing easy.
+
+-   [7. Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code):
+
+    -   Keeping syntax close to C++ will make it easier for developers to
+        transition.
+
+## Caveats
+
+### `var` name may change
+
+The name `var` could still change. However, it's used with similar meaning in
+other languages including Swift, Go, and TypeScript, and so it's reasonable to
+expect it will not.
+
+#### Changing to `let mut`
+
+The idea that `var` may change includes the possibility that `var` may become
+something like `let mut` in Rust. However, this is not assumed by this proposal:
+
+-   This proposal [omits constant syntax](#constants).
+-   It would need to be considered whether the syntax tax of `let mut`
+    appropriately
+    [focuses on encouraging appropriate usage of features rather than restricting misuse](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)).
+-   Lower verbosity syntax for variables is more
+    [consistent with C++](c-as-baseline), even if constants are made less
+    verbose by way of `let`.
+
+### Multiple identifiers in one statement
+
+Although `var (Int x, String y) = (0, "foo");` syntax is mentioned, this
+proposal is not intended to propose such a syntax. It's noted primarily to
+explain the likely path, that this does not _rule out_ abbreviated syntax such
+as that. That should probably be covered as part of tuples.
+
+### Update provisional pattern matching syntax
+
+Pattern matching syntax in [the overview](/docs/design/README.md) uses syntax
+similar to `Int: x`. As part of removing the
+[colon between type and identifier](#colon-between-type-and-identifier) from the
+provisional `var` syntax, that syntax should be changed to remove the `:`.
+Details should be resolved as part of the eventual pattern matching proposal,
+but if changes are needed to add a separator, the `var` syntax should be updated
+to remain consistent. The precise form of that implementation will be part of
+normal Carbon evolution.
+
+For example, replacing `fn Sum(Int: a, Int: b) -> Int;` with
+`fn Sum(Int a, Int b) -> Int;` and
+`case (Int: p, (Float: x, Float: _)) if (p < 13) => {` with
+`case (Int p, (Float x, Float _)) if (p < 13) => {`.
+
+### Update provisional `$` syntax
+
+Variables using `Type:$` and similar should drop the `:`, as in `Type$`.
+
+## Alternatives considered
+
+Noted alternatives are key differences from C++ variable declaration syntax.
+
+### No `var` introducer keyword
+
+The intent of the `var` statement is to improve readability and parsability, and
+it's related to `fn` for functions. Although code is more succinct without
+introducers, the noted benefits are expected to be significant. Most other
+modern languages use similar introducers, and so this break from C++ is adopting
+a different norm.
+
+### Name of the `var` statement introducer
+
+`var` is used with a similar meaning in several other languages, including
+Swift, Go, and JavaScript. `let` is used by TypeScript. `let mut` is used by
+Rust, with `let` used for constants (this use of `let` alone is consistent with
+other languages). In general `var` appears to be a more common choice.
+
+### Colon between type and identifier
+
+The use of a colon (`:`) between the type and identifier is intended to reduce
+potential parsing ambiguity, and to make reading code easier. As proposed, there
+is no colon between the type and identifier.
+
+#### Syntax ambiguity
+
+Using a colon or other separator could make it easier to avoid certain kinds of
+ambiguities. For example, suppose we decided to use a postfix `*` operator to
+form pointer types, as in C++. In such a setup, we could have code like the
+following:
+
+```carbon
+var T * x = 3;
+var T * x = 3 y;
+```
+
+In the first statement, `*` is a unary operator and so `T*` is the type and `x`
+is the identifier. However, in the second statement, `*` is a binary operator
+and so `T * x = 3` is the type, and `y` is the identifier; the resulting
+compiler errors may be confusing to users. Furthermore, the place of `3` could
+be taken by an an arbitrarily complex expression; this could cause resolving the
+ambiguity between unary and binary `*` to require unbounded look-ahead,
+adversely impacting
+[code compilation time goals](/docs/project/goals.html#fast-and-scalable-development).
+
+Consider instead the code:
+
+```carbon
+var T *: x = 3;
+var T * x = 3: y;
+```
+
+The colon makes it unambiguous whether the `*` in each case is unary or binary
+with only one token of look-ahead. More importantly, this syntax immediately
+calls the reader's attention to the fact that the second declaration has a
+highly unusual type.
+
+There are other ways of resolving ambiguities like this. For example, we could
+avoid allowing the same operator to have both postfix and infix forms, or we
+could distinguish them by the presence or absence of whitespace. However, even
+if we avoid _formal_ ambiguity by such means, a separator like `:` may be useful
+for reducing _visual_ ambiguity for human readers.
+
+#### Confusion with other languages and alternatives
+
+One of the disadvantages of `:` is that with `var Int: x`, ordering is
+inconsistent with other languages using `:`, such as Rust and Swift, which would
+say `var x: Int`.
+
+It may be worth considering other syntax options. A few to consider are:
+
+```carbon
+var(Int) x;
+var Int# x;
+var Int @x;
+var Int -> x;
+```
+
+These aren't part of the proposed syntax mainly because it's not clear any would
+gain as much support as `:`. However, this is an opportunity to make suggestions
+and see if there's a good compromise.
+
+#### Use in pattern matching
+
+The
+[old draft pattern matching proposal](https://github.com/carbon-language/carbon-lang/pull/87)
+used `:` as a separator. In pattern matching, the `:` may be particularly
+important to distinguish between value matching and type name matching. However,
+the pattern matching proposal should examine these choices and alternatives
+before we reach a conclusion that `:` is necessary for pattern matching. Per
+[syntax ambiguity](#syntax-ambiguity), it is expected that `:` has some
+advantages, but may not turn out to make a compiler difference due to prevailing
+constraints on type expression syntax.
+
+This proposal suggests we
+[update provisional pattern matching syntax](#update-provisional-pattern-matching-syntax)
+to match the proposed `var` syntax.
+
+#### Advantages and disadvantages
+
+Advantages:
+
+-   Reduces syntax ambiguity.
+    -   This should improve readability and parsability.
+    -   It should make it easier to debug issues during development.
+
+Disadvantages:
+
+-   Deviates from the common syntax used by most languages with the type before
+    the identifier, including C, C++, Java, and C#.
+    -   Changing from C++ is especially significant because of Carbon's goals
+        for interoperability and migration which will mean an especially large
+        portion of Carbon developers will be actively reading both Carbon and
+        C++ code.
+-   Other notable languages that us `:` in variable statements, including Swift,
+    put the type after the identifier.
+    -   It may be worth considering
+        [alternatives to `:`](#confusion-with-other-languages-and-alternatives).
+    -   If the alternative of [type after identifier](#type-after-identifier) is
+        adopted, it's likely a `:` separator will be adopted.
+
+#### Conclusion
+
+Right now the proposal is to not have anything between the type and identifier
+in order to avoid cross-language ambiguity, and to retain syntax that is closer
+to C++. However, the ultimate decision may hinge on type and identifier
+ordering, as well as related future evolution.
+
+### Type after identifier
+
+There are many languages that put the type after the identifier. A common format
+used by Swift and Rust is `var x: Int`.
+
+It's worth considering the sentence-like readings:
+
+-   `var x Int` (or `var x: Int`) may be read as "declare x as an int" or "make
+    a variable x and give it int storage".
+-   `var Int x` may be read as "declare an int called x" or "make a variable
+    with int storage called x".
+
+These readings might be of similar quality, and are presented to offer different
+perspectives on how to read the possible statement orderings.
+
+#### Ordering as a way to quickly answer questions
+
+Ordering is essentially a question of pairing identifiers and types. This can be
+cast as asking which question developers consider more important when reading
+code:
+
+1. What is the type of variable `x`?
+2. What is the identifier of the `Int` variable?
+
+We assert the first question is the more important one: developers will see an
+identifier in later code, and want to know its type. However, how do we
+determine which order is better for this purpose?
+
+Unfortunately, little research has been done on this. All we're aware of right
+now is a study from an unpublished undergraduate project from Germany. The study
+was done in Java with 50 students. Its data indicates that it's faster to answer
+question 1 if the type comes first, and faster to answer question 2 if the
+identifier comes first. We do not want to make decisions based on the study
+because it isn't published, studied a small group, and doesn't directly compare
+possible `var` syntaxes; however, it still influences our thoughts.
+
+#### Syntax popularity
+
+When considering what to use for now, we can consider the popularity of various
+languages. The top 10 on several sources (with percentages noted by sources that
+have them) are:
+
+| TIOBE        | Pct | GitHut     | Pct | PYPL        | Pct | Octoverse  |
+| ------------ | --: | ---------- | --: | ----------- | --: | ---------- |
+| C            | 16% | JavaScript | 19% | Python      | 30% | JavaScript |
+| Java         | 11% | Python     | 16% | Java        | 17% | Python     |
+| Python       | 11% | Java       | 11% | JavaScript  |  8% | Java       |
+| C++          |  7% | Go         |  8% | C#          |  7% | TypeScript |
+| C#           |  4% | C++        |  7% | C/C++       |  7% | C#         |
+| Visual Basic |  4% | Ruby       |  7% | PHP         |  6% | PHP        |
+| JavaScript   |  2% | TypeScript |  7% | R           |  4% | C++        |
+| PHP          |  2% | PHP        |  6% | Objective-C |  4% | C          |
+| SQL          |  2% | C#         |  4% | Swift       |  2% | Shell      |
+| Assembly     |  2% | C          |  3% | TypeScript  |  2% | Ruby       |
+
+Sources:
+
+-   [TIOBE](https://www.tiobe.com/tiobe-index/) as of 2021-02
+-   [GitHut](https://madnight.github.io/githut/) as of 2020 Q4
+-   [PYPL](https://pypl.github.io/PYPL.html) as of 2021-02
+-   [Octoverse](https://octoverse.github.com/) as of 2020-10
+
+For these languages:
+
+-   C, C++, C#, Objective-C, and Java put the identifier after the type.
+    -   These use `<type> <identifier>`, with no keyword.
+-   Python, Go, TypeScript, Visual Basic, SQL, and Swift put the type after the
+    identifier.
+    -   Python uses `<identifier>: <type>`, with no keyword. This was added in
+        [Python 3.6](https://www.python.org/dev/peps/pep-0526/), and reflects
+        language evolution.
+    -   Go uses `var <identifier> <type>`, with no colon.
+    -   TypeScript and Swift use `var <identifier>: <type>`.
+    -   Visual Basic uses `Dim <identifier> as <type>`.
+    -   SQL uses `DECLARE @<identifier> AS <type>`, where `AS` is optional.
+-   JavaScript, R, Ruby, PHP, Shell, and Assembly language do not specify types
+    in the same ways as other languages.
+
+### Type elision
+
+First, with Carbon it's been discussed to require use of `auto` (or a similar
+explicit syntax marker) instead of allowing developers to elide the type
+entirely from a `var` statement. In other words, while `var Int x = 0;` is
+valid, and `var auto x = 0;` is equivalent, there is no form such as
+`var x = 0;` which removes the type entirely.
+
+Most languages that write `var x: Int` also allow eliding the type when
+assigning a value. For example, Go allows `x := 0` and Swift allows
+`var x = 0;`. As a result, there is no need for an `auto` keyword.
+
+This would be more surprising with `var Int x` syntax because removing the `Int`
+now places the identifier immediately after `var`, where the type normally is.
+This may be subtly confusing to developers. However, if `auto` is required for
+explicitness, the issue is moot.
+
+Retaining `auto` does not eliminate the need to consider type elision as part of
+advantages and disadvantages: if the type is put after the identifier and
+`var x: auto` syntax is used, it now becomes an inconsistency with other
+languages. This inconsistency would be a Carbon innovation that may confuse
+developers, leading to long-term pressure to remove `auto` for consistency with
+similar languages, and thus a disadvantage.
+
+#### Advantages and disadvantages
+
+Advantages:
+
+-   Consistent with languages that put a `:` in the type declaration.
+    -   Notable languages using `x: Int` syntax include Swift, Rust, Kotlin and
+        TypeScript. Go is similar but does not include `:`.
+    -   Swift puts
+        [argument labels](https://docs.swift.org/swift-book/LanguageGuide/Functions.html#ID166)
+        before variable declaration. Keeping the identifier first allows
+        consistency with Swift's argument label syntax while also keeping names
+        adjacent.
+-   More opportunity to unify a concept of putting the identifier immediately
+    after the keyword.
+    -   `fn`, `import`, and `package` will all have the identifier immediately
+        after the keyword, with non-identifier content following.
+        -   It's expected that a typical function declaration will look like
+            `fn Foo(Int bar)`, with `var` only added when a storage for a copy
+            is _required_. Thus, this advantage is primarily about the function
+            identifier `Foo`, not parameter identifiers.
+    -   Other cases, such as `alias`, are likely flexible and could follow `var`
+        in concept by putting the resulting name at the end. That is,
+        `alias To = From;` for consistency with `var x: Int;` versus
+        `alias From as To;` similar to `var Int x;` ordering.
+
+Disadvantages:
+
+-   If Carbon doesn't support [type elision](#type-elision), it creates an
+    inconsistency with other notable languages using `x: Int` syntax.
+    -   It may be worth considering
+        [alternatives to `:`](#confusion-with-other-languages-and-alternatives).
+-   The ordering of `x: Int` is inconsistent with C++ variable syntax.
+    -   Negatively affects
+        [Familiarity for experienced C++ developers with a gentle learning curve](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code).
+    -   It is consistent with other parts of C++ syntax, particularly
+        `using To = From;`, although not `typedef From To;`.
+-   Early signs are that putting the identifier first makes it slower for
+    developers to answer the question, "what is the type of variable `x`?"
+    -   This indicates worse readability in spite of the sentence ordering, the
+        essential part of
+        [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write).
+-   Popular languages tend to use `int x` syntax, including C, Java, C++, and
+    C#. Other notable languages include Groovy and Dart.
+
+#### Conclusion
+
+We should conduct a larger study on the topic of type and identifier ordering
+and syntax. Until then, we should adopt C++-like syntax. This meets the
+migration sub-goal
+
+[Familiarity for experienced C++ developers with a gentle learning curve](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code),
+and allows applying the higher-priority goal
+[Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+if supporting evidence is found.
+
+**Experiment**: The ordering of type and identifier will be researched.