# `var` statement [Pull request](https://github.com/carbon-language/carbon-lang/pull/339) ## Table of contents - [Problem](#problem) - [Constants](#constants) - [Background](#background) - [Terminology](#terminology) - [Out of scope features](#out-of-scope-features) - [Language comparison](#language-comparison) - [Proposal](#proposal) - [Executable semantics form](#executable-semantics-form) - [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) - [Caveats](#caveats) - [`var` name may change](#var-name-may-change) - [Changing to `let mut`](#changing-to-let-mut) - [Multiple identifiers in one statement](#multiple-identifiers-in-one-statement) - [Update provisional pattern matching syntax](#update-provisional-pattern-matching-syntax) - [Update provisional `$` syntax](#update-provisional--syntax) - [Alternatives considered](#alternatives-considered) - [No `var` introducer keyword](#no-var-introducer-keyword) - [Name of the `var` statement introducer](#name-of-the-var-statement-introducer) - [Colon between type and identifier](#colon-between-type-and-identifier) - [Syntax ambiguity](#syntax-ambiguity) - [Confusion with other languages and alternatives](#confusion-with-other-languages-and-alternatives) - [Use in pattern matching](#use-in-pattern-matching) - [Advantages and disadvantages](#advantages-and-disadvantages) - [Conclusion](#conclusion) - [Type after identifier](#type-after-identifier) - [Ordering as a way to quickly answer questions](#ordering-as-a-way-to-quickly-answer-questions) - [Syntax popularity](#syntax-popularity) - [Type elision](#type-elision) - [Advantages and disadvantages](#advantages-and-disadvantages-1) - [Conclusion](#conclusion-1) ## Problem The `var` statement is noted in the [language overview](https://github.com/carbon-language/carbon-lang/tree/trunk/docs/design#ifelse), but is provisional — no justification has been provided. Variable declarations are fundamental, and it should be clear to what degree the current syntax is adopted. It's expected that after the adoption of this proposal, `var` syntax will still not be finalized: the [proposal](#proposal) is an experiment. ### Constants Although constants are naturally related to variables, this proposal does not include any syntax for constants. This is expected to be revisited later. ## Background ### Terminology In this proposal, "variable" is defined as an identifier referring to a mutable value. ### Out of scope features Questions have come up about: - The type system - Type checking - Scoping All of these are important features. However, in the interest of small proposals, they are out of scope of this proposal. ### Language comparison Variables are standard in many languages. Some various forms to consider are: - C++: ```c++ int x; int y = 0; bool a = true, *b = nullptr, c; ``` - Python: ```python x = None y = 0 z: int = 7 # Added by PEP 526. ``` - Swift: ```swift var x = 0 var y: Int = 0 var z: Int ``` - TypeScript ```ts let y: Number = 0; var x = 0; # Legacy from JavaScript. ``` - Rust ```rust let mut x = 0; let mut y: i32 = 0; let mut z: i32; ``` - Go ```go var x = 0 y := 0 var z int var a, b = true, false ``` - Visual Basic ```vb Dim x As Integer = 3 ``` ## Proposal Carbon should adopt `var [ = ];` syntax for variable statements. Considerations for this syntax are: - **Type and identifier ordering:** The ordering of `` before `` reflects the typical C++ ordering syntax. - Some C++ syntax can put type information after the identifier, such as `int x[6];`. Carbon should be expected to place that as part of the type. - **`var` introducer keyword:** The use of `var` makes it clearer for readers to skim and see where variables are being declared. It also reduces complexity and potential ambiguity in language parsing. - **One variable:** In C++, multiple variables can be declared in a single statement. An equivalent Carbon syntax may end up looking like `var (Int x, String y) = (0, "foo");`, so limiting to one declaration is not fundamentally restrictive. However, by breaking with C++ and requiring the full type to be specified with each identifier, we achieve two important things: - It's clear what the full type is, preventing difficult-to-read statements with a mix of stack variables, pointers, and similar. - As the language grows, a function returning a tuple may be assigned to distinctly named and typed variables. **Experiment:** The ordering of type and identifier will be researched. For more information, see the [alternative](#type-after-identifier). ### Executable semantics form Example bison syntax for executable semantics is: ```bison statement: "var" expression identifier optional_assignment; | /* preexisting statements elided */ ; optional_assignment: /* empty */ | "=" expression ; ``` ## Rationale based on Carbon's goals Carbon needs variables in order to be writable by developers. That functionality needs a syntax. Relevant goals are: - [3. Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write): - Adding a keyword makes it easy for developers to visually identify functions. - [5. Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development): - The addition of a keyword should make parsing easy. - [7. Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code): - Keeping syntax close to C++ will make it easier for developers to transition. ## Caveats ### `var` name may change The name `var` could still change. However, it's used with similar meaning in other languages including Swift, Go, and TypeScript, and so it's reasonable to expect it will not. #### Changing to `let mut` The idea that `var` may change includes the possibility that `var` may become something like `let mut` in Rust. However, this is not assumed by this proposal: - This proposal [omits constant syntax](#constants). - It would need to be considered whether the syntax tax of `let mut` appropriately [focuses on encouraging appropriate usage of features rather than restricting misuse](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)). - Lower verbosity syntax for variables is more [consistent with C++](p0285.md#c-as-baseline), even if constants are made less verbose by way of `let`. ### Multiple identifiers in one statement Although `var (Int x, String y) = (0, "foo");` syntax is mentioned, this proposal is not intended to propose such a syntax. It's noted primarily to explain the likely path, that this does not _rule out_ abbreviated syntax such as that. That should probably be covered as part of tuples. ### Update provisional pattern matching syntax Pattern matching syntax in [the overview](/docs/design/README.md) uses syntax similar to `Int: x`. As part of removing the [colon between type and identifier](#colon-between-type-and-identifier) from the provisional `var` syntax, that syntax should be changed to remove the `:`. Details should be resolved as part of the eventual pattern matching proposal, but if changes are needed to add a separator, the `var` syntax should be updated to remain consistent. The precise form of that implementation will be part of normal Carbon evolution. For example, replacing `fn Sum(Int: a, Int: b) -> Int;` with `fn Sum(Int a, Int b) -> Int;` and `case (Int: p, (Float: x, Float: _)) if (p < 13) => {` with `case (Int p, (Float x, Float _)) if (p < 13) => {`. ### Update provisional `$` syntax Variables using `Type:$` and similar should drop the `:`, as in `Type$`. ## Alternatives considered Noted alternatives are key differences from C++ variable declaration syntax. ### No `var` introducer keyword The intent of the `var` statement is to improve readability and parsability, and it's related to `fn` for functions. Although code is more succinct without introducers, the noted benefits are expected to be significant. Most other modern languages use similar introducers, and so this break from C++ is adopting a different norm. ### Name of the `var` statement introducer `var` is used with a similar meaning in several other languages, including Swift, Go, and JavaScript. `let` is used by TypeScript. `let mut` is used by Rust, with `let` used for constants (this use of `let` alone is consistent with other languages). In general `var` appears to be a more common choice. ### Colon between type and identifier The use of a colon (`:`) between the type and identifier is intended to reduce potential parsing ambiguity, and to make reading code easier. As proposed, there is no colon between the type and identifier. #### Syntax ambiguity Using a colon or other separator could make it easier to avoid certain kinds of ambiguities. For example, suppose we decided to use a postfix `*` operator to form pointer types, as in C++. In such a setup, we could have code like the following: ```carbon var T * x = 3; var T * x = 3 y; ``` In the first statement, `*` is a unary operator and so `T*` is the type and `x` is the identifier. However, in the second statement, `*` is a binary operator and so `T * x = 3` is the type, and `y` is the identifier; the resulting compiler errors may be confusing to users. Furthermore, the place of `3` could be taken by an arbitrarily complex expression; this could cause resolving the ambiguity between unary and binary `*` to require unbounded look-ahead, adversely impacting [code compilation time goals](/docs/project/goals.md#fast-and-scalable-development). Consider instead the code: ```carbon var T *: x = 3; var T * x = 3: y; ``` The colon makes it unambiguous whether the `*` in each case is unary or binary with only one token of look-ahead. More importantly, this syntax immediately calls the reader's attention to the fact that the second declaration has a highly unusual type. There are other ways of resolving ambiguities like this. For example, we could avoid allowing the same operator to have both postfix and infix forms, or we could distinguish them by the presence or absence of whitespace. However, even if we avoid _formal_ ambiguity by such means, a separator like `:` may be useful for reducing _visual_ ambiguity for human readers. #### Confusion with other languages and alternatives One of the disadvantages of `:` is that with `var Int: x`, ordering is inconsistent with other languages using `:`, such as Rust and Swift, which would say `var x: Int`. It may be worth considering other syntax options. A few to consider are: ```carbon var(Int) x; var Int# x; var Int @x; var Int -> x; ``` These aren't part of the proposed syntax mainly because it's not clear any would gain as much support as `:`. However, this is an opportunity to make suggestions and see if there's a good compromise. #### Use in pattern matching The [old draft pattern matching proposal](https://github.com/carbon-language/carbon-lang/pull/87) used `:` as a separator. In pattern matching, the `:` may be particularly important to distinguish between value matching and type name matching. However, the pattern matching proposal should examine these choices and alternatives before we reach a conclusion that `:` is necessary for pattern matching. Per [syntax ambiguity](#syntax-ambiguity), it is expected that `:` has some advantages, but may not turn out to make a compiler difference due to prevailing constraints on type expression syntax. This proposal suggests we [update provisional pattern matching syntax](#update-provisional-pattern-matching-syntax) to match the proposed `var` syntax. #### Advantages and disadvantages Advantages: - Reduces syntax ambiguity. - This should improve readability and parsability. - It should make it easier to debug issues during development. Disadvantages: - Deviates from the common syntax used by most languages with the type before the identifier, including C, C++, Java, and C#. - Changing from C++ is especially significant because of Carbon's goals for interoperability and migration which will mean an especially large portion of Carbon developers will be actively reading both Carbon and C++ code. - Other notable languages that us `:` in variable statements, including Swift, put the type after the identifier. - It may be worth considering [alternatives to `:`](#confusion-with-other-languages-and-alternatives). - If the alternative of [type after identifier](#type-after-identifier) is adopted, it's likely a `:` separator will be adopted. #### Conclusion Right now the proposal is to not have anything between the type and identifier in order to avoid cross-language ambiguity, and to retain syntax that is closer to C++. However, the ultimate decision may hinge on type and identifier ordering, as well as related future evolution. ### Type after identifier There are many languages that put the type after the identifier. A common format used by Swift and Rust is `var x: Int`. It's worth considering the sentence-like readings: - `var x Int` (or `var x: Int`) may be read as "declare x as an int" or "make a variable x and give it int storage". - `var Int x` may be read as "declare an int called x" or "make a variable with int storage called x". These readings might be of similar quality, and are presented to offer different perspectives on how to read the possible statement orderings. #### Ordering as a way to quickly answer questions Ordering is essentially a question of pairing identifiers and types. This can be cast as asking which question developers consider more important when reading code: 1. What is the type of variable `x`? 2. What is the identifier of the `Int` variable? We assert the first question is the more important one: developers will see an identifier in later code, and want to know its type. However, how do we determine which order is better for this purpose? Unfortunately, little research has been done on this. All we're aware of right now is a study from an unpublished undergraduate project from Germany. The study was done in Java with 50 students. Its data indicates that it's faster to answer question 1 if the type comes first, and faster to answer question 2 if the identifier comes first. We do not want to make decisions based on the study because it isn't published, studied a small group, and doesn't directly compare possible `var` syntaxes; however, it still influences our thoughts. #### Syntax popularity When considering what to use for now, we can consider the popularity of various languages. The top 10 on several sources (with percentages noted by sources that have them) are: | TIOBE | Pct | GitHut | Pct | PYPL | Pct | Octoverse | | ------------ | --: | ---------- | --: | ----------- | --: | ---------- | | C | 16% | JavaScript | 19% | Python | 30% | JavaScript | | Java | 11% | Python | 16% | Java | 17% | Python | | Python | 11% | Java | 11% | JavaScript | 8% | Java | | C++ | 7% | Go | 8% | C# | 7% | TypeScript | | C# | 4% | C++ | 7% | C and C++ | 7% | C# | | Visual Basic | 4% | Ruby | 7% | PHP | 6% | PHP | | JavaScript | 2% | TypeScript | 7% | R | 4% | C++ | | PHP | 2% | PHP | 6% | Objective-C | 4% | C | | SQL | 2% | C# | 4% | Swift | 2% | Shell | | Assembly | 2% | C | 3% | TypeScript | 2% | Ruby | Sources: - [TIOBE](https://www.tiobe.com/tiobe-index/) as of 2021-02 - [GitHut](https://madnight.github.io/githut/) as of 2020 Q4 - [PYPL](https://pypl.github.io/PYPL.html) as of 2021-02 - [Octoverse](https://octoverse.github.com/) as of 2020-10 For these languages: - C, C++, C#, Objective-C, and Java put the identifier after the type. - These use ` `, with no keyword. - Python, Go, TypeScript, Visual Basic, SQL, and Swift put the type after the identifier. - Python uses `: `, with no keyword. This was added in [Python 3.6](https://www.python.org/dev/peps/pep-0526/), and reflects language evolution. - Go uses `var `, with no colon. - TypeScript and Swift use `var : `. - Visual Basic uses `Dim as `. - SQL uses `DECLARE @ AS `, where `AS` is optional. - JavaScript, R, Ruby, PHP, Shell, and Assembly language do not specify types in the same ways as other languages. ### Type elision First, with Carbon it's been discussed to require use of `auto` (or a similar explicit syntax marker) instead of allowing developers to elide the type entirely from a `var` statement. In other words, while `var Int x = 0;` is valid, and `var auto x = 0;` is equivalent, there is no form such as `var x = 0;` which removes the type entirely. Most languages that write `var x: Int` also allow eliding the type when assigning a value. For example, Go allows `x := 0` and Swift allows `var x = 0;`. As a result, there is no need for an `auto` keyword. This would be more surprising with `var Int x` syntax because removing the `Int` now places the identifier immediately after `var`, where the type normally is. This may be subtly confusing to developers. However, if `auto` is required for explicitness, the issue is moot. Retaining `auto` does not eliminate the need to consider type elision as part of advantages and disadvantages: if the type is put after the identifier and `var x: auto` syntax is used, it now becomes an inconsistency with other languages. This inconsistency would be a Carbon innovation that may confuse developers, leading to long-term pressure to remove `auto` for consistency with similar languages, and thus a disadvantage. #### Advantages and disadvantages Advantages: - Consistent with languages that put a `:` in the type declaration. - Notable languages using `x: Int` syntax include Swift, Rust, Kotlin and TypeScript. Go is similar but does not include `:`. - Swift puts [argument labels](https://docs.swift.org/swift-book/LanguageGuide/Functions.html#ID166) before variable declaration. Keeping the identifier first allows consistency with Swift's argument label syntax while also keeping names adjacent. - More opportunity to unify a concept of putting the identifier immediately after the keyword. - `fn`, `import`, and `package` will all have the identifier immediately after the keyword, with non-identifier content following. - It's expected that a typical function declaration will look like `fn Foo(Int bar)`, with `var` only added when a storage for a copy is _required_. Thus, this advantage is primarily about the function identifier `Foo`, not parameter identifiers. - Other cases, such as `alias`, are likely flexible and could follow `var` in concept by putting the resulting name at the end. That is, `alias To = From;` for consistency with `var x: Int;` versus `alias From as To;` similar to `var Int x;` ordering. Disadvantages: - If Carbon doesn't support [type elision](#type-elision), it creates an inconsistency with other notable languages using `x: Int` syntax. - It may be worth considering [alternatives to `:`](#confusion-with-other-languages-and-alternatives). - The ordering of `x: Int` is inconsistent with C++ variable syntax. - Negatively affects [Familiarity for experienced C++ developers with a gentle learning curve](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code). - It is consistent with other parts of C++ syntax, particularly `using To = From;`, although not `typedef From To;`. - Early signs are that putting the identifier first makes it slower for developers to answer the question, "what is the type of variable `x`?" - This indicates worse readability in spite of the sentence ordering, the essential part of [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write). - Popular languages tend to use `int x` syntax, including C, Java, C++, and C#. Other notable languages include Groovy and Dart. #### Conclusion We should conduct a larger study on the topic of type and identifier ordering and syntax. Until then, we should adopt C++-like syntax. This meets the migration sub-goal [Familiarity for experienced C++ developers with a gentle learning curve](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code), and allows applying the higher-priority goal [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) if supporting evidence is found. **Experiment**: The ordering of type and identifier will be researched.