|
|
@@ -0,0 +1,516 @@
|
|
|
+# Basic Syntax
|
|
|
+
|
|
|
+<!--
|
|
|
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
|
|
|
+Exceptions. See /LICENSE for license information.
|
|
|
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
+-->
|
|
|
+
|
|
|
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/0162)
|
|
|
+
|
|
|
+## Table of contents
|
|
|
+
|
|
|
+<!-- toc -->
|
|
|
+
|
|
|
+## Table of contents
|
|
|
+
|
|
|
+- [Problem](#problem)
|
|
|
+- [Background](#background)
|
|
|
+- [Proposal](#proposal)
|
|
|
+- [Details](#details)
|
|
|
+ - [Expressions and Patterns](#expressions-and-patterns)
|
|
|
+ - [Statements](#statements)
|
|
|
+ - [Declarations](#declarations)
|
|
|
+ - [Precedence and Associativity](#precedence-and-associativity)
|
|
|
+ - [Abstract Syntax](#abstract-syntax)
|
|
|
+ - [Abstract Syntax for Expressions](#abstract-syntax-for-expressions)
|
|
|
+ - [Abstract Syntax for Statements](#abstract-syntax-for-statements)
|
|
|
+ - [Abstract Syntax for Declarations](#abstract-syntax-for-declarations)
|
|
|
+- [Alternatives considered](#alternatives-considered)
|
|
|
+
|
|
|
+<!-- tocstop -->
|
|
|
+
|
|
|
+## Problem
|
|
|
+
|
|
|
+The purpose of this proposal is to establish some basic syntactic elements of
|
|
|
+the Carbon language and make sure that the grammar is unambiguous and can be
|
|
|
+parsed by an LALR parser such as `yacc` or `bison`. The grammar presented here
|
|
|
+has indeed been checked by `bison`. The language features in this basic grammar
|
|
|
+include control flow by way of `if` and `while`, functions, simple structures,
|
|
|
+choice, pattern matching, and a sample of operators on Booleans and integers.
|
|
|
+The main syntactic categories are `declaration`, `statement`, and `expression`.
|
|
|
+Establishing these syntactic categories should help the other proposals choose
|
|
|
+syntax that is compatible with the rest of the language.
|
|
|
+
|
|
|
+## Background
|
|
|
+
|
|
|
+The grammar proposed here is based on the following proposals:
|
|
|
+
|
|
|
+- [Carbon language overview](https://github.com/carbon-language/carbon-lang/tree/trunk/docs/design)
|
|
|
+- pattern matching
|
|
|
+ [#87](https://github.com/carbon-language/carbon-lang/pull/87),
|
|
|
+- structs [#98](https://github.com/carbon-language/carbon-lang/pull/98),
|
|
|
+- tuples [#111](https://github.com/carbon-language/carbon-lang/pull/111),
|
|
|
+- sum types [#157](https://github.com/carbon-language/carbon-lang/pull/157),
|
|
|
+ and
|
|
|
+- metaprogramming
|
|
|
+ [#89](https://github.com/carbon-language/carbon-lang/pull/89).
|
|
|
+
|
|
|
+## Proposal
|
|
|
+
|
|
|
+We summarize the four main syntactic categories here and define the grammar in
|
|
|
+the next section.
|
|
|
+
|
|
|
+- `declaration` includes function, structure, and choice definitions.
|
|
|
+
|
|
|
+- `statement` includes local variable definitions, assignment, blocks, `if`,
|
|
|
+ `while`, `match`, `break`, `continue`, and `return`.
|
|
|
+
|
|
|
+- `expression` includes function calls, arithmetic, literals, and other
|
|
|
+ syntaxes that evaluate to a value. In Carbon, a type is a kind of value, and
|
|
|
+ Carbon makes no syntactic distinction between type-valued expressions and
|
|
|
+ other kinds of expressions, so the `expression` nonterminal is used in
|
|
|
+ positions where a type is expected.
|
|
|
+
|
|
|
+- `pattern` for the patterns in a `match` statement, for the left-hand side of
|
|
|
+ a variable definition, and for describing the parameters of a function. The
|
|
|
+ grammar treats patterns and expressions identically, but the type system
|
|
|
+ will only allow pattern variables (`expression ':' identifier`) in patterns
|
|
|
+ and not in value or type-producing expressions. Having the separate
|
|
|
+ non-terminals for `pattern` and `expression` helps to document this
|
|
|
+ distinction.
|
|
|
+
|
|
|
+The proposal also specifies the abstract syntax.
|
|
|
+
|
|
|
+When this proposal is accepted, the accompanying `flex`, `bison`, and C++ files
|
|
|
+that define the grammar and implement the parser actions will be placed in the
|
|
|
+`executable_semantics` directory of the `carbon-lang` repository.
|
|
|
+
|
|
|
+In the near future there will be proposals regarding the semantic analysis (aka.
|
|
|
+type checking) and specification of runtime behavior for this subset of Carbon
|
|
|
+with the plan to add the accompanying executable forms of those specifications
|
|
|
+to the `executable_semantics` directory.
|
|
|
+
|
|
|
+Looking towards growing the Carbon language, the intent is for other proposals
|
|
|
+to add to this grammar by modifying the `flex` and `bison` files and the
|
|
|
+abstract syntax definitions in the `executable_semantics` directory.
|
|
|
+
|
|
|
+## Details
|
|
|
+
|
|
|
+### Expressions and Patterns
|
|
|
+
|
|
|
+The following grammar defines the concrete syntax for expressions. Below we
|
|
|
+comment on a few aspects of the grammar.
|
|
|
+
|
|
|
+```Bison
|
|
|
+pattern:
|
|
|
+ expression
|
|
|
+;
|
|
|
+expression:
|
|
|
+ identifier
|
|
|
+| expression designator
|
|
|
+| expression '[' expression ']'
|
|
|
+| expression ':' identifier
|
|
|
+| integer_literal
|
|
|
+| "true"
|
|
|
+| "false"
|
|
|
+| tuple
|
|
|
+| expression "==" expression
|
|
|
+| expression '+' expression
|
|
|
+| expression '-' expression
|
|
|
+| expression "and" expression
|
|
|
+| expression "or" expression
|
|
|
+| "not" expression
|
|
|
+| '-' expression
|
|
|
+| expression tuple
|
|
|
+| "auto"
|
|
|
+| "fnty" tuple return_type
|
|
|
+;
|
|
|
+designator:
|
|
|
+ '.' identifier
|
|
|
+;
|
|
|
+tuple:
|
|
|
+ '(' field_list ')'
|
|
|
+;
|
|
|
+field_list:
|
|
|
+ /* empty */
|
|
|
+| field
|
|
|
+| field ',' field_list
|
|
|
+;
|
|
|
+field:
|
|
|
+ pattern
|
|
|
+| designator '=' pattern
|
|
|
+;
|
|
|
+return_type:
|
|
|
+ /* empty */
|
|
|
+| "->" expression
|
|
|
+;
|
|
|
+```
|
|
|
+
|
|
|
+The grammar rule
|
|
|
+
|
|
|
+```Bison
|
|
|
+expression: expression ':' identifier
|
|
|
+```
|
|
|
+
|
|
|
+is for pattern variables. For example, in a variable definition such as
|
|
|
+
|
|
|
+ var Int: x = 0;
|
|
|
+
|
|
|
+the `Int: x` is parsed with the grammar rule for pattern variables. In the
|
|
|
+right-hand side of the above grammar rule, the `expression` to the left of the
|
|
|
+`:` must evaluate to a type at compile time.
|
|
|
+
|
|
|
+The grammar rule
|
|
|
+
|
|
|
+```Bison
|
|
|
+expression: "fnty" tuple return_type
|
|
|
+```
|
|
|
+
|
|
|
+is for function types. They are meant to play the role that function pointers
|
|
|
+play in C and that `std::function` plays in C++.
|
|
|
+
|
|
|
+Regarding the grammar rule for the return type of a function:
|
|
|
+
|
|
|
+```Bison
|
|
|
+return_type: "->" expression
|
|
|
+```
|
|
|
+
|
|
|
+the `expression` is expected to evaluate to a type at compile-time.
|
|
|
+
|
|
|
+The grammar rule
|
|
|
+
|
|
|
+```Bison
|
|
|
+tuple: '(' field_list ')'
|
|
|
+```
|
|
|
+
|
|
|
+is primarily for constructing a tuple, but it is also used for creating tuple
|
|
|
+types and tuple patterns, depending on the context in which the expression
|
|
|
+occurs.
|
|
|
+
|
|
|
+Regarding the grammar rules for `designator` and for a named argument
|
|
|
+
|
|
|
+```Bison
|
|
|
+designator: '.' identifier
|
|
|
+field: designator '=' pattern
|
|
|
+```
|
|
|
+
|
|
|
+The period enables the addition of new keywords to Carbon without colliding with
|
|
|
+field names. The period also aids integrated development environments with
|
|
|
+auto-completion. The issue is that without the period, when the programmer is
|
|
|
+typing the identifier (and not yet typed the `=`), the IDE doesn't know whether
|
|
|
+the identifier is for a positional argument, in which case it is a variable
|
|
|
+occurrence, or whether the identifier is a field name.
|
|
|
+
|
|
|
+### Statements
|
|
|
+
|
|
|
+The following grammar defines the concrete syntax for statements.
|
|
|
+
|
|
|
+```Bison
|
|
|
+statement:
|
|
|
+ "var" pattern '=' expression ';'
|
|
|
+| expression '=' expression ';'
|
|
|
+| expression ';'
|
|
|
+| "if" '(' expression ')' statement "else" statement
|
|
|
+| "while" '(' expression ')' statement
|
|
|
+| "break" ';'
|
|
|
+| "continue" ';'
|
|
|
+| "return" expression ';'
|
|
|
+| '{' statement_list '}'
|
|
|
+| "match" '(' expression ')' '{' clause_list '}'
|
|
|
+;
|
|
|
+statement_list:
|
|
|
+ /* empty */
|
|
|
+| statement statement_list
|
|
|
+;
|
|
|
+clause_list:
|
|
|
+ /* empty */
|
|
|
+| clause clause_list
|
|
|
+;
|
|
|
+clause:
|
|
|
+ "case" pattern "=>" statement
|
|
|
+| "default" "=>" statement
|
|
|
+;
|
|
|
+```
|
|
|
+
|
|
|
+### Declarations
|
|
|
+
|
|
|
+The following grammar defines the concrete syntax for declarations.
|
|
|
+
|
|
|
+```Bison
|
|
|
+declaration:
|
|
|
+ "fn" identifier tuple return_type '{' statement_list '}'
|
|
|
+| "fn" identifier tuple "=>" expression ';'
|
|
|
+| "fn" identifier tuple return_type ';'
|
|
|
+| "struct" identifier '{' member_list '}'
|
|
|
+| "choice" identifier '{' alternative_list '}'
|
|
|
+;
|
|
|
+member:
|
|
|
+ "var" expression ':' identifier ';'
|
|
|
+;
|
|
|
+member_list:
|
|
|
+ /* empty */
|
|
|
+| member member_list
|
|
|
+;
|
|
|
+alternative:
|
|
|
+ identifier tuple ';'
|
|
|
+;
|
|
|
+alternative_list:
|
|
|
+ /* empty */
|
|
|
+| alternative alternative_list
|
|
|
+;
|
|
|
+declaration_list:
|
|
|
+ /* empty */
|
|
|
+| declaration declaration_list
|
|
|
+;
|
|
|
+```
|
|
|
+
|
|
|
+In the grammar rule for function definitions
|
|
|
+
|
|
|
+```Bison
|
|
|
+declaration: "fn" identifier tuple return_type '{' statement_list '}'
|
|
|
+```
|
|
|
+
|
|
|
+the `tuple` is used as a pattern to describe the parameters of the function. The
|
|
|
+grammar for function definitions does not currently include
|
|
|
+[implicit parameters](https://github.com/josh11b/carbon-lang/blob/generics-docs/docs/design/generics/terminology.md#implicit-parameter),
|
|
|
+but the intent is to add them to the grammar in the future.
|
|
|
+
|
|
|
+In the rule for field declarations
|
|
|
+
|
|
|
+```Bison
|
|
|
+member: "var" expression ':' identifier ';'
|
|
|
+```
|
|
|
+
|
|
|
+the `expression` must evaluate to a type at compile time. The same is true for
|
|
|
+the `tuple` in the grammar rule for an alternative:
|
|
|
+
|
|
|
+ alternative: identifier tuple ';'
|
|
|
+
|
|
|
+### Precedence and Associativity
|
|
|
+
|
|
|
+The following precedence and associativity specification is meant to approximate
|
|
|
+the definitions in the operators and precedence proposal
|
|
|
+[#168](https://github.com/carbon-language/carbon-lang/pull/168) to the extent
|
|
|
+that is possible in a `yacc`/`bison` generated parser. The ordering is from
|
|
|
+lowest to highest precedence, with operators on the same line having equal
|
|
|
+precedence. Proposal 168 differs in that the operator groups are partially
|
|
|
+ordered instead of being totally ordered.
|
|
|
+
|
|
|
+ nonassoc '{' '}'
|
|
|
+ nonassoc ':' ','
|
|
|
+ left "or" "and"
|
|
|
+ nonassoc "==" "not"
|
|
|
+ left '+' '-'
|
|
|
+ left '.' "->"
|
|
|
+ nonassoc '(' ')' '[' ']'
|
|
|
+
|
|
|
+### Abstract Syntax
|
|
|
+
|
|
|
+The output of parsing is an abstract syntax tree. There are many ways to define
|
|
|
+abstract syntax. Here we simply use C-style `struct` definitions. The definition
|
|
|
+of the abstract syntax is important because it is used in the specification of
|
|
|
+the semantic analysis and the specification of runtime behavior.
|
|
|
+
|
|
|
+#### Abstract Syntax for Expressions
|
|
|
+
|
|
|
+```C++
|
|
|
+enum ExpressionKind { Variable, PatternVariable, Int, Bool,
|
|
|
+ PrimitiveOp, Call, Tuple, Index, GetField,
|
|
|
+ IntT, BoolT, TypeT, FunctionT, AutoT };
|
|
|
+enum Operator { Neg, Add, Sub, Not, And, Or, Eq };
|
|
|
+
|
|
|
+struct Expression {
|
|
|
+ ExpressionKind tag;
|
|
|
+ union {
|
|
|
+ struct { string* name; } variable;
|
|
|
+ struct { Expression* aggregate; string* field; } get_field;
|
|
|
+ struct { Expression* aggregate; Expression* offset; } index;
|
|
|
+ struct { string* name; Expression* type; } pattern_variable;
|
|
|
+ int integer;
|
|
|
+ bool boolean;
|
|
|
+ struct { vector<pair<string,Expression*> >* fields; } tuple;
|
|
|
+ struct {
|
|
|
+ Operator operator_;
|
|
|
+ vector<Expression*>* arguments;
|
|
|
+ } primitive_op;
|
|
|
+ struct { Expression* function; Expression* argument; } call;
|
|
|
+ struct { Expression* parameter; Expression* return_type;} function_type;
|
|
|
+ } u;
|
|
|
+};
|
|
|
+```
|
|
|
+
|
|
|
+The correspondence between most of the grammar rules and the abstract syntax is
|
|
|
+straightforward. However, the parsing of the `field_list` deserves some
|
|
|
+explanation. The fields can be labeled with the grammar rule:
|
|
|
+
|
|
|
+```Bison
|
|
|
+field: designator '=' pattern
|
|
|
+```
|
|
|
+
|
|
|
+or unlabeled, with the rule
|
|
|
+
|
|
|
+```Bison
|
|
|
+field: pattern
|
|
|
+```
|
|
|
+
|
|
|
+The unlabeled fields are given numbers (represented as strings) for field
|
|
|
+labels, starting with 0 and going up from left to right.
|
|
|
+
|
|
|
+Regarding the rule for tuples:
|
|
|
+
|
|
|
+```Bison
|
|
|
+tuple: '(' field_list ')'
|
|
|
+```
|
|
|
+
|
|
|
+if the field list only has a single unlabeled item without a trailing comma,
|
|
|
+then the parse result for that item is returned. Otherwise a `tuple` AST node is
|
|
|
+created containing the parse results for the fields.
|
|
|
+
|
|
|
+In the following grammar rules, if the parse result for `tuple` is not a tuple
|
|
|
+(because the field list was a single unlabeled item without a trailing comma),
|
|
|
+then a tuple is wrapped around the expression to ensure that it is a tuple.
|
|
|
+
|
|
|
+```Bison
|
|
|
+expression: expression tuple
|
|
|
+expression: "fnty" tuple return_type
|
|
|
+function_definition:
|
|
|
+ "fn" identifier tuple return_type '{' statement_list '}'
|
|
|
+| "fn" identifier tuple DBLARROW expression ';'
|
|
|
+| "fn" identifier tuple return_type ';'
|
|
|
+alternative: identifier tuple ';'
|
|
|
+```
|
|
|
+
|
|
|
+#### Abstract Syntax for Statements
|
|
|
+
|
|
|
+```C++
|
|
|
+enum StatementKind { ExpressionStatement, Assign, VariableDefinition,
|
|
|
+ If, Return, Sequence, Block, While, Break, Continue,
|
|
|
+ Match };
|
|
|
+
|
|
|
+struct Statement {
|
|
|
+ StatementKind tag;
|
|
|
+ union {
|
|
|
+ Expression* exp;
|
|
|
+ struct { Expression* lhs; Expression* rhs; } assign;
|
|
|
+ struct { Expression* pat; Expression* init; } variable_definition;
|
|
|
+ struct { Expression* cond; Statement* then_; Statement* else_; } if_stmt;
|
|
|
+ Expression* return_stmt;
|
|
|
+ struct { Statement* stmt; Statement* next; } sequence;
|
|
|
+ struct { Statement* stmt; } block;
|
|
|
+ struct { Expression* cond; Statement* body; } while_stmt;
|
|
|
+ struct {
|
|
|
+ Expression* exp;
|
|
|
+ list< pair<Expression*,Statement*> >* clauses;
|
|
|
+ } match_stmt;
|
|
|
+ } u;
|
|
|
+};
|
|
|
+```
|
|
|
+
|
|
|
+#### Abstract Syntax for Declarations
|
|
|
+
|
|
|
+```C++
|
|
|
+struct FunctionDefinition {
|
|
|
+ string name;
|
|
|
+ Expression* param_pattern;
|
|
|
+ Expression* return_type;
|
|
|
+ Statement* body;
|
|
|
+};
|
|
|
+
|
|
|
+enum MemberKind { FieldMember };
|
|
|
+
|
|
|
+struct Member {
|
|
|
+ MemberKind tag;
|
|
|
+ union {
|
|
|
+ struct { string* name; Expression* type; } field;
|
|
|
+ } u;
|
|
|
+};
|
|
|
+
|
|
|
+struct StructDefinition {
|
|
|
+ string* name;
|
|
|
+ list<Member*>* members;
|
|
|
+};
|
|
|
+
|
|
|
+enum DeclarationKind { FunctionDeclaration, StructDeclaration,
|
|
|
+ ChoiceDeclaration };
|
|
|
+
|
|
|
+struct Declaration {
|
|
|
+ DeclarationKind tag;
|
|
|
+ union {
|
|
|
+ struct FunctionDefinition* fun_def;
|
|
|
+ struct StructDefinition* struct_def;
|
|
|
+ struct {
|
|
|
+ string* name;
|
|
|
+ list<pair<string, Expression*> >* alts;
|
|
|
+ } choice_def;
|
|
|
+ } u;
|
|
|
+};
|
|
|
+```
|
|
|
+
|
|
|
+## Alternatives considered
|
|
|
+
|
|
|
+Expressions, type expressions, and patterns could instead be defined using
|
|
|
+completely disjoint grammar productions, but that would require adding more
|
|
|
+keywords or symbols to avoid ambiguity and there would be a fair amount of
|
|
|
+repetition in grammar productions.
|
|
|
+
|
|
|
+The proposal does not include declarations for uninitialized variables, leaving
|
|
|
+that to a later proposal.
|
|
|
+
|
|
|
+In this proposal, assignment is a statement. It could instead be an expression
|
|
|
+as it is in C and C++. The arguments against assignment-as-an-expression
|
|
|
+include 1) it complicates reasoning about the ordering of side-effects and 2) it
|
|
|
+can cause confusion between `=` and `==`
|
|
|
+[(SEI CERT C Coding Standard)](https://wiki.sei.cmu.edu/confluence/display/c/EXP45-C.+Do+not+perform+assignments+in+selection+statements)
|
|
|
+[Visual Studio Warning](https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4706?view=vs-2019).
|
|
|
+
|
|
|
+The syntax for variable initialization uses `=`, which is the same token used in
|
|
|
+assignment. An alternative would be to use a different syntax such as `:=` to
|
|
|
+better signal the semantic differences between initialization and assignment.
|
|
|
+The author does not have a preference on this choice. The `:=` alternative does
|
|
|
+not introduce any complications regarding ambiguity despite the other uses of
|
|
|
+`:` in the grammar.
|
|
|
+
|
|
|
+Inside a `choice`, an `alternative` could instead begin with a keyword such as
|
|
|
+`alt`, `fn`, or `var`, as discussed in the proposal for sum types
|
|
|
+[#157](https://github.com/carbon-language/carbon-lang/pull/157). The author does
|
|
|
+not have a preference in this regard.
|
|
|
+
|
|
|
+The precedence for `and` and `or` are equal, following the suggestion of
|
|
|
+proposal [#168](https://github.com/carbon-language/carbon-lang/pull/168),
|
|
|
+whereas in C++ `&&` has higher precedence than `||`.
|
|
|
+
|
|
|
+The parameters of a function are described using the syntax of a pattern. There
|
|
|
+could instead be separate syntax that is special to function parameters, as
|
|
|
+there is in C++.
|
|
|
+
|
|
|
+There are open questions regarding the design of function types, so we could
|
|
|
+alternatively postpone the specification of the syntax for function types. The
|
|
|
+choice to include them is based on the idea that Carbon will need something that
|
|
|
+fills the roles of `std::function` and C-style function pointers. Also, the
|
|
|
+features were selected for the basic syntax proposal in a way that aimed to make
|
|
|
+the proposal complete in the sense that each value-oriented feature (functions,
|
|
|
+tuples, structures) comes with syntax for 1) creating values, 2) using values,
|
|
|
+and 3) a type for classifying values.
|
|
|
+
|
|
|
+The parameters of a function type are described using the syntax for type
|
|
|
+expression. There could instead be separate syntax that is special to the
|
|
|
+parameters of a function type, as there is in C++.
|
|
|
+
|
|
|
+The keyword for introducing a function type is `fnty`, which is an arbitrary
|
|
|
+choice. Other alternatives include `fn_type` and `fn`. The `fn` alternative
|
|
|
+would conflict with future plans to use `fn` for lambda expressions. Some
|
|
|
+dislike of `fn_type` was voiced, which motivated the choice of `fnty`. It would
|
|
|
+be nice to do without a keyword, but as the grammar currently stands, that
|
|
|
+introduce ambiguity.
|
|
|
+
|
|
|
+The `pattern` non-terminal is defined in terms of `expression`. It could instead
|
|
|
+be the other way around, defining `expression` in terms of `pattern`.
|
|
|
+
|
|
|
+Regarding tuples and tuple types, this proposal follows
|
|
|
+[#111](https://github.com/carbon-language/carbon-lang/pull/111) in that there is
|
|
|
+not a distinct syntax for tuple types. Instead, the intent is that when a tuple
|
|
|
+of types results from an expression in a context that expects a type, the type
|
|
|
+system will convert the tuple to a tuple type. An alternative approach has been
|
|
|
+discussed in which tuple types have a distinct syntax, such as
|
|
|
+`Tuple(Int, Bool)`.
|