Kaynağa Gözat

Basic Syntax (#162)

* first draft of proposal for basic syntax

* rename proposal

* formating

* fixing typos

* precendence and associativity

* minor edit

* added abstract syntax

* some revisions based on feedback from meeting today

* optional return type

* oops, not optional for function declarations

* replacing abbreviations with full names

* name changes

* Update proposals/p0162.md

Co-authored-by: Dmitri Gribenko <gribozavr@gmail.com>

* removed = from precedence table

* for fun decl, back to optional return type, shorthand for void return

* more fiddling with return types

* change base case of statement_list to empty

* added pattern non-terminal

* added  expression style function definitions

* change && to and, || to or

* change  and  to have equal precedence

* updating text to match grammar, fix typo in grammar regarding pattern

* adding trailing comma thing to tuples

* removed 'alt' keyword, not necessary

* flipped expression and pattern

* added period for named arguments and documented the reason

* change alternative syntax to use tuple instead of expression

* comment about abstract syntax

* code block language annotations

* added alternative designs, some cleanup for pre-commit

* spell checking

* filled out the TOC

* minor edits

* minor edit

* trying to fix pre-commit error

* changes from pre-commit?

* changes based on meeting today

* describe alternatives regarding methods

* more rationale in discussion of alternatives

* addressing comments

* typo fix, added text about next steps

* removed *, changed a ! to not

* Update proposals/p0162.md

Co-authored-by: Geoff Romer <gromer@google.com>

* edits from feedback

* pre-commit

* added second reason for period in field initializer

* added executable semantics, fixed misunderstanding regarding associativity

* update README

* added a paragraph about tuples and tuple types

* Update proposals/p0162.md

Co-authored-by: josh11b <josh11b@users.noreply.github.com>

* addressing comments from josh11b

* Update executable-semantics/README.md

Co-authored-by: Dmitri Gribenko <gribozavr@gmail.com>

* link to implicit parameters in generics proposal

* added non-terminal for designator as per geoffromer's suggestion

* changed handling of tuples in function call and similar places as per josh11b and zygoloid

* moving code to separate PR

Co-authored-by: Dmitri Gribenko <gribozavr@gmail.com>
Co-authored-by: Geoff Romer <gromer@google.com>
Co-authored-by: josh11b <josh11b@users.noreply.github.com>
Jeremy G. Siek 5 yıl önce
ebeveyn
işleme
136a8ae470
2 değiştirilmiş dosya ile 517 ekleme ve 0 silme
  1. 1 0
      proposals/README.md
  2. 516 0
      proposals/p0162.md

+ 1 - 0
proposals/README.md

@@ -49,6 +49,7 @@ request:
     -   [Decision](p0143_decision.md)
 -   [0149 - Change documentation style guide](p0149.md)
     -   [Decision](p0149_decision.md)
+-   [0162 - Basic Syntax](p0162.md)
 -   [0175 - C++ interoperability goals](p0175.md)
 
 <!-- endproposals -->

+ 516 - 0
proposals/p0162.md

@@ -0,0 +1,516 @@
+# Basic Syntax
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/0162)
+
+## Table of contents
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+    -   [Expressions and Patterns](#expressions-and-patterns)
+    -   [Statements](#statements)
+    -   [Declarations](#declarations)
+    -   [Precedence and Associativity](#precedence-and-associativity)
+    -   [Abstract Syntax](#abstract-syntax)
+        -   [Abstract Syntax for Expressions](#abstract-syntax-for-expressions)
+        -   [Abstract Syntax for Statements](#abstract-syntax-for-statements)
+        -   [Abstract Syntax for Declarations](#abstract-syntax-for-declarations)
+-   [Alternatives considered](#alternatives-considered)
+
+<!-- tocstop -->
+
+## Problem
+
+The purpose of this proposal is to establish some basic syntactic elements of
+the Carbon language and make sure that the grammar is unambiguous and can be
+parsed by an LALR parser such as `yacc` or `bison`. The grammar presented here
+has indeed been checked by `bison`. The language features in this basic grammar
+include control flow by way of `if` and `while`, functions, simple structures,
+choice, pattern matching, and a sample of operators on Booleans and integers.
+The main syntactic categories are `declaration`, `statement`, and `expression`.
+Establishing these syntactic categories should help the other proposals choose
+syntax that is compatible with the rest of the language.
+
+## Background
+
+The grammar proposed here is based on the following proposals:
+
+-   [Carbon language overview](https://github.com/carbon-language/carbon-lang/tree/trunk/docs/design)
+-   pattern matching
+    [#87](https://github.com/carbon-language/carbon-lang/pull/87),
+-   structs [#98](https://github.com/carbon-language/carbon-lang/pull/98),
+-   tuples [#111](https://github.com/carbon-language/carbon-lang/pull/111),
+-   sum types [#157](https://github.com/carbon-language/carbon-lang/pull/157),
+    and
+-   metaprogramming
+    [#89](https://github.com/carbon-language/carbon-lang/pull/89).
+
+## Proposal
+
+We summarize the four main syntactic categories here and define the grammar in
+the next section.
+
+-   `declaration` includes function, structure, and choice definitions.
+
+-   `statement` includes local variable definitions, assignment, blocks, `if`,
+    `while`, `match`, `break`, `continue`, and `return`.
+
+-   `expression` includes function calls, arithmetic, literals, and other
+    syntaxes that evaluate to a value. In Carbon, a type is a kind of value, and
+    Carbon makes no syntactic distinction between type-valued expressions and
+    other kinds of expressions, so the `expression` nonterminal is used in
+    positions where a type is expected.
+
+-   `pattern` for the patterns in a `match` statement, for the left-hand side of
+    a variable definition, and for describing the parameters of a function. The
+    grammar treats patterns and expressions identically, but the type system
+    will only allow pattern variables (`expression ':' identifier`) in patterns
+    and not in value or type-producing expressions. Having the separate
+    non-terminals for `pattern` and `expression` helps to document this
+    distinction.
+
+The proposal also specifies the abstract syntax.
+
+When this proposal is accepted, the accompanying `flex`, `bison`, and C++ files
+that define the grammar and implement the parser actions will be placed in the
+`executable_semantics` directory of the `carbon-lang` repository.
+
+In the near future there will be proposals regarding the semantic analysis (aka.
+type checking) and specification of runtime behavior for this subset of Carbon
+with the plan to add the accompanying executable forms of those specifications
+to the `executable_semantics` directory.
+
+Looking towards growing the Carbon language, the intent is for other proposals
+to add to this grammar by modifying the `flex` and `bison` files and the
+abstract syntax definitions in the `executable_semantics` directory.
+
+## Details
+
+### Expressions and Patterns
+
+The following grammar defines the concrete syntax for expressions. Below we
+comment on a few aspects of the grammar.
+
+```Bison
+pattern:
+  expression
+;
+expression:
+  identifier
+| expression designator
+| expression '[' expression ']'
+| expression ':' identifier
+| integer_literal
+| "true"
+| "false"
+| tuple
+| expression "==" expression
+| expression '+' expression
+| expression '-' expression
+| expression "and" expression
+| expression "or" expression
+| "not" expression
+| '-' expression
+| expression tuple
+| "auto"
+| "fnty" tuple return_type
+;
+designator:
+  '.' identifier
+;
+tuple:
+  '(' field_list ')'
+;
+field_list:
+  /* empty */
+| field
+| field ',' field_list
+;
+field:
+  pattern
+| designator '=' pattern
+;
+return_type:
+  /* empty */
+| "->" expression
+;
+```
+
+The grammar rule
+
+```Bison
+expression:  expression ':' identifier
+```
+
+is for pattern variables. For example, in a variable definition such as
+
+    var Int: x = 0;
+
+the `Int: x` is parsed with the grammar rule for pattern variables. In the
+right-hand side of the above grammar rule, the `expression` to the left of the
+`:` must evaluate to a type at compile time.
+
+The grammar rule
+
+```Bison
+expression:  "fnty" tuple return_type
+```
+
+is for function types. They are meant to play the role that function pointers
+play in C and that `std::function` plays in C++.
+
+Regarding the grammar rule for the return type of a function:
+
+```Bison
+return_type:  "->" expression
+```
+
+the `expression` is expected to evaluate to a type at compile-time.
+
+The grammar rule
+
+```Bison
+tuple:  '(' field_list ')'
+```
+
+is primarily for constructing a tuple, but it is also used for creating tuple
+types and tuple patterns, depending on the context in which the expression
+occurs.
+
+Regarding the grammar rules for `designator` and for a named argument
+
+```Bison
+designator: '.' identifier
+field: designator '=' pattern
+```
+
+The period enables the addition of new keywords to Carbon without colliding with
+field names. The period also aids integrated development environments with
+auto-completion. The issue is that without the period, when the programmer is
+typing the identifier (and not yet typed the `=`), the IDE doesn't know whether
+the identifier is for a positional argument, in which case it is a variable
+occurrence, or whether the identifier is a field name.
+
+### Statements
+
+The following grammar defines the concrete syntax for statements.
+
+```Bison
+statement:
+  "var" pattern '=' expression ';'
+| expression '=' expression ';'
+| expression ';'
+| "if" '(' expression ')' statement "else" statement
+| "while" '(' expression ')' statement
+| "break" ';'
+| "continue" ';'
+| "return" expression ';'
+| '{' statement_list '}'
+| "match" '(' expression ')' '{' clause_list '}'
+;
+statement_list:
+  /* empty */
+| statement statement_list
+;
+clause_list:
+  /* empty */
+| clause clause_list
+;
+clause:
+  "case" pattern "=>" statement
+| "default" "=>" statement
+;
+```
+
+### Declarations
+
+The following grammar defines the concrete syntax for declarations.
+
+```Bison
+declaration:
+  "fn" identifier tuple return_type '{' statement_list '}'
+| "fn" identifier tuple "=>" expression ';'
+| "fn" identifier tuple return_type ';'
+| "struct" identifier '{' member_list '}'
+| "choice" identifier '{' alternative_list '}'
+;
+member:
+  "var" expression ':' identifier ';'
+;
+member_list:
+  /* empty */
+| member member_list
+;
+alternative:
+  identifier tuple ';'
+;
+alternative_list:
+  /* empty */
+| alternative alternative_list
+;
+declaration_list:
+  /* empty */
+| declaration declaration_list
+;
+```
+
+In the grammar rule for function definitions
+
+```Bison
+declaration:  "fn" identifier tuple return_type '{' statement_list '}'
+```
+
+the `tuple` is used as a pattern to describe the parameters of the function. The
+grammar for function definitions does not currently include
+[implicit parameters](https://github.com/josh11b/carbon-lang/blob/generics-docs/docs/design/generics/terminology.md#implicit-parameter),
+but the intent is to add them to the grammar in the future.
+
+In the rule for field declarations
+
+```Bison
+member:  "var" expression ':' identifier ';'
+```
+
+the `expression` must evaluate to a type at compile time. The same is true for
+the `tuple` in the grammar rule for an alternative:
+
+    alternative:  identifier tuple ';'
+
+### Precedence and Associativity
+
+The following precedence and associativity specification is meant to approximate
+the definitions in the operators and precedence proposal
+[#168](https://github.com/carbon-language/carbon-lang/pull/168) to the extent
+that is possible in a `yacc`/`bison` generated parser. The ordering is from
+lowest to highest precedence, with operators on the same line having equal
+precedence. Proposal 168 differs in that the operator groups are partially
+ordered instead of being totally ordered.
+
+    nonassoc '{' '}'
+    nonassoc ':' ','
+    left "or" "and"
+    nonassoc "==" "not"
+    left '+' '-'
+    left '.' "->"
+    nonassoc '(' ')' '[' ']'
+
+### Abstract Syntax
+
+The output of parsing is an abstract syntax tree. There are many ways to define
+abstract syntax. Here we simply use C-style `struct` definitions. The definition
+of the abstract syntax is important because it is used in the specification of
+the semantic analysis and the specification of runtime behavior.
+
+#### Abstract Syntax for Expressions
+
+```C++
+enum ExpressionKind { Variable, PatternVariable, Int, Bool,
+                      PrimitiveOp, Call, Tuple, Index, GetField,
+                      IntT, BoolT, TypeT, FunctionT, AutoT };
+enum Operator { Neg, Add, Sub, Not, And, Or, Eq };
+
+struct Expression {
+  ExpressionKind tag;
+  union {
+    struct { string* name; } variable;
+    struct { Expression* aggregate; string* field; } get_field;
+    struct { Expression* aggregate; Expression* offset; } index;
+    struct { string* name; Expression* type; } pattern_variable;
+    int integer;
+    bool boolean;
+    struct { vector<pair<string,Expression*> >* fields; } tuple;
+    struct {
+      Operator operator_;
+      vector<Expression*>* arguments;
+    } primitive_op;
+    struct { Expression* function; Expression* argument; } call;
+    struct { Expression* parameter; Expression* return_type;} function_type;
+  } u;
+};
+```
+
+The correspondence between most of the grammar rules and the abstract syntax is
+straightforward. However, the parsing of the `field_list` deserves some
+explanation. The fields can be labeled with the grammar rule:
+
+```Bison
+field:  designator '=' pattern
+```
+
+or unlabeled, with the rule
+
+```Bison
+field: pattern
+```
+
+The unlabeled fields are given numbers (represented as strings) for field
+labels, starting with 0 and going up from left to right.
+
+Regarding the rule for tuples:
+
+```Bison
+tuple:  '(' field_list ')'
+```
+
+if the field list only has a single unlabeled item without a trailing comma,
+then the parse result for that item is returned. Otherwise a `tuple` AST node is
+created containing the parse results for the fields.
+
+In the following grammar rules, if the parse result for `tuple` is not a tuple
+(because the field list was a single unlabeled item without a trailing comma),
+then a tuple is wrapped around the expression to ensure that it is a tuple.
+
+```Bison
+expression: expression tuple
+expression: "fnty" tuple return_type
+function_definition:
+  "fn" identifier tuple return_type '{' statement_list '}'
+| "fn" identifier tuple DBLARROW expression ';'
+| "fn" identifier tuple return_type ';'
+alternative: identifier tuple ';'
+```
+
+#### Abstract Syntax for Statements
+
+```C++
+enum StatementKind { ExpressionStatement, Assign, VariableDefinition,
+                     If,  Return, Sequence, Block, While, Break, Continue,
+                     Match };
+
+struct Statement {
+  StatementKind tag;
+  union {
+    Expression* exp;
+    struct { Expression* lhs; Expression* rhs; } assign;
+    struct { Expression* pat; Expression* init; } variable_definition;
+    struct { Expression* cond; Statement* then_; Statement* else_; } if_stmt;
+    Expression* return_stmt;
+    struct { Statement* stmt; Statement* next; } sequence;
+    struct { Statement* stmt; } block;
+    struct { Expression* cond; Statement* body; } while_stmt;
+    struct {
+      Expression* exp;
+      list< pair<Expression*,Statement*> >* clauses;
+    } match_stmt;
+  } u;
+};
+```
+
+#### Abstract Syntax for Declarations
+
+```C++
+struct FunctionDefinition {
+  string name;
+  Expression* param_pattern;
+  Expression* return_type;
+  Statement* body;
+};
+
+enum MemberKind { FieldMember };
+
+struct Member {
+  MemberKind tag;
+  union {
+    struct { string* name; Expression* type; } field;
+  } u;
+};
+
+struct StructDefinition {
+  string* name;
+  list<Member*>* members;
+};
+
+enum DeclarationKind { FunctionDeclaration, StructDeclaration,
+                       ChoiceDeclaration };
+
+struct Declaration {
+  DeclarationKind tag;
+  union {
+    struct FunctionDefinition* fun_def;
+    struct StructDefinition* struct_def;
+    struct {
+      string* name;
+      list<pair<string, Expression*> >* alts;
+    } choice_def;
+  } u;
+};
+```
+
+## Alternatives considered
+
+Expressions, type expressions, and patterns could instead be defined using
+completely disjoint grammar productions, but that would require adding more
+keywords or symbols to avoid ambiguity and there would be a fair amount of
+repetition in grammar productions.
+
+The proposal does not include declarations for uninitialized variables, leaving
+that to a later proposal.
+
+In this proposal, assignment is a statement. It could instead be an expression
+as it is in C and C++. The arguments against assignment-as-an-expression
+include 1) it complicates reasoning about the ordering of side-effects and 2) it
+can cause confusion between `=` and `==`
+[(SEI CERT C Coding Standard)](https://wiki.sei.cmu.edu/confluence/display/c/EXP45-C.+Do+not+perform+assignments+in+selection+statements)
+[Visual Studio Warning](https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4706?view=vs-2019).
+
+The syntax for variable initialization uses `=`, which is the same token used in
+assignment. An alternative would be to use a different syntax such as `:=` to
+better signal the semantic differences between initialization and assignment.
+The author does not have a preference on this choice. The `:=` alternative does
+not introduce any complications regarding ambiguity despite the other uses of
+`:` in the grammar.
+
+Inside a `choice`, an `alternative` could instead begin with a keyword such as
+`alt`, `fn`, or `var`, as discussed in the proposal for sum types
+[#157](https://github.com/carbon-language/carbon-lang/pull/157). The author does
+not have a preference in this regard.
+
+The precedence for `and` and `or` are equal, following the suggestion of
+proposal [#168](https://github.com/carbon-language/carbon-lang/pull/168),
+whereas in C++ `&&` has higher precedence than `||`.
+
+The parameters of a function are described using the syntax of a pattern. There
+could instead be separate syntax that is special to function parameters, as
+there is in C++.
+
+There are open questions regarding the design of function types, so we could
+alternatively postpone the specification of the syntax for function types. The
+choice to include them is based on the idea that Carbon will need something that
+fills the roles of `std::function` and C-style function pointers. Also, the
+features were selected for the basic syntax proposal in a way that aimed to make
+the proposal complete in the sense that each value-oriented feature (functions,
+tuples, structures) comes with syntax for 1) creating values, 2) using values,
+and 3) a type for classifying values.
+
+The parameters of a function type are described using the syntax for type
+expression. There could instead be separate syntax that is special to the
+parameters of a function type, as there is in C++.
+
+The keyword for introducing a function type is `fnty`, which is an arbitrary
+choice. Other alternatives include `fn_type` and `fn`. The `fn` alternative
+would conflict with future plans to use `fn` for lambda expressions. Some
+dislike of `fn_type` was voiced, which motivated the choice of `fnty`. It would
+be nice to do without a keyword, but as the grammar currently stands, that
+introduce ambiguity.
+
+The `pattern` non-terminal is defined in terms of `expression`. It could instead
+be the other way around, defining `expression` in terms of `pattern`.
+
+Regarding tuples and tuple types, this proposal follows
+[#111](https://github.com/carbon-language/carbon-lang/pull/111) in that there is
+not a distinct syntax for tuple types. Instead, the intent is that when a tuple
+of types results from an expression in a context that expects a type, the type
+system will convert the tuple to a tuple type. An alternative approach has been
+discussed in which tuple types have a distinct syntax, such as
+`Tuple(Int, Bool)`.