# Tuples and tuple indexing [Pull request](https://github.com/carbon-language/carbon-lang/pull/3646) ## Table of contents - [Abstract](#abstract) - [Problem](#problem) - [Background](#background) - [Proposal](#proposal) - [Details](#details) - [Lexing](#lexing) - [Indexes as names](#indexes-as-names) - [Precedence](#precedence) - [Expression operand](#expression-operand) - [Bounds](#bounds) - [Tuple slicing](#tuple-slicing) - [Rationale](#rationale) - [Alternatives considered](#alternatives-considered) - [Alternative lexing rule](#alternative-lexing-rule) - [Decimal indexing restriction](#decimal-indexing-restriction) - [Square bracket notation](#square-bracket-notation) - [Negative indexing from the end of the tuple](#negative-indexing-from-the-end-of-the-tuple) - [Trailing commas](#trailing-commas) ## Abstract Add support for extracting elements of a tuple by their numerical index. Also formally add the well-established basic syntactic and semantic rules for tuples, for which we have had leads issues but no proposal, into the design. ## Problem Currently, the only way to access the elements of a tuple is through pattern matching. While this handles many cases well, it is sometimes desirable to access an element of a tuple more succinctly, especially in cases where only a single element's value is needed. ## Background In Python, tuple indexing is performed using square brackets: ```python tup = (1, 2, 3) # Prints 2. print(tup[1]) ``` In C++, `std::pair` is indexed using `.first` and `.second`, and `std::tuple` is indexed using `std::get`. In Rust and Swift, a tuple is indexed using `.N`, where `N` is a decimal integer literal. - Rust disallows digit separators and base prefixes in `N`, but allows certain literal suffixes [for historical reasons](https://github.com/rust-lang/rust/issues/60210). - Swift disallows digit separators and base prefixes in `N`. `swiftc` allows leading `0` digits, although this appears to be an unintentional consequence of `llvm::StringRef::getAsInteger` allowing them. The current Carbon documentation suggests using `tuple[i]` for tuple indexing, but this has not been the subject of an approved proposal. ## Proposal Formally, we have not yet approved a proposal that says that Carbon has tuple types, although we have approved several proposals that explicitly include support for tuples. So, this proposal does that: tuples exist in Carbon, and are product types with unnamed positional elements. This proposal also updates the design to match other decisions that have been made in leads issues but not captured by a proposal, specifically: - Leads issue #2191 (one-tuples and one-tuple syntax), despite being focused on one-tuples, established the syntax for tuples of all arities. - Leads issue #710 established rules for assignment, comparison, and implicit conversion of tuples. These operations are performed elementwise, with relational comparisons being performed lexicographically. Finally, the main intent of this proposal is to add support for indexing tuples, using the following syntaxes: - `.` _N_, where _N_ is an integer literal, and - `.` `(` _expr_ `)`, where _expr_ is a template constant of integer type. For pointers to tuples, `->` _N_ and `->` `(` _expr_ `)` are also supported. ## Details ### Lexing Multi-level tuple indexing will result in constructs such as `tuple_of_tuples.1.2`. It's important that these are lexed as two tuple indexing operations, not as `tuple_of_tuples` `.` `1.2`, as it would be under the current lexical rules, so a new rule is introduced: - When a `.` or `->` token is followed immediately by a digit, it is lexed as a `.` or `->` token followed by an integer literal, never a real literal. Note that this results in lexing being slightly contextual: the rule to lex a token after a `.` or `->` is different from the rule to lex a token in any other context. However, there is an alternative equivalent formulation of the rule that is not context-sensitive: that `.integer` is treated as a single lexeme that produces two tokens, and likewise for `->integer`. ### Indexes as names The elements of a tuple are treated as if they had decimal integers as their names: `.0`, `.1`, and so on. It is an error to use a different spelling of that integer in a simple member access, because that spelling would not match the element name. For example, `(1, 2).0x0` is invalid, as is `large_tuple.1_2`. These spellings can be used as an [expression operand](#expression-operand) as described below: `(1, 2).(0x0)` and `large_tuple.(1_2)` are both valid. ### Precedence The `.` _N_ syntax has the same precedence as postfix member access syntax, `.` _name_, and can be combined in the same expression: `a.0.x.1` is valid. The `.` `(` _expr_ `)` syntax is not new in this proposal, and continues to have the same precedence as `.` _name_. ### Expression operand In the `.` `(` _expr_ `)` syntax, if the first operand is a tuple and the second operand is a constant of any integer type, the result is the corresponding tuple element, as if specified by a decimal integer literal. This rule is built into the language; the `.` `(` ... `)` notion is not currently overloadable. ### Bounds If the tuple index is not between 0 and one less than the number of elements in the tuple, inclusive, the indexing is invalid. ### Tuple slicing The current skeleton design suggests using `tuple[a .. b]` to slice tuples. For example, `tuple[0 .. 2]` could be used to extract the first two elements of a tuple. Tuple slicing support is not covered by this proposal, but could be added in the future with syntax such as `tuple.(0 .. 2)`. However, note that there is a risk that this syntax may lead to an incorrect theory about how Carbon works: namely, that `tuple.__` gives an element whereas `tuple.(__)` gives a tuple. ## Rationale Goals: - [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem) - The lexing rule is relatively simple to implement. Tools such as syntax highlighters can treat `.i` as a distinct kind of token rather than implementing any kind of context-sensitive lexing. - [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) - Consistent use of tuple field indexes can be used to support code that adds new tuple elements over time. - [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) - This feature allows tuple access to be written more concisely than pattern matching would allow. - Lexing `.1.2` as four tokens rather than two avoids a surprise that would make chained member access hard to write. - For simple member access, requiring a decimal integer with no digit separators allows the member access to be treated as an element name, making the indexing easier to understand. - [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) - This feature provides a migration syntax for existing use of `.first`, `.second`, and `std::get`. The permission to use expressions rather than only literals supports migration of `std::get`. Principles: - [Low context sensitivity](/docs/project/principles/low_context_sensitivity.md). - We look only at the character immediately before a numeric literal to determine whether it is lexed as a tuple index that stops before the next `.` or as a general numeric literal. ## Alternatives considered ### Alternative lexing rule We could lex `.0`, `.1`, ... as a single token rather than as separate `.` and `0`, `1`, ... tokens. This would somewhat simplify the lexing rules, because they would no longer be contextual. We choose to not do this because: - This would be inconsistent with our handling of `struct.fieldname`. - Either `tuple . 0` would be invalid, unlike `struct.fieldname`, or it would need to use a distinct grammar production from `tuple.0`. We could lex an integer literal when the previous token is `.`, regardless of whether the literal follows the `.` immediately. For example, we could treat ```carbon let n: i32 = ((1, 2, 3), 4) . 0.1; ``` as tuple indexing, rather than as a tuple followed by a `.` and a real literal. This is what Swift does. We choose to not do this because: - The `0.1` literal in this case looks like a real literal, not tuple indexing, so this would likely cause surprise for readers. - This would make the context-sensitive lexing be non-local. The chosen rule can be interpreted as lexing `.[0-9]*` as a single lexeme, but forming two tokens from it, whereas this alternative rule would be much more firmly a context-sensitive lexing rule. We could get a similar result in other ways: - We could allowing a real literal after a `.`, and split it into a pair of member accesses when needed. This is [what `rustc` does](https://github.com/rust-lang/rust/pull/71322). - We could lex a real literal as three tokens: an integer token, a `.` token, and a suffix token, and merge them back together in the parser. This is [what `intellij` does](https://github.com/intellij-rust/intellij-rust/commit/f82f6cd68567e574bf1e30f5e0d263ee15d1d36e) when parsing Rust. Note that these approaches are not entirely equivalent to each other. In Rust, for example, the difference is observable in proc macros. Also, using any kind of token merging or splitting approach would result in the token stream not matching the interpretation of the program, which is problematic for tooling. For example, many common Rust syntax highlighters do not properly highlight chained tuple indexing. ### Decimal indexing restriction Carbon follows Rust and Swift in restricting tuple indexes to being decimal integers: ```carbon // OK let a: i32 = (1, 2, 3).0; // Error, invalid index for tuple element. let b: i32 = (1, 2, 3).0x0; ``` This restriction introduces an inconsistency between `.0x0` and `.(0x0)`, and we could easily drop it. However, the restriction allows us to consider `.0`, `.1`, and so on to simply be the names of the tuple elements, analogous to struct field names, and there isn't a clear utility for permitting a base prefix or a digit separator in a tuple index. ### Square bracket notation Instead of `tuple.0` and `tuple.(IndexConstant)`, we could use `tuple[0]` and `tuple[IndexConstant]`. This would result in more consistent syntax for indexing with a constant versus with an expression, but would make accessing an element of a tuple less consistent with accessing an element of a struct. We expect tuple access with a non-literal index to be a rare operation, so the consistency with that syntax seems to have lower value. Also, the use of `.` notation aims to convey the intent of the developer better: we intend `x[n]` notation to be used primarily for _homogenous_ indexing, whereas `.` notation is used for _heterogenous_ access. This also reflects the difference in phase: tuple indexing requires a constant index in the same way that struct member access requires a constant name, whereas array or container indexing would typically be expected to permit a runtime index. The `.N` notation can also be extended to perform member indexing into a struct or class, at least the latter of which would not be reasonable to support with `[]` notation. However, such support is not part of this proposal. Use of `[]` notation has the advantage of reducing visual ambiguity for cases such as `O.0`, `l.0`, and `Z.0`, which might be visually confused with `0.0`, `1.0`, and `2.0`, respectively. However, we're not aware of this being a problem in practice in Rust or Swift, which use this notation, and the same problem exists even without the `.0` suffix: `F(O, l, Z)` may resemble `F(0, 1, 2)`. ### Negative indexing from the end of the tuple We could support `tuple.-1`, or perhaps `tuple.(-1)`, as a notation for "the last element of the tuple", as used for example in Python. We choose not to support this at this time because such notation can be confusing and has awkward edge cases. An off-by-one error, or an attempt to access a one-past-the-start element, will sometimes be accepted and silently do the wrong thing. If a future proposal introduces tuple slicing, it should revisit this question, because this kind of indexing from the end is often desirable when forming a slice. The possibility of using a different notation for this operation should be considered, such as `tuple.(.size - 1)`. ### Trailing commas Carbon permits optional trailing commas in tuples, with mandatory trailing commas for one-tuples. Alternatives to this choice were considered in [leads issue #2191](https://github.com/carbon-language/carbon-lang/issues/2191).