# Tuples and tuple indexing

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/3646)

<!-- toc -->

## Table of contents

-   [Abstract](#abstract)
-   [Problem](#problem)
-   [Background](#background)
-   [Proposal](#proposal)
-   [Details](#details)
    -   [Lexing](#lexing)
    -   [Indexes as names](#indexes-as-names)
    -   [Precedence](#precedence)
    -   [Expression operand](#expression-operand)
    -   [Bounds](#bounds)
    -   [Tuple slicing](#tuple-slicing)
-   [Rationale](#rationale)
-   [Alternatives considered](#alternatives-considered)
    -   [Alternative lexing rule](#alternative-lexing-rule)
    -   [Decimal indexing restriction](#decimal-indexing-restriction)
    -   [Square bracket notation](#square-bracket-notation)
    -   [Negative indexing from the end of the tuple](#negative-indexing-from-the-end-of-the-tuple)
    -   [Trailing commas](#trailing-commas)

<!-- tocstop -->

## Abstract

Add support for extracting elements of a tuple by their numerical index.

Also formally add the well-established basic syntactic and semantic rules for
tuples, for which we have had leads issues but no proposal, into the design.

## Problem

Currently, the only way to access the elements of a tuple is through pattern
matching. While this handles many cases well, it is sometimes desirable to
access an element of a tuple more succinctly, especially in cases where only a
single element's value is needed.

## Background

In Python, tuple indexing is performed using square brackets:

```python
tup = (1, 2, 3)
# Prints 2.
print(tup[1])
```

In C++, `std::pair` is indexed using `.first` and `.second`, and `std::tuple` is
indexed using `std::get<I>`.

In Rust and Swift, a tuple is indexed using `.N`, where `N` is a decimal integer
literal.

-   Rust disallows digit separators and base prefixes in `N`, but allows certain
    literal suffixes
    [for historical reasons](https://github.com/rust-lang/rust/issues/60210).
-   Swift disallows digit separators and base prefixes in `N`. `swiftc` allows
    leading `0` digits, although this appears to be an unintentional consequence
    of `llvm::StringRef::getAsInteger` allowing them.

The current Carbon documentation suggests using `tuple[i]` for tuple indexing,
but this has not been the subject of an approved proposal.

## Proposal

Formally, we have not yet approved a proposal that says that Carbon has tuple
types, although we have approved several proposals that explicitly include
support for tuples. So, this proposal does that: tuples exist in Carbon, and are
product types with unnamed positional elements.

This proposal also updates the design to match other decisions that have been
made in leads issues but not captured by a proposal, specifically:

-   Leads issue #2191 (one-tuples and one-tuple syntax), despite being focused
    on one-tuples, established the syntax for tuples of all arities.
-   Leads issue #710 established rules for assignment, comparison, and implicit
    conversion of tuples. These operations are performed elementwise, with
    relational comparisons being performed lexicographically.

Finally, the main intent of this proposal is to add support for indexing tuples,
using the following syntaxes:

-   `.` _N_, where _N_ is an integer literal, and
-   `.` `(` _expr_ `)`, where _expr_ is a template constant of integer type.

For pointers to tuples, `->` _N_ and `->` `(` _expr_ `)` are also supported.

## Details

### Lexing

Multi-level tuple indexing will result in constructs such as
`tuple_of_tuples.1.2`. It's important that these are lexed as two tuple indexing
operations, not as `tuple_of_tuples` `.` `1.2`, as it would be under the current
lexical rules, so a new rule is introduced:

-   When a `.` or `->` token is followed immediately by a digit, it is lexed as
    a `.` or `->` token followed by an integer literal, never a real literal.

Note that this results in lexing being slightly contextual: the rule to lex a
token after a `.` or `->` is different from the rule to lex a token in any other
context. However, there is an alternative equivalent formulation of the rule
that is not context-sensitive: that `.integer` is treated as a single lexeme
that produces two tokens, and likewise for `->integer`.

### Indexes as names

The elements of a tuple are treated as if they had decimal integers as their
names: `.0`, `.1`, and so on. It is an error to use a different spelling of that
integer in a simple member access, because that spelling would not match the
element name. For example, `(1, 2).0x0` is invalid, as is `large_tuple.1_2`.
These spellings can be used as an [expression operand](#expression-operand) as
described below: `(1, 2).(0x0)` and `large_tuple.(1_2)` are both valid.

### Precedence

The `.` _N_ syntax has the same precedence as postfix member access syntax, `.`
_name_, and can be combined in the same expression: `a.0.x.1` is valid.

The `.` `(` _expr_ `)` syntax is not new in this proposal, and continues to have
the same precedence as `.` _name_.

### Expression operand

In the `.` `(` _expr_ `)` syntax, if the first operand is a tuple and the second
operand is a constant of any integer type, the result is the corresponding tuple
element, as if specified by a decimal integer literal. This rule is built into
the language; the `.` `(` ... `)` notion is not currently overloadable.

### Bounds

If the tuple index is not between 0 and one less than the number of elements in
the tuple, inclusive, the indexing is invalid.

### Tuple slicing

The current skeleton design suggests using `tuple[a .. b]` to slice tuples. For
example, `tuple[0 .. 2]` could be used to extract the first two elements of a
tuple. Tuple slicing support is not covered by this proposal, but could be added
in the future with syntax such as `tuple.(0 .. 2)`. However, note that there is
a risk that this syntax may lead to an incorrect theory about how Carbon works:
namely, that `tuple.__` gives an element whereas `tuple.(__)` gives a tuple.

## Rationale

Goals:

-   [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
    -   The lexing rule is relatively simple to implement. Tools such as syntax
        highlighters can treat `.i` as a distinct kind of token rather than
        implementing any kind of context-sensitive lexing.
-   [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
    -   Consistent use of tuple field indexes can be used to support code that
        adds new tuple elements over time.
-   [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
    -   This feature allows tuple access to be written more concisely than
        pattern matching would allow.
    -   Lexing `.1.2` as four tokens rather than two avoids a surprise that
        would make chained member access hard to write.
    -   For simple member access, requiring a decimal integer with no digit
        separators allows the member access to be treated as an element name,
        making the indexing easier to understand.
-   [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
    -   This feature provides a migration syntax for existing use of `.first`,
        `.second`, and `std::get<I>`. The permission to use expressions rather
        than only literals supports migration of `std::get<expression>`.

Principles:

-   [Low context sensitivity](/docs/project/principles/low_context_sensitivity.md).
    -   We look only at the character immediately before a numeric literal to
        determine whether it is lexed as a tuple index that stops before the
        next `.` or as a general numeric literal.

## Alternatives considered

### Alternative lexing rule

We could lex `.0`, `.1`, ... as a single token rather than as separate `.` and
`0`, `1`, ... tokens. This would somewhat simplify the lexing rules, because
they would no longer be contextual. We choose to not do this because:

-   This would be inconsistent with our handling of `struct.fieldname`.
-   Either `tuple . 0` would be invalid, unlike `struct.fieldname`, or it would
    need to use a distinct grammar production from `tuple.0`.

We could lex an integer literal when the previous token is `.`, regardless of
whether the literal follows the `.` immediately. For example, we could treat

```carbon
let n: i32 = ((1, 2, 3), 4) . 0.1;
```

as tuple indexing, rather than as a tuple followed by a `.` and a real literal.
This is what Swift does. We choose to not do this because:

-   The `0.1` literal in this case looks like a real literal, not tuple
    indexing, so this would likely cause surprise for readers.
-   This would make the context-sensitive lexing be non-local. The chosen rule
    can be interpreted as lexing `.[0-9]*` as a single lexeme, but forming two
    tokens from it, whereas this alternative rule would be much more firmly a
    context-sensitive lexing rule.

We could get a similar result in other ways:

-   We could allowing a real literal after a `.`, and split it into a pair of
    member accesses when needed. This is
    [what `rustc` does](https://github.com/rust-lang/rust/pull/71322).
-   We could lex a real literal as three tokens: an integer token, a `.` token,
    and a suffix token, and merge them back together in the parser. This is
    [what `intellij` does](https://github.com/intellij-rust/intellij-rust/commit/f82f6cd68567e574bf1e30f5e0d263ee15d1d36e)
    when parsing Rust.

Note that these approaches are not entirely equivalent to each other. In Rust,
for example, the difference is observable in proc macros. Also, using any kind
of token merging or splitting approach would result in the token stream not
matching the interpretation of the program, which is problematic for tooling.
For example, many common Rust syntax highlighters do not properly highlight
chained tuple indexing.

### Decimal indexing restriction

Carbon follows Rust and Swift in restricting tuple indexes to being decimal
integers:

```carbon
// OK
let a: i32 = (1, 2, 3).0;

// Error, invalid index for tuple element.
let b: i32 = (1, 2, 3).0x0;
```

This restriction introduces an inconsistency between `.0x0` and `.(0x0)`, and we
could easily drop it. However, the restriction allows us to consider `.0`, `.1`,
and so on to simply be the names of the tuple elements, analogous to struct
field names, and there isn't a clear utility for permitting a base prefix or a
digit separator in a tuple index.

### Square bracket notation

Instead of `tuple.0` and `tuple.(IndexConstant)`, we could use `tuple[0]` and
`tuple[IndexConstant]`. This would result in more consistent syntax for indexing
with a constant versus with an expression, but would make accessing an element
of a tuple less consistent with accessing an element of a struct. We expect
tuple access with a non-literal index to be a rare operation, so the consistency
with that syntax seems to have lower value.

Also, the use of `.` notation aims to convey the intent of the developer better:
we intend `x[n]` notation to be used primarily for _homogenous_ indexing,
whereas `.` notation is used for _heterogenous_ access. This also reflects the
difference in phase: tuple indexing requires a constant index in the same way
that struct member access requires a constant name, whereas array or container
indexing would typically be expected to permit a runtime index.

The `.N` notation can also be extended to perform member indexing into a struct
or class, at least the latter of which would not be reasonable to support with
`[]` notation. However, such support is not part of this proposal.

Use of `[]` notation has the advantage of reducing visual ambiguity for cases
such as `O.0`, `l.0`, and `Z.0`, which might be visually confused with `0.0`,
`1.0`, and `2.0`, respectively. However, we're not aware of this being a problem
in practice in Rust or Swift, which use this notation, and the same problem
exists even without the `.0` suffix: `F(O, l, Z)` may resemble `F(0, 1, 2)`.

### Negative indexing from the end of the tuple

We could support `tuple.-1`, or perhaps `tuple.(-1)`, as a notation for "the
last element of the tuple", as used for example in Python. We choose not to
support this at this time because such notation can be confusing and has awkward
edge cases. An off-by-one error, or an attempt to access a one-past-the-start
element, will sometimes be accepted and silently do the wrong thing.

If a future proposal introduces tuple slicing, it should revisit this question,
because this kind of indexing from the end is often desirable when forming a
slice. The possibility of using a different notation for this operation should
be considered, such as `tuple.(.size - 1)`.

### Trailing commas

Carbon permits optional trailing commas in tuples, with mandatory trailing
commas for one-tuples. Alternatives to this choice were considered in
[leads issue #2191](https://github.com/carbon-language/carbon-lang/issues/2191).