# `char` redesign

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/6710)

<!-- toc -->

## Table of contents

-   [Abstract](#abstract)
-   [Problem](#problem)
-   [Background](#background)
-   [Proposal](#proposal)
-   [Details](#details)
    -   [Add a `char` type literal](#add-a-char-type-literal)
        -   [Escape sequences](#escape-sequences)
    -   [Add a `Core.CharLiteral` type for character literals](#add-a-corecharliteral-type-for-character-literals)
    -   [Operators](#operators)
        -   [Conversion operators](#conversion-operators)
        -   [Comparison operators](#comparison-operators)
        -   [Arithmetic operators](#arithmetic-operators)
            -   [`char` integer parameters](#char-integer-parameters)
            -   [Overflow semantics](#overflow-semantics)
            -   [Preferring i32 returns](#preferring-i32-returns)
    -   [Revoke and replace proposal #1964: Character Literals](#revoke-and-replace-proposal-1964-character-literals)
-   [Rationale](#rationale)
-   [Future work](#future-work)
-   [Alternatives considered](#alternatives-considered)
    -   [Align `char` fully with C++, or make it fully valid](#align-char-fully-with-c-or-make-it-fully-valid)
    -   [Raw character literals](#raw-character-literals)
    -   [Disallow hex escape sequences in character literals](#disallow-hex-escape-sequences-in-character-literals)
    -   [Allow grapheme clusters in character literals](#allow-grapheme-clusters-in-character-literals)
    -   [Reuse string literal syntax for character literals](#reuse-string-literal-syntax-for-character-literals)
        -   [Treat single-character string literals as a third "text literal" type](#treat-single-character-string-literals-as-a-third-text-literal-type)

<!-- tocstop -->

## Abstract

-   Add a `char` type literal mapping to `Core.Char` and equivalent to C++'s
    `char`.
    -   8 bits, unsigned, treated as a single UTF-8
        [code unit](https://en.wikipedia.org/wiki/Character_encoding#Code_unit).
-   Add a `Core.CharLiteral` type for character literals, similar to
    `Core.IntLiteral`.
-   Allow operations for `char` and `Core.CharLiteral` which reinforce the
    "character" concept, versus an integer value.
-   Revokes and replaces
    [#1964: Character Literals](https://github.com/carbon-language/carbon-lang/pull/1964).

## Problem

`char` is an important type due to its common use in C++ code. However, the
related proposal
[#1964: Character Literals](https://github.com/carbon-language/carbon-lang/pull/1964)
has several issues, including:

-   Lacks a decision for `char` handling; it is not mentioned in proposal #1964.
    -   Similarly, decides there are character literals, but more detail is
        needed for implementation.
-   Type literal naming no longer reflects naming consensus.
    -   `Char8` seems potentially more equivalent to `std::char8_t` instead of
        `char`, and for interop purposes these are slightly different types.
        Similar applies to `Char16` and `Char32`.
    -   As a design direction, we have been lowercasing type literals (such as
        `u8`).
-   Conflicting statements about behavior.
    -   For example, "Rationale" states that `var b: u8 = 'a' + 1` would be
        supported, while "Operations" states that `+` is returning a character
        literal (not a `u8`).
    -   For character literals, states "Escape sequences which would result in
        non-UTF-8 encodings or more than one code point are not included."
        However, it goes on to say that `let smiley: Char16 = '\u{1F600}'` is
        valid even though `1F600` would require multiple code units in both
        UTF-8 and UTF-16.
-   Unclear that it gives us a good UTF plan.
    -   Does not decide what a single character in a Carbon string is.
    -   No consideration regarding interop with the `std::char32_t` family of
        types or [ICU](https://github.com/unicode-org/icu) compatibility.

In other words, it's likely we want something similar to `Char32`, but it may be
named something like `Core.Char32` and have slightly different type behaviors
than decided in #1964. On the other hand, we need something compatible with the
C++ `char` in order to proceed with basic C++ interop, and #1964 doesn't provide
that.

## Background

-   [Proposal #1964: Character Literals](https://github.com/carbon-language/carbon-lang/pull/1964)
    is fundamental, and a lot of the underlying thoughts still apply. In
    particular, we still want character types to be distinct from numeric types.
-   [Proposal #199: String literals](https://github.com/carbon-language/carbon-lang/pull/199)
    is important because we want character and string literals to have mirrored
    escaping concepts.
-   [Proposal #5448: Carbon &lt;-> C++ Interop: Primitive Types](https://github.com/carbon-language/carbon-lang/pull/5448)
    left the question of character type mappings open. This proposal aims to
    answer it for `char`.
-   [Issue #5903: Built-in character type questions](https://github.com/carbon-language/carbon-lang/issues/5903)
    addressed type questions.
-   [Issue #5922: Built-in character operators](https://github.com/carbon-language/carbon-lang/issues/5922)
    addressed operators.

## Proposal

The way `char` will work is:

-   Add a `char` type literal.
    -   Carbon's `str` type will use `char` for elements.
    -   For interop, map Carbon's `char` to C++'s `char`.
-   Add a `Core.CharLiteral` type for character literals, similar to
    `Core.IntLiteral`.
-   Provide operators which are consistent with the character concept.

This proposal additionally revokes and replaces proposal #1964, rather than
trying to define which parts we are keeping and which are changing.

## Details

### Add a `char` type literal

`char` is intended to offer a basic construct for Carbon's strings that is both
compatible with UTF-8, and has high fidelity with C++ strings.

In support of that, important notes are:

-   `char` itself will be a
    [type literal](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/words.md#type-literals).
-   `char` notionally represents a UTF-8 code unit.
    -   It can contain invalid code units, as long as it remains 8 bits. We do
        not assume runtime validation.
-   `char` will be backed by `Core.Char`, in the prelude.
    -   `Core.Char` will adapt `u8`.
-   C++ interoperability will transparently map `char` and `Cpp.char` on API
    boundaries.
    -   When used with Carbon, C++ `char` will be unsigned by default
        (`-funsigned-char`). A program can switch back to signed
        (`-fno-unsigned-char`), and Carbon will maintain interoperability but
        bits will be interpreted differently in each language.

#### Escape sequences

Escape sequences are the same as for a string literal. Only one escape sequence
may be provided in a character literal.

### Add a `Core.CharLiteral` type for character literals

`Core.CharLiteral` is the type of a character literal, similar to how
`Core.IntLiteral` is the type of integer literals. It abstractly represents a
single Unicode code point. This gives us a compile-time structure for characters
that can be typed and referred to in programs.

Semantics of a character literal will be equivalent to a
[simple string literal](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/string_literals.md#simple-and-block-string-literals),
except that:

-   A character literal has a validated Unicode code point value.
-   The enclosing character is `'`.
-   The contents are precisely one character or escape sequence.
    -   The `\x` escape sequence is limited to values up to `7F`, where the
        UTF-8 code unit and Unicode code point values are identical.

An important detail of the character literal type is it gives us a way to track
constant values at compile time. For example, `'a' + 1` has a constant value of
`b`. This means we can diagnose uses of character literals that don't represent
a valid Unicode code point, such as `'a' + 0xFFFFFF`.

### Operators

The goal of provided operators is to provide a set of operators which map to
common operations a user would want to do. It is a non-goal to support use of
`char` as an arbitrary byte or integer: developers should use `u8` for that.

In general, `char` and `Core.CharLiteral` operators are intended to be mirrors
of each other.

#### Conversion operators

-   `char`
    -   `ImplicitAs`: None
    -   `ExplicitAs`: To/from `u8`, plus the set of `ImplicitAs` for `u8`.
        -   For example, `u8` has `ImplicitAs` to `u16`, so `char` has
            `ExplicitAs` to `u16`.
-   `Core.CharLiteral`
    -   `ImplicitAs`: to `char` only
    -   `ExplicitAs`: To/from the set of `ImplicitAs` for `i32` and `u32`.
        -   For example, `i32` has `ImplicitAs` to `i64`, so `Core.CharLiteral`
            has `ExplicitAs` to `i64`.
        -   For example, `i64` does not have `ImplicitAs` to `i32`; conversion
            requires two casts, `((i64_val as i32) as Core.CharLiteral)`.

Casting from a `char` to a `Core.CharLiteral` is not supported.

See also
[implicit numeric conversions](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/expressions/implicit_conversions.md#data-types).

#### Comparison operators

-   `char`
    -   `EqWith` and `OrderedWith` when both operands are `char`.
    -   `ImplicitAs` should allow substituting one operand with
        `Core.CharLiteral`.
-   `Core.CharLiteral`
    -   `EqWith` and `OrderedWith` when operands are `Core.CharLiteral`.

#### Arithmetic operators

-   `char`
    -   `AddWith`: `char + &lt;integer> -> char` (with reversible operands)
        -   Equivalent to `(((char as i16) + &lt;integer>) as u8) as char)`
    -   `SubWith`:
        -   `char - &lt;integer> -> char` (non-reversible operands)
            -   Equivalent to `(((char as i16) - &lt;integer>) as u8) as char)`
        -   `char - char -> i32`
            -   Equivalent to `(lhs as i32) - (rhs as i32)`.
            -   `ImplicitAs` should allow substituting one operand with
                `Core.CharLiteral`.
-   `Core.CharLiteral`
    -   `AddWith`: `Core.CharLiteral + &lt;integer> -> Core.CharLiteral` (with
        reversible operands)
    -   `SubWith`:
        -   `Core.CharLiteral - &lt;integer> -> Core.CharLiteral`
            (non-reversible operands)
        -   `Core.CharLiteral - Core.CharLiteral -> i32`
            -   Provides a unicode code point delta.

##### `char` integer parameters

Arbitrary integers are supported for most of these operations. For example, we
want to allow addition of negative numbers, even though the representation of
`char` is unsigned, without requiring additional casts.

##### Overflow semantics

Operations will use error overflow semantics,
[similar to signed integers](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/expressions/arithmetic.md#overflow-and-other-error-conditions).
For example, `(('a' as char) + 500)` is invalid code because it causes `char`
overflow. That's why conversions are to signed values (for example,
`char as i16`).

##### Preferring i32 returns

In arithmetic, `i32` returns are preferred for deltas because they should be
valid for unicode code points. Even though `char` is only 8-bits, using `i32`
for returns there too creates consistency with `Core.CharLiteral`.

### Revoke and replace proposal #1964: Character Literals

This revokes proposal #1964 for simplicity. Rather than trying to detail which
decisions still apply and which don't, this proposal is acting from an
assumption that all decisions there no longer apply. We can still benefit by
pointing towards the rationale in explicitly maintaining decisions, but we want
to go through that step.

## Rationale

-   [Performance-critical software](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#performance-critical-software)
    -   The intent is that Carbon's main string type privileges UTF-8 over other
        potential encodings. A `char` represents a single code unit within that,
        and is consequently efficient to access. It can also be invalid, meaning
        we don't guarantee performing runtime validation for users (avoiding
        performance overhead), and that users might be able to use it for other
        encodings.
-   [Software and language evolution](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#software-and-language-evolution)
    -   `Core.CharLiteral` is designed as a Unicode code point, and even though
        this design doesn't include a way to use values over `7F`, we anticipate
        those will be added in the future. It's being provided as a building
        block for more elaborate Unicode functionality, including both UTF-16
        and UTF-32, even as we prioritize UTF-8.
-   [Code that is easy to read, understand, and write](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
    -   Character literal syntax mirrors string literal syntax. The main
        divergence is `\x80` and higher similar escapes, which are not supported
        due to potentially ambiguous behavior, still in furtherance of this
        goal.
-   [Practical safety and testing mechanisms](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#practical-safety-and-testing-mechanisms)
    -   Restricting the set of operators valid for `char` gives us a way to do
        different sorts of validation that can be more character-oriented than
        if we treated it as an arbitrary byte.
    -   Treating `Core.CharLiteral` as a valid Unicode character allows us to
        provide static checking for some operations, such as `'a' + 1` resulting
        in another valid Unicode code point; more is also transitively possible,
        including involving `char`.
-   [Interoperability with and migration from existing C++ code](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
    -   Modeling `char` as a UTF-8 code unit creates behavior which is very
        similar to C++, but still shifts towards a more character-oriented
        approach. We do expect some migration friction as a consequence (as
        use-cases might need either more casts, or to switch to a byte type).

## Future work

There's still significant future work, including:

-   `signed char`, `unsigned char`
-   `std::char8_t`, `std::char16_t`, `std::char32_t`
-   UTF-16 and UTF-32 support

It should not be assumed that there's any restriction on the designs of those
features, particularly no restrictions from #1964.

## Alternatives considered

### Align `char` fully with C++, or make it fully valid

Alternatives were discussed in
[zygoloid's comment on #5903](https://github.com/carbon-language/carbon-lang/issues/5903#issuecomment-3494068591).

The comment notes that three options were proposed:

1. `char` is fully aligned with C++.

    There is no universal convention for what the value in a `char` means, and
    the numerical encoding of Unicode characters into `char` sequences might
    even be platform-dependent. For example, we might use some code page on
    Windows, EBCDIC on some IBM targets, and probably UTF-8 everywhere else.
    Likely the encoding would match what a character literal in C++ code would
    do for that target. Even when the target normally uses UTF-8, it would be
    reasonable to use an array of `char` as the type of the output buffer when
    transcoding from UTF-8 to some other encoding, and generally an encoded text
    buffer (in any encoding) would typically be represented as an array of
    `char`. It might also be reasonable to use an array of `char` for things
    that aren't necessarily text, such as file contents.

2. `char` models a UTF-8 code unit, although it may not necessarily be valid,
   and may appear in a sequence that is not a valid UTF-8 encoding.

    As with the first option, `char` can represent an integer in [0, 255], although
    it is not an integer type. Higher-level abstractions would likely (eventually)
    be provided to represent different views of the code unit sequence as (for example)
    a sequence of code points or a sequence of graphemes, but the fundamental model
    exposes the encoding. Functions taking `char` or `char` sequences would assume
    UTF-8 encoding, and would need to consider how to handle invalid `char`s and
    invalid `char` sequences.

3. Use a foundation that enforces Unicode string validity, for some definition
   of "Unicode string validity".

    The `char` type is a Unicode character. Strings would notionally be a
    sequence of Unicode characters, possibly also maintaining some higher-level
    string invariants. String indexing, if it exists, would likely treat the
    string as a sequence of Unicode characters. String invariants would be
    enforced by type conversion into the string type rather than within the
    string operations, and certain classes of invalid strings would be
    unrepresentable.

Rationale as evaluated are:

-   **Privilege UTF-8 over other encodings:** UTF-8 is
    [typically the best choice](https://utf8everywhere.org/) for representing
    text, even when targeting languages where characters are 3 bytes in UTF-8
    but 2 in UTF-16, and even on Windows where the system APIs typically operate
    primarily in UTF-16 or UCS-2. We should create affordances that encourage
    use of UTF-8 (such as having the `char` type be conventionally UTF-8).
    -   Our overall goal to support (only) modern environments and a general
        desire for consistency and portability argues against supporting
        non-Unicode encodings for character types.
    -   Having _some_ convention for the meaning of the value of a `char` seems
        important, and the lack of one in C++ has been a notable problem over
        time, leading to the addition of `char8_t` et al, which have not been
        entirely satisfactory solutions due to the existing widespread usage of
        plain `char`.
-   **Do not privilege any particular meaning of "validity":** There are many
    different ways in which you can view a sequence of UTF-8 code units as being
    valid or invalid. For example: Can a string start with a combining
    character? Can it have mismatched LRE/RLE/PDF characters in it? Can it be
    unnormalized, or must it be in NFC, or in NFD? Can it contain unassigned
    Unicode characters? Can it contain PUA characters? Can it contain
    non-characters? Picking any set of answers to these questions as being our
    canonical notion of "validity" is somewhat arbitrary.
-   **Do not privilege any particular level for accessing elements of the string
    other than code units:** There are many different layers of abstraction at
    which you can interpret the contents of a string. The atoms that users want
    to interact with, such as glyphs or grapheme clusters in rendering, or
    combining characters when editing or performing substring searches, aren't
    in one-to-one correspondence with Unicode characters any more than they're
    in one-to-one correspondence with UTF-8 code units. So it's not clear that
    privileging Unicode-character-oriented access (or indeed any of the other
    higher-level Unicode views) is appropriate. However, code units are in
    direct correspondence with bytes of memory, which is directly relevant for
    low-level operations, so there is a reason to provide direct access to
    byte-level / code-unit-level operations.
    -   If string indexing operates on Unicode characters, it would either be
        non-constant-time or would require not storing strings as just a
        sequence of UTF-8. Having a constant-time indexing operation on strings
        seems very important (especially for interop and for meeting C++
        developers where they are), even though a lot of the desired
        functionality (perhaps all of it) can be provided with iterator- or
        cursor-like machinery instead.
-   **Enforcing validity is problematic for existing API structures:** Requiring
    strings to be valid UTF-8 presents difficulties when moving text into or out
    of other sources. For example, when reading text from a validly-encoded
    UTF-8 file into a text buffer, one would need to deal with a read that ends
    in the middle of an encoding of a character. I don't know how Rust deals
    with this, but it seems like it would create significant impedance mismatch
    with C-like buffered I/O utilities. Similarly, when interoperating with C++,
    it would create friction if our string representation requires strings to be
    valid UTF-8 encodings.
-   **We can allow additional invariants without requiring them:** For a
    known-to-be-valid UTF-8 sequence, a higher-level abstraction can be built,
    and similarly, yet-higher-level abstractions can be built for whatever other
    invariants we want to enforce. So using option 2 rather than option 3 as our
    foundation doesn't prevent enforcing invariants in the type system (but nor
    does it encourage doing so).

This proposal is choosing option 2, that `char` models a UTF-8 code unit without
validation. In some sense, option 2 is still "fully aligned with C++", but with
C++'s `char8_t` rather than with C++'s `char`.

### Raw character literals

[Raw string literals](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/string_literals.md#raw-string-literals)
use a `#` prefix. There's limited use for this in character literals;
technically, `'\\'` could instead be `#'\'#`, but that's longer and extra
characters may prove distracting. Raw string literals are more useful when
there's a longer character sequence, whereas character literals have one
character by definition. For simplicity, character literals won't have raw
syntax.

### Disallow hex escape sequences in character literals

A `\x##` escape sequence abstractly represents a UTF-8 code unit. Whereas values
over 7F are valid in string literals (allowing arbitrary byte values), these are
disallowed in character literals because we want a more validated Unicode
behavior. Developers could instead rely on `\u` escapes for `\x`.

It can still be useful to allow `\x` escapes for low-range values because some
developers will still need to specify
[ANSI escapes](https://en.wikipedia.org/wiki/ANSI_escape_code). Carbon
[drops support for some escape sequences](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/lexical_conventions/string_literals.md#escape-sequences),
such as `\a`, and specifically advises `\x` as an alternative for developers
that need it. Requiring `\a` -> `\x07` -> `\u{07}` is incrementally more verbose
syntax, and developers may be confused why `"\x1B"` is allowed for strings but
`'\u{1B}'` is required for characters.

Values over 7F are ambiguous between an arbitrary byte value and a Unicode code
point, and so should be invalid. However, where both interpretations are
identical for UTF-8 (values up to and including 7F), we will allow `\x` escape
sequences.

### Allow grapheme clusters in character literals

This proposal carries forward the decision in #1964
[to not support grapheme clusters](https://github.com/carbon-language/carbon-lang/pull/1964/files#diff-192d5568d8c1d15e68abe0c46cc52cc0b375a372d1dad8d2154d09f8b29666c5R340)
in character literals.

### Reuse string literal syntax for character literals

Instead of using single quotes (for example, `'a'`), we could use string literal
syntax with a conversion (for example, `"a" as char`) for character literals.
This was proposed because it would free up the single quote for other,
unspecified syntax uses.

For background, character literals are common in C++. For example, in
SourceGraph search statistics (some of these are in comments -- a search
limitation):

-   `'(.|\\.)'`:
    [46.2 million](https://sourcegraph.com/search?q=context:global+lang:c%2B%2B+count:50000000+/%27%28.%7C%5C%5C.%29%27/&patternType=keyword&sm=0)
-   `<<`:
    [over 100 million](https://sourcegraph.com/search?q=context:global+lang:c%2B%2B+count:100000000+/+%3C%3C+/&patternType=keyword&sm=0)
-   `>>`:
    [10.4 million](https://sourcegraph.com/search?q=context:global+lang:c%2B%2B+count:50000000+/+%3E%3E+/&patternType=keyword&sm=0)
-   `%`:
    [5.3 million](https://sourcegraph.com/search?q=context:global+lang:c%2B%2B+count:10000000+/+%25+/&patternType=keyword&sm=0)

This creates several disadvantages for removing character literals in Carbon:

-   **Migrating C++ developers to Carbon:** The frequency of use can be expected
    to have trained developers to expect single quotes to be used for
    characters, especially the C++ developers that Carbon is targeting.
    Repurposing them would create a friction for C++ developers to need to
    understand the different meanings of the same syntax in each of C++ and
    Carbon, something Carbon prefers to avoid.

-   **Increased runtime error risks:** Runtime errors could take the form of
    simple increased overhead, such as converting a string literal to a `str`
    then to a `char`. However, they could also be more insidious, such as doing
    `[0]` on a string literal and not validating that the string is exactly one
    character (this would also likely return a null byte for `""[0]`). By having
    a character literal type, Carbon encourages developers to stay within guide
    rails that make it easier to get compile-time behavior and program
    validation.

-   **Block string literal use:** We already have another use for single quotes
    in Carbon:
    [block string literals](/docs/design/lexical_conventions/string_literals.md).
    The syntax may need to change along with removing character literals, to
    make room for other uses of single quotes.

    -   If retained, it would constrain uses of single quotes. For example, a
        unary operator syntax has overlap (that is, if `'a` and `''a` are valid,
        then `'''a` is ambiguous).

    -   The choice of single quotes in proposal
        [#1360: Change raw string literal syntax](https://github.com/carbon-language/carbon-lang/pull/1360)
        was made accounting for single quotes in character literals, and that
        commonality would be lost.

-   **Tooling:** The prevalence of single quotes being used for either strings
    or characters also affects their treatment in tools not specialized to
    Carbon: they expect them to be used for strings. For example, Rust's use of
    single quotes for lifetime annotations has been observed to break
    language-agnostic syntax highlighting.

While a compelling proposal for a different use of single quotes may come up in
the future, freeing up the character for other purposes is insufficient to
justify a different syntax for character literals.

#### Treat single-character string literals as a third "text literal" type

A related alternative with the same goal of eliminating single quotes for
character literals is that, rather than requiring single-character string
literals be explicitly converted to `char`, they could instead have a third type
of text literal. This would implicitly cast to either `str` or `char`.

This approach would lead to three literal types: `StrLiteral`, `CharLiteral`,
and `TextLiteral`. The distinction of `CharLiteral` is important because we
still want to support arithmetic on character literals, such as `'a' + 1` (which
we would not want to be allowed for `StrLiteral`).

The existence of a third type would be important for generic code, even when not
trying to use character literals. For example:

```carbon
  fn StoreValue[U:! type](ref a: Optional(U), b: U) {
    a = b;
  }

  fn StrLogic[T:! type](a: T) {
    var x: Optional(T) = a;
    StoreValue(x, "str");
  }

  fn F() {
    StrLogic("a");
  }
```

Here, `T` is deduced to be `TextLiteral`. However, `U` has no valid value: it's
passed `Optional(TextLiteral)`, while `"str"` is a `StrLiteral` (which should
not be convertible to `TextLiteral`). As a consequence, this code is invalid,
even though the same code would be valid if there were not `TextLiteral` type.

Advantages:

-   Avoids an explicit cast.

Disadvantages:

-   Shares most of the disadvantages of the primary explicit conversion
    approach.
    -   This includes the risk that developers will write `"..."[0]` instead of
        `"..." as char` when they need a character, although the frequency may
        be reduced.
-   Having additional types in common literals could lead to programmer errors
    in deducing generic types, as described above.
-   Implicit casts cause more operator ambiguity.
    -   How are operators that have different meanings for string and character
        literals handled, such as `Cpp.std.cout <<` or `<=>`?
    -   In Carbon, we'd probably still want string operators to work; for
        example, `"a" + "b" => "ab"`, and that can be compile-time. Is `"a" + 1`
        a pointer to the null byte as it is in C++ (similar to `&("a"[1])`), a
        character addition (`'a' + 1 => 'b'`), or does it require an explicit
        cast in order to ensure behavior is deliberate?