Просмотр исходного кода

Weaken digit separator placement rules (#1983)

[Proposal #143: Numeric literals](https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0143.md) added digit separators with strict rules for placement. It missed some use-cases. In order to address this, remove placement rules for numeric literals.

Related issue: #1485 

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
Co-authored-by: Richard Smith <richard@metafoo.co.uk>
Jon Ross-Perkins 3 лет назад
Родитель
Сommit
ba2cfc2d0d
1 измененных файлов с 183 добавлено и 0 удалено
  1. 183 0
      proposals/p1983.md

+ 183 - 0
proposals/p1983.md

@@ -0,0 +1,183 @@
+# Weaken digit separator placement rules
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/1983)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+    -   [3-digit decimal groupings](#3-digit-decimal-groupings)
+    -   [2-digit or 4-digit hexadecimal digit groupings](#2-digit-or-4-digit-hexadecimal-digit-groupings)
+    -   [Disallow digit separators in fractions](#disallow-digit-separators-in-fractions)
+
+<!-- tocstop -->
+
+## Abstract
+
+[Proposal #143: Numeric literals](/proposals/p0143.md) added digit separators
+with strict rules for placement. It missed some use-cases. In order to address
+this, remove placement rules for numeric literals.
+
+## Problem
+
+Digit separator placement rules are too strict:
+
+-   For integers, while the original proposal mentioned Indian digit groupings,
+    Chinese, Japanese, and Korean cultures use 4-digit groupings. This oversight
+    is likely due to the description on
+    [wikipedia](https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping):
+    "but the delimiter commonly separates every three digits".
+-   There are microformats where different placement rules may be desirable. See
+    [alternatives](#alternatives-considered) for more specific examples.
+
+## Background
+
+[Proposal #143: Numeric literals](/proposals/p0143.md) added digit separators
+with strict rules for placement:
+
+-   For decimal integers, the digit separators shall occur every three digits
+    starting from the right. For example, `2_147_483_648`.
+-   For hexadecimal integers, the digit separators shall occur every four digits
+    starting from the right. For example, `0x7FFF_FFFF`.
+-   For real number literals, digit separators can appear in the decimal and
+    hexadecimal integer portions (prior to the period and after the optional `e`
+    or mandatory `p`) as described in the previous bullets. For example,
+    `2_147.483648e12_345` or `0x1_00CA.FEF00Dp+24`
+-   For binary literals, digit separators can appear between any two digits. For
+    example, `0b1_000_101_11`.
+
+This was asked on
+[Issue #1485: Reconsider digit separators](https://github.com/carbon-language/carbon-lang/issues/1485),
+where a proposal was requested.
+
+## Proposal
+
+Switch to simple rules: a single digit separator can appear between any two
+digits. For example:
+
+-   Decimal literals: `1_2_3`
+-   Hexadecimal literals: `0xA_B_C`
+-   Real literals: `1_2.3_4e5_6`
+-   Binary literals are unaffected.
+
+## Rationale
+
+-   [Community and culture](/docs/project/goals.md#community-and-culture)
+    -   Removing restrictions on decimal literals provides flexibility for
+        developers who don't use 3-digit groupings.
+-   [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+    -   This allows for mistakes in groupings, which is undesirable. However,
+        it's useful for microformats to be able to provide their own particular
+        groupings.
+-   [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
+    -   Reduces migration hurdles for C++ code by providing a rule that's more
+        consistent with the C++ rule.
+
+## Alternatives considered
+
+### 3-digit decimal groupings
+
+For decimal separators, we could enforce 3-digit groupings if digit separators
+are used at all.
+
+Advantages:
+
+-   More enforcement of digit groupings can prevent bugs and misreadings
+    resulting from accidentally incorrect groupings.
+
+Disadvantages:
+
+-   Indian, Chinese, Japanese, and Korean digit grouping conventions don't fit
+    under this rule.
+    -   These are worth considering similar to how Carbon allows
+        [Unicode identifiers](/docs/design/lexical_conventions/words.md):
+        although keywords are in English, we can be more flexible for both
+        identifiers and literals.
+-   Microformats could also pose issues. For example, microseconds (`1_00000`),
+    dates (`01_12_1983`), credit cards (`1234_5678_9012_3456`), or identity
+    numbers such as US social security numbers (`123_45_6789`).
+    -   Since the original proposal, more cases have been raised.
+
+Note that any regular grouping rule can present similar issues for Indian digit
+grouping conventions.
+
+[Proposal #143: Numeric literals](/proposals/p0143.md) chose 3-digit decimal
+groupings.
+
+Given there are overall advantages to not enforcing regular digit conventions,
+including for hex digits, it seems unnecessary to conform to the currently
+established decimal conventions. While this removes the enforcement, the
+resulting accidents are considered a reasonable risk.
+
+In theory a code linter could be told to prefer certain formats with options to
+change behavior, although that may remain too low benefit to implement.
+
+### 2-digit or 4-digit hexadecimal digit groupings
+
+Hexadecimal digit groupings could be enforced along two axes:
+
+-   Every 2 (`F_FF`) or 4 (`F_FFFF`) characters, corresponding to one or two
+    bytes respectively.
+-   Regular or irregular placement. For example, if a digit separator is placed
+    every byte (every other digit), we could require regular placement every
+    byte as in `FF_FF_FF_FF`, or irregular as in `FF_FFFFF_FF` which skips one
+    placement.
+
+[Proposal #143: Numeric literals](/proposals/p0143.md) chose 4-digit with
+regular placement.
+
+Advantages:
+
+-   More enforcement of digit groupings can prevent bugs and misreadings
+    resulting from accidentally incorrect groupings.
+
+Disadvantages:
+
+-   Again, microformats become an issue.
+    -   As noted in background, MAC addresses are typically in byte pairs
+        (`00_B0_D0_63_C2_26`) and UUIDs have particular groupings
+        (`123E4567_E89B_12D3_A456_426614174000`).
+-   If all of the other literal formats (decimal, real, and binary) allow
+    arbitrary digit separator placement, enforcing placement in hexadecimal
+    numbers would be an outlier.
+
+While it may be that enforcing 2 character (one byte) groupings with irregular
+placement would prevent sufficient errors versus the developer inconvenience
+risks, it risks becoming an outlier and for only very loose enforcement.
+
+Similar to the preceding conclusion, leaving these issues to code linters may be
+best.
+
+### Disallow digit separators in fractions
+
+[Proposal #143: Numeric literals](/proposals/p0143.md) appears to disallow digit
+separators in fractions. That is, in `1.2345`, `1.23_45` is disallowed. This
+proposal changes that.
+
+Advantages:
+
+-   Reduces the chance of confusion resulting from mistaking a `.` for another
+    `_` separator, as in `1_234_567.890_123_456`.
+
+Disadvantages:
+
+-   While sometimes commas aren't written for digit separators, it's notable
+    that SI advice is to use digit spacing instead
+    ([Digit spacing, rule #16](https://physics.nist.gov/cuu/Units/checklist.html)).
+    As a consequence, it's likely common to want some kind of digit separator
+    after the decimal point.
+-   Would create an inconsistency in placement rules.
+
+We'll allow digit separators in fractions.