Переглянути джерело

Introduce git-based target selection. (#3106)

This adds access to the `target-determinator` tool that computes the
possibly impacted set of targets between the current checkout and a
specific `git` commit. The tool is downloaded (and cached) with a Python
script that wraps it and allows us to easily run it in our CI
environment.

And finally, switches the CI for pull requests and the merge queue to
use this script and run a minimal set of impacted tests rather than all
of them. In order to make this as simple as possible, this also merges
the previously separate build and test steps of our CI.

Note that the CI testing post-submit (push events) continues to use a
blanket target pattern so that we have a good backstop in case something
outside of what is tracked here changes. The important paths, especially
the ones that might be in an interactive critical path, are the PR and
merge queue.

This has been tested on the `action-test` branch to try to make sure it
works. Some example runs for folks to inspect:

- `push` event with blanket test:
https://github.com/carbon-language/carbon-lang/actions/runs/6070060958/job/16465422569
- A "large" PR that impacts a bunch of toolchain tests:
- ubuntu-opt:
https://github.com/carbon-language/carbon-lang/actions/runs/6070063050/job/16465428107?pr=3188
- macos-opt:
https://github.com/carbon-language/carbon-lang/actions/runs/6070063050/job/16465428420?pr=3188
- The merge queue for this PR:
- ubuntu-opt:
https://github.com/carbon-language/carbon-lang/actions/runs/6070202665/job/16465825387
- macos-opt:
https://github.com/carbon-language/carbon-lang/actions/runs/6070202665/job/16465825740
- A "small" PR that impacts one test:
- ubuntu-opt:
https://github.com/carbon-language/carbon-lang/actions/runs/6070659668/job/16467147641?pr=3189
- macos-opt:
https://github.com/carbon-language/carbon-lang/actions/runs/6070659668/job/16467147933?pr=3189
- The merge queue for the small PR:
- ubuntu-opt:
https://github.com/carbon-language/carbon-lang/actions/runs/6070754584/job/16467427268
- macos-opt:
https://github.com/carbon-language/carbon-lang/actions/runs/6070754584/job/16467427736

I have observed one failure while testing with this PR, but I've not yet
been able to reproduce it. Every other failure (including empty lists,
etc.) I've tried to address. However, we may have to keep an eye on
actions after this to make sure there isn't some scenario I've not been
able to work through in a test branch.

---------

Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Geoff Romer <gromer@google.com>
Chandler Carruth 2 роки тому
батько
коміт
2751f02258

+ 61 - 8
.github/workflows/tests.yaml

@@ -159,9 +159,28 @@ jobs:
           CACHE_VERSION: 1
         run: |
           cat >user.bazelrc <<EOF
-          # Enable remote cache for our CI.
+          # Enable remote cache for our CI but minimize downloads.
           build --remote_cache=https://storage.googleapis.com/carbon-builds-github-v${CACHE_VERSION}-${{ env.os_for_cache }}
           build --google_credentials=$HOME/gcp-builds-service-account.json
+          build --remote_download_minimal
+
+          # Set an artificially high jobs count. This flag controls the number
+          # of concurrency Bazel itself uses, which is essential for actions
+          # that are internally blocked on for example downloading results form
+          # the cache above. Without setting this high, Bazel will pick a small
+          # number based on the available host CPUs and the reality will be a
+          # long chain of largely serialized download events with little or no
+          # usage of the host machine. Fortunately, local actions are
+          # *separately* gated on `--local_*_resources` that will avoid a large
+          # jobs value overwhelming the host. There is a bug to make downloads
+          # behave completely asynchronously and remove the need for this filed
+          # back in 2018 but work seemed to not finish:
+          # https://github.com/bazelbuild/bazel/issues/6394
+          #
+          # There is a new effort (yay!) but until then it seems worth using the
+          # workaround of a high jobs value. The biggest downside (increased
+          # heap usage) seems like it isn't currently a big loss for our builds.
+          build --jobs=200
 
           # Set an artificially high jobs count. This flag controls the number
           # of concurrency Bazel itself uses, which is essential for actions
@@ -192,23 +211,57 @@ jobs:
         if: steps.filter.outputs.ignore == 'false'
         run: df -h
 
-      # Build all targets first to isolate build failures.
-      - name: Build (${{ matrix.build_mode }})
-        if: steps.filter.outputs.ignore == 'false'
+      # Build and run all targets on branch pushes to ensure we always have a
+      # clean tree. We don't expect this to be an interactive path and so don't
+      # optimize the latency of this step.
+      - name: Test (${{ matrix.build_mode }})
+        if:
+          steps.filter.outputs.ignore == 'false' && github.event_name == 'push'
         env:
           # 'libtool_check_unique failed to generate' workaround.
           # https://github.com/bazelbuild/bazel/issues/14113#issuecomment-999794586
           BAZEL_USE_CPP_ONLY_TOOLCHAIN: 1
-        run: bazelisk build -c ${{ matrix.build_mode }} //...:all
+        run: |
+          bazelisk test -c ${{ matrix.build_mode }} //...
 
-      # Run all test targets.
+      # Compute the set of possible rules impacted by this change using
+      # Bazel-based diffing. This lets PRs and the merge queue have a much more
+      # efficient test CI action by avoiding even enumerating (and downloading)
+      # all of the unaffected Bazel targets.
+      - name: Compute impacted pull request targets
+        if:
+          steps.filter.outputs.ignore == 'false' && github.event_name != 'push'
+        env:
+          # Compute the base SHA from the different event structures.
+          GIT_BASE_SHA:
+            ${{ github.event_name == 'pull_request_target' &&
+            github.event.pull_request.base.sha ||
+            github.event.merge_group.base_sha }}
+          TARGETS_FILE: ${{ runner.temp }}/targets
+        run: |
+          # First fetch the relevant base into the git repository.
+          git fetch --depth=1 origin $GIT_BASE_SHA
+
+          # Then use `target-determinator` as wrapped by our script.
+          ./scripts/target_determinator.py $GIT_BASE_SHA >$TARGETS_FILE
+
+      # Build and run just the tests impacted by the PR or merge group.
       - name: Test (${{ matrix.build_mode }})
-        if: steps.filter.outputs.ignore == 'false'
+        if:
+          steps.filter.outputs.ignore == 'false' && github.event_name != 'push'
         env:
           # 'libtool_check_unique failed to generate' workaround.
           # https://github.com/bazelbuild/bazel/issues/14113#issuecomment-999794586
           BAZEL_USE_CPP_ONLY_TOOLCHAIN: 1
-        run: bazelisk test -c ${{ matrix.build_mode }} //...:all
+          TARGETS_FILE: ${{ runner.temp }}/targets
+        run: |
+          # Bazel requires a test target to run the test command. There may be
+          # no targets or there may only be non-test targets that we want to
+          # build, so simply inject an explicit no-op test target.
+          echo "//scripts:no_op_test" >> $TARGETS_FILE
+
+          bazelisk test -c ${{ matrix.build_mode }} \
+            --target_pattern_file=$TARGETS_FILE
 
       # See "Disk space before build".
       - name: Disk space after build

+ 1 - 0
bazel/sh_run/rules.bzl

@@ -12,6 +12,7 @@ def sh_run(name, args, **kwargs):
     native.sh_binary(
         name = name,
         srcs = ["//bazel/sh_run:exec.sh"],
+        tags = ["manual"],
         args = args,
         **kwargs
     )

+ 11 - 0
scripts/BUILD

@@ -0,0 +1,11 @@
+# Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+# Exceptions. See /LICENSE for license information.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+load("@rules_python//python:defs.bzl", "py_test")
+
+py_test(
+    name = "no_op_test",
+    srcs = ["no_op_test.py"],
+    main = "no_op_test.py",
+)

+ 13 - 0
scripts/no_op_test.py

@@ -0,0 +1,13 @@
+#!/usr/bin/env python3
+
+"""No-op test that should always pass.
+
+This is designed to have the fewest avoidable dependencies to make no-op build
+and test runs in CI as inexpensive as possible.
+"""
+
+__copyright__ = """
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+"""

+ 76 - 32
scripts/scripts_utils.py

@@ -14,10 +14,12 @@ from pathlib import Path
 import platform
 import shutil
 import time
-from typing import Optional
+from typing import Dict, Optional
 import urllib.request
 
-_URL = "https://github.com/bazelbuild/buildtools/releases/download/v6.3.3/"
+_BAZEL_TOOLS_URL = (
+    "https://github.com/bazelbuild/buildtools/releases/download/v6.3.3/"
+)
 
 """Version SHAs.
 
@@ -32,7 +34,7 @@ Gather shas with:
         echo },
     done
 """
-_VERSION_SHAS = {
+_BAZEL_TOOLS_VERSION_SHAS = {
     "buildozer": {
         "darwin-amd64": "9b0bbecb3745250e5ad5a9c36da456699cb55e52999451c3c74047d2b1f0085f",  # noqa: E501
         "darwin-arm64": "085928dd4deffa1a7fd38c66c4475e37326b2d4942408e8e3d993953ae4c626c",  # noqa: E501
@@ -49,6 +51,25 @@ _VERSION_SHAS = {
     },
 }
 
+_TARGET_DETERMINATOR_URL = "https://github.com/bazel-contrib/target-determinator/releases/download/v0.23.0/"  # noqa: E501
+
+"""Version SHAs.
+
+Gather shas with:
+    for v in darwin.amd64 darwin.arm64 linux.amd64 linux.arm64 \
+        windows.amd64.exe
+    do
+        echo "\"$v\": \"$(wget -q -O - https://github.com/bazel-contrib/target-determinator/releases/download/v0.23.0/target-determinator.$v | sha256sum | cut -d ' ' -f1)\", # noqa: E501"
+    done
+"""
+_TARGET_DETERMINATOR_SHAS = {
+    "darwin.amd64": "aba6dce8a978d2174b37dd1355eecba86db93be1ff77742d0753d8efd6a8a316",  # noqa: E501
+    "darwin.arm64": "6c3c308dcfc651408ed5490245ea3e0180fc49d4cc9b762ab84a4b979bcb07b8",  # noqa: E501
+    "linux.amd64": "5200dbca0dd4980690d5060cf8e04abac927efaca143567c51fe24cf973364d2",  # noqa: E501
+    "linux.arm64": "3c04f8bb2742219eb3415c6d675dcfe9175745eb7b1d6c3706085a9987f9f719",  # noqa: E501
+    "windows.amd64.exe": "3aea5bd52fdf29bfe6995ffcacc2b2c2299af02dc58f1039022ff758b58214c3",  # noqa: E501
+}
+
 
 class Release(Enum):
     BUILDOZER = "buildozer"
@@ -85,44 +106,22 @@ def _download(url: str, local_path: Path) -> Optional[int]:
     return None
 
 
-def get_release(release: Release) -> str:
-    """Install a file to carbon-lang's cache.
-
-    release: The release to cache.
-    """
+def _get_cached_binary(name: str, url: str, want_hash: str) -> str:
     cache_dir = Path.home().joinpath(".cache", "carbon-lang-scripts")
     cache_dir.mkdir(parents=True, exist_ok=True)
 
-    # Translate platform information into Bazel's release form.
-    machine = platform.machine()
-    if machine == "x86_64":
-        machine = "amd64"
-    version = f"{platform.system().lower()}-{machine}"
-
-    # Get ready to add .exe for Windows.
-    ext = ""
-    if platform.system() == "Windows":
-        ext = ".exe"
-
-    # Ensure the platform is supported, and grab its hash.
-    if version not in _VERSION_SHAS[release.value]:
-        # If this because a platform support issue, we may need to print errors.
-        exit(f"No {release.value} release available for platform: {version}")
-    want_hash = _VERSION_SHAS[release.value][version]
-
     # Hold a lock while checksumming and downloading the path. Otherwise,
     # parallel runs by pre-commit may conflict with one another with
     # simultaneous downloads.
-    with open(cache_dir.joinpath(f"{release.value}.lock"), "w") as lock_file:
+    with open(cache_dir.joinpath(f"{name}.lock"), "w") as lock_file:
         fcntl.lockf(lock_file.fileno(), fcntl.LOCK_EX)
 
         # Check if there's a cached file that can be used.
-        local_path = cache_dir.joinpath(f"{release.value}{ext}")
+        local_path = cache_dir.joinpath(name)
         if local_path.is_file() and want_hash == _get_hash(local_path):
             return str(local_path)
 
         # Download the file.
-        url = f"{_URL}/{release.value}-{version}{ext}"
         retries = 5
         while True:
             err = _download(url, local_path)
@@ -130,9 +129,7 @@ def get_release(release: Release) -> str:
                 break
             retries -= 1
             if retries == 0:
-                exit(
-                    f"Failed to download {release.value}-{version}: HTTP {err}."
-                )
+                exit(f"Failed to download {url}: HTTP {err}.")
             time.sleep(1)
         local_path.chmod(0o755)
 
@@ -140,7 +137,7 @@ def get_release(release: Release) -> str:
         found_hash = _get_hash(local_path)
         if want_hash != found_hash:
             exit(
-                f"Downloaded {release.value}-{version} but found sha256 "
+                f"Downloaded {url} but found sha256 "
                 f"{found_hash} ({local_path.stat().st_size} bytes), wanted "
                 f"{want_hash}"
             )
@@ -148,6 +145,53 @@ def get_release(release: Release) -> str:
     return str(local_path)
 
 
+def _get_machine() -> str:
+    machine = platform.machine()
+    if machine == "x86_64":
+        machine = "amd64"
+    return machine
+
+
+def _get_platform_ext() -> str:
+    if platform.system() == "Windows":
+        return ".exe"
+    else:
+        return ""
+
+
+def _select_hash(hashes: Dict[str, str], version: str) -> str:
+    # Ensure the platform version is supported and has a hash.
+    if version not in hashes:
+        # If this because a platform support issue, we may need to print errors.
+        exit(f"No release available for platform: {version}")
+    return hashes[version]
+
+
+def get_target_determinator() -> str:
+    """Install the Bazel target-determinator tool to carbon-lang's cache."""
+    # Translate platform information into this tool's release binary form.
+    version = f"{platform.system().lower()}.{_get_machine()}"
+    ext = _get_platform_ext()
+    url = f"{_TARGET_DETERMINATOR_URL}/target-determinator.{version}{ext}"
+    want_hash = _select_hash(_TARGET_DETERMINATOR_SHAS, version)
+
+    return _get_cached_binary(f"target-determinator{ext}", url, want_hash)
+
+
+def get_release(release: Release) -> str:
+    """Install a Bazel-released tool to carbon-lang's cache.
+
+    release: The release to cache.
+    """
+    # Translate platform information into Bazel's release form.
+    version = f"{platform.system().lower()}-{_get_machine()}"
+    ext = _get_platform_ext()
+    url = f"{_BAZEL_TOOLS_URL}/{release.value}-{version}{ext}"
+    want_hash = _select_hash(_BAZEL_TOOLS_VERSION_SHAS[release.value], version)
+
+    return _get_cached_binary(f"{release.value}{ext}", url, want_hash)
+
+
 def locate_bazel() -> str:
     """Returns the bazel command.
 

+ 102 - 0
scripts/target_determinator.py

@@ -0,0 +1,102 @@
+#!/usr/bin/env python3
+
+"""Computes the potentially differing rules from some git commit.
+
+Wraps the "target-determinator" Go program here:
+https://github.com/bazel-contrib/target-determinator
+
+The purpose is to compute the potentially impacted set of targets from some
+provided Git commit to the current checkout.
+
+This script will ensure a cached version of the latest release is available, and
+then forward a limited set of flags to it. This script also filters the
+resulting targets using `bazel query` to make it the most relevant list for
+continuous integration.
+"""
+
+__copyright__ = """
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+"""
+
+import subprocess
+import argparse
+import tempfile
+import sys
+from pathlib import Path
+
+import scripts_utils
+
+
+def log(s: str) -> None:
+    print(s, file=sys.stderr)
+
+
+def filter_targets(bazel: Path, targets: str) -> str:
+    with tempfile.NamedTemporaryFile(mode="w+") as tmp:
+        query = (
+            f"let t = set({targets}) in "
+            "kind(rule, $t) except attr(tags, manual, $t)\n"
+        )
+        query_lines = query.splitlines()
+        if len(query_lines) <= 10:
+            query_snippet = "\n".join(query_lines)
+        else:
+            query_snippet = "\n".join(
+                query_lines[:5] + ["..."] + query_lines[-5:]
+            )
+        log(f"Bazel query snippet:\n```\n{query_snippet}\n```")
+        tmp.write(query)
+        try:
+            p = subprocess.run(
+                [str(bazel), "query", f"--query_file={tmp.name}"],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                check=True,
+                encoding="utf-8",
+            )
+            return p.stdout
+        except subprocess.CalledProcessError as err:
+            log(err.stderr)
+            raise
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(__doc__)
+    parser.add_argument(
+        "baseline", nargs=1, help="Git commit of the diff baseline."
+    )
+    parser.add_argument(
+        "args",
+        nargs="*",
+        help="Remaining args to forward to the underlying tool.",
+    )
+    parsed_args = parser.parse_args()
+
+    scripts_utils.chdir_repo_root()
+    bazel = Path(scripts_utils.locate_bazel())
+    target_determinator = scripts_utils.get_target_determinator()
+
+    p = subprocess.run(
+        [
+            target_determinator,
+            f"--bazel={bazel}",
+            parsed_args.baseline[0],
+        ]
+        + parsed_args.args,
+        check=True,
+        stdout=subprocess.PIPE,
+        encoding="utf-8",
+    )
+
+    targets = p.stdout
+    if targets.strip() != "":
+        targets = filter_targets(bazel, targets)
+    log(f"Found {len(targets.splitlines())} impacted targets!")
+
+    print(targets.rstrip())
+
+
+if __name__ == "__main__":
+    main()