Explorar el Código

Adds a bazel query to retry syncing deps (#5386)

Trying to improve resilience against failures such as:

```
INFO: Repository rules_jvm_external+ instantiated at:
  <builtin>: in <toplevel>
Repository rule http_archive defined at:
  /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_tools/tools/build_defs/repo/http.bzl:392:31: in <toplevel>
ERROR: /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_tools/tools/build_defs/repo/http.bzl:137:45: An error occurred during the fetch of repository 'rules_jvm_external+':
   Traceback (most recent call last):
	File "/home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_tools/tools/build_defs/repo/http.bzl", line 137, column 45, in _http_archive_impl
		download_info = ctx.download_and_extract(
```


https://github.com/carbon-language/carbon-lang/actions/runs/14719625211/job/41311145495?pr=5379

It looks like GitHub currently has a high rate of these, which it
shouldn't, but also maybe we can do a little more to weather these
service issues.

To show flag behavior:

```
╚╡./scripts/run_bazel.py --attempts=5 --retry-all-errors :foo
Command ':foo' not found. Try 'bazel help'.
Retrying exit code 2 because it may be transient...
Command ':foo' not found. Try 'bazel help'.
Retrying exit code 2 because it may be transient...
Command ':foo' not found. Try 'bazel help'.
Retrying exit code 2 because it may be transient...
Command ':foo' not found. Try 'bazel help'.
Retrying exit code 2 because it may be transient...
Command ':foo' not found. Try 'bazel help'.

╚╡./scripts/run_bazel.py --attempts=5 --retry-all-errors query //... | wc -l
INFO: Invocation ID: ffdf9480-0245-442d-885f-ee91a3d86b68
Loading: 0 packages loaded
367

╚╡./scripts/run_bazel.py --attempts=5 :foo
Command ':foo' not found. Try 'bazel help'.
```

On the last run, [test (ubuntu-22.04,
opt)](https://github.com/carbon-language/carbon-lang/actions/runs/14737410033/job/41366888473?pr=5386)
has an example of this working:

```
INFO: Invocation ID: 3f276297-6dc5-4007-a333-dcadd4db55f4
 no actions running
 no actions running
 no actions running
 no actions running
 no actions running
 no actions running
 no actions running
 no actions running
 no actions running
WARNING: Download from https://github.com/bazelbuild/bazel-skylib/releases/download/1.7.1/bazel-skylib-1.7.1.tar.gz failed: class java.io.IOException GET returned 618 jwt:jwt-not-provided
INFO: Repository bazel_skylib+ instantiated at:
  <builtin>: in <toplevel>
Repository rule http_archive defined at:
  /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_tools/tools/build_defs/repo/http.bzl:392:31: in <toplevel>
ERROR: /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_tools/tools/build_defs/repo/http.bzl:137:45: An error occurred during the fetch of repository 'bazel_skylib+':
   Traceback (most recent call last):
	File "/home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_tools/tools/build_defs/repo/http.bzl", line 137, column 45, in _http_archive_impl
		download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error downloading [https://github.com/bazelbuild/bazel-skylib/releases/download/1.7.1/bazel-skylib-1.7.1.tar.gz] to /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_skylib+/temp62784[869](https://github.com/carbon-language/carbon-lang/actions/runs/14737410033/job/41366888473?pr=5386#step:5:887)09382546624/bazel-skylib-1.7.1.tar.gz: GET returned 618 jwt:jwt-not-provided
 no actions running
 no actions running
ERROR: Error loading '@@rules_python+//python/extensions:python.bzl' for module extensions, requested by /home/runner/work/carbon-lang/carbon-lang/MODULE.bazel:147:23: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/rules_python+/python/extensions/python.bzl:48:6: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/rules_python+/python/private/python.bzl:17:6: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_features+/features.bzl:3:6: Encountered error while reading extension file 'globals.bzl': no such package '@@bazel_features++version_extension+bazel_features_globals//': no such package '@@bazel_skylib+//lib': java.io.IOException: Error downloading [https://github.com/bazelbuild/bazel-skylib/releases/download/1.7.1/bazel-skylib-1.7.1.tar.gz] to /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_skylib+/temp6278486909382546624/bazel-skylib-1.7.1.tar.gz: GET re
ERROR: Error loading '@@rules_cc+//cc:extensions.bzl' for module extensions, requested by https://bcr.bazel.build/modules/rules_cc/0.1.1/MODULE.bazel:12:29: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/rules_cc+/cc/extensions.bzl:16:6: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_features+/features.bzl:3:6: Encountered error while reading extension file 'globals.bzl': no such package '@@bazel_features++version_extension+bazel_features_globals//': no such package '@@bazel_skylib+//lib': java.io.IOException: Error downloading [https://github.com/bazelbuild/bazel-skylib/releases/download/1.7.1/bazel-skylib-1.7.1.tar.gz] to /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_skylib+/temp6278486909382546624/bazel-skylib-1.7.1.tar.gz: GET returned 618 jwt:jwt-not-provided: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/rules_cc+/cc/extensions.bzl:16:6: at /h
ERROR: Error loading '@@rules_python+//python/extensions:python.bzl' for module extensions, requested by /home/runner/work/carbon-lang/carbon-lang/MODULE.bazel:147:23: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/rules_python+/python/extensions/python.bzl:48:6: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/rules_python+/python/private/python.bzl:17:6: at /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_features+/features.bzl:3:6: Encountered error while reading extension file 'globals.bzl': no such package '@@bazel_features++version_extension+bazel_features_globals//': no such package '@@bazel_skylib+//lib': java.io.IOException: Error downloading [https://github.com/bazelbuild/bazel-skylib/releases/download/1.7.1/bazel-skylib-1.7.1.tar.gz] to /home/runner/.cache/bazel/_bazel_runner/8f839eaeb716f9d034eabdfa7ebecdb0/external/bazel_skylib+/temp6278486909382546624/bazel-skylib-1.7.1.tar.gz: GET re
INFO: Invocation ID: e2062912-715f-47ec-a0bc-9d1b4fee9e5d
 no actions running
 no actions running
 no actions running
 no actions running
 no actions running
 no actions running
<root> (carbon@_)
Retrying a failure because it may be transient...
INFO: Invocation ID: 7d4ee250-882f-480d-9c8d-2d92e67462fc
Loading: 0 packages loaded
367
```
Jon Ross-Perkins hace 1 año
padre
commit
bd99b74608
Se han modificado 2 ficheros con 31 adiciones y 5 borrados
  1. 19 0
      .github/actions/build-setup-common/action.yml
  2. 12 5
      scripts/run_bazel.py

+ 19 - 0
.github/actions/build-setup-common/action.yml

@@ -111,3 +111,22 @@ runs:
         test --test_output=errors
         EOF
         ./scripts/run_bazel.py info
+
+    - name: Run bazel to sync deps with retry
+      shell: bash
+      run: |
+        # GitHub sometimes has a high failure rate for Bazel's downloads (even
+        # from GitHub URLs). Bazel exits with `1` on HTTP errors, which is hard
+        # to distinguish from a normal, permanent error.
+        #
+        # This workaround runs fast commands that should always pass (although
+        # they may be broken by an invalid PR). All errors are retried. The hope
+        # is that this caches necessary downloads, allowing later commands to
+        # more reliably succeed without retrying "permanent" errors.
+        #
+        # Disable lockfile updates, because some actions want to see
+        # differences.
+        ./scripts/run_bazel.py --attempts=5 --retry-all-errors \
+          mod --lockfile_mode=off deps
+        ./scripts/run_bazel.py --attempts=5 --retry-all-errors \
+          cquery --lockfile_mode=off //... | wc -l

+ 12 - 5
scripts/run_bazel.py

@@ -36,6 +36,11 @@ def main() -> None:
         help="Sets the number of jobs in user.bazelrc on the last attempt. If "
         "there is only one attempt, this will be set immediately.",
     )
+    parser.add_argument(
+        "--retry-all-errors",
+        action="store_true",
+        help="Retries permanent errors in addition to transient.",
+    )
     script_args, bazel_args = parser.parse_known_args()
 
     bazel = scripts_utils.locate_bazel()
@@ -50,12 +55,11 @@ def main() -> None:
 
         p = subprocess.run([bazel] + bazel_args)
 
-        # If this was the last attempt, we're done.
-        if attempt == script_args.attempts:
+        # If this was the last attempt, or it succeeded, we're done.
+        if attempt == script_args.attempts or p.returncode == 0:
             exit(p.returncode)
 
         # Several error codes are reliably permanent, break immediately.
-        # `0`  -- Success.
         # `1`  -- The build failed.
         # `2`  -- Command line or environment problem.
         # `3`  -- Tests failed or timed out, we don't retry at this layer
@@ -66,10 +70,13 @@ def main() -> None:
         # Note that `36` is documented as "likely permanent", but we retry
         # it as most of our transient failures actually produce that error
         # code.
-        if p.returncode in (0, 1, 2, 3, 4, 8):
+        perm_error = (1, 2, 3, 4, 8)
+        if not script_args.retry_all_errors and p.returncode in perm_error:
             exit(p.returncode)
 
-        print("Retrying a failure because it may be transient...")
+        print(
+            f"Retrying exit code {p.returncode} because it may be transient..."
+        )
         # Also sleep a bit to try to skip over transient machine load.
         time.sleep(attempt)