Просмотр исходного кода

Fix a stack use after scope and a heap use after free found by fuzzing. (#3126)

There were two issues contributing to this crash:

- Primarily, the issue is that we queue up diagnostics and don't format
them into a string until we reach the end of compilation. In some code
paths in the driver, we destroyed the Semantics IR object before this
happened. But diagnostics can contain references to Semantics IR
objects, such as strings stored in the string table, which can lead to a
use after destruction bug.

This is fixed by ensuring the diagnotics consumer is flushed before
destroying any of the objects that it can refer to. The current approach
to this is not especially clean, unfortunately, but this requires
fighting C++ as this isn't the order in which it wants to destroy
things.

- This issue was obscured by the Semantics IR's string table holding a
reference to whatever underlying storage it was given rather than its
own string storage, so sometimes it would hold a reference to a string
from the source file, and sometimes a string from the tokenized buffer's
string table. The diagnostics were always flushed before the source file
was destroyed, but not before the tokenized buffer was destroyed. So to
see the issue, you'd need to have a string literal with certain contents
followed by an identifier with a name that matched those contents.

The crash is made more reliable by holding references to the Semantics
IR's string map in its string table, rather than references to someone
else's strings. This also fixes a latent bug where passing a string
temporary to SemanticsIR::AddString would store a dangling reference in
the string table. Incidentally, AddString is also changed to perform
only one hash table lookup rather than two for each added string.
Richard Smith 2 лет назад
Родитель
Сommit
e05523db21

+ 12 - 1
toolchain/driver/driver.cpp

@@ -400,7 +400,7 @@ auto Driver::Compile(const CompileOptions& options) -> bool {
   CARBON_VLOG() << "*** SourceBuffer::CreateFromFile done ***\n";
   // Require flushing the consumer before the source buffer is destroyed,
   // because diagnostics may reference the buffer.
-  auto flush = llvm::make_scope_exit([&]() { consumer->Flush(); });
+  auto flush_for_source = llvm::make_scope_exit([&]() { consumer->Flush(); });
   if (!source.ok()) {
     error_stream_ << "ERROR: Unable to open input source file: "
                   << source.error();
@@ -410,6 +410,8 @@ auto Driver::Compile(const CompileOptions& options) -> bool {
 
   CARBON_VLOG() << "*** TokenizedBuffer::Lex ***\n";
   auto tokenized_source = TokenizedBuffer::Lex(*source, *consumer);
+  // Diagnostics may reference the tokenized buffer.
+  auto flush_for_tokens = llvm::make_scope_exit([&]() { consumer->Flush(); });
   bool has_errors = tokenized_source.has_errors();
   CARBON_VLOG() << "*** TokenizedBuffer::Lex done ***\n";
   if (options.dump_tokens) {
@@ -424,6 +426,8 @@ auto Driver::Compile(const CompileOptions& options) -> bool {
 
   CARBON_VLOG() << "*** ParseTree::Parse ***\n";
   auto parse_tree = ParseTree::Parse(tokenized_source, *consumer, vlog_stream_);
+  // Diagnostics may reference the parse tree.
+  auto flush_for_parse = llvm::make_scope_exit([&]() { consumer->Flush(); });
   has_errors |= parse_tree.has_errors();
   CARBON_VLOG() << "*** ParseTree::Parse done ***\n";
   if (options.dump_parse_tree) {
@@ -439,6 +443,8 @@ auto Driver::Compile(const CompileOptions& options) -> bool {
   CARBON_VLOG() << "*** SemanticsIR::MakeFromParseTree ***\n";
   const SemanticsIR semantics_ir = SemanticsIR::MakeFromParseTree(
       builtin_ir, tokenized_source, parse_tree, *consumer, vlog_stream_);
+  // Diagnostics may reference the semantics IR.
+  auto flush_for_ir = llvm::make_scope_exit([&]() { consumer->Flush(); });
   has_errors |= semantics_ir.has_errors();
   CARBON_VLOG() << "*** SemanticsIR::MakeFromParseTree done ***\n";
   if (options.dump_raw_semantics_ir) {
@@ -463,6 +469,11 @@ auto Driver::Compile(const CompileOptions& options) -> bool {
     CARBON_VLOG() << "*** Stopping before lowering due to syntax errors ***";
     return false;
   }
+
+  // Emit diagnostics now, so that the developer sees them sooner and doesn't
+  // need to wait for code generation.
+  // TODO: If we allow lowering to produce warnings, should we interleave them
+  // with diagnostics produced by earlier steps?
   consumer->Flush();
 
   CARBON_VLOG() << "*** LowerToLLVM ***\n";

+ 32 - 0
toolchain/driver/testdata/fail_flush_errors.carbon

@@ -0,0 +1,32 @@
+// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+// Exceptions. See /LICENSE for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+// ARGS: compile %s
+//
+// AUTOUPDATE
+
+fn F() {
+  // Create diagnostics containing string references, and trigger reallocation
+  // of the string table.
+  // CHECK:STDERR: fail_flush_errors.carbon:[[@LINE+3]]:3: Name `undeclared1` not found.
+  // CHECK:STDERR:   undeclared1;
+  // CHECK:STDERR:   ^
+  undeclared1;
+
+  // Add the name into the string table from the tokenized buffer's string
+  // literal storage. Use a hex escape to ensure that the tokenized buffer
+  // allocates separate storage for the result.
+  "undec\x6Cared2";
+  // CHECK:STDERR: fail_flush_errors.carbon:[[@LINE+3]]:3: Name `undeclared2` not found.
+  // CHECK:STDERR:   undeclared2;
+  // CHECK:STDERR:   ^
+  undeclared2;
+
+  // Add the name into the string table via a declaration rather than an expression.
+  if (true) { var undeclared3: i32 = 0; }
+  // CHECK:STDERR: fail_flush_errors.carbon:[[@LINE+3]]:3: Name `undeclared3` not found.
+  // CHECK:STDERR:   undeclared3;
+  // CHECK:STDERR:   ^
+  undeclared3;
+}

+ 9 - 17
toolchain/semantics/semantics_ir.h

@@ -200,16 +200,17 @@ class SemanticsIR {
 
   // Adds an string, returning an ID to reference it.
   auto AddString(llvm::StringRef str) -> SemanticsStringId {
-    // If the string has already been stored, return the corresponding ID.
-    if (auto existing_id = GetStringID(str)) {
-      return *existing_id;
+    // Look up the string, or add it if it's new.
+    SemanticsStringId next_id(strings_.size());
+    auto [it, added] = string_to_id_.insert({str, next_id});
+
+    if (added) {
+      // Update the reverse mapping from IDs to strings.
+      CARBON_CHECK(it->second == next_id);
+      strings_.push_back(it->first());
     }
 
-    // Allocate the string and store it in the map.
-    SemanticsStringId id(strings_.size());
-    strings_.push_back(str);
-    CARBON_CHECK(string_to_id_.insert({str, id}).second);
-    return id;
+    return it->second;
   }
 
   // Returns the requested string.
@@ -217,15 +218,6 @@ class SemanticsIR {
     return strings_[string_id.index];
   }
 
-  // Returns an ID for the string if it's previously been stored.
-  auto GetStringID(llvm::StringRef str) -> std::optional<SemanticsStringId> {
-    auto str_find = string_to_id_.find(str);
-    if (str_find != string_to_id_.end()) {
-      return str_find->second;
-    }
-    return std::nullopt;
-  }
-
   // Adds a type, returning an ID to reference it.
   auto AddType(SemanticsNodeId node_id) -> SemanticsTypeId {
     SemanticsTypeId type_id(types_.size());