[WebAssembly] Allow AsmTypeCheck detect multiple errors in function #109705

aheejin · 2024-09-23T19:00:46Z

This allows multiple errors to be reported within a function, rather than returning on the first error and not looking at the rest of the function.

I think the rationale for the previous behavior was that upon encountering the first error, the value stack was not in the correct status anymore and the rest of the function checking was not very meaningful. But this patch makes the instruction push the correct result type upon its completion, so the while the possibility of previous error affecting later instructions is not zero, I think this can be more helpful to assembly hand-writers. This also allows us to write multiple error test cases without creating as many functions.

This is what Wabt and Binaryen wast checker/validator do as well.

Also this makes sure we return a value (true/false) within an if for each instruction, removing the need for the long if-else if-else if chain and making them all just ifs. I also added newlines between the ifs, which I feel is easier to read.

llvmbot · 2024-09-23T19:01:20Z

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-webassembly

Author: Heejin Ahn (aheejin)

Changes

This allows multiple errors to be reported within a function, rather than returning on the first error and not looking at the rest of the function.

I think the rationale for the previous behavior was that upon encountering the first error, the value stack was not in the correct status anymore and the rest of the function checking was not very meaningful. But this patch makes the instruction push the correct result type upon its completion, so the while the possibility of previous error affecting later instructions is not zero, I think this can be more helpful to assembly hand-writers. This also allows us to write multiple error test cases without creating as many functions.

This is what Wabt and Binaryen wast parsers do as well.

Full diff: https://github.com/llvm/llvm-project/pull/109705.diff

3 Files Affected:

(modified) llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.cpp (+192-156)
(modified) llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.h (-2)
(modified) llvm/test/MC/WebAssembly/type-checker-errors.s (+22)

diff --git a/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.cpp b/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.cpp
index 9f8f78a5b6adb7..59f3a9f81a3ffb 100644
--- a/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.cpp
+++ b/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.cpp
@@ -70,14 +70,9 @@ void WebAssemblyAsmTypeCheck::dumpTypeStack(Twine Msg) {
 }
 
 bool WebAssemblyAsmTypeCheck::typeError(SMLoc ErrorLoc, const Twine &Msg) {
-  // Once you get one type error in a function, it will likely trigger more
-  // which are mostly not helpful.
-  if (TypeErrorThisFunction)
-    return true;
   // If we're currently in unreachable code, we suppress errors completely.
   if (Unreachable)
     return false;
-  TypeErrorThisFunction = true;
   dumpTypeStack("current stack: ");
   return Parser.Error(ErrorLoc, Msg);
 }
@@ -171,11 +166,11 @@ bool WebAssemblyAsmTypeCheck::checkEnd(SMLoc ErrorLoc, bool PopVals) {
 
 bool WebAssemblyAsmTypeCheck::checkSig(SMLoc ErrorLoc,
                                        const wasm::WasmSignature &Sig) {
+  bool Error = false;
   for (auto VT : llvm::reverse(Sig.Params))
-    if (popType(ErrorLoc, VT))
-      return true;
+    Error |= popType(ErrorLoc, VT);
   Stack.insert(Stack.end(), Sig.Returns.begin(), Sig.Returns.end());
-  return false;
+  return Error;
 }
 
 bool WebAssemblyAsmTypeCheck::getSymRef(SMLoc ErrorLoc, const MCOperand &SymOp,
@@ -260,17 +255,16 @@ bool WebAssemblyAsmTypeCheck::getSignature(SMLoc ErrorLoc,
 }
 
 bool WebAssemblyAsmTypeCheck::endOfFunction(SMLoc ErrorLoc) {
+  bool Error = false;
   // Check the return types.
-  for (auto RVT : llvm::reverse(ReturnTypes)) {
-    if (popType(ErrorLoc, RVT))
-      return true;
-  }
+  for (auto RVT : llvm::reverse(ReturnTypes))
+    Error |= popType(ErrorLoc, RVT);
   if (!Stack.empty()) {
     return typeError(ErrorLoc, std::to_string(Stack.size()) +
                                    " superfluous return values");
   }
   Unreachable = true;
-  return false;
+  return Error;
 }
 
 bool WebAssemblyAsmTypeCheck::typeCheck(SMLoc ErrorLoc, const MCInst &Inst,
@@ -279,179 +273,221 @@ bool WebAssemblyAsmTypeCheck::typeCheck(SMLoc ErrorLoc, const MCInst &Inst,
   auto Name = GetMnemonic(Opc);
   dumpTypeStack("typechecking " + Name + ": ");
   wasm::ValType Type;
+
   if (Name == "local.get") {
-    if (getLocal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    Stack.push_back(Type);
-  } else if (Name == "local.set") {
-    if (getLocal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-  } else if (Name == "local.tee") {
-    if (getLocal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-    Stack.push_back(Type);
-  } else if (Name == "global.get") {
-    if (getGlobal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    Stack.push_back(Type);
-  } else if (Name == "global.set") {
-    if (getGlobal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-  } else if (Name == "table.get") {
-    if (getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-    Stack.push_back(Type);
-  } else if (Name == "table.set") {
-    if (getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-  } else if (Name == "table.size") {
-    if (getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
+    if (!getLocal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type)) {
+      Stack.push_back(Type);
+      return false;
+    }
+    return true;
+  }
+
+  if (Name == "local.set") {
+    if (!getLocal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
+      return popType(ErrorLoc, Type);
+    return true;
+  }
+
+  if (Name == "local.tee") {
+    if (!getLocal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type)) {
+      bool Error = popType(ErrorLoc, Type);
+      Stack.push_back(Type);
+      return Error;
+    }
+    return true;
+  }
+
+  if (Name == "global.get") {
+    if (!getGlobal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type)) {
+      Stack.push_back(Type);
+      return false;
+    }
+    return true;
+  }
+
+  if (Name == "global.set") {
+    if (!getGlobal(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
+      return popType(ErrorLoc, Type);
+    return true;
+  }
+
+  if (Name == "table.get") {
+    bool Error = popType(ErrorLoc, wasm::ValType::I32);
+    if (!getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type)) {
+      Stack.push_back(Type);
+      return Error;
+    }
+    return true;
+  }
+
+  if (Name == "table.set") {
+    bool Error = false;
+    if (!getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
+      Error |= popType(ErrorLoc, Type);
+    else
+      Error = true;
+    Error |= popType(ErrorLoc, wasm::ValType::I32);
+    return Error;
+  }
+
+  if (Name == "table.size") {
+    bool Error = getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type);
     Stack.push_back(wasm::ValType::I32);
-  } else if (Name == "table.grow") {
-    if (getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
+    return Error;
+  }
+
+  if (Name == "table.grow") {
+    bool Error = popType(ErrorLoc, wasm::ValType::I32);
+    if (!getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
+      Error |= popType(ErrorLoc, Type);
+    else
+      Error = true;
     Stack.push_back(wasm::ValType::I32);
-  } else if (Name == "table.fill") {
-    if (getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
-      return true;
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-  } else if (Name == "memory.fill") {
+    return Error;
+  }
+
+  if (Name == "table.fill") {
+    bool Error = popType(ErrorLoc, wasm::ValType::I32);
+    if (!getTable(Operands[1]->getStartLoc(), Inst.getOperand(0), Type))
+      Error |= popType(ErrorLoc, Type);
+    else
+      Error = true;
+    Error |= popType(ErrorLoc, wasm::ValType::I32);
+    return Error;
+  }
+
+  if (Name == "memory.fill") {
     Type = is64 ? wasm::ValType::I64 : wasm::ValType::I32;
-    if (popType(ErrorLoc, Type))
-      return true;
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-  } else if (Name == "memory.copy") {
+    bool Error = popType(ErrorLoc, Type);
+    Error |= popType(ErrorLoc, wasm::ValType::I32);
+    Error |= popType(ErrorLoc, Type);
+    return Error;
+  }
+
+  if (Name == "memory.copy") {
     Type = is64 ? wasm::ValType::I64 : wasm::ValType::I32;
-    if (popType(ErrorLoc, Type))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-  } else if (Name == "memory.init") {
+    bool Error = popType(ErrorLoc, Type);
+    Error |= popType(ErrorLoc, Type);
+    Error |= popType(ErrorLoc, Type);
+    return Error;
+  }
+
+  if (Name == "memory.init") {
     Type = is64 ? wasm::ValType::I64 : wasm::ValType::I32;
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-    if (popType(ErrorLoc, Type))
-      return true;
-  } else if (Name == "drop") {
-    if (popType(ErrorLoc, {}))
-      return true;
-  } else if (Name == "try" || Name == "block" || Name == "loop" ||
-             Name == "if") {
-    if (Name == "if" && popType(ErrorLoc, wasm::ValType::I32))
-      return true;
+    bool Error = popType(ErrorLoc, wasm::ValType::I32);
+    Error |= popType(ErrorLoc, wasm::ValType::I32);
+    Error |= popType(ErrorLoc, Type);
+    return Error;
+  }
+
+  if (Name == "drop") {
+    return popType(ErrorLoc, {});
+  }
+
+  if (Name == "try" || Name == "block" || Name == "loop" || Name == "if") {
     if (Name == "loop")
       BrStack.emplace_back(LastSig.Params.begin(), LastSig.Params.end());
     else
       BrStack.emplace_back(LastSig.Returns.begin(), LastSig.Returns.end());
-  } else if (Name == "end_block" || Name == "end_loop" || Name == "end_if" ||
-             Name == "else" || Name == "end_try" || Name == "catch" ||
-             Name == "catch_all" || Name == "delegate") {
-    if (checkEnd(ErrorLoc,
-                 Name == "else" || Name == "catch" || Name == "catch_all"))
+    if (Name == "if" && popType(ErrorLoc, wasm::ValType::I32))
       return true;
+    return false;
+  }
+
+  if (Name == "end_block" || Name == "end_loop" || Name == "end_if" ||
+      Name == "else" || Name == "end_try" || Name == "catch" ||
+      Name == "catch_all" || Name == "delegate") {
+    bool Error = checkEnd(ErrorLoc, Name == "else" || Name == "catch" ||
+                                        Name == "catch_all");
     Unreachable = false;
     if (Name == "catch") {
       const wasm::WasmSignature *Sig = nullptr;
-      if (getSignature(Operands[1]->getStartLoc(), Inst.getOperand(0),
-                       wasm::WASM_SYMBOL_TYPE_TAG, Sig))
-        return true;
-      // catch instruction pushes values whose types are specified in the tag's
-      // "params" part
-      Stack.insert(Stack.end(), Sig->Params.begin(), Sig->Params.end());
+      if (!getSignature(Operands[1]->getStartLoc(), Inst.getOperand(0),
+                        wasm::WASM_SYMBOL_TYPE_TAG, Sig))
+        // catch instruction pushes values whose types are specified in the
+        // tag's "params" part
+        Stack.insert(Stack.end(), Sig->Params.begin(), Sig->Params.end());
+      else
+        Error = true;
     }
-  } else if (Name == "br") {
+    return Error;
+  }
+
+  if (Name == "br") {
     const MCOperand &Operand = Inst.getOperand(0);
     if (!Operand.isImm())
-      return false;
-    if (checkBr(ErrorLoc, static_cast<size_t>(Operand.getImm())))
-      return true;
-  } else if (Name == "return") {
-    if (endOfFunction(ErrorLoc))
       return true;
-  } else if (Name == "call_indirect" || Name == "return_call_indirect") {
+    return checkBr(ErrorLoc, static_cast<size_t>(Operand.getImm()));
+  }
+
+  if (Name == "return") {
+    return endOfFunction(ErrorLoc);
+  }
+
+  if (Name == "call_indirect" || Name == "return_call_indirect") {
     // Function value.
-    if (popType(ErrorLoc, wasm::ValType::I32))
-      return true;
-    if (checkSig(ErrorLoc, LastSig))
-      return true;
+    bool Error = popType(ErrorLoc, wasm::ValType::I32);
+    Error |= checkSig(ErrorLoc, LastSig);
     if (Name == "return_call_indirect" && endOfFunction(ErrorLoc))
       return true;
-  } else if (Name == "call" || Name == "return_call") {
+    return Error;
+  }
+
+  if (Name == "call" || Name == "return_call") {
+    bool Error = false;
     const wasm::WasmSignature *Sig = nullptr;
-    if (getSignature(Operands[1]->getStartLoc(), Inst.getOperand(0),
-                     wasm::WASM_SYMBOL_TYPE_FUNCTION, Sig))
-      return true;
-    if (checkSig(ErrorLoc, *Sig))
-      return true;
+    if (!getSignature(Operands[1]->getStartLoc(), Inst.getOperand(0),
+                      wasm::WASM_SYMBOL_TYPE_FUNCTION, Sig))
+      Error |= checkSig(ErrorLoc, *Sig);
+    else
+      Error = true;
     if (Name == "return_call" && endOfFunction(ErrorLoc))
       return true;
-  } else if (Name == "unreachable") {
+    return Error;
+  }
+
+  if (Name == "unreachable") {
     Unreachable = true;
-  } else if (Name == "ref.is_null") {
-    if (popRefType(ErrorLoc))
-      return true;
+    return false;
+  }
+
+  if (Name == "ref.is_null") {
+    bool Error = popRefType(ErrorLoc);
     Stack.push_back(wasm::ValType::I32);
-  } else if (Name == "throw") {
+    return Error;
+  }
+
+  if (Name == "throw") {
     const wasm::WasmSignature *Sig = nullptr;
-    if (getSignature(Operands[1]->getStartLoc(), Inst.getOperand(0),
-                     wasm::WASM_SYMBOL_TYPE_TAG, Sig))
-      return true;
-    if (checkSig(ErrorLoc, *Sig))
-      return true;
-  } else {
-    // The current instruction is a stack instruction which doesn't have
-    // explicit operands that indicate push/pop types, so we get those from
-    // the register version of the same instruction.
-    auto RegOpc = WebAssembly::getRegisterOpcode(Opc);
-    assert(RegOpc != -1 && "Failed to get register version of MC instruction");
-    const auto &II = MII.get(RegOpc);
-    // First pop all the uses off the stack and check them.
-    for (unsigned I = II.getNumOperands(); I > II.getNumDefs(); I--) {
-      const auto &Op = II.operands()[I - 1];
-      if (Op.OperandType == MCOI::OPERAND_REGISTER) {
-        auto VT = WebAssembly::regClassToValType(Op.RegClass);
-        if (popType(ErrorLoc, VT))
-          return true;
-      }
-    }
-    // Now push all the defs onto the stack.
-    for (unsigned I = 0; I < II.getNumDefs(); I++) {
-      const auto &Op = II.operands()[I];
-      assert(Op.OperandType == MCOI::OPERAND_REGISTER && "Register expected");
+    if (!getSignature(Operands[1]->getStartLoc(), Inst.getOperand(0),
+                      wasm::WASM_SYMBOL_TYPE_TAG, Sig))
+      return checkSig(ErrorLoc, *Sig);
+    return true;
+  }
+
+  // The current instruction is a stack instruction which doesn't have
+  // explicit operands that indicate push/pop types, so we get those from
+  // the register version of the same instruction.
+  auto RegOpc = WebAssembly::getRegisterOpcode(Opc);
+  assert(RegOpc != -1 && "Failed to get register version of MC instruction");
+  const auto &II = MII.get(RegOpc);
+  bool Error = false;
+  // First pop all the uses off the stack and check them.
+  for (unsigned I = II.getNumOperands(); I > II.getNumDefs(); I--) {
+    const auto &Op = II.operands()[I - 1];
+    if (Op.OperandType == MCOI::OPERAND_REGISTER) {
       auto VT = WebAssembly::regClassToValType(Op.RegClass);
-      Stack.push_back(VT);
+      Error |= popType(ErrorLoc, VT);
     }
   }
-  return false;
+  // Now push all the defs onto the stack.
+  for (unsigned I = 0; I < II.getNumDefs(); I++) {
+    const auto &Op = II.operands()[I];
+    assert(Op.OperandType == MCOI::OPERAND_REGISTER && "Register expected");
+    auto VT = WebAssembly::regClassToValType(Op.RegClass);
+    Stack.push_back(VT);
+  }
+  return Error;
 }
 
 } // end namespace llvm
diff --git a/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.h b/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.h
index 9ba5693719e91a..4020234e2432ff 100644
--- a/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.h
+++ b/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.h
@@ -33,7 +33,6 @@ class WebAssemblyAsmTypeCheck final {
   SmallVector<wasm::ValType, 16> LocalTypes;
   SmallVector<wasm::ValType, 4> ReturnTypes;
   wasm::WasmSignature LastSig;
-  bool TypeErrorThisFunction = false;
   bool Unreachable = false;
   bool is64;
 
@@ -68,7 +67,6 @@ class WebAssemblyAsmTypeCheck final {
     BrStack.clear();
     LocalTypes.clear();
     ReturnTypes.clear();
-    TypeErrorThisFunction = false;
     Unreachable = false;
   }
 };
diff --git a/llvm/test/MC/WebAssembly/type-checker-errors.s b/llvm/test/MC/WebAssembly/type-checker-errors.s
index e8b8274036a832..3106fe76c8449f 100644
--- a/llvm/test/MC/WebAssembly/type-checker-errors.s
+++ b/llvm/test/MC/WebAssembly/type-checker-errors.s
@@ -93,12 +93,14 @@ global_set_type_mismatch:
 
 table_get_expected_expression_operand:
   .functype table_get_expected_expression_operand () -> ()
+  i32.const 0
 # CHECK: :[[@LINE+1]]:13: error: expected expression operand
   table.get 1
   end_function
 
 table_get_missing_tabletype:
   .functype table_get_missing_tabletype () -> ()
+  i32.const 0
 # CHECK: :[[@LINE+1]]:13: error: symbol foo: missing .tabletype
   table.get foo
   end_function
@@ -851,3 +853,23 @@ br_incorrect_func_signature:
   drop
   i32.const 1
   end_function
+
+multiple_errors_in_function:
+  .functype multiple_errors_in_function () -> ()
+# CHECK: :[[@LINE+2]]:3: error: empty stack while popping i32
+# CHECK: :[[@LINE+1]]:13: error: expected expression operand
+  table.get 1
+
+# CHECK: :[[@LINE+3]]:3: error: empty stack while popping i32
+# CHECK: :[[@LINE+2]]:3: error: empty stack while popping externref
+# CHECK: :[[@LINE+1]]:3: error: empty stack while popping i32
+  table.fill valid_table
+
+  f32.const 0.0
+  ref.null_extern
+# CHECK: :[[@LINE+2]]:3: error: popped externref, expected i32
+# CHECK: :[[@LINE+1]]:3: error: popped f32, expected i32
+  i32.add
+  drop
+
+  end_function

This allows multiple errors to be reported within a function, rather than returning on the first error and not looking at the rest of the function. I think the rationale for the previous behavior was that upon encountering the first error, the value stack was not in the correct status anymore and the rest of the function checking was not very meaningful. But this patch makes the instruction push the correct result type upon its completion, so the while the possibility of previous error affecting later instructions is not zero, I think this can be more helpful to assembly hand-writers. This also allows us to write multiple error test cases without creating as many functions. This is what Wabt and Binaryen wast parsers do as well. Also this makes sure we return a value (true/false) within an `if` for each instruction, removing the need for the long `if`-`else if`- `else if` chain and making them all just `if`s. I also added a newline between each `if`s, which I feel is easier to read.

sbc100 · 2024-09-23T19:26:57Z

llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmTypeCheck.cpp

@@ -70,14 +70,9 @@ void WebAssemblyAsmTypeCheck::dumpTypeStack(Twine Msg) {
 }

 bool WebAssemblyAsmTypeCheck::typeError(SMLoc ErrorLoc, const Twine &Msg) {
-  // Once you get one type error in a function, it will likely trigger more
-  // which are mostly not helpful.
-  if (TypeErrorThisFunction)


This change looks reasonable to me, but I do wonder if we should maybe do as thomas suggested in the meeting and only report one error per block? So this would be TypeErrorThisBlock?

I don't feel too strongly though.

We can probably make it a follow-up or an improvement, but I feel that's more confusing, given that no other tools work that way. As I summarized in #109705 (comment),

Binaryen parser: Bail out at first error

Binaryen validator: Report all errors

Wabt parser: Report all errors

So no other tools do "report future errors but only one error per block" thing.

I guess Binaryen's validator can produce the same kind of false positive error messages if the parser hooks up the parents and children incorrectly due to the buggy input.

Reporting all the errors seems fine, but reporting just one error per block might be a nice way to generate fewer false positives. The other tools could benefit from that, too, so this idea isn't specific to LLVM in any way. OTOH, the number of users of this functionality is tiny, so it's probably not worth investing much in the quality of the error messages.

Yeah I agree that can be useful, but I also agree that it's not clear how much effort is worth spending on this. Will follow up if that functionality seems necessary later.

tlively · 2024-09-23T19:28:26Z

This is what Wabt and Binaryen wast parsers do as well.

Binaryen's text parser bails out at the first error, although it doesn't do any type checking.

aheejin · 2024-09-23T19:55:22Z

This is what Wabt and Binaryen wast parsers do as well.

Binaryen's text parser bails out at the first error, although it doesn't do any type checking.

Yeah sorry I should've been clearer.

Test parsing bails out at the first error. For example

  (func $test
    (drop
      (i32.add
        (i32.const 0)
      )
    )
    (drop
      (i32.sub
        (i32.const 0)
        (f32.const 0.0)
      )
    )
  )

This will error out at i32.add saying

Fatal: 4:7: error: popping from empty stack

It's mostly that Binaryen cannot continue in this case, not that it wouldn't by choice, because Binaryen IR is an AST and it cannot construct an AST from this.

But as long as the number of stack values are correct, parsing is done and the validator reports multiple errors. For example the program is

  (func $test
    (drop
      (i32.add
        (i32.const 0)
        (f32.const 0.0)
      )
    )
    (drop
      (i32.sub
        (i32.const 0)
        (f32.const 0.0)
      )
    )
  )

[wasm-validator error in function test] i32 != f32: binary child types must be equal, on 
(i32.add
 (i32.const 0)
 (f32.const 0)
)
[wasm-validator error in function test] i32 != f32: binary child types must be equal, on 
(i32.sub
 (i32.const 0)
 (f32.const 0)
)
Fatal: error validating input

In case of Wabt it continues in both cases.
Wat file:

(module
  (func
    i32.const 0
    i32.add
    drop
    i32.const 0
    f32.const 0.0
    i32.sub
    drop
   ))

Error:

test.txt:7:5: error: type mismatch in i32.add, expected [i32, i32] but got [i32]
    i32.add
    ^^^^^^^
test.txt:11:5: error: type mismatch in i32.sub, expected [i32, i32] but got [i32, f32]
    i32.sub
    ^^^^^^^

Wat file:

(module
  (func
    i32.const 0
    f32.const 0.0
    i32.add
    drop
    i32.const 0
    f32.const 0.0
    i32.sub
    drop
   ))

Error:

test.txt:7:5: error: type mismatch in i32.add, expected [i32, i32] but got [i32, f32]
    i32.add
    ^^^^^^^
test.txt:11:5: error: type mismatch in i32.sub, expected [i32, i32] but got [i32, f32]
    i32.sub
    ^^^^^^^

dschuff

LGTM; maybe also clarify the commit message to say this is the behavior of the validators (rather than the parser, as mentioned upthread)

aheejin · 2024-09-24T18:17:32Z

@sbc100 @tlively Other than the one-per-block error option for the potential follow-up, do you have any other comments?

tlively

No other comments 👍

This unifies the way we check types in various places in AsmTypeCheck. The objectives of this PR are: - We now use `checkTypes` for all type checking and `checkAndPopTypes` for type checking + popping. All other functions are helper functions to call these two functions. - We now support comparisons of types between vectors. This lets us printing error messages in more readable way. When an instruction takes [i32, i64] but the stack top is [f32, f64], now instead of ```console error: type mismatch, expected i64 but got f64 error: type mismatch, expected i32 but got f32 ``` we can print this ```console error: type mismatch, expected [i32, i64] but got [f32, f64] ``` which is also the format Wabt checker prints. This also helps printing more meaningful messages when there are superfluous values on the stack at the end of the function, such as: ```console error: type mismatch, expected [] but got [i32, exnref] ``` Actually, many instructions are not utilizing this batch printing now, which still causes multiple error messages to be printed for a single instruction. This will be improved in a follow-up. - The value stack now supports `Any` and `Ref`. There are instructions that requires the type to be anything. Also instructions like `ref.is_null` requires the type to be any reference types. Type comparison function will handle this types accordingly, meaning `match(I32, Any)` or `match(externref, Ref)` will succeed. The changes in `type-checker-errors.s` are mostly the message format changes. One downside of the new message format is that it doesn't have instruction names in it. I plan to improve that in a potential follow-up. This also made some modifications in the instructions in `type-checker-errors.s`. Currently, except for a few functions I've recently added at the end, each function tests for a single error, because the type checker used to bail out after the first error until llvm#109705. But many functions included multiple errors anyway, which I don't think was the intention of the original writer. So I added some instructions to remove the other errors which are not being tested. (In some cases I added more error checking lines instead, when I felt that could be relevant.) Thanks to the new `ExactMatch` option in `checkTypes` function family, we now can distinguish the cases when to check against only the top of the value stack and when to check against the whole stack (e.g. to check whether we have any superfluous values remaining at the end of the function). `return` or `return_call(_indirect)` can set `ExactMatch` to `false` because they don't care about the superfluous values. This makes `type-checker-return.s` succeed and I was able to remove the `FIXME`. This is the basis of the PR that fixes block parameter/return type handling in the checker, but does not yet include the actual block-related functionality, which will be submitted separately after this PR.

This unifies the way we check types in various places in AsmTypeCheck. The objectives of this PR are: - We now use `checkTypes` for all type checking and `checkAndPopTypes` for type checking + popping. All other functions are helper functions to call these two functions. - We now support comparisons of types between vectors. This lets us printing error messages in more readable way. When an instruction takes [i32, i64] but the stack top is [f32, f64], now instead of ```console error: type mismatch, expected i64 but got f64 error: type mismatch, expected i32 but got f32 ``` we can print this ```console error: type mismatch, expected [i32, i64] but got [f32, f64] ``` which is also the format Wabt checker prints. This also helps printing more meaningful messages when there are superfluous values on the stack at the end of the function, such as: ```console error: type mismatch, expected [] but got [i32, exnref] ``` Actually, many instructions are not utilizing this batch printing now, which still causes multiple error messages to be printed for a single instruction. This will be improved in a follow-up. - The value stack now supports `Any` and `Ref`. There are instructions that requires the type to be anything. Also instructions like `ref.is_null` requires the type to be any reference types. Type comparison function will handle this types accordingly, meaning `match(I32, Any)` or `match(externref, Ref)` will succeed. The changes in `type-checker-errors.s` are mostly the message format changes. One downside of the new message format is that it doesn't have instruction names in it. I plan to improve that in a potential follow-up. This also made some modifications in the instructions in `type-checker-errors.s`. Currently, except for a few functions I've recently added at the end, each function tests for a single error, because the type checker used to bail out after the first error until #109705. But many functions included multiple errors anyway, which I don't think was the intention of the original writer. So I added some instructions to remove the other errors which are not being tested. (In some cases I added more error checking lines instead, when I felt that could be relevant.) Thanks to the new `ExactMatch` option in `checkTypes` function family, we now can distinguish the cases when to check against only the top of the value stack and when to check against the whole stack (e.g. to check whether we have any superfluous values remaining at the end of the function). `return` or `return_call(_indirect)` can set `ExactMatch` to `false` because they don't care about the superfluous values. This makes `type-checker-return.s` succeed and I was able to remove the `FIXME`. This is the basis of the PR that fixes block parameter/return type handling in the checker, but does not yet include the actual block-related functionality, which will be submitted separately after this PR.

This unifies the way we check types in various places in AsmTypeCheck. The objectives of this PR are: - We now use `checkTypes` for all type checking and `checkAndPopTypes` for type checking + popping. All other functions are helper functions to call these two functions. - We now support comparisons of types between vectors. This lets us printing error messages in more readable way. When an instruction takes [i32, i64] but the stack top is [f32, f64], now instead of ```console error: type mismatch, expected i64 but got f64 error: type mismatch, expected i32 but got f32 ``` we can print this ```console error: type mismatch, expected [i32, i64] but got [f32, f64] ``` which is also the format Wabt checker prints. This also helps printing more meaningful messages when there are superfluous values on the stack at the end of the function, such as: ```console error: type mismatch, expected [] but got [i32, exnref] ``` Actually, many instructions are not utilizing this batch printing now, which still causes multiple error messages to be printed for a single instruction. This will be improved in a follow-up. - The value stack now supports `Any` and `Ref`. There are instructions that requires the type to be anything. Also instructions like `ref.is_null` requires the type to be any reference types. Type comparison function will handle this types accordingly, meaning `match(I32, Any)` or `match(externref, Ref)` will succeed. The changes in `type-checker-errors.s` are mostly the message format changes. One downside of the new message format is that it doesn't have instruction names in it. I plan to improve that in a potential follow-up. This also made some modifications in the instructions in `type-checker-errors.s`. Currently, except for a few functions I've recently added at the end, each function tests for a single error, because the type checker used to bail out after the first error until llvm#109705. But many functions included multiple errors anyway, which I don't think was the intention of the original writer. So I added some instructions to remove the other errors which are not being tested. (In some cases I added more error checking lines instead, when I felt that could be relevant.) Thanks to the new `ExactMatch` option in `checkTypes` function family, we now can distinguish the cases when to check against only the top of the value stack and when to check against the whole stack (e.g. to check whether we have any superfluous values remaining at the end of the function). `return` or `return_call(_indirect)` can set `ExactMatch` to `false` because they don't care about the superfluous values. This makes `type-checker-return.s` succeed and I was able to remove the `FIXME`. This is the basis of the PR that fixes block parameter/return type handling in the checker, but does not yet include the actual block-related functionality, which will be submitted separately after this PR.

aheejin requested review from sbc100, dschuff and tlively September 23, 2024 19:00

llvmbot added backend:WebAssembly mc Machine (object) code labels Sep 23, 2024

aheejin force-pushed the typecheck_error branch from 7ad6821 to 11d3c76 Compare September 23, 2024 19:03

sbc100 reviewed Sep 23, 2024

View reviewed changes

Merge branch 'main' into typecheck_error

77dabcf

dschuff approved these changes Sep 24, 2024

View reviewed changes

tlively approved these changes Sep 24, 2024

View reviewed changes

aheejin merged commit b62075e into llvm:main Sep 24, 2024
8 checks passed

aheejin deleted the typecheck_error branch September 24, 2024 23:56

aheejin mentioned this pull request Sep 26, 2024

[WebAssembly] Unify type checking in AsmTypeCheck #110094

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WebAssembly] Allow AsmTypeCheck detect multiple errors in function #109705

[WebAssembly] Allow AsmTypeCheck detect multiple errors in function #109705

Uh oh!

aheejin commented Sep 23, 2024 •

edited

Loading

Uh oh!

llvmbot commented Sep 23, 2024 •

edited

Loading

Uh oh!

sbc100 Sep 23, 2024 •

edited

Loading

Uh oh!

aheejin Sep 23, 2024 •

edited

Loading

Uh oh!

tlively Sep 23, 2024

Uh oh!

aheejin Sep 24, 2024

Uh oh!

tlively commented Sep 23, 2024

Uh oh!

aheejin commented Sep 23, 2024 •

edited

Loading

Uh oh!

dschuff left a comment

Uh oh!

aheejin commented Sep 24, 2024

Uh oh!

tlively left a comment

Uh oh!

Uh oh!

Uh oh!

[WebAssembly] Allow AsmTypeCheck detect multiple errors in function #109705

[WebAssembly] Allow AsmTypeCheck detect multiple errors in function #109705

Uh oh!

Conversation

aheejin commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbc100 Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

aheejin Sep 24, 2024

Choose a reason for hiding this comment

Uh oh!

tlively commented Sep 23, 2024

Uh oh!

aheejin commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschuff left a comment

Choose a reason for hiding this comment

Uh oh!

aheejin commented Sep 24, 2024

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aheejin commented Sep 23, 2024 •

edited

Loading

llvmbot commented Sep 23, 2024 •

edited

Loading

sbc100 Sep 23, 2024 •

edited

Loading

aheejin Sep 23, 2024 •

edited

Loading

aheejin commented Sep 23, 2024 •

edited

Loading