[mlir] Do not set lastToken in AsmParser's resetToken function and add a unit test for AsmParsers's locations #105529

jorickert · 2024-08-21T13:56:19Z

This changes the function resetToken to not update lastToken.

The member lastToken is the last token that was consumed by the parser.
Resetting the lexer position to a different position does not cause any token to be consumed, so lastToken should not be updated.
Setting it to curToken can cause the scopeLoc.end location of OperationDefinition to be off-by-one, pointing to the
first token after the operation.

An example for an operation for which the scopeLoc.end location was wrong before is:

%0 = torch.vtensor.literal(dense_resource<__elided__> : tensor<768xbf16>) : !torch.vtensor<[768],bf16>

Here the scope end loc always pointed to the next token

This also adds a test for the Locations of OperationDefinitions. Without the change to resetToken the test failes, with the scope end location for llvm.mlir.undef pointing to the func.return in the next line

llvmbot · 2024-08-21T13:57:09Z

@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Author: Jonas Rickert (jorickert)

Changes

This changes the function resetToken to not update lastToken.

The member lastToken is the last token that was consumed by the parser.
Resetting the lexer position to a different position does not cause any token to be consumed, so lastToken should not be updated.
Setting it to curToken can cause the scopeLoc.end location of OperationDefinition to be off-by-one, pointing to the
first token after the operation.

An example for an operation for which the scopeLoc.end location was wrong before is:

%0 = torch.vtensor.literal(dense_resource&lt;__elided__&gt; : tensor&lt;768xbf16&gt;) : !torch.vtensor&lt;[768],bf16&gt; .

Here the scope end loc always pointed to the next token

This also adds a test for the Locations of OperationDefinitions. Without the change to resetToken the test failes, with the scope end location for llvm.mlir.undef pointing to the func.return in the next line

Full diff: https://github.com/llvm/llvm-project/pull/105529.diff

3 Files Affected:

(modified) mlir/lib/AsmParser/Parser.h (-1)
(modified) mlir/unittests/Parser/CMakeLists.txt (+2)
(modified) mlir/unittests/Parser/ParserTest.cpp (+81)

diff --git a/mlir/lib/AsmParser/Parser.h b/mlir/lib/AsmParser/Parser.h
index 4caab499e1a0e4..3c556b19870770 100644
--- a/mlir/lib/AsmParser/Parser.h
+++ b/mlir/lib/AsmParser/Parser.h
@@ -133,7 +133,6 @@ class Parser {
   /// Reset the parser to the given lexer position.
   void resetToken(const char *tokPos) {
     state.lex.resetPointer(tokPos);
-    state.lastToken = state.curToken;
     state.curToken = state.lex.lexToken();
   }
 
diff --git a/mlir/unittests/Parser/CMakeLists.txt b/mlir/unittests/Parser/CMakeLists.txt
index f5646ff3f2c345..a5e2da56ffb57e 100644
--- a/mlir/unittests/Parser/CMakeLists.txt
+++ b/mlir/unittests/Parser/CMakeLists.txt
@@ -8,6 +8,8 @@ add_mlir_unittest(MLIRParserTests
 target_include_directories(MLIRParserTests PRIVATE "${MLIR_BINARY_DIR}/test/lib/Dialect/Test")
 
 target_link_libraries(MLIRParserTests PRIVATE
+  MLIRFuncDialect
+  MLIRLLVMDialect
   MLIRIR
   MLIRParser
   MLIRTestDialect
diff --git a/mlir/unittests/Parser/ParserTest.cpp b/mlir/unittests/Parser/ParserTest.cpp
index 62f609ecf80492..52b965bcc1326b 100644
--- a/mlir/unittests/Parser/ParserTest.cpp
+++ b/mlir/unittests/Parser/ParserTest.cpp
@@ -8,8 +8,12 @@
 
 #include "mlir/Parser/Parser.h"
 #include "mlir/AsmParser/AsmParser.h"
+#include "mlir/AsmParser/AsmParserState.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
 #include "mlir/IR/BuiltinOps.h"
 #include "mlir/IR/Verifier.h"
+#include "llvm/Support/SourceMgr.h"
 
 #include "gmock/gmock.h"
 
@@ -101,4 +105,81 @@ TEST(MLIRParser, ParseAttr) {
     EXPECT_EQ(attr, b.getI64IntegerAttr(9));
   }
 }
+
+TEST(MLIRParser, AsmParserLocations) {
+  std::string moduleStr = R"mlir(
+func.func @foo() -> !llvm.array<2 x f32> {
+  %0 = llvm.mlir.undef : !llvm.array<2 x f32>
+  func.return %0 : !llvm.array<2 x f32>
+}
+  )mlir";
+
+  DialectRegistry registry;
+  registry.insert<func::FuncDialect, LLVM::LLVMDialect>();
+  MLIRContext context(registry);
+
+  auto memBuffer =
+      llvm::MemoryBuffer::getMemBuffer(moduleStr, "AsmParserTest.mlir",
+                                       /*RequiresNullTerminator=*/false);
+  ASSERT_TRUE(memBuffer);
+
+  llvm::SourceMgr sourceMgr;
+  sourceMgr.AddNewSourceBuffer(std::move(memBuffer), llvm::SMLoc());
+
+  Block block;
+  AsmParserState parseState;
+  const LogicalResult parseResult =
+      parseAsmSourceFile(sourceMgr, &block, &context, &parseState);
+  ASSERT_TRUE(parseResult.succeeded());
+
+  auto funcOp = *block.getOps<func::FuncOp>().begin();
+  const AsmParserState::OperationDefinition *funcOpDefinition =
+      parseState.getOpDef(funcOp);
+  ASSERT_TRUE(funcOpDefinition);
+
+  const std::pair expectedStartFunc{2u, 1u};
+  const std::pair expectedEndFunc{2u, 10u};
+  const std::pair expectedScopeEndFunc{5u, 2u};
+  ASSERT_EQ(sourceMgr.getLineAndColumn(funcOpDefinition->loc.Start),
+            expectedStartFunc);
+  ASSERT_EQ(sourceMgr.getLineAndColumn(funcOpDefinition->loc.End),
+            expectedEndFunc);
+  ASSERT_EQ(funcOpDefinition->loc.Start, funcOpDefinition->scopeLoc.Start);
+  ASSERT_EQ(sourceMgr.getLineAndColumn(funcOpDefinition->scopeLoc.End),
+            expectedScopeEndFunc);
+
+  auto llvmUndef = *funcOp.getOps<LLVM::UndefOp>().begin();
+  const AsmParserState::OperationDefinition *llvmUndefDefinition =
+      parseState.getOpDef(llvmUndef);
+  ASSERT_TRUE(llvmUndefDefinition);
+
+  const std::pair expectedStartUndef{3u, 8u};
+  const std::pair expectedEndUndef{3u, 23u};
+  const std::pair expectedScopeEndUndef{3u, 46u};
+  ASSERT_EQ(sourceMgr.getLineAndColumn(llvmUndefDefinition->loc.Start),
+            expectedStartUndef);
+  ASSERT_EQ(sourceMgr.getLineAndColumn(llvmUndefDefinition->loc.End),
+            expectedEndUndef);
+  ASSERT_EQ(llvmUndefDefinition->loc.Start,
+            llvmUndefDefinition->scopeLoc.Start);
+  ASSERT_EQ(sourceMgr.getLineAndColumn(llvmUndefDefinition->scopeLoc.End),
+            expectedScopeEndUndef);
+
+  auto funcReturn = *funcOp.getOps<func::ReturnOp>().begin();
+  const AsmParserState::OperationDefinition *funcReturnDefinition =
+      parseState.getOpDef(funcReturn);
+  ASSERT_TRUE(funcReturnDefinition);
+
+  const std::pair expectedStartReturn{4u, 3u};
+  const std::pair expectedEndReturn{4u, 14u};
+  const std::pair expectedScopeEndReturn{4u, 40u};
+  ASSERT_EQ(sourceMgr.getLineAndColumn(funcReturnDefinition->loc.Start),
+            expectedStartReturn);
+  ASSERT_EQ(sourceMgr.getLineAndColumn(funcReturnDefinition->loc.End),
+            expectedEndReturn);
+  ASSERT_EQ(funcReturnDefinition->loc.Start,
+            funcReturnDefinition->scopeLoc.Start);
+  ASSERT_EQ(sourceMgr.getLineAndColumn(funcReturnDefinition->scopeLoc.End),
+            expectedScopeEndReturn);
+}
 } // namespace

jorickert · 2024-08-21T14:19:18Z

@jpienaar

jorickert · 2024-08-26T07:34:06Z

@joker-eph @Mogball Could you please review this or assign a reviewer? I do not have write access, so can not do it myself. Thank you

joker-eph · 2024-09-02T14:54:56Z

LGTM, but since the invariants of the lexer/parser are tricky, this deserves more eyes on it! So let's wait for some of the other reviewers to check this as well.

River707 · 2024-09-03T23:21:04Z

With the current usage of resetToken, this change is fine, but it's a little spooky that there is an implicit contract between the last token and resetToken, that could be easily broken if you aren't careful (not that the current behavior is really correct either). This is relying on the various of resetToken to cancel out, i.e. the resetTokens used for parsing attributes/types to technically be correct (because it resets first to the start of the attribute/type, and then to the end). This behavior can be fine, but we should ensure that the connection to lastToken is documented in the doc for resetToken.

jorickert · 2024-09-04T08:42:32Z

I updated the comment to make the relationship between resetToken and state.lastToken clearer

github-actions · 2024-09-04T08:42:36Z

✅ With the latest revision this PR passed the C/C++ code formatter.

The member lastToken is the last token that was consumed by the parser. Resetting the lexer position to a different position does not cause any token to be consumed, so lastToken should not be updated. Setting it to curToken can cause the scopeLoc.end location of OperationDefinition to be off-by-one, pointing to the first token after the operation. An example for an operation for which the scopeLoc.end location was wrong before is: %0 = torch.vtensor.literal(dense_resource<__elided__> : tensor<768xbf16>) : !torch.vtensor<[768],bf16> . Here the scope end loc always pointed to the next token

This test checks the parsed ops' locations and scope locations. It uses the !llvm.array type in the test to check that locations are correct for ops that contain an extended-type. The parser for extended-types calls resetToken, which caused wrong locations in the past.

jorickert · 2024-09-09T08:43:23Z

It would be nice if someone could merge this, I do not have write access/merge rights. Thank you

llvmbot added mlir:core MLIR Core Infrastructure mlir labels Aug 21, 2024

mgehre-amd requested review from jpienaar, joker-eph and Mogball September 2, 2024 09:18

joker-eph requested a review from River707 September 2, 2024 14:54

jorickert force-pushed the jrickert.upstream_parse_end_loc branch from 504d646 to 15df89e Compare September 4, 2024 08:38

jorickert added 2 commits September 4, 2024 10:31

jorickert force-pushed the jrickert.upstream_parse_end_loc branch from 15df89e to 8e2f39e Compare September 4, 2024 09:31

River707 approved these changes Sep 4, 2024

View reviewed changes

joker-eph merged commit aa21ce4 into llvm:main Sep 9, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir] Do not set lastToken in AsmParser's resetToken function and add a unit test for AsmParsers's locations #105529

[mlir] Do not set lastToken in AsmParser's resetToken function and add a unit test for AsmParsers's locations #105529

Uh oh!

jorickert commented Aug 21, 2024 •

edited

Loading

Uh oh!

llvmbot commented Aug 21, 2024 •

edited

Loading

Uh oh!

jorickert commented Aug 21, 2024

Uh oh!

jorickert commented Aug 26, 2024

Uh oh!

joker-eph commented Sep 2, 2024

Uh oh!

River707 commented Sep 3, 2024

Uh oh!

jorickert commented Sep 4, 2024

Uh oh!

github-actions bot commented Sep 4, 2024 •

edited

Loading

Uh oh!

jorickert commented Sep 9, 2024

Uh oh!

Uh oh!

Uh oh!

[mlir] Do not set lastToken in AsmParser's resetToken function and add a unit test for AsmParsers's locations #105529

[mlir] Do not set lastToken in AsmParser's resetToken function and add a unit test for AsmParsers's locations #105529

Uh oh!

Conversation

jorickert commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorickert commented Aug 21, 2024

Uh oh!

jorickert commented Aug 26, 2024

Uh oh!

joker-eph commented Sep 2, 2024

Uh oh!

River707 commented Sep 3, 2024

Uh oh!

jorickert commented Sep 4, 2024

Uh oh!

github-actions bot commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorickert commented Sep 9, 2024

Uh oh!

Uh oh!

Uh oh!

jorickert commented Aug 21, 2024 •

edited

Loading

llvmbot commented Aug 21, 2024 •

edited

Loading

github-actions bot commented Sep 4, 2024 •

edited

Loading