Skip to content

[z/OS] treat text files as text files so auto-conversion is done #90128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 1, 2024

Conversation

perry-ca
Copy link
Contributor

To support auto-conversion on z/OS text files need to be opened as text files. These changes will fix a number of LIT failures due to text files not being converted to the internal code page.

  • update a number of tools so they open the text files as text files
  • add support in the cat.py to open a text file as a text file (Windows will continue to treat all files as binary so new lines are handled correctly)
  • add env var definitions to enable auto-conversion in the lit config file.

@llvmbot
Copy link
Member

llvmbot commented Apr 25, 2024

@llvm/pr-subscribers-objectyaml

@llvm/pr-subscribers-clang-format

Author: Sean Perry (perry-ca)

Changes

To support auto-conversion on z/OS text files need to be opened as text files. These changes will fix a number of LIT failures due to text files not being converted to the internal code page.

  • update a number of tools so they open the text files as text files
  • add support in the cat.py to open a text file as a text file (Windows will continue to treat all files as binary so new lines are handled correctly)
  • add env var definitions to enable auto-conversion in the lit config file.

Full diff: https://github.com/llvm/llvm-project/pull/90128.diff

6 Files Affected:

  • (modified) clang/tools/clang-format/ClangFormat.cpp (+4-3)
  • (modified) llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp (+2-1)
  • (modified) llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp (+6-3)
  • (modified) llvm/tools/yaml2obj/yaml2obj.cpp (+1-1)
  • (modified) llvm/utils/lit/lit/builtin_commands/cat.py (+16-2)
  • (modified) llvm/utils/lit/lit/llvm/config.py (+7)
diff --git a/clang/tools/clang-format/ClangFormat.cpp b/clang/tools/clang-format/ClangFormat.cpp
index feb733fe3c9e0b..01f7c6047726e2 100644
--- a/clang/tools/clang-format/ClangFormat.cpp
+++ b/clang/tools/clang-format/ClangFormat.cpp
@@ -413,8 +413,9 @@ static bool format(StringRef FileName, bool ErrorOnIncompleteFormat = false) {
   // On Windows, overwriting a file with an open file mapping doesn't work,
   // so read the whole file into memory when formatting in-place.
   ErrorOr<std::unique_ptr<MemoryBuffer>> CodeOrErr =
-      !OutputXML && Inplace ? MemoryBuffer::getFileAsStream(FileName)
-                            : MemoryBuffer::getFileOrSTDIN(FileName);
+      !OutputXML && Inplace
+          ? MemoryBuffer::getFileAsStream(FileName)
+          : MemoryBuffer::getFileOrSTDIN(FileName, /*IsText=*/true);
   if (std::error_code EC = CodeOrErr.getError()) {
     errs() << EC.message() << "\n";
     return true;
@@ -558,7 +559,7 @@ static int dumpConfig() {
     // Read in the code in case the filename alone isn't enough to detect the
     // language.
     ErrorOr<std::unique_ptr<MemoryBuffer>> CodeOrErr =
-        MemoryBuffer::getFileOrSTDIN(FileNames[0]);
+        MemoryBuffer::getFileOrSTDIN(FileNames[0], /*IsText=*/true);
     if (std::error_code EC = CodeOrErr.getError()) {
       llvm::errs() << EC.message() << "\n";
       return 1;
diff --git a/llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp b/llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
index c3015d895230ea..40ee59c014b09f 100644
--- a/llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
+++ b/llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
@@ -95,7 +95,8 @@ static std::vector<StringRef> getSearchPaths(opt::InputArgList *Args,
 
 // Opens a file. Path has to be resolved already. (used for def file)
 std::unique_ptr<MemoryBuffer> openFile(const Twine &Path) {
-  ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> MB = MemoryBuffer::getFile(Path);
+  ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> MB =
+      MemoryBuffer::getFile(Path, /*IsText=*/true);
 
   if (std::error_code EC = MB.getError()) {
     llvm::errs() << "cannot open file " << Path << ": " << EC.message() << "\n";
diff --git a/llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp b/llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp
index 6a5646965df2cf..c5ccd64f116539 100644
--- a/llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp
+++ b/llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp
@@ -144,15 +144,18 @@ int main(int argc, const char *argv[]) {
   cl::HideUnrelatedOptions({&CXXMapCategory, &getColorCategory()});
   cl::ParseCommandLineOptions(argc, argv, "LLVM C++ mangled name remapper\n");
 
-  auto OldSymbolBufOrError = MemoryBuffer::getFileOrSTDIN(OldSymbolFile);
+  auto OldSymbolBufOrError =
+      MemoryBuffer::getFileOrSTDIN(OldSymbolFile, /*IsText=*/true);
   if (!OldSymbolBufOrError)
     exitWithErrorCode(OldSymbolBufOrError.getError(), OldSymbolFile);
 
-  auto NewSymbolBufOrError = MemoryBuffer::getFileOrSTDIN(NewSymbolFile);
+  auto NewSymbolBufOrError =
+      MemoryBuffer::getFileOrSTDIN(NewSymbolFile, /*IsText=*/true);
   if (!NewSymbolBufOrError)
     exitWithErrorCode(NewSymbolBufOrError.getError(), NewSymbolFile);
 
-  auto RemappingBufOrError = MemoryBuffer::getFileOrSTDIN(RemappingFile);
+  auto RemappingBufOrError =
+      MemoryBuffer::getFileOrSTDIN(RemappingFile, /*IsText=*/true);
   if (!RemappingBufOrError)
     exitWithErrorCode(RemappingBufOrError.getError(), RemappingFile);
 
diff --git a/llvm/tools/yaml2obj/yaml2obj.cpp b/llvm/tools/yaml2obj/yaml2obj.cpp
index b7f5356e22a9e6..4a060e1aad427f 100644
--- a/llvm/tools/yaml2obj/yaml2obj.cpp
+++ b/llvm/tools/yaml2obj/yaml2obj.cpp
@@ -130,7 +130,7 @@ int main(int argc, char **argv) {
   }
 
   ErrorOr<std::unique_ptr<MemoryBuffer>> Buf =
-      MemoryBuffer::getFileOrSTDIN(Input);
+      MemoryBuffer::getFileOrSTDIN(Input, /*IsText=*/true);
   if (!Buf)
     return 1;
 
diff --git a/llvm/utils/lit/lit/builtin_commands/cat.py b/llvm/utils/lit/lit/builtin_commands/cat.py
index 37f55c0aef210b..6fb2152ef9332d 100644
--- a/llvm/utils/lit/lit/builtin_commands/cat.py
+++ b/llvm/utils/lit/lit/builtin_commands/cat.py
@@ -55,10 +55,24 @@ def main(argv):
             msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
     for filename in filenames:
         try:
-            fileToCat = open(filename, "rb")
-            contents = fileToCat.read()
+            contents = None
+            is_text = False
+            try:
+                if sys.platform != "win32":
+                    fileToCat = open(filename, "r")
+                    contents = fileToCat.read()
+                    is_text = True
+            except:
+                pass
+
+            if contents is None:
+                fileToCat = open(filename, "rb")
+                contents = fileToCat.read()
+
             if show_nonprinting:
                 contents = convertToCaretAndMNotation(contents)
+            elif is_text:
+                contents = contents.encode()
             writer.write(contents)
             sys.stdout.flush()
             fileToCat.close()
diff --git a/llvm/utils/lit/lit/llvm/config.py b/llvm/utils/lit/lit/llvm/config.py
index 96b4f7bc86772d..1d4babc99984bf 100644
--- a/llvm/utils/lit/lit/llvm/config.py
+++ b/llvm/utils/lit/lit/llvm/config.py
@@ -57,6 +57,13 @@ def __init__(self, lit_config, config):
                 self.lit_config.note("using lit tools: {}".format(path))
                 lit_path_displayed = True
 
+        if platform.system() == "OS/390":
+            self.with_environment("_BPXK_AUTOCVT", "ON")
+            self.with_environment("_TAG_REDIR_IN", "TXT")
+            self.with_environment("_TAG_REDIR_OUT", "TXT")
+            self.with_environment("_TAG_REDIR_ERR", "TXT")
+            self.with_environment("_CEE_RUNOPTS", "FILETAG(AUTOCVT,AUTOTAG) POSIX(ON)")
+
         # Choose between lit's internal shell pipeline runner and a real shell.
         # If LIT_USE_INTERNAL_SHELL is in the environment, we use that as an
         # override.

@llvmbot
Copy link
Member

llvmbot commented Apr 25, 2024

@llvm/pr-subscribers-testing-tools

Author: Sean Perry (perry-ca)

Changes

To support auto-conversion on z/OS text files need to be opened as text files. These changes will fix a number of LIT failures due to text files not being converted to the internal code page.

  • update a number of tools so they open the text files as text files
  • add support in the cat.py to open a text file as a text file (Windows will continue to treat all files as binary so new lines are handled correctly)
  • add env var definitions to enable auto-conversion in the lit config file.

Full diff: https://github.com/llvm/llvm-project/pull/90128.diff

6 Files Affected:

  • (modified) clang/tools/clang-format/ClangFormat.cpp (+4-3)
  • (modified) llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp (+2-1)
  • (modified) llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp (+6-3)
  • (modified) llvm/tools/yaml2obj/yaml2obj.cpp (+1-1)
  • (modified) llvm/utils/lit/lit/builtin_commands/cat.py (+16-2)
  • (modified) llvm/utils/lit/lit/llvm/config.py (+7)
diff --git a/clang/tools/clang-format/ClangFormat.cpp b/clang/tools/clang-format/ClangFormat.cpp
index feb733fe3c9e0b..01f7c6047726e2 100644
--- a/clang/tools/clang-format/ClangFormat.cpp
+++ b/clang/tools/clang-format/ClangFormat.cpp
@@ -413,8 +413,9 @@ static bool format(StringRef FileName, bool ErrorOnIncompleteFormat = false) {
   // On Windows, overwriting a file with an open file mapping doesn't work,
   // so read the whole file into memory when formatting in-place.
   ErrorOr<std::unique_ptr<MemoryBuffer>> CodeOrErr =
-      !OutputXML && Inplace ? MemoryBuffer::getFileAsStream(FileName)
-                            : MemoryBuffer::getFileOrSTDIN(FileName);
+      !OutputXML && Inplace
+          ? MemoryBuffer::getFileAsStream(FileName)
+          : MemoryBuffer::getFileOrSTDIN(FileName, /*IsText=*/true);
   if (std::error_code EC = CodeOrErr.getError()) {
     errs() << EC.message() << "\n";
     return true;
@@ -558,7 +559,7 @@ static int dumpConfig() {
     // Read in the code in case the filename alone isn't enough to detect the
     // language.
     ErrorOr<std::unique_ptr<MemoryBuffer>> CodeOrErr =
-        MemoryBuffer::getFileOrSTDIN(FileNames[0]);
+        MemoryBuffer::getFileOrSTDIN(FileNames[0], /*IsText=*/true);
     if (std::error_code EC = CodeOrErr.getError()) {
       llvm::errs() << EC.message() << "\n";
       return 1;
diff --git a/llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp b/llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
index c3015d895230ea..40ee59c014b09f 100644
--- a/llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
+++ b/llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
@@ -95,7 +95,8 @@ static std::vector<StringRef> getSearchPaths(opt::InputArgList *Args,
 
 // Opens a file. Path has to be resolved already. (used for def file)
 std::unique_ptr<MemoryBuffer> openFile(const Twine &Path) {
-  ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> MB = MemoryBuffer::getFile(Path);
+  ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> MB =
+      MemoryBuffer::getFile(Path, /*IsText=*/true);
 
   if (std::error_code EC = MB.getError()) {
     llvm::errs() << "cannot open file " << Path << ": " << EC.message() << "\n";
diff --git a/llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp b/llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp
index 6a5646965df2cf..c5ccd64f116539 100644
--- a/llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp
+++ b/llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp
@@ -144,15 +144,18 @@ int main(int argc, const char *argv[]) {
   cl::HideUnrelatedOptions({&CXXMapCategory, &getColorCategory()});
   cl::ParseCommandLineOptions(argc, argv, "LLVM C++ mangled name remapper\n");
 
-  auto OldSymbolBufOrError = MemoryBuffer::getFileOrSTDIN(OldSymbolFile);
+  auto OldSymbolBufOrError =
+      MemoryBuffer::getFileOrSTDIN(OldSymbolFile, /*IsText=*/true);
   if (!OldSymbolBufOrError)
     exitWithErrorCode(OldSymbolBufOrError.getError(), OldSymbolFile);
 
-  auto NewSymbolBufOrError = MemoryBuffer::getFileOrSTDIN(NewSymbolFile);
+  auto NewSymbolBufOrError =
+      MemoryBuffer::getFileOrSTDIN(NewSymbolFile, /*IsText=*/true);
   if (!NewSymbolBufOrError)
     exitWithErrorCode(NewSymbolBufOrError.getError(), NewSymbolFile);
 
-  auto RemappingBufOrError = MemoryBuffer::getFileOrSTDIN(RemappingFile);
+  auto RemappingBufOrError =
+      MemoryBuffer::getFileOrSTDIN(RemappingFile, /*IsText=*/true);
   if (!RemappingBufOrError)
     exitWithErrorCode(RemappingBufOrError.getError(), RemappingFile);
 
diff --git a/llvm/tools/yaml2obj/yaml2obj.cpp b/llvm/tools/yaml2obj/yaml2obj.cpp
index b7f5356e22a9e6..4a060e1aad427f 100644
--- a/llvm/tools/yaml2obj/yaml2obj.cpp
+++ b/llvm/tools/yaml2obj/yaml2obj.cpp
@@ -130,7 +130,7 @@ int main(int argc, char **argv) {
   }
 
   ErrorOr<std::unique_ptr<MemoryBuffer>> Buf =
-      MemoryBuffer::getFileOrSTDIN(Input);
+      MemoryBuffer::getFileOrSTDIN(Input, /*IsText=*/true);
   if (!Buf)
     return 1;
 
diff --git a/llvm/utils/lit/lit/builtin_commands/cat.py b/llvm/utils/lit/lit/builtin_commands/cat.py
index 37f55c0aef210b..6fb2152ef9332d 100644
--- a/llvm/utils/lit/lit/builtin_commands/cat.py
+++ b/llvm/utils/lit/lit/builtin_commands/cat.py
@@ -55,10 +55,24 @@ def main(argv):
             msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
     for filename in filenames:
         try:
-            fileToCat = open(filename, "rb")
-            contents = fileToCat.read()
+            contents = None
+            is_text = False
+            try:
+                if sys.platform != "win32":
+                    fileToCat = open(filename, "r")
+                    contents = fileToCat.read()
+                    is_text = True
+            except:
+                pass
+
+            if contents is None:
+                fileToCat = open(filename, "rb")
+                contents = fileToCat.read()
+
             if show_nonprinting:
                 contents = convertToCaretAndMNotation(contents)
+            elif is_text:
+                contents = contents.encode()
             writer.write(contents)
             sys.stdout.flush()
             fileToCat.close()
diff --git a/llvm/utils/lit/lit/llvm/config.py b/llvm/utils/lit/lit/llvm/config.py
index 96b4f7bc86772d..1d4babc99984bf 100644
--- a/llvm/utils/lit/lit/llvm/config.py
+++ b/llvm/utils/lit/lit/llvm/config.py
@@ -57,6 +57,13 @@ def __init__(self, lit_config, config):
                 self.lit_config.note("using lit tools: {}".format(path))
                 lit_path_displayed = True
 
+        if platform.system() == "OS/390":
+            self.with_environment("_BPXK_AUTOCVT", "ON")
+            self.with_environment("_TAG_REDIR_IN", "TXT")
+            self.with_environment("_TAG_REDIR_OUT", "TXT")
+            self.with_environment("_TAG_REDIR_ERR", "TXT")
+            self.with_environment("_CEE_RUNOPTS", "FILETAG(AUTOCVT,AUTOTAG) POSIX(ON)")
+
         # Choose between lit's internal shell pipeline runner and a real shell.
         # If LIT_USE_INTERNAL_SHELL is in the environment, we use that as an
         # override.

@zibi2 zibi2 self-requested a review April 26, 2024 14:47
@zibi2
Copy link
Contributor

zibi2 commented Apr 26, 2024

LGTM, but check why the following test in checks failed:
FAIL: BOLT :: RISCV/unnamed-sym-no-entry.c

Copy link
Contributor

@zibi2 zibi2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with clean checks.

@perry-ca perry-ca force-pushed the perry/text-file-encoding branch from 3c43471 to 2d70343 Compare April 26, 2024 15:09
@perry-ca
Copy link
Contributor Author

LGTM, but check why the following test in checks failed: FAIL: BOLT :: RISCV/unnamed-sym-no-entry.c

Looks unrelated. I'm kicking off another build.

Copy link
Contributor

@abhina-sree abhina-sree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@perry-ca
Copy link
Contributor Author

Failed at FAIL: BOLT :: RISCV/fake-label-no-entry.c this time. I'll give it a day or so and try again.

@perry-ca perry-ca force-pushed the perry/text-file-encoding branch from 2d70343 to bf3ed68 Compare April 29, 2024 14:31
Copy link
Contributor

@mydeveloperday mydeveloperday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't normally commit in clang-format with a unit test

: MemoryBuffer::getFileOrSTDIN(FileName);
!OutputXML && Inplace
? MemoryBuffer::getFileAsStream(FileName)
: MemoryBuffer::getFileOrSTDIN(FileName, /*IsText=*/true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On windows won't this have an impact of potentially converting \n to \r\n? could a clang-format effectively become a unix2dos at the same time (for those using clang-format on cygwin is that what they want?) can we have a unit test to understand the impact?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be other impacts if the files that aren't utf-8 or contain internationalize characters in say comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this will be a problem for Windows. This is a common problem that we have fixed in a few places already. I don't have a Windows platform to confirm. Would you or someone else be able to verify?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on Windows and didn't have a problem. I used LF only, CRLF only, and mixed line endings for the following:

int i;
int j;

Running clang-format through all values of the LineEnding option, with and without in-place, seemed ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I also was able to confirm there was no impact on windows i just wasn't sure so needed to prove it to myself, I'm ok if you want to commit this into clang-format, if you break stuff... we'll simply revert it ;-)

@mydeveloperday mydeveloperday requested a review from owenca April 30, 2024 16:11
@perry-ca
Copy link
Contributor Author

We don't normally commit in clang-format with a unit test

I assume you mean "without a unit test". In this case the unit test is the existing test cases. Some fail on z/OS because the files are not read in as text. It's not really possible to detect this issue on other platforms. I can't think of a new unit test that wouldn't duplicate what we already have.

@mydeveloperday
Copy link
Contributor

We don't normally commit in clang-format with a unit test

I assume you mean "without a unit test". In this case the unit test is the existing test cases. Some fail on z/OS because the files are not read in as text. It's not really possible to detect this issue on other platforms. I can't think of a new unit test that wouldn't duplicate what we already have.

No its ok.. we kind of have a rule witrh clang-format that all commits need some form of test, but again I can understand this is hard to test. Go ahead.

@perry-ca
Copy link
Contributor Author

perry-ca commented May 1, 2024

@owenca @mydeveloperday Thanks for testing on Windows and taking the time to review the PR.

@abhina-sree abhina-sree merged commit e22ce61 into llvm:main May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants