Skip to content

[SystemZ][z/OS] Propagate IsText parameter to open text files as text #107906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 19, 2024

Conversation

abhina-sree
Copy link
Contributor

This patch adds an IsText parameter to the following functions openFileForRead, getBufferForFile, getBufferForFileImpl and determines whether a file is text by querying the file tag on z/OS. The default is set to OF_Text instead of OF_None, this change in value does not affect any other platforms other than z/OS.

@abhina-sree abhina-sree self-assigned this Sep 9, 2024
@llvmbot llvmbot added clang Clang issues not falling into any other category clang-tools-extra clangd clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules llvm:support labels Sep 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2024

@llvm/pr-subscribers-lldb
@llvm/pr-subscribers-clang-tools-extra
@llvm/pr-subscribers-clangd
@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Abhina Sree (abhina-sree)

Changes

This patch adds an IsText parameter to the following functions openFileForRead, getBufferForFile, getBufferForFileImpl and determines whether a file is text by querying the file tag on z/OS. The default is set to OF_Text instead of OF_None, this change in value does not affect any other platforms other than z/OS.


Patch is 22.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107906.diff

20 Files Affected:

  • (modified) clang-tools-extra/clangd/FS.cpp (+2-2)
  • (modified) clang-tools-extra/clangd/Preamble.cpp (+2-2)
  • (modified) clang-tools-extra/clangd/support/ThreadsafeFS.cpp (+1-1)
  • (modified) clang-tools-extra/clangd/unittests/ClangdTests.cpp (+1-1)
  • (modified) clang/include/clang/Basic/FileManager.h (+4-4)
  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h (+1-1)
  • (modified) clang/lib/Basic/FileManager.cpp (+6-6)
  • (modified) clang/lib/Basic/SourceManager.cpp (+12-2)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+2-1)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp (+3-2)
  • (modified) clang/unittests/Driver/DistroTest.cpp (+2-2)
  • (modified) clang/unittests/Driver/ToolChainTest.cpp (+1-1)
  • (modified) clang/unittests/Frontend/PCHPreambleTest.cpp (+3-3)
  • (modified) clang/unittests/Tooling/DependencyScanning/DependencyScannerTest.cpp (+2-2)
  • (modified) llvm/include/llvm/Support/AutoConvert.h (+6)
  • (modified) llvm/include/llvm/Support/VirtualFileSystem.h (+9-7)
  • (modified) llvm/lib/Support/AutoConvert.cpp (+26)
  • (modified) llvm/lib/Support/FileCollector.cpp (+2-2)
  • (modified) llvm/lib/Support/VirtualFileSystem.cpp (+12-9)
  • (modified) llvm/unittests/Support/VirtualFileSystemTest.cpp (+3-3)
diff --git a/clang-tools-extra/clangd/FS.cpp b/clang-tools-extra/clangd/FS.cpp
index 5729b9341d9d4b..bd3c6440c24b0f 100644
--- a/clang-tools-extra/clangd/FS.cpp
+++ b/clang-tools-extra/clangd/FS.cpp
@@ -64,8 +64,8 @@ PreambleFileStatusCache::getProducingFS(
         : ProxyFileSystem(std::move(FS)), StatCache(StatCache) {}
 
     llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-    openFileForRead(const llvm::Twine &Path) override {
-      auto File = getUnderlyingFS().openFileForRead(Path);
+    openFileForRead(const llvm::Twine &Path, bool IsText = true) override {
+      auto File = getUnderlyingFS().openFileForRead(Path, IsText);
       if (!File || !*File)
         return File;
       // Eagerly stat opened file, as the followup `status` call on the file
diff --git a/clang-tools-extra/clangd/Preamble.cpp b/clang-tools-extra/clangd/Preamble.cpp
index dd13b1a9e5613d..e970d01f3729f3 100644
--- a/clang-tools-extra/clangd/Preamble.cpp
+++ b/clang-tools-extra/clangd/Preamble.cpp
@@ -479,9 +479,9 @@ class TimerFS : public llvm::vfs::ProxyFileSystem {
       : ProxyFileSystem(std::move(FS)) {}
 
   llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-  openFileForRead(const llvm::Twine &Path) override {
+  openFileForRead(const llvm::Twine &Path, bool IsText = true) override {
     WallTimerRegion T(Timer);
-    auto FileOr = getUnderlyingFS().openFileForRead(Path);
+    auto FileOr = getUnderlyingFS().openFileForRead(Path, IsText);
     if (!FileOr)
       return FileOr;
     return std::make_unique<TimerFile>(Timer, std::move(FileOr.get()));
diff --git a/clang-tools-extra/clangd/support/ThreadsafeFS.cpp b/clang-tools-extra/clangd/support/ThreadsafeFS.cpp
index 7398e4258527ba..bc0b984e577cb8 100644
--- a/clang-tools-extra/clangd/support/ThreadsafeFS.cpp
+++ b/clang-tools-extra/clangd/support/ThreadsafeFS.cpp
@@ -29,7 +29,7 @@ class VolatileFileSystem : public llvm::vfs::ProxyFileSystem {
       : ProxyFileSystem(std::move(FS)) {}
 
   llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-  openFileForRead(const llvm::Twine &InPath) override {
+  openFileForRead(const llvm::Twine &InPath, bool IsText = true) override {
     llvm::SmallString<128> Path;
     InPath.toVector(Path);
 
diff --git a/clang-tools-extra/clangd/unittests/ClangdTests.cpp b/clang-tools-extra/clangd/unittests/ClangdTests.cpp
index 643b8e9f12d751..e86385c2072b34 100644
--- a/clang-tools-extra/clangd/unittests/ClangdTests.cpp
+++ b/clang-tools-extra/clangd/unittests/ClangdTests.cpp
@@ -1010,7 +1010,7 @@ TEST(ClangdTests, PreambleVFSStatCache) {
             : ProxyFileSystem(std::move(FS)), CountStats(CountStats) {}
 
         llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-        openFileForRead(const Twine &Path) override {
+        openFileForRead(const Twine &Path, bool IsText = true) override {
           ++CountStats[llvm::sys::path::filename(Path.str())];
           return ProxyFileSystem::openFileForRead(Path);
         }
diff --git a/clang/include/clang/Basic/FileManager.h b/clang/include/clang/Basic/FileManager.h
index 527bbef24793ee..67a69fb79ccefe 100644
--- a/clang/include/clang/Basic/FileManager.h
+++ b/clang/include/clang/Basic/FileManager.h
@@ -286,21 +286,21 @@ class FileManager : public RefCountedBase<FileManager> {
   /// MemoryBuffer if successful, otherwise returning null.
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(FileEntryRef Entry, bool isVolatile = false,
-                   bool RequiresNullTerminator = true,
+                   bool RequiresNullTerminator = true, bool IsText = true,
                    std::optional<int64_t> MaybeLimit = std::nullopt);
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(StringRef Filename, bool isVolatile = false,
-                   bool RequiresNullTerminator = true,
+                   bool RequiresNullTerminator = true, bool IsText = true,
                    std::optional<int64_t> MaybeLimit = std::nullopt) const {
     return getBufferForFileImpl(Filename,
                                 /*FileSize=*/(MaybeLimit ? *MaybeLimit : -1),
-                                isVolatile, RequiresNullTerminator);
+                                isVolatile, RequiresNullTerminator, IsText);
   }
 
 private:
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFileImpl(StringRef Filename, int64_t FileSize, bool isVolatile,
-                       bool RequiresNullTerminator) const;
+                       bool RequiresNullTerminator, bool IsText) const;
 
   DirectoryEntry *&getRealDirEntry(const llvm::vfs::Status &Status);
 
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index d12814e7c9253e..635fdd0e00c433 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -346,7 +346,7 @@ class DependencyScanningWorkerFilesystem
 
   llvm::ErrorOr<llvm::vfs::Status> status(const Twine &Path) override;
   llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-  openFileForRead(const Twine &Path) override;
+  openFileForRead(const Twine &Path, bool IsText = true) override;
 
   std::error_code getRealPath(const Twine &Path,
                               SmallVectorImpl<char> &Output) override;
diff --git a/clang/lib/Basic/FileManager.cpp b/clang/lib/Basic/FileManager.cpp
index 4509cee1ca0fed..e9c32b24d7d95b 100644
--- a/clang/lib/Basic/FileManager.cpp
+++ b/clang/lib/Basic/FileManager.cpp
@@ -530,7 +530,7 @@ void FileManager::fillRealPathName(FileEntry *UFE, llvm::StringRef FileName) {
 
 llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
 FileManager::getBufferForFile(FileEntryRef FE, bool isVolatile,
-                              bool RequiresNullTerminator,
+                              bool RequiresNullTerminator, bool IsText,
                               std::optional<int64_t> MaybeLimit) {
   const FileEntry *Entry = &FE.getFileEntry();
   // If the content is living on the file entry, return a reference to it.
@@ -558,21 +558,21 @@ FileManager::getBufferForFile(FileEntryRef FE, bool isVolatile,
 
   // Otherwise, open the file.
   return getBufferForFileImpl(Filename, FileSize, isVolatile,
-                              RequiresNullTerminator);
+                              RequiresNullTerminator, IsText);
 }
 
 llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
 FileManager::getBufferForFileImpl(StringRef Filename, int64_t FileSize,
-                                  bool isVolatile,
-                                  bool RequiresNullTerminator) const {
+                                  bool isVolatile, bool RequiresNullTerminator,
+                                  bool IsText) const {
   if (FileSystemOpts.WorkingDir.empty())
     return FS->getBufferForFile(Filename, FileSize, RequiresNullTerminator,
-                                isVolatile);
+                                isVolatile, IsText);
 
   SmallString<128> FilePath(Filename);
   FixupRelativePath(FilePath);
   return FS->getBufferForFile(FilePath, FileSize, RequiresNullTerminator,
-                              isVolatile);
+                              isVolatile, IsText);
 }
 
 /// getStatValue - Get the 'stat' information for the specified path,
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp
index d6ec26af80aadd..9ad4996df6e13a 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -121,8 +121,18 @@ ContentCache::getBufferOrNone(DiagnosticsEngine &Diag, FileManager &FM,
   // Start with the assumption that the buffer is invalid to simplify early
   // return paths.
   IsBufferInvalid = true;
-
-  auto BufferOrError = FM.getBufferForFile(*ContentsEntry, IsFileVolatile);
+  bool IsText = false;
+
+#ifdef __MVS__
+  // If the file is tagged with a text ccsid, it may require autoconversion.
+  llvm::ErrorOr<bool> IsFileText =
+      llvm::iszOSTextFile(ContentsEntry->getName().data());
+  if (IsFileText)
+    IsText = *IsFileText;
+#endif
+
+  auto BufferOrError = FM.getBufferForFile(
+      *ContentsEntry, IsFileVolatile, /*RequiresNullTerminator=*/true, IsText);
 
   // If we were unable to open the file, then we are in an inconsistent
   // situation where the content cache referenced a file which no longer
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index e5a1e20a265616..9cfb65ccf1970e 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -5322,7 +5322,8 @@ std::string ASTReader::getOriginalSourceFile(
     const PCHContainerReader &PCHContainerRdr, DiagnosticsEngine &Diags) {
   // Open the AST file.
   auto Buffer = FileMgr.getBufferForFile(ASTFileName, /*IsVolatile=*/false,
-                                         /*RequiresNullTerminator=*/false);
+                                         /*RequiresNullTerminator=*/false,
+                                         /*IsText=*/true);
   if (!Buffer) {
     Diags.Report(diag::err_fe_unable_to_read_pch_file)
         << ASTFileName << Buffer.getError().message();
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
index 4d738e4bea41a6..7d6239a0732fe6 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
@@ -353,12 +353,13 @@ DepScanFile::create(EntryRef Entry) {
 }
 
 llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-DependencyScanningWorkerFilesystem::openFileForRead(const Twine &Path) {
+DependencyScanningWorkerFilesystem::openFileForRead(const Twine &Path,
+                                                    bool IsText) {
   SmallString<256> OwnedFilename;
   StringRef Filename = Path.toStringRef(OwnedFilename);
 
   if (shouldBypass(Filename))
-    return getUnderlyingFS().openFileForRead(Path);
+    return getUnderlyingFS().openFileForRead(Path, IsText);
 
   llvm::ErrorOr<EntryRef> Result = getOrCreateFileSystemEntry(Filename);
   if (!Result)
diff --git a/clang/unittests/Driver/DistroTest.cpp b/clang/unittests/Driver/DistroTest.cpp
index 43efc0dd8f51ec..4c13b8bbb94089 100644
--- a/clang/unittests/Driver/DistroTest.cpp
+++ b/clang/unittests/Driver/DistroTest.cpp
@@ -352,9 +352,9 @@ TEST(DistroTest, DetectWindowsAndCrossCompile) {
     }
 
     llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-    openFileForRead(const llvm::Twine &Path) override {
+    openFileForRead(const llvm::Twine &Path, bool IsText = true) override {
       ++Count;
-      return llvm::vfs::ProxyFileSystem::openFileForRead(Path);
+      return llvm::vfs::ProxyFileSystem::openFileForRead(Path, IsText);
     }
 
     unsigned Count{};
diff --git a/clang/unittests/Driver/ToolChainTest.cpp b/clang/unittests/Driver/ToolChainTest.cpp
index a9b5f3c700315c..6c8b32dec879d1 100644
--- a/clang/unittests/Driver/ToolChainTest.cpp
+++ b/clang/unittests/Driver/ToolChainTest.cpp
@@ -662,7 +662,7 @@ struct FileSystemWithError : public llvm::vfs::FileSystem {
     return std::make_error_code(std::errc::no_such_file_or_directory);
   }
   llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-  openFileForRead(const Twine &Path) override {
+  openFileForRead(const Twine &Path, bool IsText = true) override {
     return std::make_error_code(std::errc::permission_denied);
   }
   llvm::vfs::directory_iterator dir_begin(const Twine &Dir,
diff --git a/clang/unittests/Frontend/PCHPreambleTest.cpp b/clang/unittests/Frontend/PCHPreambleTest.cpp
index 2ce24c91ac0f13..137dcf2377be47 100644
--- a/clang/unittests/Frontend/PCHPreambleTest.cpp
+++ b/clang/unittests/Frontend/PCHPreambleTest.cpp
@@ -36,10 +36,10 @@ class ReadCountingInMemoryFileSystem : public vfs::InMemoryFileSystem
   std::map<std::string, unsigned> ReadCounts;
 
 public:
-  ErrorOr<std::unique_ptr<vfs::File>> openFileForRead(const Twine &Path) override
-  {
+  ErrorOr<std::unique_ptr<vfs::File>>
+  openFileForRead(const Twine &Path, bool IsText = true) override {
     ++ReadCounts[Canonicalize(Path)];
-    return InMemoryFileSystem::openFileForRead(Path);
+    return InMemoryFileSystem::openFileForRead(Path, IsText);
   }
 
   unsigned GetReadCount(const Twine &Path) const
diff --git a/clang/unittests/Tooling/DependencyScanning/DependencyScannerTest.cpp b/clang/unittests/Tooling/DependencyScanning/DependencyScannerTest.cpp
index ec0e143be4a209..1012016ed03c7d 100644
--- a/clang/unittests/Tooling/DependencyScanning/DependencyScannerTest.cpp
+++ b/clang/unittests/Tooling/DependencyScanning/DependencyScannerTest.cpp
@@ -277,9 +277,9 @@ TEST(DependencyScanner, ScanDepsWithModuleLookup) {
     }
 
     llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-    openFileForRead(const Twine &Path) override {
+    openFileForRead(const Twine &Path, bool IsText = true) override {
       ReadFiles.push_back(Path.str());
-      return ProxyFileSystem::openFileForRead(Path);
+      return ProxyFileSystem::openFileForRead(Path, IsText);
     }
   };
 
diff --git a/llvm/include/llvm/Support/AutoConvert.h b/llvm/include/llvm/Support/AutoConvert.h
index 6f45c4683f7775..8ef6d3e8cf973e 100644
--- a/llvm/include/llvm/Support/AutoConvert.h
+++ b/llvm/include/llvm/Support/AutoConvert.h
@@ -52,6 +52,12 @@ std::error_code restorezOSStdHandleAutoConversion(int FD);
 /// \brief Set the tag information for a file descriptor.
 std::error_code setzOSFileTag(int FD, int CCSID, bool Text);
 
+// Get the the tag ccsid for a file name or a file descriptor.
+ErrorOr<__ccsid_t> getzOSFileTag(const char *FileName, const int FD = -1);
+
+// Query the file tag to determine if the file is a text file.
+ErrorOr<bool> iszOSTextFile(const char *Filename, const int FD = -1);
+
 } // namespace llvm
 #endif // __cplusplus
 
diff --git a/llvm/include/llvm/Support/VirtualFileSystem.h b/llvm/include/llvm/Support/VirtualFileSystem.h
index 2531c075f262d7..80ebae6b3d7e49 100644
--- a/llvm/include/llvm/Support/VirtualFileSystem.h
+++ b/llvm/include/llvm/Support/VirtualFileSystem.h
@@ -273,13 +273,14 @@ class FileSystem : public llvm::ThreadSafeRefCountedBase<FileSystem>,
 
   /// Get a \p File object for the file at \p Path, if one exists.
   virtual llvm::ErrorOr<std::unique_ptr<File>>
-  openFileForRead(const Twine &Path) = 0;
+  openFileForRead(const Twine &Path, bool IsText = true) = 0;
 
   /// This is a convenience method that opens a file, gets its content and then
   /// closes the file.
   llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
   getBufferForFile(const Twine &Name, int64_t FileSize = -1,
-                   bool RequiresNullTerminator = true, bool IsVolatile = false);
+                   bool RequiresNullTerminator = true, bool IsVolatile = false,
+                   bool IsText = true);
 
   /// Get a directory_iterator for \p Dir.
   /// \note The 'end' iterator is directory_iterator().
@@ -392,7 +393,7 @@ class OverlayFileSystem : public RTTIExtends<OverlayFileSystem, FileSystem> {
   llvm::ErrorOr<Status> status(const Twine &Path) override;
   bool exists(const Twine &Path) override;
   llvm::ErrorOr<std::unique_ptr<File>>
-  openFileForRead(const Twine &Path) override;
+  openFileForRead(const Twine &Path, bool IsText = true) override;
   directory_iterator dir_begin(const Twine &Dir, std::error_code &EC) override;
   llvm::ErrorOr<std::string> getCurrentWorkingDirectory() const override;
   std::error_code setCurrentWorkingDirectory(const Twine &Path) override;
@@ -446,8 +447,8 @@ class ProxyFileSystem : public RTTIExtends<ProxyFileSystem, FileSystem> {
   }
   bool exists(const Twine &Path) override { return FS->exists(Path); }
   llvm::ErrorOr<std::unique_ptr<File>>
-  openFileForRead(const Twine &Path) override {
-    return FS->openFileForRead(Path);
+  openFileForRead(const Twine &Path, bool IsText = true) override {
+    return FS->openFileForRead(Path, IsText);
   }
   directory_iterator dir_begin(const Twine &Dir, std::error_code &EC) override {
     return FS->dir_begin(Dir, EC);
@@ -615,7 +616,7 @@ class InMemoryFileSystem : public RTTIExtends<InMemoryFileSystem, FileSystem> {
 
   llvm::ErrorOr<Status> status(const Twine &Path) override;
   llvm::ErrorOr<std::unique_ptr<File>>
-  openFileForRead(const Twine &Path) override;
+  openFileForRead(const Twine &Path, bool IsText = true) override;
   directory_iterator dir_begin(const Twine &Dir, std::error_code &EC) override;
 
   llvm::ErrorOr<std::string> getCurrentWorkingDirectory() const override {
@@ -1051,7 +1052,8 @@ class RedirectingFileSystem
 
   ErrorOr<Status> status(const Twine &Path) override;
   bool exists(const Twine &Path) override;
-  ErrorOr<std::unique_ptr<File>> openFileForRead(const Twine &Path) override;
+  ErrorOr<std::unique_ptr<File>> openFileForRead(const Twine &Path,
+                                                 bool IsText = true) override;
 
   std::error_code getRealPath(const Twine &Path,
                               SmallVectorImpl<char> &Output) override;
diff --git a/llvm/lib/Support/AutoConvert.cpp b/llvm/lib/Support/AutoConvert.cpp
index 66570735f8fc88..06130876c67f05 100644
--- a/llvm/lib/Support/AutoConvert.cpp
+++ b/llvm/lib/Support/AutoConvert.cpp
@@ -116,4 +116,30 @@ std::error_code llvm::setzOSFileTag(int FD, int CCSID, bool Text) {
   return std::error_code();
 }
 
+ErrorOr<__ccsid_t> llvm::getzOSFileTag(const char *FileName, const int FD) {
+  // If we have a file descriptor, use it to find out file tagging. Otherwise we
+  // need to use stat() with the file path.
+  if (FD != -1) {
+    struct f_cnvrt Query = {
+        QUERYCVT, // cvtcmd
+        0,        // pccsid
+        0,        // fccsid
+    };
+    if (fcntl(FD, F_CONTROL_CVT, &Query) == -1)
+      return std::error_code(errno, std::generic_category());
+    return Query.fccsid;
+  }
+  struct stat Attr;
+  if (stat(FileName, &Attr) == -1)
+    return std::error_code(errno, std::generic_category());
+  return Attr.st_tag.ft_ccsid;
+}
+
+ErrorOr<bool> llvm::iszOSTextFile(const char *Filename, const int FD) {
+  ErrorOr<__ccsid_t> Ccsid = getzOSFileTag(Filename, FD);
+  if (std::error_code EC = Ccsid.getError())
+    return EC;
+  return *Ccsid != FT_BINARY;
+}
+
 #endif // __MVS__
diff --git a/llvm/lib/Support/FileCollector.cpp b/llvm/lib/Support/FileCollector.cpp
index 29436f85c2f23c..fd4350da7f66a6 100644
--- a/llvm/lib/Support/FileCollector.cpp
+++ b/llvm/lib/Support/FileCollector.cpp
@@ -268,8 +268,8 @@ class FileCollectorFileSystem : public vfs::FileSystem {
   }
 
   llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
-  openFileForRead(const Twine &Path) override {
-    auto Result = FS->openFileForRead(Path);
+  openFileForRead(const Twine &Path, bool IsText) override {
+    auto Result = FS->openFileForRead(Path, IsText);
     if (Result && *Result)
       Collector->addFile(Path);
     return Result;
diff --git a/llvm/lib/Support/VirtualFileSystem.cpp b/llvm/lib/Support/VirtualFileSystem.cpp
index 928c0b5a24ed65..ceca65b46e486d 100644
--- a/llvm/lib/Support/VirtualFileSystem.cpp
+++ b/llvm/lib/Support/VirtualFileSystem.cpp
@@ -117,8 +117,9 @@ FileSystem::~FileSystem() = default;
 
 ErrorOr<std::unique_ptr<MemoryBuffer>>
 FileSystem::getBufferForFile(const llvm::Twine &Name, int64_t FileSize,
-                             bool RequiresNullTerminator, bool IsVolatile) {
-  auto F = openFileForRead(Name);
+                             bool RequiresNullTerminator, bool IsVolatile,
+                             bool IsText) {
+  auto F = openFileForRead(Name, IsText);
   if (!F)
     return F.getError();
 
@@ -278,7 +279,8 @@ class RealFileSystem : public FileSystem {
   }
 
   ErrorOr<Status> status(const Twine &Path) override;
-  ErrorOr<std::unique_ptr<File>> openFileForRead(const Twine &Path) override;
+  ErrorOr<std::unique_ptr<File>> openFileForRead(const Twine &Path,
+                                                 bool IsText = true) override;
   directory_iterator dir_begin(const Twine &Dir, std::error_code &EC) override;
 
   llvm::ErrorOr<std::string> getCurrentWorkingDirectory() const override;
@@ -323,10 +325,11 @@ ErrorOr<Status> RealFileSystem::status(const Twine &Path) {
 }
 
 ErrorOr<std::unique_ptr<File>>
-RealFileSystem::openFileForRead(const Twine &Name) {
+RealFileSystem::openFileForRead(const Twine &Name, bool...
[truncated]

@abhina-sree abhina-sree force-pushed the abhina/propagate_text_flag branch from 27a0638 to 758745c Compare September 9, 2024 20:14
Copy link
Contributor

@zibi2 zibi2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abhina-sree abhina-sree merged commit edf3b27 into llvm:main Sep 19, 2024
9 checks passed
@abhina-sree abhina-sree deleted the abhina/propagate_text_flag branch September 19, 2024 18:30
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
…llvm#107906)

This patch adds an IsText parameter to the following functions
openFileForRead, getBufferForFile, getBufferForFileImpl and determines
whether a file is text by querying the file tag on z/OS. The default is
set to OF_Text instead of OF_None, this change in value does not affect
any other platforms other than z/OS.
Copy link
Member

@kadircet kadircet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi sorry for missing this during the review time, i believe this change is changing some of the core support library interfaces in a way that's not justified.

this is an interface implemented by quite a lot of both upstream and downstream clients, but this change doesn't describe what the semantics around IsText is. so none of these implementations have a way to sustainably understand and maintain this new semantic.

Moreover reading the commit description, I also cannot see how callers can figure out what to pass into this parameter. You were probably fine with this during the implementation as they don't need to care, but this will create an unreasonable maintenance burden for the contributors in the future as no one can reason about semantics here.

Even further, the way this functionality is implemented today, relies on doing system calls to figure out IsTextness. This is against the very nature of a VFS, the behavior you're implementing here will not compose well with rest of the contractual guarantees we have via the VFS abstractions (surely there're cases where we violate that already today, but that isn't a reason to add more).


So in the light of all of these, I believe this change should've gotten approvals from an LLVM code-owner before submission. In that regard, I can see that you've been trying for the past 2 weeks to get attention, unfortunately community can be slow to respond at times, I'd like to ask you to a little bit more patient, or even start an RFC in discourse to get more visibility (as PRs are quite easy to miss).

I was also making some recommendations in RealFileSystem::openFileForRead about how this change might be contained. I think it'd be a lot better if you can keep the interfaces clean and introduce the change only into the parts in which these semantics are meaningful.


Up until these questions are settled (and I am more than happy to follow up on the discussion on my behalf), may I ask you to revert this change? As it's going to create some confusion both in the upstream and downstream users the more it stays around.

@@ -121,8 +121,18 @@ ContentCache::getBufferOrNone(DiagnosticsEngine &Diag, FileManager &FM,
// Start with the assumption that the buffer is invalid to simplify early
// return paths.
IsBufferInvalid = true;

auto BufferOrError = FM.getBufferForFile(*ContentsEntry, IsFileVolatile);
bool IsText = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in all other code paths, IsText always defaults to true, but you're making it default to false here, is this intentional ?

SmallString<256> RealName, Storage;
Expected<file_t> FDOrErr = sys::fs::openNativeFileForRead(
adjustPath(Name, Storage), sys::fs::OF_None, &RealName);
adjustPath(Name, Storage), IsText ? sys::fs::OF_Text : sys::fs::OF_None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like the only place that uses IsText. is there any reason why you can't deduce that signal from Name here directly? rather than propagating it all the way from the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this and see if I hit any issues. I think if the file already exists, we can deduce it here, but in the case the file does not exist, we do not know whether the file is supposed to be text or binary without additional context

@kadircet
Copy link
Member

i've also just noticed 3b3accb, which seem to be pushed without review and any tests, changing behavior more in a non-obvious way. Please not that logic you have in:

  Expected<file_t> FDOrErr = sys::fs::openNativeFileForRead(
      adjustPath(Name, Storage), IsText ? sys::fs::OF_Text : sys::fs::OF_None,
      &RealName);

is affecting all, not just zOS, and by fiddling with the value passed by default, you're in turn changing the flags used by all the platforms.

@abhina-sree
Copy link
Contributor Author

3b3accb

Hi, sure I will revert the two changes so we can discuss further. The OF_Text flag only affects the z/OS platform behaviour, the OF_TextWithCRLF will affect z/OS and Windows. On other platforms, the flag doesn't show any difference in behaviour that I'm aware of

abhina-sree added a commit that referenced this pull request Sep 20, 2024
@kadircet
Copy link
Member

thanks a lot for the swift response!

I can see how none of these implementations are not using those flags today, but we're changing the observable behavior for them as well, and if some of those implementations decides to give meaning to these flags, it might be impossible to keep it working. Especially because we're not clearly defining the semantics around how callers are going to figure those bits out.

So I am wondering:

  • if we can make these changes in a very limited way, e.g. just in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L972-L1118. We already have lots of __MVS__ specific regions in there. that's also the "right" layer if you want to perform syscalls.
  • if we need to make those changes in the VFS interfaces indeed, can you provide the rationale explaining that need and what the new semantics mean both for implementors of the interface and the callers of it?

@abhina-sree
Copy link
Contributor Author

thanks a lot for the swift response!

I can see how none of these implementations are not using those flags today, but we're changing the observable behavior for them as well, and if some of those implementations decides to give meaning to these flags, it might be impossible to keep it working. Especially because we're not clearly defining the semantics around how callers are going to figure those bits out.

So I am wondering:

  • if we can make these changes in a very limited way, e.g. just in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L972-L1118. We already have lots of __MVS__ specific regions in there. that's also the "right" layer if you want to perform syscalls.
  • if we need to make those changes in the VFS interfaces indeed, can you provide the rationale explaining that need and what the new semantics mean both for implementors of the interface and the callers of it?

So when we call this function sys::fs::openNativeFileForRead, there is an assumption that we already know whether the file is text or binary based on the OpenFlags that are passed to it. This function RealFileSystem::openFileForRead also assumes the file is binary by passing the OF_None flag. I'm not sure if there is a better way to determine whether the file is text or binary in this function without querying the file tag using a z/OS syscall in this function.

I think I can limit it to the following change to avoid adding additional parameters to all the other functions which I agree is a much cleaner and less invasive solution. Please let me know your thoughts, thanks!

diff --git a/llvm/lib/Support/VirtualFileSystem.cpp b/llvm/lib/Support/VirtualFileSystem.cpp
index 928c0b5a24ed..7153560754d1 100644
--- a/llvm/lib/Support/VirtualFileSystem.cpp
+++ b/llvm/lib/Support/VirtualFileSystem.cpp
@@ -22,6 +22,7 @@
 #include "llvm/ADT/Twine.h"
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/Config/llvm-config.h"
+#include "llvm/Support/AutoConvert.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Chrono.h"
 #include "llvm/Support/Compiler.h"
@@ -325,8 +326,16 @@ ErrorOr<Status> RealFileSystem::status(const Twine &Path) {
 ErrorOr<std::unique_ptr<File>>
 RealFileSystem::openFileForRead(const Twine &Name) {
   SmallString<256> RealName, Storage;
+  bool IsText = true;
+#ifdef __MVS__
+  // If the file is tagged with a text ccsid, it may require autoconversion.
+  llvm::ErrorOr<bool> IsFileText =
+      llvm::iszOSTextFile(Name.str().c_str());
+  if (IsFileText)
+    IsText = *IsFileText;
+#endif
   Expected<file_t> FDOrErr = sys::fs::openNativeFileForRead(
-      adjustPath(Name, Storage), sys::fs::OF_None, &RealName);
+      adjustPath(Name, Storage), IsText ? sys::fs::OF_Text : sys::fs::OF_None, &RealName);
   if (!FDOrErr)
     return errorToErrorCode(FDOrErr.takeError());
   return std::unique_ptr<File>(

@kadircet
Copy link
Member

i think limiting this to RealFileSystem::openFileForRead LG, but can we make sure IsText defaults to false and we can only set it in #ifdef __MVS__ block to make sure this is really a no-op for other platforms. e.g.:

auto OpenFlags = sys::fs::OF_None;
#ifdef __MVS__
....
  OpenFlags |= sys::fs::OF_Text;
....
#endif
   Expected<file_t> FDOrErr = sys::fs::openNativeFileForRead(
      adjustPath(Name, Storage), OpenFlags, &RealName);

I'd still prefer doing this inside https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L972-L1118 as we have other utility functions that create file-descriptors and all of them end up using these base functions. By having your logic at a layer above these, you risk having subtle discrepancies. But I don't know much about zOS or why this is needed, so I'll leave management of that risk to you.

@abhina-sree
Copy link
Contributor Author

OpenFlags |= sys::fs::OF_Text;

Thanks, I've made your suggested change and I opened a new PR here #109664. A short explanation for why distinguishing text and binary files on z/OS is important is because the native encoding is not UTF-8, so we need to know whether the file is text in order to convert to a UTF-8 encoding so that it is readable.

cachemeifyoucan added a commit to swiftlang/llvm-project that referenced this pull request Sep 24, 2024
Fix build failure in unit test.

rdar://136591087
cachemeifyoucan added a commit to cachemeifyoucan/llvm-project that referenced this pull request Sep 24, 2024
Fix build failure in unit test.

rdar://136591087
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category clang-tools-extra clangd lldb llvm:support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants