-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SystemZ][z/OS] Query if a file is text and set the text flag accordingly #109664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-support Author: Abhina Sree (abhina-sree) ChangesThis patch sets the text flag correctly when opening files by querying the file tag on z/OS. For other platforms, there will be no change in behaviour. Full diff: https://github.com/llvm/llvm-project/pull/109664.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Support/AutoConvert.h b/llvm/include/llvm/Support/AutoConvert.h
index 6f45c4683f7775..8ef6d3e8cf973e 100644
--- a/llvm/include/llvm/Support/AutoConvert.h
+++ b/llvm/include/llvm/Support/AutoConvert.h
@@ -52,6 +52,12 @@ std::error_code restorezOSStdHandleAutoConversion(int FD);
/// \brief Set the tag information for a file descriptor.
std::error_code setzOSFileTag(int FD, int CCSID, bool Text);
+// Get the the tag ccsid for a file name or a file descriptor.
+ErrorOr<__ccsid_t> getzOSFileTag(const char *FileName, const int FD = -1);
+
+// Query the file tag to determine if the file is a text file.
+ErrorOr<bool> iszOSTextFile(const char *Filename, const int FD = -1);
+
} // namespace llvm
#endif // __cplusplus
diff --git a/llvm/lib/Support/AutoConvert.cpp b/llvm/lib/Support/AutoConvert.cpp
index 66570735f8fc88..06130876c67f05 100644
--- a/llvm/lib/Support/AutoConvert.cpp
+++ b/llvm/lib/Support/AutoConvert.cpp
@@ -116,4 +116,30 @@ std::error_code llvm::setzOSFileTag(int FD, int CCSID, bool Text) {
return std::error_code();
}
+ErrorOr<__ccsid_t> llvm::getzOSFileTag(const char *FileName, const int FD) {
+ // If we have a file descriptor, use it to find out file tagging. Otherwise we
+ // need to use stat() with the file path.
+ if (FD != -1) {
+ struct f_cnvrt Query = {
+ QUERYCVT, // cvtcmd
+ 0, // pccsid
+ 0, // fccsid
+ };
+ if (fcntl(FD, F_CONTROL_CVT, &Query) == -1)
+ return std::error_code(errno, std::generic_category());
+ return Query.fccsid;
+ }
+ struct stat Attr;
+ if (stat(FileName, &Attr) == -1)
+ return std::error_code(errno, std::generic_category());
+ return Attr.st_tag.ft_ccsid;
+}
+
+ErrorOr<bool> llvm::iszOSTextFile(const char *Filename, const int FD) {
+ ErrorOr<__ccsid_t> Ccsid = getzOSFileTag(Filename, FD);
+ if (std::error_code EC = Ccsid.getError())
+ return EC;
+ return *Ccsid != FT_BINARY;
+}
+
#endif // __MVS__
diff --git a/llvm/lib/Support/VirtualFileSystem.cpp b/llvm/lib/Support/VirtualFileSystem.cpp
index 928c0b5a24ed65..50aa60363db166 100644
--- a/llvm/lib/Support/VirtualFileSystem.cpp
+++ b/llvm/lib/Support/VirtualFileSystem.cpp
@@ -325,8 +325,16 @@ ErrorOr<Status> RealFileSystem::status(const Twine &Path) {
ErrorOr<std::unique_ptr<File>>
RealFileSystem::openFileForRead(const Twine &Name) {
SmallString<256> RealName, Storage;
+ auto OpenFlags = sys::fs::OF_None;
+#ifdef __MVS__
+ // If the file is tagged with a text ccsid, it may require autoconversion.
+ llvm::ErrorOr<bool> IsFileText =
+ llvm::iszOSTextFile(Name.str().c_str());
+ if (IsFileText && *IsFileText)
+ OpenFlags |= sys::fs::OF_Text;
+#endif
Expected<file_t> FDOrErr = sys::fs::openNativeFileForRead(
- adjustPath(Name, Storage), sys::fs::OF_None, &RealName);
+ adjustPath(Name, Storage), OpenFlags, &RealName);
if (!FDOrErr)
return errorToErrorCode(FDOrErr.takeError());
return std::unique_ptr<File>(
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
244ad16
to
d417ccc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -325,8 +325,15 @@ ErrorOr<Status> RealFileSystem::status(const Twine &Path) { | |||
ErrorOr<std::unique_ptr<File>> | |||
RealFileSystem::openFileForRead(const Twine &Name) { | |||
SmallString<256> RealName, Storage; | |||
auto OpenFlags = sys::fs::OF_None; | |||
#ifdef __MVS__ | |||
// If the file is tagged with a text ccsid, it may require autoconversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so as discussed in #107906 (mostly leaving it for context of future travelers); it's still unclear to me why we need to perform this check only at this point (and not other places that reads files, e.g. deep down the system helper stack).
Can you expand the comments here a little bit? At high level it's not obvious what an autoconversion
is and why it's needed. I think mentioning that in here would make it easier for maintenance going forward, since most people that touch this code, won't know about zOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So automatic conversion is a z/OS specific functionality to help ASCII applications work on z/OS. The native encoding on z/OS is EBCDIC, not ASCII-based, so without this tool enabled, we aren't able to read files correctly. This is some documentation on this feature: https://www.ibm.com/docs/en/zos/3.1.0?topic=pages-enhanced-ascii
There are also other places that do require distinguishing between text and binary files as well, there were many places that were modified in the past for example https://reviews.llvm.org/D99426.
But since the openFileForRead() function mainly deals with reading existing files, so querying the file tag seems sufficient to determining whether its binary or text based instead of passing an IsText parameter based on where it was called from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation!
sorry if I am being dense, but I still don't understand why it has to happen in this layer, rather than in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1035-L1114:
- it already has access to all of this information
- it has logic that's already dealing with autoconversions
- you can check if we're trying to open a file that exists, even better we've already stat'd the file, so you can check if it contains the tag, without performing any extra system calls on the way.
I assume you're already trying to make sure logic in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1098-L1105 triggers by inferring OF_TEXT
flag for all callers here.
What I am asking for is, why we're not performing that inference at the layer that's shared for all such filesystem operations, i.e. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1035-L1114. What breaks/doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because previously OF_None was being passed at this point unconditionally. In Path.inc in functions (like openNativeFileForRead, openFile, etc) we do not modify the OpenFlags and respect the flags that were passed. So it would seem wrong to pass OF_None here (which implies a binary file) and then decide it is a text file later on in the code path.
@@ -325,8 +325,15 @@ ErrorOr<Status> RealFileSystem::status(const Twine &Path) { | |||
ErrorOr<std::unique_ptr<File>> | |||
RealFileSystem::openFileForRead(const Twine &Name) { | |||
SmallString<256> RealName, Storage; | |||
auto OpenFlags = sys::fs::OF_None; | |||
#ifdef __MVS__ | |||
// If the file is tagged with a text ccsid, it may require autoconversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation!
sorry if I am being dense, but I still don't understand why it has to happen in this layer, rather than in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1035-L1114:
- it already has access to all of this information
- it has logic that's already dealing with autoconversions
- you can check if we're trying to open a file that exists, even better we've already stat'd the file, so you can check if it contains the tag, without performing any extra system calls on the way.
I assume you're already trying to make sure logic in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1098-L1105 triggers by inferring OF_TEXT
flag for all callers here.
What I am asking for is, why we're not performing that inference at the layer that's shared for all such filesystem operations, i.e. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1035-L1114. What breaks/doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use the actual file to determine if it should be read as text or binary. The context of where the file is being read determines if we need to treat it as a text file, not the file being read. I think one example is the #embed
directive. The #embed
is a binary file inclusion. If the file is a text file we need to embed the original bytes from the file (i.e. as binary). If we check the file, see it is text and then do a code page conversion when reading the file as text, we will include the wrong values into the program. The call to open the file has to say if it is text or binary.
Looks like this will also fail with header map files. These are binary files and don't have any file tags. The compiler will read them as EBCDIC instead as binary. |
I've also noticed that the lit tests that are testing |
Closing this PR as there were some bugs related with this implementation |
This patch sets the text flag correctly when opening files by querying the file tag on z/OS. For other platforms, there will be no change in behaviour.