Skip to content

[SystemZ][z/OS] Query if a file is text and set the text flag accordingly #109664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions llvm/include/llvm/Support/AutoConvert.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ std::error_code restorezOSStdHandleAutoConversion(int FD);
/// \brief Set the tag information for a file descriptor.
std::error_code setzOSFileTag(int FD, int CCSID, bool Text);

// Query the file tag to determine if the file is a text file.
bool iszOSTextFile(const char *Filename, const int FD = -1);

} // namespace llvm
#endif // __cplusplus

Expand Down
26 changes: 26 additions & 0 deletions llvm/lib/Support/AutoConvert.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,30 @@ std::error_code llvm::setzOSFileTag(int FD, int CCSID, bool Text) {
return std::error_code();
}

ErrorOr<__ccsid_t> getzOSFileTag(const char *FileName, const int FD) {
// If we have a file descriptor, use it to find out file tagging. Otherwise we
// need to use stat() with the file path.
if (FD != -1) {
struct f_cnvrt Query = {
QUERYCVT, // cvtcmd
0, // pccsid
0, // fccsid
};
if (fcntl(FD, F_CONTROL_CVT, &Query) == -1)
return std::error_code(errno, std::generic_category());
return Query.fccsid;
}
struct stat Attr;
if (stat(FileName, &Attr) == -1)
return std::error_code(errno, std::generic_category());
return Attr.st_tag.ft_ccsid;
}

bool llvm::iszOSTextFile(const char *Filename, const int FD) {
ErrorOr<__ccsid_t> Ccsid = getzOSFileTag(Filename, FD);
if (std::error_code EC = Ccsid.getError())
return false;
return *Ccsid != FT_BINARY;
}

#endif // __MVS__
8 changes: 7 additions & 1 deletion llvm/lib/Support/VirtualFileSystem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -325,8 +325,14 @@ ErrorOr<Status> RealFileSystem::status(const Twine &Path) {
ErrorOr<std::unique_ptr<File>>
RealFileSystem::openFileForRead(const Twine &Name) {
SmallString<256> RealName, Storage;
auto OpenFlags = sys::fs::OF_None;
#ifdef __MVS__
// If the file is tagged with a text ccsid, it may require autoconversion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so as discussed in #107906 (mostly leaving it for context of future travelers); it's still unclear to me why we need to perform this check only at this point (and not other places that reads files, e.g. deep down the system helper stack).

Can you expand the comments here a little bit? At high level it's not obvious what an autoconversion is and why it's needed. I think mentioning that in here would make it easier for maintenance going forward, since most people that touch this code, won't know about zOS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So automatic conversion is a z/OS specific functionality to help ASCII applications work on z/OS. The native encoding on z/OS is EBCDIC, not ASCII-based, so without this tool enabled, we aren't able to read files correctly. This is some documentation on this feature: https://www.ibm.com/docs/en/zos/3.1.0?topic=pages-enhanced-ascii

There are also other places that do require distinguishing between text and binary files as well, there were many places that were modified in the past for example https://reviews.llvm.org/D99426.
But since the openFileForRead() function mainly deals with reading existing files, so querying the file tag seems sufficient to determining whether its binary or text based instead of passing an IsText parameter based on where it was called from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation!

sorry if I am being dense, but I still don't understand why it has to happen in this layer, rather than in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1035-L1114:

  • it already has access to all of this information
  • it has logic that's already dealing with autoconversions
  • you can check if we're trying to open a file that exists, even better we've already stat'd the file, so you can check if it contains the tag, without performing any extra system calls on the way.

I assume you're already trying to make sure logic in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1098-L1105 triggers by inferring OF_TEXT flag for all callers here.

What I am asking for is, why we're not performing that inference at the layer that's shared for all such filesystem operations, i.e. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Unix/Path.inc#L1035-L1114. What breaks/doesn't work?

Copy link
Contributor Author

@abhina-sree abhina-sree Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because previously OF_None was being passed at this point unconditionally. In Path.inc in functions (like openNativeFileForRead, openFile, etc) we do not modify the OpenFlags and respect the flags that were passed. So it would seem wrong to pass OF_None here (which implies a binary file) and then decide it is a text file later on in the code path.

if (llvm::iszOSTextFile(Name.str().c_str()))
OpenFlags |= sys::fs::OF_Text;
#endif
Expected<file_t> FDOrErr = sys::fs::openNativeFileForRead(
adjustPath(Name, Storage), sys::fs::OF_None, &RealName);
adjustPath(Name, Storage), OpenFlags, &RealName);
if (!FDOrErr)
return errorToErrorCode(FDOrErr.takeError());
return std::unique_ptr<File>(
Expand Down
Loading