-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Unicode × Linux #2368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The head ref may contain hidden characters: "unicode\u2010linux"
Unicode × Linux #2368
Changes from all commits
b6eab71
986c333
c92803b
4c3a9db
12dba81
a20b64a
730c2be
081dd0a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
module Cπשּׁµ𝄞🇺🇳🇮🇱x̱̱̱̱̱̄̄̄̄̄ { | ||
header "Cπשּׁµ𝄞🇺🇳🇮🇱x̱̱̱̱̱̄̄̄̄̄.h" | ||
export * | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -184,6 +184,16 @@ public protocol FileSystem: class { | |
/// | ||
/// The method throws if the underlying stat call fails. | ||
func getFileInfo(_ path: AbsolutePath) throws -> FileInfo | ||
|
||
/// Returns the path in the normalization form (or lack thereof) actually present. | ||
/// Any path components without an extant match are left unaltered. | ||
/// | ||
/// On file systems that natively consider Unicode‐equivalent paths to be equal, | ||
/// the returned value need not represent the actual scalars present. | ||
/// It is sufficient if whatever scalars are returned | ||
/// *will be considered equal to the real ones by the file system* | ||
/// when future file system calls are made.* | ||
func resolveUnicode(_ path: AbsolutePath) -> AbsolutePath | ||
} | ||
|
||
/// Convenience implementations (default arguments aren't permitted in protocol | ||
|
@@ -224,41 +234,78 @@ public extension FileSystem { | |
func getFileInfo(_ path: AbsolutePath) throws -> FileInfo { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should the default implementations above assert that the path is resolved? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the other comment about types and assertions. |
||
fatalError("This file system currently doesn't support this method") | ||
} | ||
|
||
func resolveUnicode(_ path: AbsolutePath) -> AbsolutePath { | ||
return path | ||
} | ||
} | ||
|
||
/// Concrete FileSystem implementation which communicates with the local file system. | ||
private class LocalFileSystem: FileSystem { | ||
|
||
func resolveUnicode(_ path: AbsolutePath) -> AbsolutePath { | ||
#if os(macOS) | ||
// The macOS file system enforces NFD. | ||
// It can find everything without any help. | ||
return path | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this contingent on APFS? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have no idea. File systems largely aren’t my area of expertise. Based on this note on the Uncyclopedia article for APFS, I suspect you are at least generally right that my |
||
#else | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, this will tank the performance on non-macOS platforms. Can you check how corelibs and llvm::path handle these? I wonder if there's a API in corelibs that we can leverage here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I don’t find anything relevant in corelibs when searching for “Unicode”, or when perusing the FileManager and URL sources. Maybe someone from that team would know better? @millenomi? I find nothing related in Those lower levels may simply not care. The package manager is in an unusual situation, in that it looks up paths that it neither received from the file system nor placed there itself (and also in that it cannot defer this responsibility to a higher‐level client). This is the gap in which lookup becomes necessary. SwiftPM’s unique case is compounded by the fact that the same string literal in a manifest may need to match differing byte representations when checked out on different devices, so even telling users to be explicit with their literals ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intent for corelibs is to mimic Darwin. On Darwin, normalization occurs at the filesystem access level. These should likely be contributions to FileManager to supplement its .fileSystemRepresentation… method. FileManager should accept any path as input and normalize it internally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i.e. this is the wrong place; we should make FileManager better. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I agree that this should be solved at the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is good to hear. Unfortunately, my Linux machine is not large enough to develop
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't checked but maybe it's possible to develop corelibs-foundation against as a standalone project if you install a toolchain on Linux? |
||
if _exists(path, followSymlink: true) { | ||
return path | ||
} else { | ||
// Search for any neighbours with Unicode‐equivalent names | ||
// in order to be resilient against Unicode‐naïveté in the file system. | ||
let parent = resolveUnicode(path.parentDirectory) | ||
let neighbours = (try? _getDirectoryContents(parent)) ?? [] | ||
if let equivalent = neighbours.first(where: { $0 == path.basename }) { | ||
// Return the extant equivalent. | ||
return parent.appending(component: equivalent) | ||
} else { | ||
// Nothing found; return unaltered (but still using resolved parent). | ||
return parent.appending(component: path.basename) | ||
} | ||
} | ||
#endif | ||
} | ||
|
||
func isExecutableFile(_ path: AbsolutePath) -> Bool { | ||
let path = resolveUnicode(path) | ||
// Our semantics doesn't consider directories. | ||
return (self.isFile(path) || self.isSymlink(path)) && FileManager.default.isExecutableFile(atPath: path.pathString) | ||
} | ||
|
||
func exists(_ path: AbsolutePath, followSymlink: Bool) -> Bool { | ||
/// This method is Unicode‐naïve. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good indication that an assert would be nice; at the very least it is documentation about what you mean by "naive", as in it assumes the caller handled such details. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the other comment about types and assertions. It is |
||
private func _exists(_ path: AbsolutePath, followSymlink: Bool) -> Bool { | ||
if followSymlink { | ||
return FileManager.default.fileExists(atPath: path.pathString) | ||
} | ||
return (try? FileManager.default.attributesOfItem(atPath: path.pathString)) != nil | ||
} | ||
func exists(_ path: AbsolutePath, followSymlink: Bool) -> Bool { | ||
let path = resolveUnicode(path) | ||
return _exists(path, followSymlink: followSymlink) | ||
} | ||
|
||
func isDirectory(_ path: AbsolutePath) -> Bool { | ||
let path = resolveUnicode(path) | ||
var isDirectory: ObjCBool = false | ||
let exists: Bool = FileManager.default.fileExists(atPath: path.pathString, isDirectory: &isDirectory) | ||
return exists && isDirectory.boolValue | ||
} | ||
|
||
func isFile(_ path: AbsolutePath) -> Bool { | ||
let path = resolveSymlinks(path) | ||
let path = resolveUnicode(resolveSymlinks(resolveUnicode(path))) | ||
let attrs = try? FileManager.default.attributesOfItem(atPath: path.pathString) | ||
return attrs?[.type] as? FileAttributeType == .typeRegular | ||
} | ||
|
||
func isSymlink(_ path: AbsolutePath) -> Bool { | ||
let path = resolveUnicode(path) | ||
let attrs = try? FileManager.default.attributesOfItem(atPath: path.pathString) | ||
return attrs?[.type] as? FileAttributeType == .typeSymbolicLink | ||
} | ||
|
||
func getFileInfo(_ path: AbsolutePath) throws -> FileInfo { | ||
let path = resolveUnicode(path) | ||
let attrs = try FileManager.default.attributesOfItem(atPath: path.pathString) | ||
return FileInfo(attrs) | ||
} | ||
|
@@ -272,32 +319,38 @@ private class LocalFileSystem: FileSystem { | |
return AbsolutePath(NSHomeDirectory()) | ||
} | ||
|
||
/// This method is Unicode‐naïve. | ||
func _getDirectoryContents(_ path: AbsolutePath) throws -> [String] { | ||
#if canImport(Darwin) | ||
return try FileManager.default.contentsOfDirectory(atPath: path.pathString) | ||
#else | ||
do { | ||
return try FileManager.default.contentsOfDirectory(atPath: path.pathString) | ||
} catch let error as NSError { | ||
// Fixup error from corelibs-foundation. | ||
if error.code == CocoaError.fileReadNoSuchFile.rawValue, !error.userInfo.keys.contains(NSLocalizedDescriptionKey) { | ||
var userInfo = error.userInfo | ||
userInfo[NSLocalizedDescriptionKey] = "The folder “\(path.basename)” doesn’t exist." | ||
throw NSError(domain: error.domain, code: error.code, userInfo: userInfo) | ||
} | ||
throw error | ||
} | ||
#endif | ||
} | ||
func getDirectoryContents(_ path: AbsolutePath) throws -> [String] { | ||
#if canImport(Darwin) | ||
return try FileManager.default.contentsOfDirectory(atPath: path.pathString) | ||
#else | ||
do { | ||
return try FileManager.default.contentsOfDirectory(atPath: path.pathString) | ||
} catch let error as NSError { | ||
// Fixup error from corelibs-foundation. | ||
if error.code == CocoaError.fileReadNoSuchFile.rawValue, !error.userInfo.keys.contains(NSLocalizedDescriptionKey) { | ||
var userInfo = error.userInfo | ||
userInfo[NSLocalizedDescriptionKey] = "The folder “\(path.basename)” doesn’t exist." | ||
throw NSError(domain: error.domain, code: error.code, userInfo: userInfo) | ||
} | ||
throw error | ||
} | ||
#endif | ||
return try _getDirectoryContents(resolveUnicode(path)) | ||
} | ||
|
||
func createDirectory(_ path: AbsolutePath, recursive: Bool) throws { | ||
let path = resolveUnicode(path) | ||
// Don't fail if path is already a directory. | ||
if isDirectory(path) { return } | ||
|
||
try FileManager.default.createDirectory(atPath: path.pathString, withIntermediateDirectories: recursive, attributes: [:]) | ||
} | ||
|
||
func readFileContents(_ path: AbsolutePath) throws -> ByteString { | ||
let path = resolveUnicode(path) | ||
// Open the file. | ||
let fp = fopen(path.pathString, "rb") | ||
if fp == nil { | ||
|
@@ -327,6 +380,7 @@ private class LocalFileSystem: FileSystem { | |
} | ||
|
||
func writeFileContents(_ path: AbsolutePath, bytes: ByteString) throws { | ||
let path = resolveUnicode(path) | ||
// Open the file. | ||
let fp = fopen(path.pathString, "wb") | ||
if fp == nil { | ||
|
@@ -350,6 +404,7 @@ private class LocalFileSystem: FileSystem { | |
} | ||
|
||
func writeFileContents(_ path: AbsolutePath, bytes: ByteString, atomically: Bool) throws { | ||
let path = resolveUnicode(path) | ||
// Perform non-atomic writes using the fast path. | ||
if !atomically { | ||
return try writeFileContents(path, bytes: bytes) | ||
|
@@ -369,12 +424,14 @@ private class LocalFileSystem: FileSystem { | |
} | ||
|
||
func removeFileTree(_ path: AbsolutePath) throws { | ||
let path = resolveUnicode(path) | ||
if self.exists(path, followSymlink: false) { | ||
try FileManager.default.removeItem(atPath: path.pathString) | ||
} | ||
} | ||
|
||
func chmod(_ mode: FileMode, path: AbsolutePath, options: Set<FileMode.Option>) throws { | ||
let path = resolveUnicode(path) | ||
guard exists(path) else { return } | ||
func setMode(path: String) throws { | ||
let attrs = try FileManager.default.attributesOfItem(atPath: path) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,13 +73,45 @@ class FileSystemTests: XCTestCase { | |
_ = try fs.getDirectoryContents(AbsolutePath("/does-not-exist")) | ||
XCTFail("Unexpected success") | ||
} catch { | ||
XCTAssertEqual(error.localizedDescription, "The folder “does-not-exist” doesn’t exist.") | ||
XCTAssertEqual(error._domain, "NSCocoaErrorDomain") | ||
XCTAssertEqual(error._code, 260) | ||
XCTAssert(error.localizedDescription.contains("does-not-exist")) | ||
// English description: “The folder “does-not-exist” doesn’t exist.” | ||
} | ||
|
||
let thisDirectoryContents = try! fs.getDirectoryContents(AbsolutePath(#file).parentDirectory) | ||
XCTAssertTrue(!thisDirectoryContents.contains(where: { $0 == "." })) | ||
XCTAssertTrue(!thisDirectoryContents.contains(where: { $0 == ".." })) | ||
XCTAssertTrue(thisDirectoryContents.contains(where: { $0 == AbsolutePath(#file).basename })) | ||
|
||
// Unicode | ||
let willyNillyString = "e\u{301}\u{E9}x\u{304}\u{331}" // Neither NFC nor NFD. | ||
let nfdString = "e\u{301}e\u{301}x\u{331}\u{304}" | ||
let nfcString = "\u{E9}\u{E9}x\u{331}\u{304}" | ||
#if !os(Linux) // TODO: Linux cannot normalize?!? | ||
XCTAssertEqual( | ||
Array(willyNillyString.decomposedStringWithCanonicalMapping.unicodeScalars), | ||
Array(nfdString.unicodeScalars)) | ||
XCTAssertEqual( | ||
Array(willyNillyString.precomposedStringWithCanonicalMapping.unicodeScalars), | ||
Array(nfcString.unicodeScalars)) | ||
#endif | ||
let willyNilly = tempDirPath.appending(component: willyNillyString) | ||
let nfd = tempDirPath.appending(component: nfdString) | ||
let nfc = tempDirPath.appending(component: nfcString) | ||
let variants = [willyNilly, nfd, nfc] | ||
for directory in variants { | ||
do { | ||
try fs.createDirectory(directory) | ||
defer { try? fs.removeFileTree(directory) } | ||
for other in variants { | ||
_ = try fs.getDirectoryContents(other) | ||
} | ||
} catch { | ||
let directoryName = directory.basename.unicodeScalars.map({ "U+\($0.value)" }) | ||
XCTFail("Unable to handle directory named “\(directoryName)”: \(error)") | ||
} | ||
} | ||
} | ||
} | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you're looking for a more complete programmatic data set to drive this, the stdlib uses Unicode's published normalization tests at https://raw.githubusercontent.com/apple/swift/master/test/stdlib/Inputs/NormalizationTest.txt There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This lightweight test is really only there to check that Unicode equivalence is being considered at all. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can also try |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, a "unicode resolved" path would be reflected in the type system as such. Have you thought about ways to present this distinction?
Otherwise, be sure to liberally assert anywhere assuming a resolved path is in fact resolved (if it comes up). I like as many checks as possible to guard against the omission of an API call, especially as code evolves over multiple contributors.
Example:
which can be sprinkled about as needed. If the idea is that all
AbsolutePath
-receiving APIs on this protocol have to be resolved, then a separate (or phantom) type becomes more appealing. Another option is to have the protocol requirements be something likeuncheckedGetFileInfo(_:)
and havegetFileInfo(_:)
be in an extension that runs this assert before callinguncheckedGetFileInfo(_:)
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if this is about putting the path into the conformer's preferred normal form, then this might be better called
normalizeUnicode
or something, unless normalization is the only kind of resolution imaginable for file systems (it might be).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look, @milseman. All very good questions.
I’m not aiming at supporting old code pages. (If someone else wants to extend it for that, I leave it up to them.) The aim here is about resolving the discrepancy between path components entered in the manifest, which use Swift
String
’s Unicode equivalence, and the file paths stored by the platform, which outside of macOS do not attempt to handle Unicode equivalence. A manifest should be equally loadable on every platform. Right now you get “file not found” errors when you go back and forth between platforms.I have not thought about them and I have done nothing special to handle them. Whatever
String
does ought to be fine, at least for now. In contrast to what this aims to fix, unpaired surrogates would indicate that the string itself is already “broken” to begin with.Initially I wanted to do it that way. Unfortunately, I realized that is itself unsafe, because the “resolvedness” of a path can be invalidated over time as file operations occur, some of which are hard for SwiftPM to track (such as Git). Since resolution must be redone before each operation anyway, the type‐system doesn’t actually help us much.
The assertion would have to do all the same work to check whether the resolution is valid. (And we cannot just record a flag in
AbsolutePath
for the same reason a separate type would not work.) CallingresolveUnicode(_:)
at the start of eachLocalFileSystem
method, effectively accomplishes the same safety otherwise afforded byassert
.The two private underscored methods are basically what you mean by
unchecked
. They only exist because they need a variant callable from withinresolveUnicode(_:)
that doesn’t wind up circular. Nothing else should be using them.Normalization‐related stuff is the only resolution I am imagining here. However, it is actually not normalization, but the reverse. On platforms besides macOS, the file on disk may have a name that is neither NFD nor NFC, and we still need to be able to find it.