Skip to content

[Support] Use std::filesystem::remove_all() in remove_directories() #119146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

slydiman
Copy link
Contributor

@slydiman slydiman commented Dec 8, 2024

See #118677 for details.

@llvmbot
Copy link
Member

llvmbot commented Dec 8, 2024

@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-llvm-support

Author: Dmitry Vasilyev (slydiman)

Changes

See #118677 for details.


Full diff: https://github.com/llvm/llvm-project/pull/119146.diff

3 Files Affected:

  • (modified) llvm/lib/Support/Path.cpp (+8)
  • (modified) llvm/lib/Support/Unix/Path.inc (-40)
  • (modified) llvm/lib/Support/Windows/Path.inc (-26)
diff --git a/llvm/lib/Support/Path.cpp b/llvm/lib/Support/Path.cpp
index d7752851971030..4c0804f2dd494d 100644
--- a/llvm/lib/Support/Path.cpp
+++ b/llvm/lib/Support/Path.cpp
@@ -22,6 +22,7 @@
 #include "llvm/Support/Process.h"
 #include "llvm/Support/Signals.h"
 #include <cctype>
+#include <filesystem>
 
 #if !defined(_MSC_VER) && !defined(__MINGW32__)
 #include <unistd.h>
@@ -1206,6 +1207,13 @@ namespace llvm {
 namespace sys {
 namespace fs {
 
+std::error_code remove_directories(const Twine &path, bool IgnoreErrors) {
+  const std::filesystem::path Path(path.str());
+  std::error_code EC;
+  std::filesystem::remove_all(Path, EC);
+  return IgnoreErrors ? std::error_code() : EC;
+}
+
 TempFile::TempFile(StringRef Name, int FD)
     : TmpName(std::string(Name)), FD(FD) {}
 TempFile::TempFile(TempFile &&Other) { *this = std::move(Other); }
diff --git a/llvm/lib/Support/Unix/Path.inc b/llvm/lib/Support/Unix/Path.inc
index 44097bad7b46ed..b8cc6b8bdb464a 100644
--- a/llvm/lib/Support/Unix/Path.inc
+++ b/llvm/lib/Support/Unix/Path.inc
@@ -1272,46 +1272,6 @@ std::error_code closeFile(file_t &F) {
   return Process::SafelyCloseFileDescriptor(TmpF);
 }
 
-template <typename T>
-static std::error_code remove_directories_impl(const T &Entry,
-                                               bool IgnoreErrors) {
-  std::error_code EC;
-  directory_iterator Begin(Entry, EC, false);
-  directory_iterator End;
-  while (Begin != End) {
-    auto &Item = *Begin;
-    ErrorOr<basic_file_status> st = Item.status();
-    if (st) {
-      if (is_directory(*st)) {
-        EC = remove_directories_impl(Item, IgnoreErrors);
-        if (EC && !IgnoreErrors)
-          return EC;
-      }
-
-      EC = fs::remove(Item.path(), true);
-      if (EC && !IgnoreErrors)
-        return EC;
-    } else if (!IgnoreErrors) {
-      return st.getError();
-    }
-
-    Begin.increment(EC);
-    if (EC && !IgnoreErrors)
-      return EC;
-  }
-  return std::error_code();
-}
-
-std::error_code remove_directories(const Twine &path, bool IgnoreErrors) {
-  auto EC = remove_directories_impl(path, IgnoreErrors);
-  if (EC && !IgnoreErrors)
-    return EC;
-  EC = fs::remove(path, true);
-  if (EC && !IgnoreErrors)
-    return EC;
-  return std::error_code();
-}
-
 std::error_code real_path(const Twine &path, SmallVectorImpl<char> &dest,
                           bool expand_tilde) {
   dest.clear();
diff --git a/llvm/lib/Support/Windows/Path.inc b/llvm/lib/Support/Windows/Path.inc
index c4bd5e24723517..e70576540edaa1 100644
--- a/llvm/lib/Support/Windows/Path.inc
+++ b/llvm/lib/Support/Windows/Path.inc
@@ -1372,32 +1372,6 @@ std::error_code closeFile(file_t &F) {
   return std::error_code();
 }
 
-std::error_code remove_directories(const Twine &path, bool IgnoreErrors) {
-  // Convert to utf-16.
-  SmallVector<wchar_t, 128> Path16;
-  std::error_code EC = widenPath(path, Path16);
-  if (EC && !IgnoreErrors)
-    return EC;
-
-  // SHFileOperation() accepts a list of paths, and so must be double null-
-  // terminated to indicate the end of the list.  The buffer is already null
-  // terminated, but since that null character is not considered part of the
-  // vector's size, pushing another one will just consume that byte.  So we
-  // need to push 2 null terminators.
-  Path16.push_back(0);
-  Path16.push_back(0);
-
-  SHFILEOPSTRUCTW shfos = {};
-  shfos.wFunc = FO_DELETE;
-  shfos.pFrom = Path16.data();
-  shfos.fFlags = FOF_NO_UI;
-
-  int result = ::SHFileOperationW(&shfos);
-  if (result != 0 && !IgnoreErrors)
-    return mapWindowsError(result);
-  return std::error_code();
-}
-
 static void expandTildeExpr(SmallVectorImpl<char> &Path) {
   // Path does not begin with a tilde expression.
   if (Path.empty() || Path[0] != '~')

if (result != 0 && !IgnoreErrors)
return mapWindowsError(result);
return std::error_code();
const std::filesystem::path Path(path.str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, passing paths like this won't do the right thing.

Within LLVM, paths in the usual form with narrow chars is assumed to be UTF-8; for the existing Windows implementations here, it is converted to UTF-16 into wchar_t form with widenPath above.

However, in std::filesystem, a narrow char path is assumed to be in the system active code page (ACP). Unless the ACP is set to UTF-8, only plain ASCII path names would work correctly.

There is a std::filesystem::u8path() function which takes char input and interprets it as UTF-8, which we could use instead. That one is deprecated since C++20 though (because in C++20, there's a char8_t data type to natively signal UTF8 strings).

I guess it's simplest to just keep the existing widenPath call first and then pass the path from that to std::filesystem though, but I don't have a very strong opinion on it - u8path() might be ok.

(At least this is how MS STL and libc++ implements std::filesystem. I haven't observed libstdc++'s implementation much at all; I'm not sure how mature it is on Windows.)

Copy link
Contributor Author

@slydiman slydiman Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this patch to use widenPath() and tested it with non-ASCII symbols locally.

@RKSimon RKSimon requested a review from rnk December 9, 2024 09:39
@mstorsjo
Copy link
Member

I did some empirical testing with this patch in mingw environments; with libc++ it seems fine, all tests in check-llvm pass successfully like they did before.

However with libstdc++ (tested with the latest version in msys2, based on GCC 14), calling remove_all() simply hangs (a run on github actions timed out after 6 hours); we run into this in unittests\Support\SupportTests.exe e.g. in FileSystemTest.CreateDir.

I tried running it in a debugger, and it does seem to do ... something, with backtraces like this:

#0  0x00007ffa40b2ba94 in ntdll!RtlReAllocateHeap () from C:\Windows\SYSTEM32\ntdll.dll
#1  0x00007ffa40b4807c in ntdll!RtlGetFullPathName_UEx () from C:\Windows\SYSTEM32\ntdll.dll
#2  0x00007ffa3d0d61c7 in KERNELBASE!GetFullPathNameW () from C:\Windows\System32\KernelBase.dll
#3  0x00007ffa3db61e5d in ucrtbase!_wfullpath () from C:\Windows\System32\ucrtbase.dll
#4  0x00007ffa3dbe0527 in ucrtbase!_wsplitpath () from C:\Windows\System32\ucrtbase.dll
#5  0x00007ffa3dbe08a8 in ucrtbase!_wsplitpath () from C:\Windows\System32\ucrtbase.dll
#6  0x00007ffa3dbdf96e in ucrtbase!_wsplitpath () from C:\Windows\System32\ucrtbase.dll
#7  0x00007ffa3dbdf66d in ucrtbase!_wsplitpath () from C:\Windows\System32\ucrtbase.dll
#8  0x00007ff754fd6d36 in std::filesystem::symlink_status(std::filesystem::__cxx11::path const&, std::error_code&) ()
#9  0x00007ff754fdbf41 in std::filesystem::remove(std::filesystem::__cxx11::path const&, std::error_code&) ()
#10 0x00007ff754fde08c in std::filesystem::__cxx11::recursive_directory_iterator::__erase(std::error_code*) ()
#11 0x00007ff754fd5681 in std::filesystem::remove_all(std::filesystem::__cxx11::path const&, std::error_code&) ()
#12 0x00007ff754d887a8 in llvm::sys::fs::remove_directories(llvm::Twine const&, bool) ()
#13 0x00007ff754aba3d7 in (anonymous namespace)::FileSystemTest_CreateDir_Test::TestBody() ()
#14 0x00007ff754e65f98 in testing::Test::Run() ()
#15 0x00007ff754e6a455 in testing::TestInfo::Run() ()
#16 0x00007ff754f0785b in testing::TestSuite::Run() [clone .part.0] ()
#17 0x00007ff754e8d65a in testing::internal::UnitTestImpl::RunAllTests() ()
#18 0x00007ff754e6b26f in testing::UnitTest::Run() ()
#19 0x00007ff7550ea640 in main ()

@RKSimon
Copy link
Collaborator

RKSimon commented Dec 10, 2024

@mstorsjo Does the log show what the dir paths looked like in the handing test?

@mstorsjo
Copy link
Member

@mstorsjo Does the log show what the dir paths looked like in the handing test?

The log didn't show anything about it, but I tried in a local build and added a printout before calling it, and the path is C:\msys64\tmp\file-system-test-e3791e/DirNameWith19Charss - so nothing very problematic (no UNC paths or anything like that).

@slydiman
Copy link
Contributor Author

The location and reason for the hang are not clear from this stack trace. It is clear that we cannot rely on libstdc++.
I don't like the idea of ​​a complete custom implementation with a bunch of heuristics.
So probably #118677 is the safest way.

@RKSimon
Copy link
Collaborator

RKSimon commented Dec 10, 2024

@mstorsjo Does the log show what the dir paths looked like in the handing test?

The log didn't show anything about it, but I tried in a local build and added a printout before calling it, and the path is C:\msys64\tmp\file-system-test-e3791e/DirNameWith19Charss - so nothing very problematic (no UNC paths or anything like that).

Forward slash - if you add an explicit path.make_preferred() - does it still hang?

@mstorsjo
Copy link
Member

@mstorsjo Does the log show what the dir paths looked like in the handing test?

The log didn't show anything about it, but I tried in a local build and added a printout before calling it, and the path is C:\msys64\tmp\file-system-test-e3791e/DirNameWith19Charss - so nothing very problematic (no UNC paths or anything like that).

Forward slash - if you add an explicit path.make_preferred() - does it still hang?

That doesn't make any difference, it still hangs.

@slydiman
Copy link
Contributor Author

std::filesystem::remove_all(std::filesystem::__cxx11::path const&, std::error_code&)

It seems __cxx11 means c++11, not c++17.

@mstorsjo
Copy link
Member

std::filesystem::remove_all(std::filesystem::__cxx11::path const&, std::error_code&)

It seems __cxx11 means c++11, not c++17.

I think that's unrelated - I would guess that it's some libstdc++ internal ABI grouping, where some classes had to break ABI betwen C++03 and C++11, and this belongs to the post-C++11 category.

@mstorsjo
Copy link
Member

I think the reason it hangs is because this test intentionally tries to create a deep nested structure of paths, like /tmp/file-system-test-ed4fa1/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss/DirNameWith19Charss, and this fails in the libstdc++ implementation. See https://github.com/llvm/llvm-project/blob/llvmorg-19.1.5/llvm/unittests/Support/Path.cpp#L1050-L1063 for reference for this test.

So that makes the issue a bit easier to understand; libstdc++'s implementation of std::filesystem is not safe for long paths on Windows.

@slydiman
Copy link
Contributor Author

Should we close this PR in favor of #118677 ?

@mstorsjo
Copy link
Member

Should we close this PR in favor of #118677 ?

Yes, I think this one can be closed.

@slydiman slydiman closed this Dec 13, 2024
@slydiman slydiman deleted the remove_dir-using-remove_all branch April 18, 2025 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants