Skip to content

[libc++] Fix regex_search to match $ alone with match_default flag #77256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2024

Conversation

SanjayMarreddi
Copy link
Contributor

Using regex_search with the regex_constant match_default and a simple regex pattern $ is expected to match general strings such as "a", "ab", "abc"... at [last, last) positions. But, the current implementation fails to do so.

Fixes #75042

@SanjayMarreddi SanjayMarreddi requested a review from a team as a code owner January 7, 2024 21:15
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jan 7, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 7, 2024

@llvm/pr-subscribers-libcxx

Author: Sanjay Marreddi (SanjayMarreddi)

Changes

Using regex_search with the regex_constant match_default and a simple regex pattern $ is expected to match general strings such as "a", "ab", "abc"... at [last, last) positions. But, the current implementation fails to do so.

Fixes #75042


Full diff: https://github.com/llvm/llvm-project/pull/77256.diff

2 Files Affected:

  • (modified) libcxx/include/regex (+3)
  • (modified) libcxx/test/std/re/re.const/re.matchflag/match_not_eol.pass.cpp (+7)
diff --git a/libcxx/include/regex b/libcxx/include/regex
index b575a267583b5f..0761d9de54a95c 100644
--- a/libcxx/include/regex
+++ b/libcxx/include/regex
@@ -1889,6 +1889,9 @@ void __r_anchor_multiline<_CharT>::__exec(__state& __s) const {
   if (__s.__current_ == __s.__last_ && !(__s.__flags_ & regex_constants::match_not_eol)) {
     __s.__do_   = __state::__accept_but_not_consume;
     __s.__node_ = this->first();
+  } else if (__s.__current_ == __s.__first_ && !(__s.__flags_ & regex_constants::match_not_eol)) {
+    __s.__do_   = __state::__accept_but_not_consume;
+    __s.__node_ = this->first();
   } else if (__multiline_ && std::__is_eol(*__s.__current_)) {
     __s.__do_   = __state::__accept_but_not_consume;
     __s.__node_ = this->first();
diff --git a/libcxx/test/std/re/re.const/re.matchflag/match_not_eol.pass.cpp b/libcxx/test/std/re/re.const/re.matchflag/match_not_eol.pass.cpp
index edeea517d2537a..76f8986b2d539f 100644
--- a/libcxx/test/std/re/re.const/re.matchflag/match_not_eol.pass.cpp
+++ b/libcxx/test/std/re/re.const/re.matchflag/match_not_eol.pass.cpp
@@ -47,5 +47,12 @@ int main(int, char**)
     assert( std::regex_search(target, re, std::regex_constants::match_not_eol));
     }
 
+    {
+      std::string target = "foo";
+      std::regex re("$");
+      assert(std::regex_search(target, re));
+      assert(!std::regex_search(target, re, std::regex_constants::match_not_eol));
+    }
+
   return 0;
 }

Copy link
Member

@mordante mordante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks good, I'd like one more test before approval.

@mordante mordante self-assigned this Jan 8, 2024
Copy link
Member

@mordante mordante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM! Do you need assistance to land this patch?

@SanjayMarreddi
Copy link
Contributor Author

I'm sorry, I don't think I fully understood what you asked. If you have asked, if I need assistance to get this merged, yes!!

@mordante mordante merged commit 0804ef2 into llvm:main Jan 9, 2024
@mordante
Copy link
Member

mordante commented Jan 9, 2024

I'm sorry, I don't think I fully understood what you asked. If you have asked, if I need assistance to get this merged, yes!!

Yes it was about merging. Before we used Phabricator and used the word landing.

@SanjayMarreddi
Copy link
Contributor Author

I'm sorry, I don't think I fully understood what you asked. If you have asked, if I need assistance to get this merged, yes!!

Yes it was about merging. Before we used Phabricator and used the word landing.

Now, I get it :) Thanks!!

@alexfh
Copy link
Contributor

alexfh commented Jan 16, 2024

It looks like the patch changes the behavior of search for patterns anchored both to the start and to the end of the input, e.g. after this change std::regex_search finds a pattern like ^b*$ in strings that don't match it (like "a"). Is this intended?

The behavior is now different to that of libstdc++: https://gcc.godbolt.org/z/hcd4acrb6

#include <regex>
#include <string>

int main() {
  std::string t = "a";
  std::regex re {"^b*$"};
  std::smatch matches;
  return std::regex_search(t, matches, re);
}

returns 1 on libc++ and 0 on libstdc++ (and libc++ before this change).

@SanjayMarreddi
Copy link
Contributor Author

@alexfh Thanks much for letting us know about this. I will look into it.

@alexfh
Copy link
Contributor

alexfh commented Jan 16, 2024

As usual: if the behavior is not intended (which I suspect is true) and there's no obvious and reliable fix, I'd suggest to revert first.

@ldionne
Copy link
Member

ldionne commented Jan 16, 2024

This seems broken to me -- the pattern ^b*$ shouldn't match a. @SanjayMarreddi if you're unsure how to fix this forward, let's revert the commit and add this regression test case, and then work on re-landing the patch without breaking that case. Let me know if you need help with the revert.

@SanjayMarreddi
Copy link
Contributor Author

I have created the revert. Will add this test case and work on relanding the patch.

alexfh pushed a commit that referenced this pull request Jan 17, 2024
…fault` flag" (#78349)

The behavior of `std::regex_search` for patterns anchored both to the
start and to the end of the input went wrong after merging #77256 .
Patterns like `"^b*$"` started matching the strings such as `"a"`, which
is not expected.

Reverts the PR: #77256
ampandey-1995 pushed a commit to ampandey-1995/llvm-project that referenced this pull request Jan 19, 2024
…fault` flag" (llvm#78349)

The behavior of `std::regex_search` for patterns anchored both to the
start and to the end of the input went wrong after merging llvm#77256 .
Patterns like `"^b*$"` started matching the strings such as `"a"`, which
is not expected.

Reverts the PR: llvm#77256
@SanjayMarreddi
Copy link
Contributor Author

I have fixed the above test case along with the $ alone test case and created a PR: #78845

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…lag (llvm#77256)

Using `regex_search` with the regex_constant `match_default` and a
simple regex pattern `$` is expected to match general strings such as
_"a", "ab", "abc"..._ at `[last, last)` positions. But, the current
implementation fails to do so.

Fixes llvm#75042
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[libc++] std::regex $ alone with match_default doesn't match [last, last)
5 participants