-
Notifications
You must be signed in to change notification settings - Fork 468
Add RegexSet functionality to C API #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These functions implement a C interface to the RegexSet api. Some notes: * These do not include start offsets as the standard regex functions do. The reason being is down to how these are implemented in the core regex crate. The RegexSet api does not expose a public is_match_at whilst the Regex api does. * This only tests a complete compile/match mainly for sanity. One or two more tests targetting the specific areas would be preferred. * Set matches take a mutuable array to fill with results. This is more C-like and allows the caller to manage the memory on the stack if they want.
@@ -165,7 +172,7 @@ rure *rure_compile(const uint8_t *pattern, size_t length, | |||
/* | |||
* rure_free frees the given compiled regular expression. | |||
* | |||
* This must be called at most once. | |||
* This must be called at most once for any rure. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly more clear I think. rure_set_free
should match this.
|
||
/* | ||
* rure_set_matches compares each regex in the set against the haystack and | ||
* returns an array of bools which correspond to if a match was found for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "to if a" -> "to a"
* caring which, use rure_set_is_match. | ||
*/ | ||
bool rure_set_matches(rure_set *re, const uint8_t *haystack, size_t length, | ||
const bool *matches); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be bool *matches
instead of const bool *matches
? (Since the data pointed to by matches
will be modified.)
unsafe { Box::from_raw(re as *mut RegexSet); } | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want to expose RegexSet::len
here so that callers can use it to allocate matches
.
Thanks for doing this! At a first glance, this looks pretty good with a few nits. So one thing that we should consider before merging this is how to incorporate The core problem you'll run into is that neither of these things is actually exposed in the Maybe we can come up with a straw man proposal for the C API and convince ourselves that it can be added later, so that we can merge this sooner.
The inconsistency is in the The inconsistency exists for a reason. Getting the match offsets is hard.
This is probably OK. Mostly, I think the C tests should just make sure that FFI is working. Testing the semantics of RegexSet is better suited for the
Perfect. |
I had a bit more of a look into
New function names are just placeholders. And actually, just that seems like it would be sufficient for the lack of options to me. This can easily be exposed to the C API. This would be a backwards compatible API change, so would adjust the versioning however you are doing it. A corresponding
If that is the case and I can just omit the start argument from the match functions.
I'll just add a regex failure test for sanity. Thanks for the quick response, let me know if the |
I noted previously that we would need some extra functions in the regex crate. This seems like an incompatible change however after reading the regex 1.0 RFC (at least in the short-term as you previously said). It looks like enough of the interface is public that we can just construct the |
Sorry I'm dragging, but the past few days have been action packed (including a trip to the animal ER). Using These options do need to get exposed in the To be clear, I'm happy merging this PR without these options in the C API, as long as there's a reasonable path to adding them in the future. |
Not a problem. I'd like to get something thought out in that would minimize any breaking future api changes. I'd also like to avoid modifying the regex crate if possible right now. With these goals in mind, here is what I suggest.
This should be flexible enough for future improvements without any breaking api changes. Also, as an aside regarding the I'll be offline till Monday but that day (which is thankfully a public holiday for me) I'll implement these and see how it all fits. |
Sounds good. Two things:
In fact, I think both of these things means that you end up working around the use of |
Just a note that I've finished adding these features. |
@tiehuis Thank you! |
These functions implement a C interface to the RegexSet api.
Some notes:
do. The reason being is down to how these are implemented in the core
regex crate. The RegexSet api does not expose a public is_match_at
whilst the Regex api does. I would actually like to change this if possible
as the inconsistency this makes between
rure
andrure_set
types isnot great.
two more tests targeting the specific areas would be preferred.
C-like and allows the caller to manage the memory on the stack if
they want.
Wanted to make a request before targeting these changes to clarify.