Skip to content

TextMatcher should not match @null@ inside pattern #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

adamquaile
Copy link

The TextMatcher currently reports that it matches against all strings. During matching against strings with @null@ in them an UnknownTypeException will be thrown - after all a pattern like this should be @null@ makes no sense in a text context.

This prevents other matchers from being able to match, so I've changed it to not match in this case.

Reproduce code:

<?php

use Coduo\PHPMatcher\Factory\SimpleFactory;

require_once __DIR__ . '/vendor/autoload.php';

$expected = <<<JSON
{
    "a": @null@,
    "b": 4
}
JSON;


$actual = <<<JSON
{
    "a": null,
    "b": 5
}
JSON;


$matcher = (new SimpleFactory())->createMatcher();


echo $matcher->match($actual, $expected)
    ? 'Match found' :
    'Match not found: ' . $matcher->getError()
;

@norberttech
Copy link
Member

Indeed matching @null@ in TextMatcher have no sense.
Maybe we could just simply modify TextMatcher::PATTERN_REGEXP

const PATTERN_REGEXP = "/@(?!null)[a-zA-Z\\.]+@(\\.[a-zA-Z0-9_]+\\([a-zA-Z0-9{},:@\\.\"'\\(\\)]*\\))*/";

(typped from head) should fix issue you found. Could you check that?

@adamquaile
Copy link
Author

That stops the exception, but in a different way. See the output from your change:

Match not found: "{
    "a": null,
    "b": 5
}" does not match "{
    "a": @null@,
    "b": 4
}" pattern

Whereas when the TextMatcher abstains from trying to match, it looks like this:

Match not found: "5" does not match "4".

By matching with the most appropriate matcher an not attempting to match when it is guaranteed to fail, you get a more meaningful "non-match" message, and less work done to get there. I also think it's a bit more readable, but that's subjective.

Your thoughts?

@norberttech
Copy link
Member

I totally agree. We need make error messages as clean as possible I'm just not sure if I like the way how you exclude text patterns from being matched.
I was thinking about this issue yesterday and I might have an idea for another solution.
I guess we could expand canMatch in TextMatcher to check if passed text is not one of following formats:

  • json - we can extract json validation rules from JsonMatcher
  • xml - we can extract xml validation rules from XmlMatcher
  • yaml - I would like to add yaml files support in future but lets ignore it now

This should solve issue with not descriptive error messages.
I think we should also add "null" to RegexpConverter more or less like in #46 in order to prevent throwing UnknownTypeException or handle this exception and convert it into a error message that used patter contains invalid type matchers.

@adamquaile what do You think?
P.S. sorry for my long response time, I have a flu : <

@adamquaile
Copy link
Author

@norzechowicz What is it you don't like about the way it's excluded? Do you think it's too specific to this bug / use-case?

In my mind, the point of the canMatch($pattern) does seem like a sensible place to put the logic. I'm undecided whether we should exclude it if it happens to be json/xml/etc.. These are still text, and maybe we would want to use the text matcher on them - trying to think of an example of this still.

If we decide to take it out of canMatch, then I like the idea of handling the UnknownTypeException and setting $this->error, but (correct me if I'm wrong) this only makes sense when there's one matcher, i.e. when we exclude all other types.

@norberttech
Copy link
Member

@adamquaile yea I just dont like the fact that text can't be matched because there is a @null@ anywhere in this text.
About excluding xml/json/etc I cant find any good reason why to not exclude those formats from matching in TextMatcher. As long as passed json/xml have valid format we have XmlMatcher/JsonMatcher matchers that should be used in this case.
And yes, handling UnknownTypeException have sense only when we exclude other types.

@adamquaile
Copy link
Author

I agree with you about the excluding json/xml them; if there are any use-cases for this, I'm sure they're edge-cases. Although I would give thought to other formats you might support in the future. It's not really likely to create text which is accidentally JSON or XML, but maybe other formats for example YML maybe? I think --- and a: b are valid YML. This is text! is valid markdown. Even if you don't plan to support these, you might want to make it extensible and allow devs to make their own matchers. The logic should work regardless.

Anyway, that's probably a small case and getting a bit off topic. Just something to think about.

yea I just dont like the fact that text can't be matched because there is a @null@ anywhere in this text.

I see your point. How do you think matching this is a string with @null@ inside with this is a string with inside should work? Should it return false, or throw an exception to the user?

@norberttech
Copy link
Member

Pattern: "this is a string with @null@ inside with @number@"
Value: "this is a string with @null@ inside with 252"
Above example should pass.
Pattern: "this is a string with @null@ inside with @number@"
Value: "this is a string with inside with 252"
Above example should fail as there is no null in simple text.

About YAML support, IMO everything that can be parsed by YAML parser is a valid YAML file so it should be matched by YAML matcher.
For example if you will have:
Pattern: "a: b"
Value: "a: @integer@"
Then it would be even better to parse it with YamlMatcher instead of TextMatcher cuz error will message be like "b is not a valid integer" instead of "a: b does not match a: @integer@".
And if patter will be only Yaml similar (with wrong indention in one line for example) then it will still be matched by TextMatcher as validation Yaml::isValid($value) will fail for this example. What do You think?

@adamquaile
Copy link
Author

Pattern: "this is a string with @null@ inside with @Number@"
Value: "this is a string with inside with 252"
Above example should fail as there is no null in simple text.

Gotcha. Happy with this.

About YAML support, IMO everything that can be parsed by YAML parser is a valid YAML file so it should be matched by YAML matcher.

Agreed, although I wouldn't say 100% that it should be disallowed matching by another format.

I was thinking about this issue yesterday and I might have an idea for another solution.
I guess we could expand canMatch in TextMatcher to check if passed text is not one of following formats

All things considered, I think you're right, and we should exclude the text matcher if it's JSON or XML, but I'm not sure if that responsibility should lie with the text matcher or somewhere higher up. If and when you add a YamlMatcher, the TextMatcher shouldn't have to change. Any ideas?

@norberttech
Copy link
Member

Because there will be at least 2 poinst (JsonMatcher and TextMatcher) where we are going to check if string is a valid json/xml/yaml I think it would be nice to extract this logic somewhere else.
Maybe simple Json and Xml classes with static method isValid($value) method?
It will give us possibility to use it like in following example in canMatch methods:

public function canMatch($value)
{
    if (Json::isValid($value) || Xml::isValid($value) { 
         return false;
    }
}

But above code breaks open closed principle so I guess we could extract logic from canMatch method into something like SimpleTextValidator class, what do You think?

@adamquaile
Copy link
Author

I'm not sure I follow what the SimpleTextValidator would be doing in this case.

Yes, it's difficult to get a nice solution without over-engineering it. Better than the static methods is injecting two validators, but I think the main part I don't like is putting any knowledge or dependency on other formats into the TextMatcher.

Perhaps there is an interface to extract out, and we abstain if either of the validators implementing this interface matches. I just can't think what the name might be in that case. This would then take the logic of text being excluded if json matches back out into the builder class based on how the objects are built together.

@norberttech
Copy link
Member

If there will be more than one implementation of that extracted interface I'm ok with that. Anyway it would be greate if you could provide some code that we could discuss later.
Thanks for your engagement i really appreciate it :)

@adamquaile
Copy link
Author

I'll certainly play around and see if I can come up with something nice. I'm just looking quickly now, and we can't check the value is valid JSON without adding the $value parameter to canMatch globally, as we only have the pattern. I will look more into it.

I'm sure there's other ways we can look at it, too. For example, we could have TextMatcher delegate to one of JsonMatcher or XmlMatcher if they claim to be able to match. Or we could selectively choose which matchers are used based on the pattern/value. Again, just thinking out-loud; I'll get some code when I've had a chance to think about it properly.

@norberttech
Copy link
Member

I'm not sure if we need $value in canMatch. canMatch was created to check if specific matcher can match specific pattern against any value.

@adamquaile
Copy link
Author

No, I agree - we don't. But we can't test if the pattern alone is valid JSON for example, because some patterns aren't valid. We could only check if they're JSON-like.

Consider pattern from original description above:

$expected = <<<JSON
{
    "a": @null@,
    "b": 4
}
JSON;

@norberttech
Copy link
Member

Right, thats why in JsonMatcher there is a method that transform type patterns into strings so inside of canMatch there is a transformation and above example is turned into:

$expected = <<<JSON
{
    "a": "@null@",
    "b": 4
}
JSON;

@adamquaile
Copy link
Author

Right.. I see. And would that still match correctly? null and "null" in JSON are obviously 2 very different things.

@norberttech
Copy link
Member

It should only transform type patterns (strings between @ character) so it will ignore null but it will transform @null@ into "@null@"

@norberttech
Copy link
Member

@adamquaile thanks for reporting this issue 🍻 . #49 adds everything that we discussed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants