-
-
Notifications
You must be signed in to change notification settings - Fork 735
New query condition to match all strings that starts with some other given strings #673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New query condition to match all strings that starts with some other given strings #673
Conversation
Codecov Report
@@ Coverage Diff @@
## master #673 +/- ##
===========================================
+ Coverage 52.62% 52.9% +0.28%
+ Complexity 1680 1679 -1
===========================================
Files 131 131
Lines 10120 10159 +39
Branches 1409 1409
===========================================
+ Hits 5326 5375 +49
+ Misses 4353 4341 -12
- Partials 441 443 +2
Continue to review full report at Codecov.
|
@@ -1699,6 +1701,31 @@ public void getInBackground(final String objectId, final GetCallback<T> callback | |||
* {@code ParseQuery}. | |||
* <p/> | |||
* This only works on keys whose values are {@link ParseObject}s or lists of {@link ParseObject}s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this javadoc is wrong? Add a constraint ... to match another ParseQuery
. Same for whereContainsAll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so also when I copied the comment from containsAll
, but was not sure to change it.
Thanks to comment this 👍 Someone could write a good documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you can remove/reword both to their real meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments updated.
* The values that will match. | ||
* @return this, so you can chain this call. | ||
*/ | ||
public ParseQuery<T> whereContainsAllStartsWith(String key, Collection<String> values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions for a different name? This seems obscure at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the beginning I was thinking about containsAllStartingWith
. Maybe this could be a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to understand what this does. Personally I think the cleanest and simplest wording is to keep whereStartsWith
:
whereStartsWith(key, String prefix):
check that value inkey
(must beString
) starts with prefixwhereStartsWith(key, Collection<String> prefixes)
: check that values inkey
(must beArray
) start with prefixes in the given collection (implicit: all of the constraints must be satisfied)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a good name, but I think it's good to have the All
word to know that all strings in collection must match. But I like your option too.
Let's see what other prefer.
@@ -120,6 +122,12 @@ public Object encode(Object object) { | |||
return ((ParseQuery.RelationConstraint) object).encode(this); | |||
} | |||
|
|||
if (object instanceof Pattern) { | |||
JSONObject json = new JSONObject(); | |||
json.put("$regex", object.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to avoid Pattern
s and move this logic to ParseQuery
just like all other $ constraints?
I mean using ParseQuery.addCondition(key, “$regex”, string)
. That way we keep consistencies and leave encoder untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that could be better to avoid update the ParseEncoder. I didn't researched a lot.
Now, I think could be better to update the constraintsAllStartsWith
to:
public ParseQuery<T> whereContainsAllStartsWith(String key, Collection<String> values) {
checkIfRunning();
ArrayList<KeyConstraints> startsWithConstraints = new ArrayList<>();
for (String value : values) {
KeyConstraints keyConstraints = new KeyConstraints();
keyConstraints.put("$regex", buildStartsWithRegex(value));
startsWithConstraints.add(keyConstraints);
}
return whereContainsAll(key, startsWithConstraints);
}
This might auto compile as expected. If people prefer to revert the ParseEncoder changes, I'll update the PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. Also since whereContainsAll is documented to work only with ParseObject
s, you could call directly builder.addCondition(key, "$all", startsWithConstraints)
and return this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the function to use KeyConstraints.
I keep using whereContsinaAll
because was not restricted to ParseObject
s. The docs were wrong.
Left some comments but other than this, nice work IMO. Let’s see how the other PRs are received as well, I guess |
I've updated this section with correct links. Other SDK contributions and necessary changesParse-SDK-JS: parse-community/Parse-SDK-JS#437 |
Reverts ParseEncoder changes
@natario1 I've added containsAllStartingWith compatibility for offline search in LDS, but obviously, it's terribly slow. I think should be here only for compatibility. The code could be optimized a little bit to avoid some regexp calls, but I think it will continue being so slow. I think that for offline search, the best approach is to use Full Text matching (whereEqualsTo) or at most, partial text matching with (whereContains). Even using this approach, LDS is really slow for searching in thousands of records as parse loads all the records into memory (#279) and performs the matches from Java code. I think that Parse it not really designed for offline search of efficient partial text matching 😞 |
It’s more an issue with mongodb than Parse. Regular expressions are very costly |
Well, maybe this could be improved by saving ParseObjects in LDS in a way that data could be queried without having to load all data into memory and perform the matches in plain Java. Is SQLite LIKE expression more efficient than performing the matches with plain Java? I don't know. Or maybe use another offline data management approach like Realm that it's really efficient in that. You know better Parse SDK than me, and how it's built, and I'm sure that will be really difficult to implement a more efficient LDS data management for a huge amount of data. But that will make working with thousands of ParseObjects in LDS great. Now is a little bit a pain, but it works 👍 Thanks for all your help as always!! 😃 |
Sorry, I thought we were talking about regular expressions on the server, (too many threads). So to go back to the client, the LDS is a blob store (on iOS) and all objects are just stored by ID, so when running a query, everything from the LDS is loaded in memory which is ineffient AF. |
Yeah, indexing will be great. Yes, I've been looking deeply at Parse LDS SQLite db file and Parse offline store to know how data is saved and queried in LDS. I've seen that is saved as plain JSON, so the only way to perform queries right now is by loading into memory and perform the matching as it's done. It's a pity that this makes a huge volume of data nearly unusable with some kind of queries in LDS. Maybe a good approach is to maintain another more efficient offline db that only saves the texts that could be searched with the associated ParseObject ids and its Class names so the search it's done outside of LDS and then we use the matching ParseObject ids to query into LDS. |
Yeah, or any list of keys that you wanna index for future searches. They could be pre-emptively set like: LDS.prepareStoreForQuery(storeName, query) In this method we’d create a new table (if needed) with json objects as well as the columns extracted from the JSON. This way, you could use your query on that store and benefit from the indexing |
Well, I think this approach will still be slow if there are thousands of records in One nice approach will be to use an offline db like realm as LDS, but as you said, you need to know the schema, but maybe the user can create the schema and then integrate it automatically. That will be great. Now I've tested the approach to save objects with |
Rhat’s Pretty much what I suggested, using a separate properly indexed storage for enduring fast queries and then resolving the objects either because they are stored in the optimized storage, or on the main pin. Wether you use SQLite or realm is an implementation detail at that point, I’m sure you could do the same with both |
And the prepareQuery would basically lookup the schema required for running the query the most efficiently because it knows the schema right? |
I don't understand your last question @flovilmart sorry 😅 Do you mean that |
Thst’s The idea! Then the storage need probably an abstraction later do we can plug realm or SQLite :) |
Codecov Report
@@ Coverage Diff @@
## master #673 +/- ##
============================================
+ Coverage 54.09% 54.24% +0.14%
- Complexity 1682 1697 +15
============================================
Files 123 123
Lines 9736 9783 +47
Branches 1361 1371 +10
============================================
+ Hits 5267 5307 +40
- Misses 4040 4042 +2
- Partials 429 434 +5
Continue to review full report at Codecov.
|
229f5d1
to
7c8a423
Compare
…tains-all-starting-with # Conflicts: # Parse/src/main/java/com/parse/OfflineQueryLogic.java
…tains-all-starting-with
…tains-all-starting-with
@rogerhu This was added in the latest version of parse-server. How does it look? |
Thanks for PR'ng. Merging.. |
rebased #839 |
Great! Thanks for merging @rogerhu and all the others that helped address this PR 😀 |
Hi to all! 🎉
This is my first contribution to Parse, and I hope it could be useful for the community.
The problem
I need a way to perform efficient searches in my database. I have researched, but I haven't found any way to perform efficient partial searches. The only way to perform efficient searches with Parse I think is by doing full text match or start with string matching.
I've read an article about performing keyword partial case insensitive. This solution is based on searching strings starting with another given strings which is really efficient with MongoDB. http://jam.sg/blog/efficient-partial-keyword-searches/
The solution
My solution was to split the string field that I want to search, in different keywords. Then, I take each of this keyword and I create different keyword versions by striping the firsts characters of each keyword with a minimum of 3 characters.
Suppose that we have this collection:
Then I create the keywords without diacritics (https://jsperf.com/diacritics/43) and in lowercase. To create the keywords, I split the original name field by white space, comas, dots, dashes, etc:
Finally I create all the keywords representations for partial search by matchings strings that starts with given strings:
Now, I create an index for the "nameSearch" field, and I can do partial matches really efficient with this query:
This increases the database size a little bit, but I think it's really worth it because it allows really good partial text searches. This is a good approach for searching short names, but not for searching in large amount of words. That will be better with full text matching and containsAll condition I think.
But the problem comes that parse does not allow calling ParseQuery.containsAll with regular expressions. So here it comes my feature.
The feature
I've created a new query condition to match all strings that starts with the given strings. I think this is really useful to make the partial keyword matching explained above.
This new condition is called containsAllStartsWith. Basically is a combination of containsAll and startsWith conditions. Given an array of strings, they are converted to regular expressions to match at the beginning of strings like in startsWith and delivered to containsAll condition.
The problem comes that the SDK sends the query as a JSON like this:
{ nameSearch: { $all: [ { $regex: "^\Qrem\E"}, {$regex: "^\Qpsum\E" ] }
This command is not supported by MongoDB, because $ is not allowed inside the $all condition. So I've updated the Parse Server to convert from regular expression JSON representation to a regular expression itself.
{ $regex: "^\Qrem\E"}
to/^\Qrem\E/
.So Parse Server converts from this:
{ nameSearch: { $all: [ { $regex: "^\Qrem\E"}, {$regex: "^\Qpsum\E" ] }
to this:
{ nameSearch: { $all: [ /^\Qrem\E/, /^\Qpsum\E/ ] }
which runs perfectly on MongoDB.
Application use
To use this kind of search in an application it's really easy.
Suppose the user enters a text in a search field
"lÒr Psû"
The application should take this string and follow the same process to remove the diacritics and split as when creating the keywords. That will result in this:
["lor", "psu"]
Finally, that array should be delivered to containsAllStartWith condition like this:
This call will end with a query in MongoDB like this:
{ nameSearch: { $all: [ /^\Qlor\E/, /^\Qpsu\E/ ] }
Thanks!
I'm happy to try to contribute to this library. I hope that this can be useful for some other people.
Please, let me know if you want to know anything else 😃
Other SDK contributions and necessary changes
Parse-SDK-JS: parse-community/Parse-SDK-JS#437
parse server: parse-community/parse-server#3864