Skip to content

Commit f9877d1

Browse files
spetruniaGunnar Kudrjavets
authored andcommitted
mysql#42: Range in form tbl.key > const doesn't work in reverse column family
Summary: Fix issue mysql#42. Also issue mysql#43: Ranges on t.key > const are not handled efficiently: Make ha_rocksdb::index_read_map() do the right thing when invoked with find_flag=HA_READ_KEY_OR_NEXT, both for forward and reverse column families. Detailed explanation of actions is provided in rocksdb-range-access.txt Test Plan: mtr t/rocksdb_range.test, used gcov to check the new code is covered Reviewers: maykov, hermanlee4, jonahcohen, jtolmer, yoshinorim Reviewed By: yoshinorim Differential Revision: https://reviews.facebook.net/D35103
1 parent 2933a57 commit f9877d1

File tree

2 files changed

+116
-60
lines changed

2 files changed

+116
-60
lines changed

storage/rocksdb/ha_rocksdb.cc

Lines changed: 41 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -2696,7 +2696,8 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
26962696
keypart_map);
26972697

26982698
if (find_flag == HA_READ_PREFIX_LAST_OR_PREV ||
2699-
find_flag == HA_READ_PREFIX_LAST)
2699+
find_flag == HA_READ_PREFIX_LAST ||
2700+
find_flag == HA_READ_AFTER_KEY)
27002701
{
27012702
/* See below */
27022703
kd->successor(sec_key_packed_tuple, packed_size);
@@ -2715,7 +2716,7 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
27152716
greater than the lookup tuple.
27162717
*/
27172718
setup_index_scan(kd, &slice, use_all_keys, is_ascending(kd, find_flag));
2718-
bool move_forward= true;
2719+
//bool move_forward= true;
27192720

27202721

27212722
switch (find_flag) {
@@ -2753,11 +2754,45 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
27532754
}
27542755
break;
27552756
}
2757+
case HA_READ_BEFORE_KEY:
2758+
{
2759+
//move_forward= false;
2760+
/* We want to read the record that's right *before* the given key. */
2761+
if (!scan_it->Valid())
2762+
{
2763+
/*
2764+
All the values in the database are smaller than our key. Two cases
2765+
- our index is the last in db. Its last value is a match
2766+
- our index has no records (in that case we will get a record from
2767+
our index and detect it below)
2768+
*/
2769+
scan_it->SeekToLast();
2770+
}
2771+
else
2772+
{
2773+
/*
2774+
RocksDB iterator is positioned at "the first key in the source that
2775+
at or past target".
2776+
We need to step one key back, so that we're at the last key that is
2777+
before the target.
2778+
If the passed key is greater than the max. value that is found in the
2779+
table, then iterator is pointing at the *first* record in subsequent
2780+
table/index.
2781+
*/
2782+
scan_it->Prev();
2783+
}
2784+
/* fall through */
2785+
}
2786+
case HA_READ_AFTER_KEY:
27562787
case HA_READ_KEY_OR_NEXT:
27572788
{
27582789
/*
27592790
We are looking for the first record such that
2760-
index_tuple >= lookup_tuple.
2791+
2792+
index_tuple $GT lookup_tuple
2793+
2794+
with HA_READ_AFTER_KEY, $GT = '>',
2795+
with HA_READ_KEY_OR_NEXT, $GT = '>='
27612796
*/
27622797
if (key_descr[active_index]->is_reverse_cf)
27632798
{
@@ -2768,8 +2803,9 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
27682803
else
27692804
{
27702805
/*
2771-
When using a full extended key tuple: we should not step back if
2772-
we've found an exact match.
2806+
We should step back
2807+
- when not using full extended key
2808+
- when using full extended key and when we've got an exact match
27732809
*/
27742810
rkey= scan_it->key();
27752811
if (!using_full_key || !kd->value_matches_prefix(rkey, slice))
@@ -2794,61 +2830,6 @@ int ha_rocksdb::index_read_map(uchar *buf, const uchar *key,
27942830
}
27952831
break;
27962832
}
2797-
case HA_READ_BEFORE_KEY:
2798-
{
2799-
move_forward= false;
2800-
/* We want to read the record that's right *before* the given key. */
2801-
if (!scan_it->Valid())
2802-
{
2803-
/*
2804-
All the values in the database are smaller than our key. Two cases
2805-
- our index is the last in db. Its last value is a match
2806-
- our index has no records (in that case we will get a record from
2807-
our index and detect it below)
2808-
*/
2809-
scan_it->SeekToLast();
2810-
}
2811-
else
2812-
{
2813-
/*
2814-
RocksDB iterator is positioned at "the first key in the source that
2815-
at or past target".
2816-
We need to step one key back, so that we're at the last key that is
2817-
before the target.
2818-
If the passed key is greater than the max. value that is found in the
2819-
table, then iterator is pointing at the *first* record in subsequent
2820-
table/index.
2821-
*/
2822-
scan_it->Prev();
2823-
}
2824-
/* fall through */
2825-
}
2826-
case HA_READ_AFTER_KEY:
2827-
{
2828-
bool in_key;
2829-
bool have_row;
2830-
/*
2831-
Walk forward until we've found a record that is not equal to the lookup
2832-
tuple, but still belongs to this index.
2833-
*/
2834-
while ((have_row= scan_it->Valid()))
2835-
{
2836-
rkey= scan_it->key();
2837-
if (!(in_key= kd->covers_key(rkey.data(), rkey.size())) ||
2838-
kd->cmp_full_keys(rkey.data(), rkey.size(),
2839-
slice.data(), slice.size(),
2840-
n_used_parts))
2841-
break;
2842-
2843-
if (move_forward)
2844-
scan_it->Next();
2845-
else
2846-
scan_it->Prev();
2847-
}
2848-
if (!have_row || !in_key)
2849-
rc= HA_ERR_END_OF_FILE;
2850-
break;
2851-
}
28522833
case HA_READ_KEY_OR_PREV:
28532834
{
28542835
if (!scan_it->Valid())

storage/rocksdb/rocksdb-range-access.txt

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,3 +139,78 @@ RocksDB calls:
139139

140140
if (it->Valid() && kd->covers_key(..))
141141
return record.
142+
143+
== HA_READ_AFTER_KEY, forward CF ==
144+
145+
This is finding min(key) such that key > lookup_key.
146+
147+
Suppose lookup_key = kv-bbb
148+
149+
( kv )-aaa-pk
150+
# ( kv )-bbb
151+
( kv )-bbb-pk1 <--- Seek("kv-bbb") will put us here. We need to
152+
( kv )-bbb-pk2 get to the value that is next after 'bbb'.
153+
( kv )-bbb-pk3
154+
( kv )-bbb-pk4
155+
( kv )-bbb-pk5
156+
( kv )-ccc-pkN <-- That is, we need to be here.
157+
158+
However, we don't know that the next value is kv-ccc. Instead, we seek to the
159+
first value that strictly greater than 'kv-bbb'. It is Successor(kv-bbb).
160+
161+
It doesn't matter if we're using a full extended key or not.
162+
163+
RocksDB calls:
164+
165+
Seek(Successor(kv-bbb));
166+
if (it->Valid() && kd->covers_key(it.key()))
167+
return record;
168+
169+
Note that the code is the same as with HA_READ_KEY_OR_NEXT, except that
170+
we seek to Successor($lookup_key) instead of $lookup_key itself.
171+
172+
== HA_READ_AFTER_KEY, backward CF ==
173+
174+
Suppose, the lookup key is 'kv-bbb':
175+
176+
(kv+1)-xxx-pk
177+
( kv )-ccc-pk7
178+
( kv )-ccc-pk6 <-- We need to be here.
179+
# Successor(kv-bbb) <-- We get here when we call Seek(Successor(kv-bbb))
180+
( kv )-bbb-pk5 and we will need to call Prev() (*)
181+
( kv )-bbb-pk4
182+
( kv )-bbb-pk3
183+
( kv )-bbb-pk2
184+
( kv )-bbb-pk1
185+
# ( kv )-bbb <-- We would get here if we called Seek(kv-bbb).
186+
( kv )-aaa-pk
187+
188+
(*) - unless Successor(kv-bbb)=(kv-ccc), and Seek(kv-ccc) hits the row. In
189+
that case, we won't need to call Prev().
190+
191+
RocksDB calls:
192+
193+
Seek(Successor(kv-bbb));
194+
if (!it->Valid())
195+
{
196+
/*
197+
We may get EOF if rows with 'kv-bbb' (below the Successor... line in the
198+
diagram) do not exist. This doesn't mean that rows with values kv-ccc
199+
do not exist.
200+
*/
201+
it->SeekToLast();
202+
}
203+
else
204+
{
205+
if (!using_full_key ||
206+
!kd->value_matches_prefix(...))
207+
{
208+
it->Prev();
209+
}
210+
}
211+
212+
if (it->Valid() && kd->covers_key(...))
213+
return record.
214+
215+
Note that the code is the same as with HA_READ_KEY_OR_NEXT, except that
216+
we seek to Successor($lookup_key) instead of $lookup_key itself.

0 commit comments

Comments
 (0)