Fix an issue with loading bundles with multi-byte character #2820

wu-hui · 2021-07-12T01:48:17Z

Both BundleReader and the testing BundlerBuilder should be counting UTF-8 bytes, instead of Java char count.

This is a fix for: #2805

# Conflicts: # firebase-firestore/CHANGELOG.md

google-oss-bot · 2021-07-12T02:00:09Z

Coverage Report

Affected SDKs

`firebase-firestore`

SDK overall coverage changed from 47.18% (4f1cdc8) to 47.10% (559e5f5e) by -0.08%.

Filename	Base (`4f1cdc8`)	Head (559e5f5e)	Diff
AsyncQueue.java	78.39%	76.88%	-1.51%
BundleReader.java	94.87%	95.29%	+0.42%
ByteBufferInputStream.java	90.91%	83.33%	-7.58%
FirestoreClient.java	34.96%	30.08%	-4.88%
LruGarbageCollector.java	93.46%	84.11%	-9.35%

Test Logs

Notes

HTML coverage reports can be produced locally with ./gradlew <product>:checkCoverage.
Report files are located at <product-build-dir>/reports/jacoco/.

Head commit (559e5f5e) is created by Prow via merging commits: 4f1cdc8 7daf5cb.

google-oss-bot · 2021-07-12T02:03:16Z

Binary Size Report

Affected SDKs

firebase-firestore

Type Base (4f1cdc8) Head (559e5f5e) Diff

aar 1.03 MB 1.03 MB +237 B (+0.0%)

apk (release) 3.19 MB 3.19 MB +388 B (+0.0%)

Test Logs

Notes

Head commit (559e5f5e) is created by Prow via merging commits: 4f1cdc8 7daf5cb.

dconeybe · 2021-07-12T14:01:49Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

@@ -153,29 +152,50 @@ private int indexOfOpenBracket() {
    }
  }

+  private class ReadJsonResult {


drive-by comment: this class should be static, and probably final.

dconeybe · 2021-07-12T14:04:30Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

-      buffer.position(buffer.position() + read);
+      byte[] bytes = new byte[read];
+      buffer.get(bytes);
+      json.append(new String(bytes, charset));


drive-by comment: This is dangerous because bytes could end in the middle of a multi-byte character. You could possibly use a class like java.nio.charset.CharsetDecoder to handle this properly.

https://docs.oracle.com/javase/7/docs/api/java/nio/charset/CharsetEncoder.html

The change doesn't fix the issue. For example, consider the situation where the last byte of bytes specified to new String(bytes, charset) is the first byte of a multi-byte character, and the second byte of that character would be the first byte of the bytes in the next iteration; this would probably either fail decoding with an exception or would produce garbage.

This is the bytes the bundle tells the SDK that it is a UTF8 string. If what you described happens, it means the bundle is invalid, and an exception is exactly what we want here.

IIUC the code reads the bundle's bytes in "chunks". Wouldn't there be arbitrary breaks in the "chunks" such that a multibyte character could span two chunks?

You are right. This is problematic..I pushed a new commit to fix it.

But now, what if bytesToRead > BUFFER_CAPACITY? IIUC, with the updated code, if the buffer fills up then pullMoreData() will return false and readJson() will incorrectly throw IllegalArgumentException.

I have another idea. What if you go back to the original implementation but change StringBuilder to ByteArrayOutputStream. You can then accumulate the bytes in the ByteArrayOutputStream and, once all bytes are read, you can decode the entire byte sequence, avoiding the multi-byte-character-spanning-a-chunk issue.

Yeah, that is better.

dconeybe · 2021-07-12T18:38:15Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

-      buffer.position(buffer.position() + read);
+      byte[] bytes = new byte[read];
+      buffer.get(bytes);
+      json.append(new String(bytes, charset));


The change doesn't fix the issue. For example, consider the situation where the last byte of bytes specified to new String(bytes, charset) is the first byte of a multi-byte character, and the second byte of that character would be the first byte of the bytes in the next iteration; this would probably either fail decoding with an exception or would produce garbage.

firebase-firestore/CHANGELOG.md

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

firebase-firestore/src/androidTest/java/com/google/firebase/firestore/BundleTest.java

dconeybe · 2021-07-13T15:26:16Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

@@ -185,9 +186,12 @@ private String readJsonString(int length) throws IOException {
   */
  private boolean pullMoreData() throws IOException {
    buffer.compact();
-    int read = dataReader.read(buffer);
+    int bytesToRead = Math.min(bundleInputStream.available(), buffer.remaining());


available() is allowed to return 0 even before the end of the input stream. I think you should use buffer.remaining() if available() returns 0; otherwise, the read operation could erroneously fail as if it had prematurely reached the end of input.

We need a test that hits this issue.

To reply to Sebastian: I am not sure how to do this without mocking, available() is stated to be an estimate, which is pretty accurate for most implementations.

dconeybe · 2021-07-13T15:31:18Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

@@ -185,9 +186,12 @@ private String readJsonString(int length) throws IOException {
   */
  private boolean pullMoreData() throws IOException {
    buffer.compact();
-    int read = dataReader.read(buffer);
+    int bytesToRead = Math.min(bundleInputStream.available(), buffer.remaining());
+    byte[] bytes = new byte[bytesToRead];


Consider reading directly into the ByteBuffer's backing array, instead of allocating a new byte array.

s/Consider reading/Please read

Similarly, I feel better to avoid performance optimization without proof. The end result is some byte counting, which is bad for readability.

The other place we do this is acceptable, because there is no good alternative..

You are doubling the amount of memory consumed here. Gil pushed back heavily against this in the original PR.

I still think this is premature..but done anyways.

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

dconeybe · 2021-07-13T15:34:15Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

    buffer.get(c);
-    return new String(c);
+    return charset.decode(ByteBuffer.wrap(c)).toString();


readLengthPrefix() could return the byte count as well as the String, to avoid forcing the caller to re-encode the String into bytes. As with the other suggestion, this could be deferred to a follow-up PR if you prefer.

Note that indexOfOpenBracket returns the char-index of the first bracket. Using a byte buffer here is not correct. but likely won't make a difference since this part of the input should be ASCII.

dconeybe · 2021-07-13T15:35:54Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

      int read = Math.min(remaining, buffer.remaining());
-      json.append(buffer, 0, read);
+      jsonBytes.write(buffer.array(), buffer.position(), read);


You need to add buffer.arrayOffset() to buffer.position().

https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html#arrayOffset()

More over, buffer is a CharBuffer. CharBuffer deals with UTF-8 direcltly:

String unicode = "\uD83D\uDE0A"; byte[] b = unicode.getBytes(); CharBuffer buffer = CharBuffer.allocate(10); new InputStreamReader(new ByteArrayInputStream(b)).read(buffer); StringBuilder result = new StringBuilder(); buffer.flip(); System.out.println(buffer.remaining()); result.append(buffer); System.out.println(result);

Prints:

2 😊

FYI buffer was changed from CharBuffer to ByteBuffer in one of the recent commits.

Got it, but the existing code uses it throughout and creates the initial input stream using UTF-8. It should work (and the test passes), so I am a bit concerned that the bug is somewhere else.

Yeah, the tests pass because the test bundle builder also count characters, not bytes.

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

schmidt-sebastian · 2021-07-13T19:15:49Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

    buffer.get(c);
-    return new String(c);
+    return charset.decode(ByteBuffer.wrap(c)).toString();


Note that indexOfOpenBracket returns the char-index of the first bracket. Using a byte buffer here is not correct. but likely won't make a difference since this part of the input should be ASCII.

schmidt-sebastian · 2021-07-13T19:27:12Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

      int read = Math.min(remaining, buffer.remaining());
-      json.append(buffer, 0, read);
+      jsonBytes.write(buffer.array(), buffer.position(), read);


More over, buffer is a CharBuffer. CharBuffer deals with UTF-8 direcltly:

String unicode = "\uD83D\uDE0A"; byte[] b = unicode.getBytes(); CharBuffer buffer = CharBuffer.allocate(10); new InputStreamReader(new ByteArrayInputStream(b)).read(buffer); StringBuilder result = new StringBuilder(); buffer.flip(); System.out.println(buffer.remaining()); result.append(buffer); System.out.println(result);

Prints:

2 😊

schmidt-sebastian · 2021-07-13T19:37:12Z

firebase-firestore/src/androidTest/java/com/google/firebase/firestore/BundleTest.java

@@ -71,7 +72,7 @@
            + "{\"seconds\":1000,\"nanos\":9999},\"exists\":true}}",
        "{\"document\":{\"name\":\"projects/{projectId}/databases/(default)/documents/coll-1/b\","
            + "\"createTime\":{\"seconds\":1,\"nanos\":9},\"updateTime\":{\"seconds\":1,"
-            + "\"nanos\":9},\"fields\":{\"k\":{\"stringValue\":\"b\"},\"bar\":"
+            + "\"nanos\":9},\"fields\":{\"k\":{\"stringValue\":\"\uD83D\uDE0A\"},\"bar\":"


Note that this test passes against master as well.

Because the bundles built here also count chars, not bytes.

See other change in this test, it uses byte count, which fails the tests.

Sorry, I just saw this flaw in my logic. So the bug is that we try to read more data than available.

schmidt-sebastian · 2021-07-13T19:37:50Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

@@ -185,9 +186,12 @@ private String readJsonString(int length) throws IOException {
   */
  private boolean pullMoreData() throws IOException {
    buffer.compact();
-    int read = dataReader.read(buffer);
+    int bytesToRead = Math.min(bundleInputStream.available(), buffer.remaining());


We need a test that hits this issue.

schmidt-sebastian · 2021-07-13T19:38:25Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

@@ -185,9 +186,12 @@ private String readJsonString(int length) throws IOException {
   */
  private boolean pullMoreData() throws IOException {
    buffer.compact();
-    int read = dataReader.read(buffer);
+    int bytesToRead = Math.min(bundleInputStream.available(), buffer.remaining());
+    byte[] bytes = new byte[bytesToRead];


s/Consider reading/Please read

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

dconeybe

I approve, but please ensure that Sebastian's comments are resolved.

schmidt-sebastian · 2021-07-13T20:15:40Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

  long bytesRead;

  public BundleReader(BundleSerializer serializer, InputStream data) {
    this.serializer = serializer;
-    dataReader = new InputStreamReader(data, charset);
-    buffer = CharBuffer.allocate(BUFFER_CAPACITY);
+    bundleInputStream = data;


Can you rename the argument to bundleInputStream as well? We could also drop the prefix similar to serializer.

Optional.

schmidt-sebastian · 2021-07-13T21:18:48Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

@@ -133,9 +135,9 @@ private BundleElement readNextElement() throws IOException, JSONException {
      throw abort("Reached the end of bundle when a length string is expected.");
    }

-    char[] c = new char[nextOpenBracket];
+    byte[] c = new byte[nextOpenBracket];


schmidt-sebastian · 2021-07-13T21:29:44Z

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java

+    int available = bundleInputStream.available();
+    int bytesToRead = Math.min(available, buffer.remaining());
+    // `available` is an estimation, we still try to read if the estimation is 0, to move things
+    // forward.
+    if (available == 0) {
+      bytesToRead = buffer.remaining();
+    }


Can we always read buffer.remaining?

Yes, since we are reading into the buffer directly now!

google-oss-bot · 2021-07-13T21:54:12Z

@wu-hui: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
smoke-tests	`7daf5cb`	link	`/test smoke-tests`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

schmidt-sebastian

LGTM

wu-hui added 10 commits February 26, 2021 13:43

Change bundle protos package

113d5de

Merge branch 'master' of github.com:firebase/firebase-android-sdk

61da4fa

Merge branch 'master' of github.com:firebase/firebase-android-sdk

4cb7c31

Merge branch 'master' of github.com:firebase/firebase-android-sdk

df1b260

Merge branch 'master' of github.com:firebase/firebase-android-sdk

0b8f106

Merge branch 'master' of github.com:firebase/firebase-android-sdk

1d0d296

Merge branch 'master' of github.com:firebase/firebase-android-sdk

e94805e

Merge branch 'master' of github.com:firebase/firebase-android-sdk

d973ea3

Merge branch 'master' of github.com:firebase/firebase-android-sdk

2252f4e

Fix an issue with loading bundles with multi-byte unicodes

cd7f17b

google-oss-bot added the size/L label Jul 12, 2021

googlebot added the cla: yes Override cla label Jul 12, 2021

wu-hui added 2 commits July 11, 2021 21:50

Merge branch 'master' of github.com:firebase/firebase-android-sdk

0865332

Merge branch 'master' into wuandy/BundleUnicodeFix

654235c

# Conflicts: # firebase-firestore/CHANGELOG.md

wu-hui requested a review from schmidt-sebastian July 12, 2021 01:54

wu-hui assigned schmidt-sebastian Jul 12, 2021

dconeybe requested changes Jul 12, 2021

View reviewed changes

Decode byte array properly.

266ec18

dconeybe requested changes Jul 12, 2021

View reviewed changes

wu-hui added 3 commits July 12, 2021 17:19

Fix a potential bug.

c7e5cd6

More fixes

87d73f2

Remove unused class.

32fa86e

google-oss-bot added size/M and removed size/L labels Jul 13, 2021

Rename method.

d67e1f5

dconeybe reviewed Jul 13, 2021

View reviewed changes

firebase-firestore/CHANGELOG.md Outdated Show resolved Hide resolved

wu-hui added 2 commits July 13, 2021 11:11

Fix one more bug.

3db6481

Changelog update.

33b1f4a

dconeybe requested changes Jul 13, 2021

View reviewed changes

schmidt-sebastian reviewed Jul 13, 2021

View reviewed changes

schmidt-sebastian assigned wu-hui and unassigned schmidt-sebastian Jul 13, 2021

More feedback.

ad600bd

google-oss-bot added size/L and removed size/M labels Jul 13, 2021

wu-hui assigned schmidt-sebastian and dconeybe and unassigned wu-hui Jul 13, 2021

dconeybe reviewed Jul 13, 2021

View reviewed changes

firebase-firestore/src/main/java/com/google/firebase/firestore/bundle/BundleReader.java Outdated Show resolved Hide resolved

dconeybe approved these changes Jul 13, 2021

View reviewed changes

dconeybe assigned wu-hui and unassigned dconeybe Jul 13, 2021

schmidt-sebastian reviewed Jul 13, 2021

View reviewed changes

wu-hui added 2 commits July 13, 2021 16:30

More more feedback.

91ded51

Feedback++

91ad89a

schmidt-sebastian approved these changes Jul 13, 2021

View reviewed changes

Feedback++++

7daf5cb

schmidt-sebastian approved these changes Jul 13, 2021

View reviewed changes

wu-hui merged commit 76b029c into master Jul 14, 2021

wu-hui deleted the wuandy/BundleUnicodeFix branch July 14, 2021 01:52

firebase locked and limited conversation to collaborators Aug 14, 2021

Fix an issue with loading bundles with multi-byte character #2820

Fix an issue with loading bundles with multi-byte character #2820

Uh oh!

Conversation

wu-hui commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-oss-bot commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Affected SDKs

firebase-firestore

Test Logs

Notes

Uh oh!

google-oss-bot commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Report

Affected SDKs

firebase-firestore

Test Logs

Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

wu-hui commented Jul 12, 2021 •

edited

Loading

google-oss-bot commented Jul 12, 2021 •

edited

Loading

`firebase-firestore`

google-oss-bot commented Jul 12, 2021 •

edited

Loading

`firebase-firestore`