Skip to content

Commit cb2e313

Browse files
committed
---
yaml --- r: 162045 b: refs/heads/auto c: 4d1cb78 h: refs/heads/master i: 162043: 216eea7 v: v3
1 parent db8c92d commit cb2e313

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

[refs]

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ refs/tags/release-0.3: b5f0d0f648d9a6153664837026ba1be43d3e2503
1010
refs/tags/release-0.3.1: 495bae036dfe5ec6ceafd3312b4dca48741e845b
1111
refs/tags/release-0.4: e828ea2080499553b97dfe33b3f4d472b4562ad7
1212
refs/tags/release-0.5: 7e3bcfbf21278251ee936ad53e92e9b719702d73
13-
refs/heads/auto: f33d879a7094bce7e16345dcc2efa85da6f05261
13+
refs/heads/auto: 4d1cb7820de50899f2009da20f83b639df2873f0
1414
refs/heads/servo: af82457af293e2a842ba6b7759b70288da276167
1515
refs/tags/release-0.6: b4ebcfa1812664df5e142f0134a5faea3918544c
1616
refs/tags/0.1: b19db808c2793fe2976759b85a355c3ad8c8b336

branches/auto/src/doc/complement-lang-faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ The `str` type is UTF-8 because we observe more text in the wild in this encodin
108108

109109
This does mean that indexed access to a Unicode codepoint inside a `str` value is an O(n) operation. On the one hand, this is clearly undesirable; on the other hand, this problem is full of trade-offs and we'd like to point a few important qualifications:
110110

111-
* Scanning a `str` for ASCII-range codepoints can still be done safely octet-at-a-time, with each indexing operation pulling out a `u8` costing only O(1) and producing a value that can be cast and compared to an ASCII-range `char`. So if you're (say) line-breaking on `'\n'`, octet-based treatment still works. UTF8 was well-designed this way.
111+
* Scanning a `str` for ASCII-range codepoints can still be done safely octet-at-a-time. If you use `.as_bytes()`, pulling out a `u8` costs only O(1) and produces a value that can be cast and compared to an ASCII-range `char`. So if you're (say) line-breaking on `'\n'`, octet-based treatment still works. UTF8 was well-designed this way.
112112
* Most "character oriented" operations on text only work under very restricted language assumptions sets such as "ASCII-range codepoints only". Outside ASCII-range, you tend to have to use a complex (non-constant-time) algorithm for determining linguistic-unit (glyph, word, paragraph) boundaries anyways. We recommend using an "honest" linguistically-aware, Unicode-approved algorithm.
113113
* The `char` type is UCS4. If you honestly need to do a codepoint-at-a-time algorithm, it's trivial to write a `type wstr = [char]`, and unpack a `str` into it in a single pass, then work with the `wstr`. In other words: the fact that the language is not "decoding to UCS4 by default" shouldn't stop you from decoding (or re-encoding any other way) if you need to work with that encoding.
114114

0 commit comments

Comments
 (0)