---

steveklabnik · steveklabnik · commit fd14fdd3120a · 2014-11-28T10:06:08.000-05:00
yaml --- r: 162711 b: refs/heads/try c: 4d1cb78 h: refs/heads/master i: 162709: 81ec966 162707: 235e075 162703: c146997 v: v3
diff --git a/[refs] b/[refs]
@@ -2,7 +2,7 @@
 refs/heads/master: 9146a919b616e39e528e4d7100d16eef52f1f852
 refs/heads/snap-stage1: e33de59e47c5076a89eadeb38f4934f58a3618a6
 refs/heads/snap-stage3: cafe2966770ff377aad6dd9fd808e68055587c58
-refs/heads/try: f33d879a7094bce7e16345dcc2efa85da6f05261
+refs/heads/try: 4d1cb7820de50899f2009da20f83b639df2873f0
 refs/tags/release-0.1: 1f5c5126e96c79d22cb7862f75304136e204f105
 refs/heads/dist-snap: ba4081a5a8573875fed17545846f6f6902c8ba8d
 refs/tags/release-0.2: c870d2dffb391e14efb05aa27898f1f6333a9596
diff --git a/branches/try/src/doc/complement-lang-faq.md b/branches/try/src/doc/complement-lang-faq.md
@@ -108,7 +108,7 @@ The `str` type is UTF-8 because we observe more text in the wild in this encodin
 
 This does mean that indexed access to a Unicode codepoint inside a `str` value is an O(n) operation. On the one hand, this is clearly undesirable; on the other hand, this problem is full of trade-offs and we'd like to point a few important qualifications:
 
-* Scanning a `str` for ASCII-range codepoints can still be done safely octet-at-a-time, with each indexing operation pulling out a `u8` costing only O(1) and producing a value that can be cast and compared to an ASCII-range `char`. So if you're (say) line-breaking on `'\n'`, octet-based treatment still works. UTF8 was well-designed this way.
+* Scanning a `str` for ASCII-range codepoints can still be done safely octet-at-a-time. If you use `.as_bytes()`, pulling out a `u8` costs only O(1) and produces a value that can be cast and compared to an ASCII-range `char`. So if you're (say) line-breaking on `'\n'`, octet-based treatment still works. UTF8 was well-designed this way.
 * Most "character oriented" operations on text only work under very restricted language assumptions sets such as "ASCII-range codepoints only". Outside ASCII-range, you tend to have to use a complex (non-constant-time) algorithm for determining linguistic-unit (glyph, word, paragraph) boundaries anyways. We recommend using an "honest" linguistically-aware, Unicode-approved algorithm.
 * The `char` type is UCS4. If you honestly need to do a codepoint-at-a-time algorithm, it's trivial to write a `type wstr = [char]`, and unpack a `str` into it in a single pass, then work with the `wstr`. In other words: the fact that the language is not "decoding to UCS4 by default" shouldn't stop you from decoding (or re-encoding any other way) if you need to work with that encoding.