Skip to content

Commit e677f34

Browse files
authored
Merge pull request #947 from jansol/master
Add subchapter about (byte)string literals
2 parents c973ef7 + f8bf8a5 commit e677f34

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed

src/std/str.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,5 +57,110 @@ More `str`/`String` methods can be found under the
5757
[std::string][string]
5858
modules
5959

60+
## Literals and escapes
61+
62+
There are multiple ways to write string literals with special characters in them.
63+
All result in a similar `&str` so it's best to use the form that is the most
64+
convenient to write. Similarly there are multiple ways to write byte string literals,
65+
which all result in `&[u8; N]`.
66+
67+
Generally special characters are escaped with a backslash character: `\`.
68+
This way you can add any character to your string, even unprintable ones
69+
and ones that you don't know how to type. If you want a literal backslash,
70+
escape it with another one: `\\`
71+
72+
String or character literal delimiters occuring within a literal must be escaped: `"\""`, `'\''`.
73+
74+
```rust,editable
75+
fn main() {
76+
// You can use escapes to write bytes by their hexadecimal values...
77+
let byte_escape = "I'm writing \x52\x75\x73\x74!";
78+
println!("What are you doing\x3F (\\x3F means ?) {}", byte_escape);
79+
80+
// ...or Unicode code points.
81+
let unicode_codepoint = "\u{211D}";
82+
let character_name = "\"DOUBLE-STRUCK CAPITAL R\"";
83+
84+
println!("Unicode character {} (U+211D) is called {}",
85+
unicode_codepoint, character_name );
86+
87+
88+
let long_string = "String literals
89+
can span multiple lines.
90+
The linebreak and indentation here ->\
91+
<- can be escaped too!";
92+
println!("{}", long_string);
93+
}
94+
```
95+
96+
Sometimes there are just too many characters that need to be escaped or it's just
97+
much more convenient to write a string out as-is. This is where raw string literals come into play.
98+
99+
```rust, editable
100+
fn main() {
101+
let raw_str = r"Escapes don't work here: \x3F \u{211D}";
102+
println!("{}", raw_str);
103+
104+
// If you need quotes in a raw string, add a pair of #s
105+
let quotes = r#"And then I said: "There is no escape!""#;
106+
println!("{}", quotes);
107+
108+
// If you need "# in your string, just use more #s in the delimiter.
109+
// There is no limit for the number of #s you can use.
110+
let longer_delimiter = r###"A string with "# in it. And even "##!"###;
111+
println!("{}", longer_delimiter);
112+
}
113+
```
114+
115+
Want a string that's not UTF-8? (Remember, `str` and `String` must be valid UTF-8)
116+
Or maybe you want an array of bytes that's mostly text? Byte strings to the rescue!
117+
118+
```rust, editable
119+
use std::str;
120+
121+
fn main() {
122+
// Note that this is not actually a &str
123+
let bytestring: &[u8; 20] = b"this is a bytestring";
124+
125+
// Byte arrays don't have Display so printing them is a bit limited
126+
println!("A bytestring: {:?}", bytestring);
127+
128+
// Bytestrings can have byte escapes...
129+
let escaped = b"\x52\x75\x73\x74 as bytes";
130+
// ...but no unicode escapes
131+
// let escaped = b"\u{211D} is not allowed";
132+
println!("Some escaped bytes: {:?}", escaped);
133+
134+
135+
// Raw bytestrings work just like raw strings
136+
let raw_bytestring = br"\u{211D} is not escaped here";
137+
println!("{:?}", raw_bytestring);
138+
139+
// Converting a byte array to str can fail
140+
if let Ok(my_str) = str::from_utf8(raw_bytestring) {
141+
println!("And the same as text: '{}'", my_str);
142+
}
143+
144+
let quotes = br#"You can also use "fancier" formatting, \
145+
like with normal raw strings"#;
146+
147+
// Bytestrings don't have to be UTF-8
148+
let shift_jis = b"\x82\xe6\x82\xa8\x82\xb1\x82"; // "ようこそ" in SHIFT-JIS
149+
150+
// But then they can't always be converted to str
151+
match str::from_utf8(shift_jis) {
152+
Ok(my_str) => println!("Conversion successful: '{}'", my_str),
153+
Err(e) => println!("Conversion failed: {:?}", e),
154+
};
155+
}
156+
```
157+
158+
For conversions between character encodings check out the [enconding][encoding-crate] crate.
159+
160+
A more detailed listing of the ways to write string literals and escape characters
161+
is given in the ['Tokens' chapter][tokens] of the Rust Reference.
162+
60163
[str]: https://doc.rust-lang.org/std/str/
61164
[string]: https://doc.rust-lang.org/std/string/
165+
[tokens]: https://doc.rust-lang.org/reference/tokens.html
166+
[encoding-crate]: https://crates.io/crates/encoding

0 commit comments

Comments
 (0)