You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/expressions/literal-expr.md
+193-5Lines changed: 193 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -26,29 +26,208 @@ Each of the lexical [literal][literal tokens] forms described earlier can make u
26
26
5; // integer type
27
27
```
28
28
29
+
In the descriptions below, the _string representation_ of a token is the sequence of characters from the input which matched the token's production in a *Lexer* grammar snippet.
30
+
31
+
> **Note**: this string representation never includes a character `U+000D` (CR) immediately followed by `U+000A` (LF): this pair would have been previously transformed into a single `U+000A` (LF).
32
+
33
+
## Escapes
34
+
35
+
The descriptions of textual literal expressions below make use of several forms of _escape_.
36
+
37
+
Each form of escape is characterised by:
38
+
* an _escape sequence_: a sequence of characters, which always begins with `U+005C` (`\`)
39
+
* an _escaped value_: either a single character or an empty sequence of characters
40
+
41
+
In the definitions of escapes below:
42
+
* An _octal digit_ is any of the characters in the range \[`0`-`7`].
43
+
* A _hexadecimal digit_ is any of the characters in the ranges \[`0`-`9`], \[`a`-`f`], or \[`A`-`F`].
44
+
45
+
### Simple escapes
46
+
47
+
Each sequence of characters occurring in the first column of the following table is an escape sequence.
48
+
49
+
In each case, the escaped value is the character given in the corresponding entry in the second column.
50
+
51
+
| Escape sequence | Escaped value |
52
+
|-----------------|--------------------------|
53
+
|`\0`| U+0000 (NUL) |
54
+
|`\t`| U+0009 (HT) |
55
+
|`\n`| U+000A (LF) |
56
+
|`\r`| U+000D (CR) |
57
+
|`\"`| U+0022 (QUOTATION MARK) |
58
+
|`\'`| U+0027 (APOSTROPHE) |
59
+
|`\\`| U+005C (REVERSE SOLIDUS) |
60
+
61
+
### 8-bit escapes
62
+
63
+
The escape sequence consists of `\x` followed by two hexadecimal digits.
64
+
65
+
The escaped value is the character whose [Unicode scalar value] is the result of interpreting the final two characters in the escape sequence as a hexadecimal integer, as if by [`u8::from_str_radix`] with radix 16.
66
+
67
+
> **Note**: the escaped value therefore has a [Unicode scalar value] in the range of [`u8`][numeric types].
68
+
69
+
### 7-bit escapes
70
+
71
+
The escape sequence consists of `\x` followed by an octal digit then a hexadecimal digit.
72
+
73
+
The escaped value is the character whose [Unicode scalar value] is the result of interpreting the final two characters in the escape sequence as a hexadecimal integer, as if by [`u8::from_str_radix`] with radix 16.
74
+
75
+
### Unicode escapes
76
+
77
+
The escape sequence consists of `\u{`, followed by a sequence of characters each of which is a hexadecimal digit or `_`, followed by `}`.
78
+
79
+
The escaped value is the character whose [Unicode scalar value] is the result of interpreting the hexadecimal digits contained in the escape sequence as a hexadecimal integer, as if by [`u8::from_str_radix`] with radix 16.
80
+
81
+
> **Note**: the permitted forms of a [CHAR_LITERAL] or [STRING_LITERAL] token ensure that there is such a character.
82
+
83
+
### String continuation escapes
84
+
85
+
The escape sequence consists of `\` followed immediately by `U+000A` (LF), and all following whitespace characters before the next non-whitespace character.
86
+
For this purpose, the whitespace characters are `U+0009` (HT), `U+000A` (LF), `U+000D` (CR), and `U+0020` (SPACE).
87
+
88
+
The escaped value is an empty sequence of characters.
89
+
29
90
## Character literal expressions
30
91
31
92
A character literal expression consists of a single [CHAR_LITERAL] token.
32
93
33
-
> **Note**: This section is incomplete.
94
+
The expression's type is the primitive [`char`][textual types] type.
95
+
96
+
The token must not have a suffix.
97
+
98
+
The token's _literal content_ is the sequence of characters following the first `U+0027` (`'`) and preceding the last `U+0027` (`'`) in the string representation of the token.
99
+
100
+
The literal expression's _represented character_ is derived from the literal content as follows:
101
+
102
+
* If the literal content is one of the following forms of escape sequence, the represented character is the escape sequence's escaped value:
103
+
*[Simple escapes]
104
+
*[7-bit escapes]
105
+
*[Unicode escapes]
106
+
107
+
* Otherwise the represented character is the single character that makes up the literal content.
108
+
109
+
The expression's value is the [`char`][textual types] corresponding to the represented character's [Unicode scalar value].
110
+
111
+
> **Note**: the permitted forms of a [CHAR_LITERAL] token ensure that these rules always produce a single character.
112
+
113
+
Examples of character literal expressions:
114
+
115
+
```rust
116
+
'R'; // R
117
+
'\''; // '
118
+
'\x52'; // R
119
+
'\u{00E6}'; // LATIN SMALL LETTER AE (U+00E6)
120
+
```
34
121
35
122
## String literal expressions
36
123
37
124
A string literal expression consists of a single [STRING_LITERAL] or [RAW_STRING_LITERAL] token.
38
125
39
-
> **Note**: This section is incomplete.
126
+
The expression's type is a shared reference (with `static` lifetime) to the primitive [`str`][textual types] type.
127
+
That is, the type is `&'static str`.
128
+
129
+
The token must not have a suffix.
130
+
131
+
The token's _literal content_ is the sequence of characters following the first `U+0022` (`"`) and preceding the last `U+0022` (`"`) in the string representation of the token.
132
+
133
+
The literal expression's _represented string_ is a sequence of characters derived from the literal content as follows:
134
+
135
+
* If the token is a [STRING_LITERAL], each escape sequence of any of the following forms occurring in the literal content is replaced by the escape sequence's escaped value.
136
+
*[Simple escapes]
137
+
*[7-bit escapes]
138
+
*[Unicode escapes]
139
+
*[String continuation escapes]
140
+
141
+
These replacements take place in left-to-right order.
142
+
For example, the token `"\\x41"` is converted to the characters `\``x``4``1`.
143
+
144
+
* If the token is a [RAW_STRING_LITERAL], the represented string is identical to the literal content.
145
+
146
+
The expression's value is a reference to a statically allocated [`str`][textual types] containing the UTF-8 encoding of the represented string.
147
+
148
+
Examples of string literal expressions:
149
+
150
+
```rust
151
+
"foo"; r"foo"; // foo
152
+
"\"foo\""; r#""foo""#; // "foo"
153
+
154
+
"foo #\"# bar";
155
+
r##"foo #"# bar"##; // foo #"# bar
156
+
157
+
"\x52"; "R"; r"R"; // R
158
+
"\\x52"; r"\x52"; // \x52
159
+
```
40
160
41
161
## Byte literal expressions
42
162
43
163
A byte literal expression consists of a single [BYTE_LITERAL] token.
44
164
45
-
> **Note**: This section is incomplete.
165
+
The expression's type is the primitive [`u8`][numeric types] type.
166
+
167
+
The token must not have a suffix.
168
+
169
+
The token's _literal content_ is the sequence of characters following the first `U+0027` (`'`) and preceding the last `U+0027` (`'`) in the string representation of the token.
170
+
171
+
The literal expression's _represented character_ is derived from the literal content as follows:
172
+
173
+
* If the literal content is one of the following forms of escape sequence, the represented character is the escape sequence's escaped value:
174
+
*[Simple escapes]
175
+
*[8-bit escapes]
176
+
177
+
* Otherwise the represented character is the single character that makes up the literal content.
178
+
179
+
The expression's value is the represented character's [Unicode scalar value].
180
+
181
+
> **Note**: the permitted forms of a [BYTE_LITERAL] token ensure that these rules always produce a single character, whose Unicode scalar value is in the range of [`u8`][numeric types].
182
+
183
+
Examples of byte literal expressions:
184
+
185
+
```rust
186
+
b'R'; // 82
187
+
b'\''; // 39
188
+
b'\x52'; // 82
189
+
b'\xA0'; // 160
190
+
```
46
191
47
192
## Byte string literal expressions
48
193
49
-
A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_BYTE_STRING_LITERAL] token.
194
+
A byte string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_BYTE_STRING_LITERAL] token.
50
195
51
-
> **Note**: This section is incomplete.
196
+
The expression's type is a shared reference (with `static` lifetime) to an array whose element type is [`u8`][numeric types].
197
+
That is, the type is `&'static [u8; N]`, where `N` is the number of bytes in the represented string described below.
198
+
199
+
The token must not have a suffix.
200
+
201
+
The token's _literal content_ is the sequence of characters following the first `U+0022` (`"`) and preceding the last `U+0022` (`"`) in the string representation of the token.
202
+
203
+
The literal expression's _represented string_ is a sequence of characters derived from the literal content as follows:
204
+
205
+
* If the token is a [BYTE_STRING_LITERAL], each escape sequence of any of the following forms occurring in the literal content is replaced by the escape sequence's escaped value.
206
+
*[Simple escapes]
207
+
*[8-bit escapes]
208
+
*[String continuation escapes]
209
+
210
+
These replacements take place in left-to-right order.
211
+
For example, the token `b"\\x41"` is converted to the characters `\``x``4``1`.
212
+
213
+
* If the token is a [RAW_BYTE_STRING_LITERAL], the represented string is identical to the literal content.
214
+
215
+
The expression's value is a reference to a statically allocated array containing the [Unicode scalar values] of the characters in the represented string, in the same order.
216
+
217
+
> **Note**: the permitted forms of [BYTE_STRING_LITERAL] and [RAW_BYTE_STRING_LITERAL] tokens ensure that these rules always produce array element values in the range of [`u8`][numeric types].
218
+
219
+
Examples of byte string literal expressions:
220
+
221
+
```rust
222
+
b"foo"; br"foo"; // foo
223
+
b"\"foo\""; br#""foo""#; // "foo"
224
+
225
+
b"foo #\"# bar";
226
+
br##"foo #"# bar"##; // foo #"# bar
227
+
228
+
b"\x52"; b"R"; br"R"; // R
229
+
b"\\x52"; br"\x52"; // \x52
230
+
```
52
231
53
232
## C string literal expressions
54
233
@@ -167,6 +346,11 @@ The expression's type is the primitive [boolean type], and its value is:
0 commit comments