@@ -234,7 +234,7 @@ rule. A literal is a form of constant expression, so is evaluated (primarily)
234
234
at compile time.
235
235
236
236
~~~~ {.ebnf .gram}
237
- literal : string_lit | char_lit | num_lit ;
237
+ literal : string_lit | char_lit | byte_string_lit | byte_lit | num_lit ;
238
238
~~~~
239
239
240
240
#### Character and string literals
@@ -244,17 +244,17 @@ char_lit : '\x27' char_body '\x27' ;
244
244
string_lit : '"' string_body * '"' | 'r' raw_string ;
245
245
246
246
char_body : non_single_quote
247
- | '\x5c' [ '\x27' | common_escape ] ;
247
+ | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
248
248
249
249
string_body : non_double_quote
250
- | '\x5c' [ '\x22' | common_escape ] ;
250
+ | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
251
251
raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
252
252
253
253
common_escape : '\x5c'
254
254
| 'n' | 'r' | 't' | '0'
255
255
| 'x' hex_digit 2
256
- | 'u' hex_digit 4
257
- | 'U' hex_digit 8 ;
256
+ unicode_escape : 'u' hex_digit 4
257
+ | 'U' hex_digit 8 ;
258
258
259
259
hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
260
260
| 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
@@ -294,7 +294,7 @@ the following forms:
294
294
escaped in order to denote * itself* .
295
295
296
296
Raw string literals do not process any escapes. They start with the character
297
- ` U+0072 ` (` r ` ), followed zero or more of the character ` U+0023 ` (` # ` ) and a
297
+ ` U+0072 ` (` r ` ), followed by zero or more of the character ` U+0023 ` (` # ` ) and a
298
298
` U+0022 ` (double-quote) character. The _ raw string body_ is not defined in the
299
299
EBNF grammar above: it can contain any sequence of Unicode characters and is
300
300
terminated only by another ` U+0022 ` (double-quote) character, followed by the
@@ -319,6 +319,65 @@ r##"foo #"# bar"##; // foo #"# bar
319
319
"\\x52"; r"\x52"; // \x52
320
320
~~~~
321
321
322
+ #### Byte and byte string literals
323
+
324
+ ~~~~ {.ebnf .gram}
325
+ byte_lit : 'b' '\x27' byte_body '\x27' ;
326
+ byte_string_lit : 'b' '"' string_body * '"' | 'b' 'r' raw_byte_string ;
327
+
328
+ byte_body : ascii_non_single_quote
329
+ | '\x5c' [ '\x27' | common_escape ] ;
330
+
331
+ byte_string_body : ascii_non_double_quote
332
+ | '\x5c' [ '\x22' | common_escape ] ;
333
+ raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
334
+
335
+ ~~~~
336
+
337
+ A _ byte literal_ is a single ASCII character (in the ` U+0000 ` to ` U+007F ` range)
338
+ enclosed within two ` U+0027 ` (single-quote) characters,
339
+ with the exception of ` U+0027 ` itself,
340
+ which must be _ escaped_ by a preceding U+005C character (` \ ` ),
341
+ or a single _ escape_ .
342
+ It is equivalent to a ` u8 ` unsigned 8-bit integer _ number literal_ .
343
+
344
+ A _ byte string literal_ is a sequence of ASCII characters and _ escapes_
345
+ enclosed within two ` U+0022 ` (double-quote) characters,
346
+ with the exception of ` U+0022 ` itself,
347
+ which must be _ escaped_ by a preceding ` U+005C ` character (` \ ` ),
348
+ or a _ raw byte string literal_ .
349
+ It is equivalent to a ` &'static [u8] ` borrowed vectior unsigned 8-bit integers.
350
+
351
+ Some additional _ escapes_ are available in either byte or non-raw byte string
352
+ literals. An escape starts with a ` U+005C ` (` \ ` ) and continues with one of
353
+ the following forms:
354
+
355
+ * An _ byte escape_ escape starts with ` U+0078 ` (` x ` ) and is
356
+ followed by exactly two _ hex digits_ . It denotes the byte
357
+ equal to the provided hex value.
358
+ * A _ whitespace escape_ is one of the characters ` U+006E ` (` n ` ), ` U+0072 `
359
+ (` r ` ), or ` U+0074 ` (` t ` ), denoting the bytes values ` 0x0A ` (ASCII LF),
360
+ ` 0x0D ` (ASCII CR) or ` 0x09 ` (ASCII HT) respectively.
361
+ * The _ backslash escape_ is the character ` U+005C ` (` \ ` ) which must be
362
+ escaped in order to denote its ASCII encoding ` 0x5C ` .
363
+
364
+ Raw byte string literals do not process any escapes.
365
+ They start with the character ` U+0072 ` (` r ` ),
366
+ followed by ` U+0062 ` (` b ` ),
367
+ followed by zero or more of the character ` U+0023 ` (` # ` ),
368
+ and a ` U+0022 ` (double-quote) character.
369
+ The _ raw string body_ is not defined in the EBNF grammar above:
370
+ it can contain any sequence of ASCII characters and is
371
+ terminated only by another ` U+0022 ` (double-quote) character, followed by the
372
+ same number of ` U+0023 ` (` # ` ) characters that preceded the opening ` U+0022 `
373
+ (double-quote) character.
374
+ A raw byte string literal can not contain any non-ASCII byte.
375
+
376
+ All characters contained in the raw string body represent their ASCII encoding,
377
+ the characters ` U+0022 ` (double-quote) (except when followed by at least as
378
+ many ` U+0023 ` (` # ` ) characters as were used to start the raw string literal) or
379
+ ` U+005C ` (` \ ` ) do not have any special meaning.
380
+
322
381
#### Number literals
323
382
324
383
~~~~ {.ebnf .gram}
0 commit comments