Skip to content

Commit 74f22eb

Browse files
committed
Add grammar for char and string literals
1 parent 768cb76 commit 74f22eb

File tree

1 file changed

+84
-2
lines changed

1 file changed

+84
-2
lines changed

src/tokens.md

Lines changed: 84 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,24 @@ evaluated (primarily) at compile time.
2121

2222
| | Example | `#` sets | Characters | Escapes |
2323
|----------------------------------------------|-----------------|------------|-------------|---------------------|
24-
| [Character](#character-literals) | `'H'` | `N/A` | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) |
25-
| [String](#string-literals) | `"hello"` | `N/A` | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) |
24+
| [Character](#character-literals) | `'H'` | `N/A` | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
25+
| [String](#string-literals) | `"hello"` | `N/A` | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) |
2626
| [Raw](#raw-string-literals) | `r#"hello"#` | `0...` | All Unicode | `N/A` |
2727
| [Byte](#byte-literals) | `b'H'` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
2828
| [Byte string](#byte-string-literals) | `b"hello"` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
2929
| [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | `0...` | All ASCII | `N/A` |
3030

31+
#### ASCII escapes
32+
33+
| | Name |
34+
|---|------|
35+
| `\x41` | 7-bit character code (exactly 2 digits, up to 0x7F) |
36+
| `\n` | Newline |
37+
| `\r` | Carriage return |
38+
| `\t` | Tab |
39+
| `\\` | Backslash |
40+
| `\0` | Null |
41+
3142
#### Byte escapes
3243

3344
| | Name |
@@ -74,12 +85,45 @@ evaluated (primarily) at compile time.
7485

7586
#### Character literals
7687

88+
> **<sup>Lexer</sup>**
89+
> CHAR_LITERAL :
90+
> &nbsp;&nbsp; `'` ( ~[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'`
91+
>
92+
> QUOTE_ESCAPE :
93+
> &nbsp;&nbsp; `\'` | `\"`
94+
>
95+
> ASCII_ESCAPE :
96+
> &nbsp;&nbsp; &nbsp;&nbsp; `\x` OCT_DIGIT HEX_DIGIT
97+
> &nbsp;&nbsp; | `\n` | `\r` | `\t` | `\\` | `\0`
98+
>
99+
> UNICODE_ESCAPE :
100+
> &nbsp;&nbsp; &nbsp;&nbsp; `\u{` HEX_DIGIT `}`
101+
> &nbsp;&nbsp; | `\u{` HEX_DIGIT HEX_DIGIT `}`
102+
> &nbsp;&nbsp; | `\u{` HEX_DIGIT HEX_DIGIT HEX_DIGIT `}`
103+
> &nbsp;&nbsp; | `\u{` HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT `}`
104+
> &nbsp;&nbsp; | `\u{` HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT`}`
105+
> &nbsp;&nbsp; | `\u{` HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT`}`
106+
77107
A _character literal_ is a single Unicode character enclosed within two
78108
`U+0027` (single-quote) characters, with the exception of `U+0027` itself,
79109
which must be _escaped_ by a preceding `U+005C` character (`\`).
80110

81111
#### String literals
82112

113+
> **<sup>Lexer</sup>**
114+
> STRING_LITERAL :
115+
> &nbsp;&nbsp; `"` (
116+
> &nbsp;&nbsp; &nbsp;&nbsp; ~[`"` `\` _IsolatedCR_]
117+
> &nbsp;&nbsp; &nbsp;&nbsp; | QUOTE_ESCAPE
118+
> &nbsp;&nbsp; &nbsp;&nbsp; | ASCII_ESCAPE
119+
> &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE
120+
> &nbsp;&nbsp; &nbsp;&nbsp; | STRING_CONTINUE
121+
> &nbsp;&nbsp; )<sup>\*</sup> `"`
122+
>
123+
> STRING_CONTINUE :
124+
> &nbsp;&nbsp; `\` _followed by_ \\n
125+
126+
83127
A _string literal_ is a sequence of any Unicode characters enclosed within two
84128
`U+0022` (double-quote) characters, with the exception of `U+0022` itself,
85129
which must be _escaped_ by a preceding `U+005C` character (`\`).
@@ -120,6 +164,14 @@ following forms:
120164

121165
#### Raw string literals
122166

167+
> **<sup>Lexer</sup>**
168+
> RAW_STRING_LITERAL :
169+
> &nbsp;&nbsp; `r` RAW_STRING_CONTENT
170+
>
171+
> RAW_STRING_CONTENT :
172+
> &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`
173+
> &nbsp;&nbsp; | `#` RAW_STRING_CONTENT `#`
174+
123175
Raw string literals do not process any escapes. They start with the character
124176
`U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a
125177
`U+0022` (double-quote) character. The _raw string body_ can contain any sequence
@@ -149,6 +201,17 @@ r##"foo #"# bar"##; // foo #"# bar
149201

150202
#### Byte literals
151203

204+
> **<sup>Lexer</sup>**
205+
> BYTE_LITERAL :
206+
> &nbsp;&nbsp; `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'`
207+
>
208+
> ASCII_FOR_CHAR :
209+
> &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `/`, \\n, \\r or \\t
210+
>
211+
> BYTE_ESCAPE :
212+
> &nbsp;&nbsp; &nbsp;&nbsp; `\x` HEX_DIGIT HEX_DIGIT
213+
> &nbsp;&nbsp; | `\n` | `\r` | `\t` | `\\` | `\0`
214+
152215
A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F`
153216
range) or a single _escape_ preceded by the characters `U+0062` (`b`) and
154217
`U+0027` (single-quote), and followed by the character `U+0027`. If the character
@@ -158,6 +221,13 @@ _number literal_.
158221

159222
#### Byte string literals
160223

224+
> **<sup>Lexer</sup>**
225+
> BYTE_STRING_LITERAL :
226+
> &nbsp;&nbsp; `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )<sup>\*</sup> `"`
227+
>
228+
> ASCII_FOR_STRING :
229+
> &nbsp;&nbsp; _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `/` _and IsolatedCR_
230+
161231
A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_,
162232
preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and
163233
followed by the character `U+0022`. If the character `U+0022` is present within
@@ -183,6 +253,18 @@ following forms:
183253

184254
#### Raw byte string literals
185255

256+
> **<sup>Lexer</sup>**
257+
> RAW_BYTE_STRING_LITERAL :
258+
> &nbsp;&nbsp; `br` RAW_BYTE_STRING_CONTENT
259+
>
260+
> RAW_BYTE_STRING_CONTENT :
261+
> &nbsp;&nbsp; &nbsp;&nbsp; `"` ASCII<sup>* (non-greedy)</sup> `"`
262+
> &nbsp;&nbsp; | `#` RAW_STRING_CONTENT `#`
263+
>
264+
> ASCII :
265+
> &nbsp;&nbsp; _any ASCII (i.e. 0x00 to 0x7F)_
266+
267+
186268
Raw byte string literals do not process any escapes. They start with the
187269
character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more
188270
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The

0 commit comments

Comments
 (0)