-
Notifications
You must be signed in to change notification settings - Fork 13.5k
adding proof of context-sensitivy of raw string literals #16001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Awesome, one more item from my todo list gone! I spent a lot of time trying to use the pumping lemma and came to the same conclusion. |
Looks good to me, could someone with better chops than I look at this? @pnkfelix @nikomatsakis @zwarich ? |
@@ -1,7 +1,8 @@ | |||
Rust's lexical grammar is not context-free. Raw string literals are the source | |||
of the problem. Informally, a raw string literal is an `r`, followed by `N` | |||
hashes (where N can be zero), a quote, any characters, then a quote followed | |||
by `N` hashes. This grammar describes this as best possible: | |||
by `N` hashes. Critically, `N` consecutive hashes *cannot* occur between the quotes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this line mean? It sounds to me like it's saying you can't say r#"#"#
, but I just tested and that parses just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think this text contains a misunderstanding of the end-delimiter of a raw-string.
As @kballard said, r#"#"#
is fine.
I imagine the actual problematic string literal will be something like r####" "### "####
, where there is a leading "
before the series of repeated octothorpes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I reviewed the reasoning, and I think it all still holds together if you just change the definition of s
in the manner I suggested above, namely:
`s = r#^{p+2}""#^{p+1}"#^{p+2}`
e.g. for `p = 2`: `s = r####""###"####`
Then that will actually produce an invalid string literal after pumping.
Update: Well, maybe that's a little too optimistic on my part. The problem is that the input may decompose into some choice for v
and x
where v
or x
includes the "
that occurs within the string literal. I am still working through the case analysis here. There may be a different choice for s
that is easier to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, this problem holds for my original proof. Given r####"###"#### with the given marking, one can take v= "#
x=epsilon, z=epsilon, and the string becomes r####"#"###"#### which is valid even in my incorrect scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticing this flaw, it strikes me that Ogden's Lemma isn't powerful enough either. The high level of non-structure allowed in between the quotes makes this tricky.
Regexes to the rescue. New proof has landed. |
Shout-outs to http://cs.stackexchange.com/a/437/14863 for inspiration, and also letting me avoid even trying to understand Parikh's theorem. |
|
||
Neither `v` nor `x` can contain a `"` or `r`, as the number of these characters | ||
in any string in `R'` is fixed. So `v` and `x` contain only hashes. | ||
Consequently, Of the three sequences of hashes, `v` and `x` combined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lowercase "Of" to "of" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than happy to have that be the biggest complaint you have by the time you've reached that part of the proof :)
Fixed.
@pnkfelix all squashed and accounted for |
Stumbled across this and thought it would be cool to prove. I've never used Ogden's Lemma before, but I'm pretty sure I used it right. The pumping lemma definitely doesn't seem sufficient for the job. In particular, when using the pumping lemma, you can always just pump one of the quotes, and it's fine. Ogden's Lemma lets you effectively force the pumper to use certain characters in the string. @cmr
Stumbled across this and thought it would be cool to prove. I've never used Ogden's Lemma before, but I'm pretty sure I used it right. The pumping lemma definitely doesn't seem sufficient for the job. In particular, when using the pumping lemma, you can always just pump one of the quotes, and it's fine. Ogden's Lemma lets you effectively force the pumper to use certain characters in the string.
@cmr