-
Notifications
You must be signed in to change notification settings - Fork 144
Add line number support in the sabberworm CSS parser #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add line number support in the sabberworm CSS parser #105
Conversation
Wow, thanks for your work. I will review soon…
|
@@ -516,13 +519,18 @@ private function peek($iLength = 1, $iOffset = 0) { | |||
|
|||
private function consume($mValue = 1) { | |||
if (is_string($mValue)) { | |||
$noLines = substr_count($mValue, "\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call the variable $iLineCount
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
$this->iCurrentPosition += $this->strlen($mValue); | ||
return $mValue; | ||
} else { | ||
$substring = $this->substr($this->iCurrentPosition, $mValue); | ||
$noLines = substr_count($substring, "\n"); | ||
$this->iLineNum += $noLines; | ||
if ($this->iCurrentPosition + $mValue > $this->iLength) { | ||
throw new UnexpectedTokenException($mValue, $this->peek(5), 'count'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add line numbers to exceptions as well for easier debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -26,11 +24,19 @@ public function __construct($iLineNum = 0) { | |||
/** | |||
* @return int | |||
*/ | |||
public function getLineNum() | |||
public function getLineNo() | |||
{ | |||
return $this->iLineNum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename the variable, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks. I do have one minor nitpick: the |
@@ -6,4 +6,13 @@ | |||
* Thrown if the CSS parsers attempts to print something invalid | |||
*/ | |||
class OutputException extends \Exception { | |||
private $iLineNum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide a getter for the line number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Some notes:
Its a good idea to have setLineNo generally, mainly for testing purposes where you want to reset the stored line number to 0. What is your alternate suggestion? |
@@ -0,0 +1,28 @@ | |||
@charset "utf-8"; /* line 3 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment should say 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
|
||
body { | ||
background: #FFFFFF url("http://somesite.com/images/someimage.gif") repeat top center; /* line 25 */ | ||
color: rgb(233, 100, 450); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you maybe split up the color components into separate lines and check if they are identified correctly:
color: rgb( /* line 26 */
233, /* line 27 */
100, /* line 28 */
450 /* line 29 */
);
Also: What was I thinking writing 450, a value over 255, here…
@sidkshatriya With any luck we can drop PHP 5.3 somewhere in the next year… |
class SourceException extends \Exception { | ||
private $iLineNo; | ||
public function __construct($sMessage, $iLineNo = 0) { | ||
$this->$iLineNo = $iLineNo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the $
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sabberworm Why should we remove the $
-- sorry I didn't understand...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s just a typo. The line reads $this->$iLineNo = $iLineNo;
but should be $this->iLineNo = $iLineNo;
.
If $iLineNo is 1
, the former sets $this->1
to 1
, the latter sets $this->iLineNo
to 1
.
The newly-added test testUnexpectedTokenExceptionLineNo
should have caught this but didn’t because you didn’t use Sabberworm\CSS\Parsing\UnexpectedTokenException
in the file that contains the test class so the catch
was catching the wrong exception type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sabberworm This was a fantastic catch! Sorry I didn't notice the extraneous $
/**
* @expectedException Sabberworm\CSS\Parsing\UnexpectedTokenException
*/
public function testUnexpectedTokenExceptionLineNo() {
$oParser = new Parser("\ntest: 1;", Settings::create()->beStrict());
try {
$oParser->parse();
} catch(UnexpectedTokenException $e) {
$this->assertSame($e->getLineNo(), 2);
throw $e;
}
} Ideally there would be a test for every place in the parser where we throw an exception. |
@sabberworm Hopefully things are now looking good. Let me know if you have any other suggestions (thanks for your really responsive help on this!). BTW My contributions on this pull request are on behalf of Lullabot I'll pick up with this tomorrow (or later tonight if I get the chance) |
try { | ||
$oParser->parse(); | ||
} catch (UnexpectedTokenException $e) { | ||
$this->assertSame($e->getLineNo(), 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch the arguments: expected should come first, actual later. Sorry, my bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sabberworm done.
@sabberworm Ah you merged! Great! Thanks so much! |
Thank you for your great work! Now I only need someone like you to implement selector and media query parsing… How do you prefer to be listed under contributors, as individual or as Lullabot? |
as Lullabot re: selector and media query parsing -- I could look at it in my spare time. If you make a ticket chalking out the details on whats incomplete / what needs to be done etc. it would be awesome. What happens with media queries right now? Does the parser choke or simply not get the finer structure? |
The selector parsing is really rudimentary, it just searches for the next Media query parsing is even worse. It was only designed to handle simple queries like |
Fixing media queries should be a lot simpler than fixing selectors because there are fewer cases to consider and since the current selector parsing works well enough for real-world, non-contrived examples, it’s also the bigger pain point. |
@sabberworm I don't really need media query parsing or selector parsing for the AMP library at the moment (may need it in future) -- but I'm concerned if the parser will abort and issue an exception. Its OK for instance to not do an accurate job of parsing for things like selectors/media queries but aborting the whole parse is not so good... What happens in such cases? |
@sabberworm sorry for persisting but I just want to be aware of any parse related aborts due to syntax the parser does not understand. Like I mentioned, it's ok if you don't parse the complex selectors into its fine components or even get media query parsing correct. I just want the parser to skip over those things (or even do a bad job with them). What is problematic is the whole parse getting aborted due to unrecognised (but valid) css syntax. |
Some well-defined exception states are recoverable. But because the parser does not know how media queries are supposed to be structured, it can’t recover and throws a |
@sabberworm Thanks for responding and eagerly look forward to some basic media query parsing. Also to my previous question: is your parsing algorithm hand built or you have consulted something like https://drafts.csswg.org/css-syntax-3/ ? IIRC that there may be general strategies in that link where you can prevent yourself from aborting a parse when you encounter syntax that you don't know about. Basically you should simply skip over to the next at-rule or declaration block that you do understand. |
In the beginning the goal of the parser had always been to parse only valid CSS. Handling invalid CSS and being fault-tolerant was a feature added later on and does not exactly follow the way the specification demands of browser vendors. However, for valid CSS (for everything the parser supports, which, currently, does not include media queries) the goal was always to be compliant with the specs (only the parsing; the resulting data structure does not adhere to CSSOM). |
AMP HTML is a restricted subset of HTML (See https://www.ampproject.org/ for more details) and the AMP PHP Library provide validation and HTML to AMP HTML conversion utilities. We're using the
masterminds/html5-php
library to parse HTML5 and we're usingsabberworm/php-css-parser
to parse the CSS in the AMP PHP LibraryAfter parsing the CSS parse we report any problems in the CSS (for e.g.
!important
is disallowed in the AMP dialect of HTML, we use thesabberworm
parser to find all the uses of!important
for instance)Now the CSS supplied by the user could be hundreds / thousands of lines long. So its not useful to simply say that
!important
was used and is not allowed. We should be able to say which line the problem occurred.This pull request adds support for line numbers. Here is how we're using this Lullabot/amp-library#80 (comment)
Notice the line numbers reported in the validation errors which coming from code submitted in this PR.
Would greatly appreciate if you had a look and merge it if you believe this feature would be useful. Hopefully the code changes have not missed anything out.
Thanks for this useful library BTW.