Skip to content

Add line number support in the sabberworm CSS parser #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sidkshatriya
Copy link
Contributor

AMP HTML is a restricted subset of HTML (See https://www.ampproject.org/ for more details) and the AMP PHP Library provide validation and HTML to AMP HTML conversion utilities. We're using the masterminds/html5-php library to parse HTML5 and we're using sabberworm/php-css-parser to parse the CSS in the AMP PHP Library

After parsing the CSS parse we report any problems in the CSS (for e.g. !important is disallowed in the AMP dialect of HTML, we use the sabberworm parser to find all the uses of !important for instance)

Now the CSS supplied by the user could be hundreds / thousands of lines long. So its not useful to simply say that !important was used and is not allowed. We should be able to say which line the problem occurred.

This pull request adds support for line numbers. Here is how we're using this Lullabot/amp-library#80 (comment)
Notice the line numbers reported in the validation errors which coming from code submitted in this PR.

Would greatly appreciate if you had a look and merge it if you believe this feature would be useful. Hopefully the code changes have not missed anything out.

Thanks for this useful library BTW.

@sabberworm
Copy link
Collaborator

sabberworm commented Jun 29, 2016

Wow, thanks for your work. I will review soon…

  • Maybe you could add a unit test dealing specifically with line numbers and refactor the one you changed to ignore the line numbers so it doesn’t break when we change the source?

@@ -516,13 +519,18 @@ private function peek($iLength = 1, $iOffset = 0) {

private function consume($mValue = 1) {
if (is_string($mValue)) {
$noLines = substr_count($mValue, "\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call the variable $iLineCount for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sabberworm
Copy link
Collaborator

sabberworm commented Jun 29, 2016

  • Add the getLineNum method declaration to the Renderable interface. And name it either getLineNo or getLineNumber.
  • The CSSList class should also implement Renderable, I forgot to do that.

$this->iCurrentPosition += $this->strlen($mValue);
return $mValue;
} else {
$substring = $this->substr($this->iCurrentPosition, $mValue);
$noLines = substr_count($substring, "\n");
$this->iLineNum += $noLines;
if ($this->iCurrentPosition + $mValue > $this->iLength) {
throw new UnexpectedTokenException($mValue, $this->peek(5), 'count');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add line numbers to exceptions as well for easier debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -26,11 +24,19 @@ public function __construct($iLineNum = 0) {
/**
* @return int
*/
public function getLineNum()
public function getLineNo()
{
return $this->iLineNum;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the variable, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sabberworm
Copy link
Collaborator

sabberworm commented Jun 30, 2016

Thanks. I do have one minor nitpick: the Renderable interface should not declare the setter for the line number, only the getter. Because in order for something to be renderable, the line-number need not be state, it could also be computed upon calling.

@@ -6,4 +6,13 @@
* Thrown if the CSS parsers attempts to print something invalid
*/
class OutputException extends \Exception {
private $iLineNum;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a getter for the line number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sidkshatriya
Copy link
Contributor Author

Some notes:

  • The OutputException and UnexpectedTokenException exception classes now have line number information (see the throws) but I think there are 4 places where you're just throwing new \Exception -- it would logical to add line numbers there too -- but I'm not sure if should create another custom exception class or re-use the UnexpectedTokenException (please let me know what params one must use)
  • CSSList implements Renderable now. In addition Rule also implements Renderable now

Thanks. I do have one minor nitpick: the Renderable interface should not declare the setter for the line number, only the getter because in order for something to be renderable, the line-number need not be state, it could also be computed upon calling.

Its a good idea to have setLineNo generally, mainly for testing purposes where you want to reset the stored line number to 0. What is your alternate suggestion?

@@ -0,0 +1,28 @@
@charset "utf-8"; /* line 3 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should say 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


body {
background: #FFFFFF url("http://somesite.com/images/someimage.gif") repeat top center; /* line 25 */
color: rgb(233, 100, 450);
Copy link
Collaborator

@sabberworm sabberworm Jun 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you maybe split up the color components into separate lines and check if they are identified correctly:

color: rgb( /* line 26 */
       233, /* line 27 */
       100, /* line 28 */
       450 /* line 29 */
);

Also: What was I thinking writing 450, a value over 255, here…

@sabberworm
Copy link
Collaborator

@sidkshatriya With any luck we can drop PHP 5.3 somewhere in the next year…

class SourceException extends \Exception {
private $iLineNo;
public function __construct($sMessage, $iLineNo = 0) {
$this->$iLineNo = $iLineNo;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the $.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sabberworm Why should we remove the $ -- sorry I didn't understand...

Copy link
Collaborator

@sabberworm sabberworm Jun 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s just a typo. The line reads $this->$iLineNo = $iLineNo; but should be $this->iLineNo = $iLineNo;.

If $iLineNo is 1, the former sets $this->1 to 1, the latter sets $this->iLineNo to 1.

The newly-added test testUnexpectedTokenExceptionLineNo should have caught this but didn’t because you didn’t use Sabberworm\CSS\Parsing\UnexpectedTokenException in the file that contains the test class so the catch was catching the wrong exception type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sabberworm This was a fantastic catch! Sorry I didn't notice the extraneous $

@sabberworm
Copy link
Collaborator

sabberworm commented Jun 30, 2016

  • Add some tests for exceptions, e.g.:
/**
* @expectedException Sabberworm\CSS\Parsing\UnexpectedTokenException
*/
public function testUnexpectedTokenExceptionLineNo() {
    $oParser = new Parser("\ntest: 1;", Settings::create()->beStrict());
    try {
        $oParser->parse();
    } catch(UnexpectedTokenException $e) {
        $this->assertSame($e->getLineNo(), 2);
        throw $e;
    }
}

Ideally there would be a test for every place in the parser where we throw an exception.

@sidkshatriya
Copy link
Contributor Author

sidkshatriya commented Jun 30, 2016

@sabberworm Hopefully things are now looking good. Let me know if you have any other suggestions (thanks for your really responsive help on this!).

BTW My contributions on this pull request are on behalf of Lullabot

I'll pick up with this tomorrow (or later tonight if I get the chance)

try {
$oParser->parse();
} catch (UnexpectedTokenException $e) {
$this->assertSame($e->getLineNo(), 2);
Copy link
Collaborator

@sabberworm sabberworm Jun 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch the arguments: expected should come first, actual later. Sorry, my bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sabberworm done.

@sabberworm sabberworm merged commit bfca845 into MyIntervals:master Jun 30, 2016
@sidkshatriya
Copy link
Contributor Author

@sabberworm Ah you merged! Great! Thanks so much!

@sabberworm
Copy link
Collaborator

On 30.06.2016, at 16:41, Sidharth Kshatriya [email protected] wrote:

@sabberworm Ah you merged! Great! Thanks so much!

Thank you for your great work! Now I only need someone like you to implement selector and media query parsing…

How do you prefer to be listed under contributors, as individual or as Lullabot?

@sidkshatriya
Copy link
Contributor Author

as Lullabot

re: selector and media query parsing -- I could look at it in my spare time. If you make a ticket chalking out the details on whats incomplete / what needs to be done etc. it would be awesome. What happens with media queries right now? Does the parser choke or simply not get the finer structure?

@sabberworm
Copy link
Collaborator

sabberworm commented Jun 30, 2016

The selector parsing is really rudimentary, it just searches for the next { and splits by comma. So it has trouble parsing a selector like a[href*=","] .complicated\{selector.

Media query parsing is even worse. It was only designed to handle simple queries like @media print {} without expressions and chokes on more complex stuff. See #75 and its duplicates (#85, #104) for examples.

@sabberworm
Copy link
Collaborator

Fixing media queries should be a lot simpler than fixing selectors because there are fewer cases to consider and since the current selector parsing works well enough for real-world, non-contrived examples, it’s also the bigger pain point.

@sidkshatriya
Copy link
Contributor Author

@sabberworm
Is your parser hand built or have you consulted something like https://drafts.csswg.org/css-syntax-3/ for your tokenizing/parsing algorithm?

I don't really need media query parsing or selector parsing for the AMP library at the moment (may need it in future) -- but I'm concerned if the parser will abort and issue an exception. Its OK for instance to not do an accurate job of parsing for things like selectors/media queries but aborting the whole parse is not so good... What happens in such cases?

@sidkshatriya
Copy link
Contributor Author

@sabberworm sorry for persisting but I just want to be aware of any parse related aborts due to syntax the parser does not understand. Like I mentioned, it's ok if you don't parse the complex selectors into its fine components or even get media query parsing correct. I just want the parser to skip over those things (or even do a bad job with them). What is problematic is the whole parse getting aborted due to unrecognised (but valid) css syntax.

@sabberworm
Copy link
Collaborator

sabberworm commented Jul 1, 2016

Some well-defined exception states are recoverable. But because the parser does not know how media queries are supposed to be structured, it can’t recover and throws a SourceException. But I will try to implement correct media query parsing soon since it’s long been one of the major shortcomings of this library.

@sidkshatriya
Copy link
Contributor Author

@sabberworm Thanks for responding and eagerly look forward to some basic media query parsing.

Also to my previous question: is your parsing algorithm hand built or you have consulted something like https://drafts.csswg.org/css-syntax-3/ ? IIRC that there may be general strategies in that link where you can prevent yourself from aborting a parse when you encounter syntax that you don't know about. Basically you should simply skip over to the next at-rule or declaration block that you do understand.

@sabberworm
Copy link
Collaborator

sabberworm commented Jul 1, 2016

In the beginning the goal of the parser had always been to parse only valid CSS. Handling invalid CSS and being fault-tolerant was a feature added later on and does not exactly follow the way the specification demands of browser vendors.

However, for valid CSS (for everything the parser supports, which, currently, does not include media queries) the goal was always to be compliant with the specs (only the parsing; the resulting data structure does not adhere to CSSOM).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants