Skip to content

Improve the performance of the Kore parser #2296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 9, 2020
Merged

Improve the performance of the Kore parser #2296

merged 8 commits into from
Dec 9, 2020

Conversation

MirceaS
Copy link
Contributor

@MirceaS MirceaS commented Dec 2, 2020

Fixes #2189


Review checklist

The author performs the actions on the checklist. The reviewer evaluates the work and checks the boxes as they are completed.

  • Summary. Write a summary of the changes. Explain what you did to fix the issue, and why you did it. Present the changes in a logical order. Instead of writing a summary in the pull request, you may push a clean Git history.
  • Documentation. Write documentation for new functions. Update documentation for functions that changed, or complete documentation where it is missing.
  • Tests. Write unit tests for every change. Write the unit tests that were missing before the changes. Include any examples from the reported issue as integration tests.
  • Clean up. The changes are already clean. Clean up anything near the changes that you noticed while working. This does not mean only spatially near the changes, but logically near: any code that interacts with the changes!

@MirceaS MirceaS changed the title inlined space and stringParserToIdParser Lexer functions Improve parser efficiency Dec 2, 2020
@ttuegel ttuegel changed the title Improve parser efficiency Inline some low-level parsers Dec 2, 2020
Using Text instead of String reduces peak memory use by 43% and total allocation
by 24%. Never use String.
Comment on lines +520 to +522
Text.readFile fileName
& liftIO
& clockSomethingIO "Reading the input file"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttuegel Why are these the other way round now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it from ($) to (&) so that the action (Text.readFile) would appear before the wrappers.

@ttuegel ttuegel changed the title Inline some low-level parsers Improve the performance of the Kore parser Dec 8, 2020
@ttuegel
Copy link
Contributor

ttuegel commented Dec 9, 2020

I made some changes to your parser performance pull request which reduce the memory use significantly. (For some reason we were still using String. Never use String.) I would like you to try a few more things:

  1. Avoid using Text.cons in the parser. Use lookAhead or match instead. Text.cons causes copying, but the tokens we are parsing are already in contiguous blocks of memory.
  2. Refactor parseAnyId so that it only calls parseIntoId once. Right now, each branch calls parseIntoId separately. That incurs up to three calls to getSourcePos per token parsed.
  3. Investigate the performance of validation.

Copy link
Contributor

@ttuegel ttuegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the redundant look-ahead, and then we can merge this.

Comment on lines +520 to +522
Text.readFile fileName
& liftIO
& clockSomethingIO "Reading the input file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it from ($) to (&) so that the action (Text.readFile) would appear before the wrappers.

Comment on lines 228 to 230
(genericId, _) <- Parser.match
$ (Parser.satisfy isFirstChar <?> "first identifier character")
>> Parser.takeWhileP (Just "identifier character") isBodyChar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is easier to read:

Suggested change
(genericId, _) <- Parser.match
$ (Parser.satisfy isFirstChar <?> "first identifier character")
>> Parser.takeWhileP (Just "identifier character") isBodyChar
(genericId, _) <- Parser.match $ do
_ <- Parser.satisfy isFirstChar <?> "first identifier character"
_ <- Parser.takeWhileP (Just "identifier character") isBodyChar
pure ()

then do
skipChar '\\'
(c :) <$> parseIdRaw KeywordsPermitted
then fst <$> Parser.match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using match here makes the look-ahead (above) redundant.

@ttuegel ttuegel self-requested a review December 9, 2020 15:10
@ttuegel ttuegel merged commit 08b378f into master Dec 9, 2020
@ttuegel ttuegel deleted the 2189 branch December 9, 2020 17:33
ttuegel added a commit that referenced this pull request Dec 10, 2020
* Inline some primitive parsers

* Kore.Parser.Lexer: Remove unused primitive parsers

* Run Parser over Text, not String

Using Text instead of String reduces peak memory use by 43% and total allocation
by 24%. Never use String.

Co-authored-by: Thomas Tuegel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the performance of the Kore parser
3 participants