Skip to content

[WIP] Add an :ets table to cache the list of files that can be formatted #394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

benwilson512
Copy link

@benwilson512 benwilson512 commented Oct 28, 2020

Cached Input List

This PR improves FormatOnSave time by several orders of magnitude, at least when tested on my PC.

Before: 900ms
After: ~2-3ms

An O(n), file system bottlenecked operation is replaced by an O(1) in memory operation.

Related

Fixes #385.

Problem

As noted in #385, #381 and reflected in my experience with large projects at CargoSense, FormatOnSave could cause significant to massive delays when trying to save files. After some hunting, the vast majority of the time is spent here:

https://github.com/elixir-lsp/elixir-ls/blob/master/apps/language_server/lib/language_server/providers/formatting.ex#L50

The job of this function is to figure out if a specific file that is going to be saved should be formatted. The key factor here is the :inputs criterion defined in .formatter.exs which takes the shape of a list of path globs eg:

inputs: [
    "*.exs",
    "config/**/*.exs",
    "apps/*/{config,lib,test}/**/*.{ex,exs}",
    "apps/*/mix.exs"
  ]

The current approach simply calls Path.wildcard on each entry, expanding it into all files based on that glob. Then, the candidate path is tested for membership in that list.

Proposed Approach

This PR does NOT change the overall game plan. There is still a list of files which is produced by calling Path.wildcard on the input list, and a candidate path is still tested for membership. However, the main insight here is that this list very rarely changes. This PR populates that list when the LanguageServer boots and caches it in an :ets table set. Then, testing membership becomes an extremely fast process, just an :ets.lookup call. These :ets calls are serialized through the GenServer to avoid various race conditions, but they are such a small percentage of the overall format time that I'm not worried about it being a bottleneck.

TODO:

  • Invalidate the cache when .formatter.exs changes.
  • Block formatting until the cache has been initialized so efforts to format don't crash.
  • Block formatting while the cache is being rebuilt.
  • Replace IO.puts with whatever the proper logging thing is.
  • Handle deletion / creation of formattable files
  • Handle deletion / creation of .formatter.exs
  • Fix tests.
  • Fix "can't find dep" issues.
  • Validate that cding into an umbrella sub-dir works.

@benwilson512
Copy link
Author

Any ideas about the test failures? They're getting "Process not found" errors, but the process is in the app supervision tree. Is there some spot in the test setup I should be adding it to?

@lukaszsamson
Copy link
Collaborator

Nice @benwilson512 . I've got some points

  1. How about caching formatter opts along with formattable files?
    I guess it will also need another TODOs:
  2. Invalidate cache when .formatter.exs files are created/deleted
  3. Invalidate cache when formattable files are created/deleted/renamed

@benwilson512
Copy link
Author

benwilson512 commented Oct 28, 2020

How about caching formatter opts along with formattable files?

Already done!

As for the others, great point! Need to find the right hooks in the LanguageServer to identify these cases.

Copy link
Author

@benwilson512 benwilson512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukaszsamson OK I believe those scenarios are all handled now.

@@ -320,6 +322,14 @@ defmodule ElixirLS.LanguageServer.Server do
|> Enum.uniq()
|> WorkspaceSymbols.notify_uris_modified()

formatter_changed? = Enum.any?(changes, fn %{"uri" => uri} ->
Path.basename(uri) == ".formatter.exs"
Copy link
Author

@benwilson512 benwilson512 Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more complicated than this? I realize this could in theory lead to some false positives but that isn't really a big issue IMHO, I don't see many people regularly creating / editing extra .formatter.exs files within their project. Are there files beyond this file that can be allowed to affect formatting?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about .formatter.exs in subdirectories? AFAIK it is supported in subapps in umbrella and there's an unfinished PR that tries to make it work in more cases #362
Though editing .formatter.exs is likely rare I can see that happen more often when switching between branches

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukaszsamson that will work just fine here. What this code is doing is saying "If any files are changed anywhere in the open project named ".formatter.exs" then rebuild the cache.

I do suppose the following would be an issue: If you cd into an elixir project's lib folder and code . there, it won't be watching the .formatter.exs in the upper directory. Let me see if that is honored today though, if not then this at least isn't a regression.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I didn't notice ``Path.basename` call.

Regarding the cd to lib - we don't support non mix projects.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd into lib might be unsupported, but what about cd into apps/myapp within an umbrella? A single app within an umbrella is still considered a mix project afaik.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, that definitely simplifies things. There are still a few bugs I'm working out, hope to have this in a ready state by early next week.

@axelson
Copy link
Member

axelson commented Nov 16, 2020

Regarding the Fix "can't find dep" issues TODO item I have some local work that I believe can alleviate that pain point (it works by copying over some logic from Mix.Tasks.Formatter.formatter_opts_for_file into ElixirLS, but I hope in the future that could be upstreamed to Elixir). So maybe that should be removed as a goal for this PR and it can be implemented separately, that might be a good idea even if we don't end up going with my approach, to keep the size and feasibility of this PR down.

@benwilson512
Copy link
Author

@axelson I mean the issue is that at the moment in this PR dep importing is entirely broken so we really can't merge it in. Hopefully I'll get a chance to fix that in this branch soon.

@axelson
Copy link
Member

axelson commented Nov 16, 2020

Ah, I see. In that case it is important to handle within this PR 👍

@axelson
Copy link
Member

axelson commented Dec 17, 2020

@benwilson512 ping :)

Have you had a chance to look at this recently? It would be great to be get it in.

@benwilson512
Copy link
Author

Hey folks! Life at house Wilson has gotten a bit exciting with the introduction of my first child a week ago! Although sleep is a bit hard to find I have taken an extended period off of work and hope to have time to mess with this over the holidays.

@axelson
Copy link
Member

axelson commented Dec 22, 2020

Congratulations on your first child! I hope you can find some time to enjoy the holidays between everything :)

@@ -208,6 +208,7 @@ defmodule ElixirLS.LanguageServer.Server do

if reason == :normal do
WorkspaceSymbols.notify_build_complete()
Formatting.build_cache(state.project_dir)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so this solves the issue with formatting not being able to pull in deps initially. By re-building the the cache after every build we trivially catch updates to deps or other changes that effect formatting.

@benwilson512
Copy link
Author

I can confirm that this isn't really working at all in umbrella apps, so working on that.

@billylanchantin
Copy link
Contributor

@benwilson512 Some problems with the tests may have solved by #488. Try updating to see?

@dsnipe
Copy link

dsnipe commented Apr 12, 2021

@benwilson512 You made a great addition to ElixirLS and I hope you're able to finish it. But if you don't have time, is it fine if I pick it up?
I really want to have a fix of the formatting bug in umbrella applications (which is part of this PR). It will make my (and many others) life much easier :)

@benwilson512
Copy link
Author

Hey @dsnipe feel free to take a swing at it. What's the best way to do that? Would it work to make a PR into this branch? I'd definitely be responsive to that PR and its review, I just didn't have time to chase down the test failures or handle some of the various edge cases like "what if this vscode elixir instance has been opened not in a mix project?".

@dsnipe
Copy link

dsnipe commented Apr 12, 2021

@benwilson512 Great, then, I'll try to allocate some time and see what I can do.

@wingyplus
Copy link
Contributor

Hi, all. I've start resolving test error step by step. I start open PR to resolve the compiling error here benwilson512#1 (it's PR against this PR).

@martasd
Copy link

martasd commented Aug 11, 2021

Hi @benwilson512. What is the status on this one?

@benwilson512
Copy link
Author

I wasn't able to get tests to pass for umbrellas, and haven't had time to followup. If someone wants to take over from here that'd be very appreciated.

@wingyplus
Copy link
Contributor

wingyplus commented Aug 11, 2021

I has been sending a pr to start tackle the test suite here benwilson512#1. I’m not sure someone take a look?

btw, it seem pr got merge conflict. I think we resolve it first before do anything.

@wingyplus
Copy link
Contributor

I’m would like to take over it if you’re ok. :)

@benwilson512
Copy link
Author

Sure, your best bet is to simply fork my branch into your own repository and then open an independent PR. When you've done that I'll close this one.

@lukaszsamson
Copy link
Collaborator

@benwilson512 @wingyplus is it still relevant? How's the speed after recent path glob changes?

@tomjakubowski
Copy link

tomjakubowski commented Sep 22, 2022

The job of this function is to figure out if a specific file that is going to be saved should be formatted. The key factor here is the :inputs criterion defined in .formatter.exs which takes the shape of a list of path globs eg:

inputs: [
   "*.exs",
   "config/**/*.exs",
   "apps/*/{config,lib,test}/**/*.{ex,exs}",
   "apps/*/mix.exs"
 ]

The current approach simply calls Path.wildcard on each entry, expanding it into all files based on that glob. Then, the candidate path is tested for membership in that list.

Forgive my drive-by comment here: couldn't it work just as well to check the candidate path against the set of globs without expanding them? If so then there would be no need to cache the list of files at all.

@benwilson512
Copy link
Author

couldn't it work just as well to check the candidate path against the set of globs without expanding them?

There is no function in the Elixir standard library to do this that I know of. If there's a well defined spec or library to compare a candidate path against a glob then we could use that.

I think realistically there's no chance at this point that I'm going to finish this, so I'm gonna close the PR.

@lukaszsamson
Copy link
Collaborator

couldn't it work just as well to check the candidate path against the set of globs without expanding them?

There is no function in the Elixir standard library to do this that I know of. If there's a well defined spec or library to compare a candidate path against a glob then we could use that.

A basic glob match function should be fairly easy to write. At first look glob seems like a simplified regex. The complex part is dealing correctly with cwd, symlinks, special paths, case insensitive filesystems and other path oddities

@benwilson512
Copy link
Author

The complex part is dealing correctly with cwd, symlinks, special paths, case insensitive filesystems and other path oddities

Indeed.

@lukaszsamson
Copy link
Collaborator

Anyway, I forgot that we are already using a simple function like that provided by PathGlob package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eliixir LS slow since upgrading to elixir 1.11
9 participants