Skip to content

Fix semantic functionality not working for macro expansion reference documents due to URL encoding #1017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

lokesh-tr
Copy link
Contributor

@lokesh-tr lokesh-tr commented Aug 19, 2024

This PR fixes semantic functionality not working in macro expansion reference documents by carefully dealing with URIs received from / sent to sourcekit-lsp

Please have a look at: swiftlang/sourcekit-lsp#1634


Expansion of Swift Macros in Visual Studio Code - Google Summer Of Code 2024
@lokesh-tr @ahoppen @adam-fowler

@lokesh-tr
Copy link
Contributor Author

@adam-fowler I just sent you a detailed slack message regarding the issue which I am facing and I would like to have your help.

@lokesh-tr lokesh-tr marked this pull request as draft August 19, 2024 06:44
@lokesh-tr lokesh-tr changed the title Fix semantic functionality not working for nested macro expansions Fix semantic functionality not working for macro expansion reference documents due to URL encoding Aug 19, 2024
@lokesh-tr lokesh-tr force-pushed the fix-semantic-functionality branch 2 times, most recently from 970b215 to 999be97 Compare August 19, 2024 14:42
@lokesh-tr lokesh-tr force-pushed the fix-semantic-functionality branch from 999be97 to 74a6923 Compare August 19, 2024 14:45
@lokesh-tr lokesh-tr marked this pull request as ready for review August 19, 2024 14:46
@lokesh-tr
Copy link
Contributor Author

lokesh-tr commented Aug 19, 2024

@adam-fowler I had the logs of sourcekit-lsp test cases side by side and encoded the URI as required by sourcekit-lsp. Please let me know if I had missed something.

Everything works now :)

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this seems quite the VS Code bug if it adds more URL encoding than originally necessary.

As context for anyone else reading this, when expanding a macro, SourceKit-LSP will send a windows/peekDocument request to VS Code with the following URL

sourcekit-lsp://swift-macro-expansion/L5C5-L5C5.swift?fromLine=2&fromColumn=2&toLine=2&toColumn=2&bufferName=@__swiftmacro_17swift_macroClient5PointV1x17DictionaryStoragefMr_.swift&parent=file:///Users/lokesh/MyDesktop/MacroExperiments/mymacro2/swift-macro/Sources/swift-macroClient/main.swift

Then VS Code is expected to send the same URL to a window/getReferenceDocument request to get the contents of the macro expansion. But VS Code a level of URL encoding and requests the document for

sourcekit-lsp://swift-macro-expansion/L5C5-L5C5.swift?fromLine%3D2%26fromColumn%3D2%26toLine%3D2%26toColumn%3D2%26bufferName%3D%40__swiftmacro_17swift_macroClient5PointV1x17DictionaryStoragefMr_.swift%26parent%3Dfile%3A%2F%2F%2FUsers%2Flokesh%2FMyDesktop%2FMacroExperiments%2Fmymacro2%2Fswift-macro%2FSources%2Fswift-macroClient%2Fmain.swift

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very odd to me that VS Code’s URI type is not able to round-trip URIs that have both a = and a percent-encoded = sign (%3D) in the query. In my opinion such round-tripping should be possible. In my opinion, interchanging = for %3D is a semantics change because https://www.rfc-editor.org/rfc/rfc3986#section-2.1 says that

If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent.
Which seems to suggest that URIs are not equivalent if a character is percent-escaped. But the RFC isn’t explicit about it, so I suppose that VS Code’s behavior can be considered correct.

This manifests as follows, where = and %3D are indistinguishable after running it through vscode.Uri.

vscode.Uri.parse("x://a?b=xxxx%3Dyyyy").toString() -> 'x://a?b%3Dxxxx%3Dyyyy'
vscode.Uri.parse("x://a?b=xxxx%3Dyyyy").toString(/*skipEncoding=*/true) -> 'x://a?b=xxxx=yyyy'

Lacking the ability to make vscode.Uri round-trip these URIs, I would prefer to implement this in SourceKit-LSP because VS Code should not need to understand the structure of sourcekit-lsp URIs. If VS Code interprets the URI, it hinders us from changing the URL structure in the future without a corresponding VS Code update – at which point we’d need to keep the old implementation around as well, complicating the code.

I implemented a fix in SourceKit-LSP here: swiftlang/sourcekit-lsp#1636. I wasn’t able to test it with integration into VS Code yet but I hope that it works. I did test that we were able to deconstruct the URI correctly again after running it through vscode.Uri.parse(input).toString().

@lokesh-tr
Copy link
Contributor Author

@ahoppen Thanks for adding some context and implementing a fix in sourcekit-lsp. I will have to test it out. If your PR solves the issue by itself, then I will close this PR.

@lokesh-tr
Copy link
Contributor Author

lokesh-tr commented Aug 20, 2024

Closing this since Alex made the following fix in sourcekit-lsp: swiftlang/sourcekit-lsp#1636
We don't need to do anything in vscode-swift for this issue.

@lokesh-tr lokesh-tr closed this Aug 20, 2024
@ahoppen
Copy link
Member

ahoppen commented Aug 20, 2024

Reading some more specs, I am now convinced that this is a bug in VS Code’s Uri type. In RFC-3986, the query format is left open for applications to define and only hits that a percent-encoded character and its original character are not equivalent. Looking at HTML (which for all practical purposes is the primary user of URL query strings), things are a lot clearer: The algorithm to decode URL encoded form data clearly specifies that key-value pairs should be split on the & character and percent-decoding should only be applied to the key and the value itself, not the query itself. So, I’m pretty certain that we shouldn’t percent decode the entire query part of the URI here: https://github.com/microsoft/vscode/blame/a9daf53b72917a38a1ad25703b804e0b015bdcbc/src/vs/base/common/uri.ts#L276. Quite possibly, we also shouldn’t percent decode the fragment, I haven't checked the specs here.

I would propose that we go with swiftlang/sourcekit-lsp#1636 for now but I would like to explore defining a custom URI converter as done in this PR, that uses an implementation like the following to decode the URI but not percent-decode the query part.

Custom URI Decoding
// Code mostly copied from https://github.com/microsoft/vscode/blame/a9daf53b72917a38a1ad25703b804e0b015bdcbc/src/vs/base/common/uri.ts#L267-L280
const _regexp = /^(([^:/?#]+?):)?(\/\/([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/;

const _rEncodedAsHex = /(%[0-9A-Za-z][0-9A-Za-z])+/g;

function decodeURIComponentGraceful(str: string): string {
	try {
		return decodeURIComponent(str);
	} catch {
		if (str.length > 3) {
			return str.substr(0, 3) + decodeURIComponentGraceful(str.substr(3));
		} else {
			return str;
		}
	}
}

function percentDecode(str: string): string {
	if (!str.match(_rEncodedAsHex)) {
		return str;
	}
	return str.replace(_rEncodedAsHex, (match) => decodeURIComponentGraceful(match));
}

export function parseURI(value: string): vscode.Uri {
    const match = _regexp.exec(value);
    if (!match) {
        return vscode.Uri.parse(value);
    }
    return vscode.Uri.from(
        {
        scheme: match[2] || '',
        authority: percentDecode(match[4] || ''),
        path: percentDecode(match[5] || ''),
        query: match[7] || '',
        fragment: percentDecode(match[9] || ''),
        }
    );
}

And then use toString(/*strictEncoding=*/false) to encode the URI. We should be able to use this URI encoder for all URIs, not just the sourcekit-lsp scheme or just the parent part of the URI, so VS Code wouldn’t have any knowledge about the URI structure.

@adam-fowler
Copy link
Contributor

We should fix this in VSCode not SourceKit-LSP so we should re-open this PR.

@adam-fowler
Copy link
Contributor

adam-fowler commented Aug 20, 2024

VSCode.Uri is not fit for purpose really. It doesn't seem to specify whether internally query parameters are percent encoded or not. I think the best route is to assume that the query parameters are already encoded correctly (as you are receiving them from SKLSP), and then only encode the path and authority using encodeURI. So

uriString = scheme + "//" + encodeURI (authority + path) + "?" + query + "#" + fragment

EDIT: Actually we can't do that. The Uri.Parse removes all percent encoding.
OK the building of the query will be a little more complex. It'd go something like this

queryParameters = query.split("&")
let encodedParameters = queryParameters.map { parameter in
   let key = parameter.unti("=")
   let value = parameter.after("=") // remember value might not exist
   return encodeURIComponent(key) + "=" + encodeURIComponent(value)
}
let encodedQueryParameters = encodedParameters.joined("&")

Sorry for the mixture of TS and Swift but it easier to write in Swift.

@adam-fowler adam-fowler reopened this Aug 20, 2024
@ahoppen
Copy link
Member

ahoppen commented Aug 20, 2024

Yes, the underlying issue happens during Uri parsing where Uri decomposes the percent encoding. I think the correct design would be that Uri is a structured bag of bytes that can decode percent encoding when accessing the host/path/query/... Incidentally that is what Foundation’s URL type is doing, as I realized afterwards.

But most of all, I think that Uris should be able to round-trip.

Uri should definitely not do any splitting of the query based on & and = because the structure of the query is not specified in RFC 3986. The format with & and = is just what HTML forms happen to use but it’s perfectly valid to have any other structure in the query.

Now, I still don’t know how I want to fix this. The real fix would be to completely re-design the Uri type in VS Code itself but that is out of the question. Presumably, even if we wanted to this, that’s too big of an API behavior break to be acceptable to be accepted by VS Code. Maybe the best middle ground would be to not do any percent de/encoding in Uris for the sourcekit-lsp scheme. These URIs should be mostly opaque to VS Code anyway (I think all it does is extract the file name for display purposes, which would still work, especially because SourceKit-LSP doesn’t add any special characters to the file name of macro expansion (it’s always L#C#-L#C#.swift with # being a placeholder for a number). That way the VS Code extension still wouldn’t need to interpret any URLs except for knowing about the sourcekit-lsp scheme. But it already has knowledge of that, so I guess that’s acceptable.

I will need to think about this some more before committing to a solution but I’d welcome opinions.

@lokesh-tr
Copy link
Contributor Author

lokesh-tr commented Aug 20, 2024

Well, as I initially made this PR and after considering your inputs, am convinced that the best place to have the fix for the issue is in vscode-swift itself (since the issue exists in vscode). I would say that the fix for this in sourcekit-lsp should be a 'last resort'. However, am happy that a fix for this exists in sourcekit-lsp in the first place to just 'make things work'.

Its nice that semantic functionality works with either this fix or the fix in sourcekit-lsp. But, am not sure with how good my fix here handles its job. I am getting aware of the fact that an acceptable fix for this won't be as simple as what I have right now as Adam suggests.

While Alex says that the structure of the query is not specified in the RFC 3986, It now makes me wonder how to decode the query parameters effectively without knowing what the structure of the query will be since it can be anything.

I will have to see if we can come up with a good idea on how to fix this here knowing that we already have a fix ready in sourcekit-lsp just in case.

I will have to spend some time getting a mental model of what's going on by looking at the implementation of vscode.URI, RFC 3986 and how Foundation deals with URL. The current fix which I have right here doesn't take any of this into factor and only 'makes things work' (although now I assume based your inputs that it won't work for some special cases).

I don't have any opinions for now, since I haven't researched much on this. I would need some time.

@adam-fowler
Copy link
Contributor

Uri should definitely not do any splitting of the query based on & and = because the structure of the query is not specified in RFC 3986. The format with & and = is just what HTML forms happen to use but it’s perfectly valid to have any other structure in the query.

We own the sourcekit-lsp scheme, so we can define how the query parameters are formatted so this point is moot. We've already chosen the URL query format for this scheme, so lets stick with it.

Maybe the best middle ground would be to not do any percent de/encoding in Uris for the sourcekit-lsp scheme.

This doesn't work because

  1. vscode.Uri automatically percent decodes the query part of the Uri (we have no control over this)
  2. the value of one of the query parameters is a filename that can includes characters that should probably be percent encoded.

@ahoppen
Copy link
Member

ahoppen commented Aug 22, 2024

Let’s close this in favor of #1026.

@lokesh-tr lokesh-tr closed this Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants