Skip to content

CDRIVER-5580 support embedded URI in connection string options #1914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 24, 2025

Conversation

eramongodb
Copy link
Contributor

@eramongodb eramongodb commented Mar 13, 2025

Summary

Partially resolves CDRIVER-5580 by updating mongoc_uri_parse to support embedded URI syntax in query options. More specifically, refactors URI parsing to more correctly support the presence of /, ?, :, and , within the connection string.

Additionally fixes mongoc_uri_parse_auth_mechanism_properties to correctly use ',' as the kvp delimiter and support the presence of ':' in property values.

Note

The changes in this PR use MONGODB-OIDC authentication mechanism properties in new regression tests, but it does not add any support for MONGODB-OIDC itself, taking advantage of the lack of thorough validation of auth-related options and properties. However, these changes should be forward-compatible with those currently in review in #1896 which introduce said thorough validation.

Definitions

Given the following connection string adapted from the Connection String spec:

mongodb://user:pass@host1:27017,host2:27018/db?key1=value1&key2=value2

the following terms will be used to describe its parts for purposes of this PR:

  • Scheme: mongodb
    • Scheme Delimiter: :// in mongodb://
  • Userinfo (aka "userpass"): user:pass
    • Username: user
    • Password: pass
    • Password Delimiter: : in user:pass
  • Userinfo Delimiter: @ in user:pass@
  • Hostinfo (aka "host list"): host1:27017,host2:27018
    • Host: host1:27017 in host1:27017,host2:27018
      • Hostname: host1 in host1:27017
      • Port: 27017 in host1:27017
      • Port Delimiter: : in host:27017
    • Host Delimiter: , in host1:27017,host2:27018
  • Auth Database (aka "database"): db
    • Database Delimiter: / in /db
  • Query: ?key1=value1&key2=value2
    • Query Delimiter: ?
    • Query Option: key1=value1
    • Query Option Delimiter: & in key1=value1&key2=value2

Problem

Note

For brevity, the scheme and its delimiter will be omitted from connection string examples below.

The mongoc_uri_parse function in mongoc-uri.c parses a connection string and produces a mongoc_uri_t object representing the described URI. Excluding UTF-8 validation, the algorithm currently used may be described by the following pseudocode:

# Omit scheme and its delimiter, including "mongodb+srv://".
str.remove_prefix("mongodb://")

# Find database delimiter.
db_delim = str.find('/')

# Split using database delimiter. (!!)
if db_delim:
    before_slash = str[:db_delim]
    options      = str[db_delim:]

# No database delimiter.
else:
    # Find userinfo delimiter.
    userpass = str[:str.find('@')]

    # Find query delimiter after userinfo.
    if userpass:
        # Userinfo may include '?'.
        # Exclude in search but include in result.
        rest = str.remove_prefix(userpass)

        # Split using query delimiter.
        before_slash = userpass + rest[:rest.find('?')]
        options      = hosts[rest.find('?'):]

    # No userinfo.
    else:
        # Split using query delimiter.
        before_slash = str[:str.find('?')]
        options      = str[str.find('?'):]

When options is not empty, it is subsequently parsed for the optional auth database (via mongoc_uri_parse_database) and query options (via mongoc_uri_parse_options).

A connection string which executes the "split using database delimiter" branch is:

localhost:27017/db?key=value
               ^ Find and split.

which is split into:

before_slash = "localhost:27017"
options      = "/db?key=value"

This branch assumes the only / that is present in the connection string (after the schema delimiter) MUST be the optional auth database delimiter. To my knowledge, this appears to be a reasonable assumption: no spec test, legacy or unified, contains a / in the connection string after the scheme delimiter... except for MONGODB-OIDC spec tests.

The MONGODB-OIDC authentication mechanism introduced a TOKEN_RESOURCE authentication mechanism property. The value of this property is expected to be a URI, e.g. mongodb://foo. This means that a slash may now be present in the query that is not the optional auth database delimiter:

localhost:27017?authMechanismProperties=TOKEN_RESOURCE:mongodb://foo,ENVIRONMENT=azure

This means the connection string above is incorrectly split into:

before_slash = "localhost:27017?authMechanismProperties=TOKEN_RESOURCE:mongodb:"
options      = "//foo,ENVIRONMENT=azure"

which results in parse failure.

Solution

The mongoc_uri_parse function is refactored to instead use a "cursor" to scan through the connection string and find the point at which to split into the "userhosts" (formerly before_slash) and "dbopts" (formerly options) sections. The new algorithm may be described by the following pseudocode:

# Skip the scheme.
cursor = str.remove_prefix("mongodb://")

# Skip past the optional userinfo.
userinfo_delim = cursor.find('@')
if userinfo_delim:
    cursor = cursor[userinfo_delim:]

db_delim = cursor.find('/')
query_delim = cursor.find('?')

# Skip to either the auth database delimiter '/' or the query delimiter '?'.
if db_delim:
    cursor = cursor[db_delim:]
elif query_delim:
    cursor = cursor[query_delim:]
else:
    cursor = str.end()

# Split on position of cursor at this point.
userhosts = str[:distance(str, cursor)]
dbopts    = str[distance(str, cursor):]

The actual code contains thorough comments explaining how each step is expected to modify the position of the cursor through the connection string.

An important detail not included in the pseudocode above is the handling of reserved characters. Per RFC 3986, only a certain set of characters are permitted within each component of the connection string. The exclusion of these reserved characters is not being correctly handled, leading to cases such as mongodb://us?r:pa?s@localhost being accepted as valid despite '?' not being permitted within the userinfo component. This test case has been updated accordingly.

The permitted characters in each component are summarized below:

pct-encoded = "%" HEXDIG HEXDIG
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
reserved    = gen-delims / sub-delims

reg-name    = *( unreserved / pct-encoded / sub-delims )
IPv4address = ... "." ...
IPv6address = ... "::" ... ":" ...
IPvFuture   = ... *( unreserved / sub-delims / ":" )
IP-literal  = "[" ( IPv6address / IPvFuture  ) "]"

userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
host        = IP-literal / IPv4address / reg-name
query       = *( pchar / "/" / "?" )

The important takeaway is the absence of gen-delims in the userinfo, host, and query components, where gen-delims is summarized as ":/?#[]@". These characters are now added to the list of excluded characters in calls to scan_to_unichar.

Note

Total conformance to RFC 3986 is not within the scope of this PR. Forbidding important gen-delims characters which have not been explicitly permitted within a given component to minimize the potential for incorrect parse failure is the focus of this PR.

Parsing ':' in authMechanismProperties

CDRIVER-5580 is originally concerned with ensuring ',' are correctly excluded from the value of key value pairs (kvp) in authMechanismProperties. However, this PR is instead more focused on correctly supporting ':' in kvps instead.

The authMechanismProperties query option has a value which are a list of kvps. Kvps are delimited by ',' first, and split into a key and a value by ':' second. For example, given:

authMechanismProperties=A:B,C:D:E,F:G

the resulting BSON document representing the mechanism properties should look as follows:

{
  'A': 'B',
  'C': 'D:E',
  'F': 'G'
}

This requirement that kvps are delimited by ',' is what motivated CDRIVER-5580 to forbid ',' in the value of the TOKEN_RESOURCE property (it can never be parsed in the intended manner). If a URI requires a comma, it must be percent-encoded instead.

However, the current implementation of mongoc_uri_parse_auth_mechanism_properties incorrectly scans for ':' first, then another scan for ',' second, with exclusions for ':' (if an excluded character is found before the terminator, then the result is "none found"). This leads to the above being incorrectly parsed as:

{
  'A': 'B',
  'C': 'D:E,F:G'
}

This PR fixes this behavior and adds regression tests (test_uri_uri_in_options) accordingly.

@eramongodb eramongodb requested a review from kevinAlbs March 13, 2025 14:34
@eramongodb eramongodb self-assigned this Mar 13, 2025
@eramongodb eramongodb requested a review from kevinAlbs March 14, 2025 21:36
@eramongodb
Copy link
Contributor Author

The NEWS entry for authentication-related credential validation has been summarized into a single line summarizing the related changes.

Copy link
Contributor

@vector-of-bool vector-of-bool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Aside: curious if the URI parsing is a good candidate for fuzz testing.

@eramongodb
Copy link
Contributor Author

Curious if the URI parsing is a good candidate for fuzz testing.

I expect yes, although it would probably need a dictionary to avoid wasting too many cycles on invalid URI keys.

@eramongodb eramongodb merged commit 9e3bbd3 into mongodb:master Mar 24, 2025
40 of 42 checks passed
@eramongodb eramongodb deleted the cdriver-5580 branch March 24, 2025 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants