Skip to content

Support for Service Principal Auth for Databricks Retrieve with DSPy #8293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

willsmithDB
Copy link
Contributor

This PR adds support for authenticating with Databricks Vector Search using service principals. Users can now provide client ID and secret credentials as an alternative to personal access tokens.

Changes

  • Added support for service principal authentication in the DatabricksRM class
  • Added two new optional parameters to the constructor:
    • databricks_client_id: Databricks service principal ID
    • databricks_client_secret: Databricks service principal secret
  • Updated authentication logic to use service principal credentials when provided
  • Added appropriate error handling and validation
  • Added clear logging statements to indicate which authentication method is being used
  • Updated documentation with new parameter descriptions

Implementation Details

The implementation prioritizes service principal authentication when both client credentials and tokens are provided. This follows the standard pattern where more specific authentication methods take precedence over more general ones.

Authentication now follows this logic:

  • If client ID and secret are provided, use service principal authentication
  • Otherwise, fall back to token-based authentication (existing behavior)

Testing

Tested with:

  • Service principal authentication using client ID and secret
  • Token-based authentication (ensuring backward compatibility)
  • Various combinations of missing credentials to verify error handling

Documentation

Updated the docstring to include descriptions of the new parameters and their usage.

willsmithDB and others added 6 commits May 27, 2025 14:31
- Updated the base class to include the optional parameters
- Updated logic on the _query_via_databricks_sdk to use the SP credentials if they exist otherwise will fallback to PAT.
…ity' into dbx_service_principal_functionality

# Conflicts:
#	dspy/retrieve/databricks_rm.py
Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! The code LGTM, could you include a screenshot or something that shows this new auth path works?

@willsmithDB
Copy link
Contributor Author

Absolutely, thank you so much for reviewing!

Are there standards for contributing tests that you would recommend for third party retrievers?

SP Flow

SPFlow

PAT Flow

PATFlow

Invalid PAT Flow

InvalidPAT

Cheers!

@chenmoneygithub
Copy link
Collaborator

@willsmithDB We are actually going to remove most of the 3rd party retreivers from DSPy due to the toughness of testing and maintaining. DatabricksRM will remain for a bit while longer, and probably get moved to Databricks internal repo later. Regarding testing, since it's hard to write unit test and run on CI, we just ask for proof that the change works.

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@chenmoneygithub chenmoneygithub merged commit 3c1dc16 into stanfordnlp:main May 30, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants