Skip to content

Set UserProvider before discovery in Spark SQL integrations #1934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 29, 2022

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Mar 28, 2022

This PR adds the requisite configuration step for the UserProvider implementation before attempting to discover the cluster information in the SparkSQL integrations. This additionally updates the integration tests to add coverage for the write and read paths for SparkSQL.

Fixes #1933

@@ -533,10 +534,10 @@ private[sql] case class ElasticsearchRelation(parameters: Map[String, String], @

// perform a scan-scroll delete
val cfgCopy = cfg.copy()
InitializationUtils.setUserProviderIfNotSet(cfgCopy, classOf[HadoopUserProvider], null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just roll setUserProviderIfNotSet into discoverClusterInfo to keep us from accidentally calling discoverClusterInfo without it in the future?

Copy link
Member Author

@jbaiera jbaiera Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but I think the problems are rooted deeper than just this initialization call. Honestly, I think all of InitializationUtils in general needs to be reworked.

Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the one question about whether we could do something to prevent this kind of thing in the future, but LGTM.

@jbaiera jbaiera merged commit 5e9bcaa into elastic:master Mar 29, 2022
@jbaiera jbaiera deleted the fix-spark-sql-kerberos branch March 29, 2022 16:59
jbaiera added a commit to jbaiera/elasticsearch-hadoop that referenced this pull request Mar 29, 2022
…1934)

This PR adds the requisite configuration step for the UserProvider implementation before attempting
to discover the cluster information in the SparkSQL integrations. This additionally updates the
integration tests to add coverage for the write and read paths for SparkSQL.
jbaiera added a commit to jbaiera/elasticsearch-hadoop that referenced this pull request Mar 29, 2022
…1934)

This PR adds the requisite configuration step for the UserProvider implementation before attempting 
to discover the cluster information in the SparkSQL integrations. This additionally updates the 
integration tests to add coverage for the write and read paths for SparkSQL.
jbaiera added a commit that referenced this pull request Mar 30, 2022
) (#1936)

This PR adds the requisite configuration step for the UserProvider implementation before attempting 
to discover the cluster information in the SparkSQL integrations. This additionally updates the 
integration tests to add coverage for the write and read paths for SparkSQL.
jbaiera added a commit that referenced this pull request Mar 30, 2022
…1934) (#1935)

This PR adds the requisite configuration step for the UserProvider implementation before attempting
to discover the cluster information in the SparkSQL integrations. This additionally updates the
integration tests to add coverage for the write and read paths for SparkSQL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SparkSQL breaks when saving data with Kerberos authentication
2 participants