Set UserProvider before discovery in Spark SQL integrations #1934

jbaiera · 2022-03-28T20:13:50Z

This PR adds the requisite configuration step for the UserProvider implementation before attempting to discover the cluster information in the SparkSQL integrations. This additionally updates the integration tests to add coverage for the write and read paths for SparkSQL.

Fixes #1933

masseyke · 2022-03-28T20:54:47Z

spark/sql-13/src/main/scala/org/elasticsearch/spark/sql/DefaultSource.scala

@@ -533,10 +534,10 @@ private[sql] case class ElasticsearchRelation(parameters: Map[String, String], @

      // perform a scan-scroll delete
      val cfgCopy = cfg.copy()
+      InitializationUtils.setUserProviderIfNotSet(cfgCopy, classOf[HadoopUserProvider], null)


Should we just roll setUserProviderIfNotSet into discoverClusterInfo to keep us from accidentally calling discoverClusterInfo without it in the future?

I don't disagree, but I think the problems are rooted deeper than just this initialization call. Honestly, I think all of InitializationUtils in general needs to be reworked.

masseyke

I have the one question about whether we could do something to prevent this kind of thing in the future, but LGTM.

…1934) This PR adds the requisite configuration step for the UserProvider implementation before attempting to discover the cluster information in the SparkSQL integrations. This additionally updates the integration tests to add coverage for the write and read paths for SparkSQL.

) (#1936) This PR adds the requisite configuration step for the UserProvider implementation before attempting to discover the cluster information in the SparkSQL integrations. This additionally updates the integration tests to add coverage for the write and read paths for SparkSQL.

…1934) (#1935) This PR adds the requisite configuration step for the UserProvider implementation before attempting to discover the cluster information in the SparkSQL integrations. This additionally updates the integration tests to add coverage for the write and read paths for SparkSQL.

jbaiera added 3 commits March 28, 2022 16:05

Fix UserProvider calls in EsSparkSQL

a292a25

Set UserProvider before calls to discoverClusterInfo

1e0f7cb

Add additional smoke tests for data frame and data sources

749fb1f

jbaiera added bug :Spark v8.2.0 v7.17.2 labels Mar 28, 2022

jbaiera requested a review from masseyke March 28, 2022 20:13

masseyke reviewed Mar 28, 2022

View reviewed changes

masseyke approved these changes Mar 28, 2022

View reviewed changes

jbaiera merged commit 5e9bcaa into elastic:master Mar 29, 2022

jbaiera deleted the fix-spark-sql-kerberos branch March 29, 2022 16:59

jbaiera added backport pending v8.1.3 labels Mar 29, 2022

This was referenced Mar 29, 2022

[7.17] Set UserProvider before discovery in Spark SQL integrations (#1934) #1935

Merged

[8.1] Set UserProvider before discovery in Spark SQL integrations (#1934) #1936

Merged

jbaiera removed the backport pending label Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set UserProvider before discovery in Spark SQL integrations #1934

Set UserProvider before discovery in Spark SQL integrations #1934

Uh oh!

jbaiera commented Mar 28, 2022

Uh oh!

masseyke Mar 28, 2022

Uh oh!

jbaiera Mar 28, 2022 •

edited

Loading

Uh oh!

masseyke left a comment

Uh oh!

Uh oh!

Set UserProvider before discovery in Spark SQL integrations #1934

Set UserProvider before discovery in Spark SQL integrations #1934

Uh oh!

Conversation

jbaiera commented Mar 28, 2022

Uh oh!

masseyke Mar 28, 2022

Choose a reason for hiding this comment

Uh oh!

jbaiera Mar 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

masseyke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbaiera Mar 28, 2022 •

edited

Loading