Replies: 2 comments 2 replies
-
Hiya, We struggle to support all databricks runtimes because we don't have access to databricks ourselves. It looks like the change that has broken databricks for you is here (this change fixed it for many users!). See also various comments here You might find it works if you upgrade or downgrade your runtime. Otherwise you'll need to stick with 3.9.12 of Splink |
Beta Was this translation helpful? Give feedback.
-
I've noticed with the latest version that an error message is printed, but everything still works correctly (using latest Splink and using Databricks Runtime 13.3 LTS). I think I also tested with DBR 11.3 and it was ok. I think this is because the fix involved a sequence of try-catch statements that work for different DBR versions, and the error is still printed if one of them fails. If everything else after that in your code works fine I wouldn't worry about it. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I am running splink package Spark Backend on databricks. My cluster configurations is 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12). When I used 3.9.12, there is no issue with SparkLinker , but since I started to use 3.9.14, I ran into the error with full description below.
The setting is all default ones with SparkLinker
SparkLinker(
[dat1,dat2]
, setting
, break_lineage_method = 'parquet'
, repartition_after_blocking = True
, spark = spark
)
Any help is greatly appreciated. Thank you.
--- Logging error ---
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/splink/databricks/enable_splink.py", line 38, in enable_splink
lib = JavaJarId(
File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1585, in call
return_value = get_return_value(
File "/databricks/spark/python/pyspark/sql/utils.py", line 196, in deco
return f(*a, **kw)
File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 330, in get_return_value
raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling None.com.databricks.libraries.JavaJarId. Trace:
py4j.Py4JException: Constructor com.databricks.libraries.JavaJarId([class java.net.URI, class java.lang.String, class java.lang.String, class scala.None$, class scala.None$, class scala.None$]) does not exist
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:202)
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:219)
at py4j.Gateway.invoke(Gateway.java:255)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Beta Was this translation helpful? Give feedback.
All reactions