Databricks Error SparkLinker #1461
Replies: 1 comment 8 replies
-
Hiya, The error is being created from: splink/splink/databricks/enable_splink.py Line 15 in 31d4d0e which is run automatically for you since you're in a databricks environment, see here The spark context is obtained from your input Spark dataframes using this function Specifically: So, things for you to check: What happens if you run Another option is to pass Spark explicitly into the SparkLinker see here, which I think should avoid the problem. I'm afraid I don't have access to databricks so I'm unable to check this easily. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I am new to databricks and spark/python and very excited to use Splink. I am trying to run Splink in databricks. I am getting an error 'NoneType' object has no attribute 'sparkContext' when trying to execute the command linker = SparkLinker(df, settings). I am using the default settings provided in the example repo. Any help is much appreciated. Thank you in advance!
Here are my cluster configurations:
When I did the pip install, I realized the version has been updated since the demo repo was last published so I specified version 3.9.3.
Here is the full error message provided by databricks:
AttributeError Traceback (most recent call last)
in
1 # from splink.spark.linker import SparkLinker
----> 2 linker = SparkLinker(df, settings)
3 deterministic_rules = [
4 "l.first_name = r.first_name and levenshtein(r.dob, l.dob) <= 1",
5 "l.surname = r.surname and levenshtein(r.dob, l.dob) <= 1",
/local_disk0/.ephemeral_nfs/envs/pythonEnv-xxx-xx-xxxxxx-xxxx/lib/python3.8/site-packages/splink/spark/linker.py in init(self, input_table_or_tables, settings_dict, break_lineage_method, set_up_basic_logging, input_table_aliases, spark, validate_settings, catalog, database, repartition_after_blocking, num_partitions_on_repartition)
182 self.in_databricks = "DATABRICKS_RUNTIME_VERSION" in os.environ
183 if self.in_databricks:
--> 184 enable_splink(spark)
185
186 self._set_default_break_lineage_method()
/local_disk0/.ephemeral_nfs/envs/pythonEnv-xxx-xx-xxxxxx-xxxx/lib/python3.8/site-packages/splink/databricks/enable_splink.py in enable_splink(spark)
13 None
14 """
---> 15 sc = spark.sparkContext
Beta Was this translation helpful? Give feedback.
All reactions