After I wrote the code for Getting Started with Apache Spark, Apache Kafka and Apache Iceberg I couldn’t get Apache Iceberg Maintenance tasks (like compaction) to work. There are Apache Spark Procedures that are supposed to work by using the “CALL” keyword with spark.sql.
spark.sql("""
CALL <catalog>.system.rewrite_data_files(
table => '<catalog>.getting_started_table',
strategy => 'binpack',
options => map('min-input-files','2')
)
""")
I kept getting the error
pyspark.errors.exceptions.captured.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near ‘CALL’.(line 2, pos 0)
== SQL ==
CALL nessie.system.rewrite_data_files(
^^^
table => ‘nessie.getting_started_table’,
strategy => ‘binpack’,
options => map(‘min-input-files’,’2′)
)
And I found the problem (with help from the Apache Iceberg Slack channel) that the way I was starting pyspark was wrong because I was not including –conf spark.sql.extensions=”org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions” when launching. So I did that and poof it worked we are good to go now.
I updated the source code repository so it works.
Thanx =) Joe Stein
https://www.twitter.com/charmalloc
https://www.linkedin.com/in/charmalloc
Leave a Reply