Apache Iceberg Table Maintenance and Procedures with Apache Spark

After I wrote the code for Getting Started with Apache Spark, Apache Kafka and Apache Iceberg I couldn’t get Apache Iceberg Maintenance tasks (like compaction) to work. There are Apache Spark Procedures that are supposed to work by using the “CALL” keyword with spark.sql.

spark.sql("""
CALL <catalog>.system.rewrite_data_files(
table => '<catalog>.getting_started_table', 
strategy => 'binpack', 
options => map('min-input-files','2')
)
""")

I kept getting the error

pyspark.errors.exceptions.captured.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near ‘CALL’.(line 2, pos 0)

== SQL ==

CALL nessie.system.rewrite_data_files(

^^^
table => ‘nessie.getting_started_table’,
strategy => ‘binpack’,
options => map(‘min-input-files’,’2′)
)

And I found the problem (with help from the Apache Iceberg Slack channel) that the way I was starting pyspark was wrong because I was not including –conf spark.sql.extensions=”org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions” when launching. So I did that and poof it worked we are good to go now.

I updated the source code repository so it works.

Thanx =) Joe Stein

https://www.twitter.com/charmalloc

https://www.linkedin.com/in/charmalloc

Hello BitsNBytes World