Apache Iceberg Table Maintenance and Procedures with Apache Spark

After I wrote the code for Getting Started with Apache Spark, Apache Kafka and Apache Iceberg I couldn’t get Apache Iceberg Maintenance tasks (like compaction) to work. There are Apache Spark Procedures that are supposed to work by using the “CALL” keyword with spark.sql.

spark.sql("""
CALL <catalog>.system.rewrite_data_files(
table => '<catalog>.getting_started_table', 
strategy => 'binpack', 
options => map('min-input-files','2')
)
""")

I kept getting the error

And I found the problem (with help from the Apache Iceberg Slack channel) that the way I was starting pyspark was wrong because I was not including –conf spark.sql.extensions=”org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions” when launching. So I did that and poof it worked we are good to go now.

I updated the source code repository so it works.

Thanx =) Joe Stein

https://www.twitter.com/charmalloc

https://www.linkedin.com/in/charmalloc

Leave a Reply

Discover more from Hello BitsNBytes World

Subscribe now to keep reading and get access to the full archive.

Continue reading