Apache Spark, Apache Kafka, Apache Iceberg, Nessie and Minio

Recently I posted Getting Started with Apache Spark, Apache Kafka and Apache Iceberg. When I did that originally it was missing two things. The first was using the Nessie Catalog which I updated. The second was it was using AWS S3. I didn’t like it depending on AWS S3 for a few reasons.

So many examples use AWS and often as a result will not let you understand certain configurations and aspects to how the system you are using works. Less ability to learn outside the box.
Not everyone uses AWS and some people are in the data center and like it that way.

So, I needed an object store and Minio fit the bill. Minio is a replacement for AWS S3 that can run on Kubernetes. It also runs without it. Also, it spins up nicely in a docker container (not that I couldn’t have used localstack) so it works better for my example but is something that can be used for real.

I updated the repo with how to get started.

Take a look at the code here.

Thanx =8^) Joe Stein

https://www.twitter.com/charmalloc

https://www.linkedin.com/in/charmalloc

Hello BitsNBytes World