Apache Spark, Apache Kafka, Apache Iceberg, Nessie and Minio

Recently I posted Getting Started with Apache Spark, Apache Kafka and Apache Iceberg. When I did that originally it was missing two things. The first was using the Nessie Catalog which I updated. The second was it was using AWS S3. I didn’t like it depending on AWS S3 for a few reasons.

  • So many examples use AWS and often as a result will not let you understand certain configurations and aspects to how the system you are using works. Less ability to learn outside the box.
  • Not everyone uses AWS and some people are in the data center and like it that way.

So, I needed an object store and Minio fit the bill. Minio is a replacement for AWS S3 that can run on Kubernetes. It also runs without it. Also, it spins up nicely in a docker container (not that I couldn’t have used localstack) so it works better for my example but is something that can be used for real.

I updated the repo with how to get started.

Take a look at the code here.

Thanx =8^) Joe Stein

https://www.twitter.com/charmalloc

https://www.linkedin.com/in/charmalloc

Leave a Reply

Discover more from Hello BitsNBytes World

Subscribe now to keep reading and get access to the full archive.

Continue reading