---
title: "Apache Iceberg With Wasabi"
slug: "apache-iceberg-with-wasabi"
updated: 2026-06-01T15:27:05Z
published: 2026-06-01T15:27:05Z
---

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wasabi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Apache Iceberg With Wasabi

This guide explains how to configure [Apache Spark](https://spark.apache.org/) with [Apache Iceberg](https://iceberg.apache.org/) to use Wasabi Hot Cloud Storage as an S3-compatible object storage backend. Apache Iceberg is an open table format for large analytic datasets.

## Requirements

- This solution was tested on Ubuntu Server 26.04 LTS with Java 21 (OpenJDK 21.0.11) and Apache Spark 4.1.2.
- A Wasabi account with a bucket created. [Creating a Bucket](https://docs.wasabi.com/docs/working-with-buckets-and-objects#creating-a-bucket) for details on this procedure.
- Wasabi access key and secret key. See [3—Creating a User Account and Access Key](https://docs.wasabi.com/docs/creating-a-user-account-and-access-key) for further information.

## Find Your Bundled Versions

Spark ships with a bundled Hadoop client. The hadoop-aws package version must exactly match it. Find the version by checking the JAR filename, along with the Scala and Spark versions.

```bash
ls $SPARK_HOME/jars/hadoop-client-*.jar
ls $SPARK_HOME/jars/scala-library-*.jar
ls $SPARK_HOME/jars/spark-core_*.jar
```

The Hadoop filename (for example, hadoop-client-runtime-3.4.2.jar) tells you the version to use in the next step with the Scala and Spark versions as well.

## Configuring spark-defaults.conf

Use your favorite Linux text editor such as vi or nano to edit the Spark configuration file.

```bash
vi /opt/spark/conf/spark-defaults.conf
```

Add the following lines with the following information.

- Wasabi access and secret keys. Replace YOUR_WASABI_ACCESS_KEY and YOUR_WASABI_SECRET_KEY with your keys.
- Bucket name (replace “mt-iceberg” with your bucket name).
- Wasabi region endpoint.
- The Hadoop version found above. Replace 3.4.2 in hadoop-aws in the last line with the version found above.
- The iceberg-spark-runtime version (4.1_2.13:1.11.0) in the last line must match your Spark version (4.1.x) and Scala version (2.13) found above.

```plaintext
spark.hadoop.fs.s3a.endpoint                    https://s3.us-east-2.wasabisys.com
spark.hadoop.fs.s3a.access.key                  YOUR_WASABI_ACCESS_KEY
spark.hadoop.fs.s3a.secret.key                  YOUR_WASABI_SECRET_KEY
spark.hadoop.fs.s3a.aws.credentials.provider    org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
spark.hadoop.fs.s3a.path.style.access           true
spark.sql.extensions                            org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.iceberg_catalog               org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg_catalog.type          hadoop
spark.sql.catalog.iceberg_catalog.warehouse     s3a://mt-iceberg/
spark.jars.packages                             org.apache.hadoop:hadoop-aws:3.4.2,org.apache.iceberg:iceberg-spark-runtime-4.1_2.13:1.11.0
```

> This example uses Wasabi’s us-east-2 storage region. Use the endpoint URL for the region where your bucket is located. For a list of regions and their endpoint URLs, see [Service URLs for Wasabi’s Storage Regions](https://docs.wasabi.com/docs/what-are-the-service-urls-for-wasabi-s-different-storage-regions).

## Starting spark-shell

Start the Spark shell. The packages defined in spark-defaults.conf will be downloaded automatically on first launch.

```bash
spark-shell
```

## Creating a Database and Table

Once inside the Spark shell, create a database (namespace) and an Iceberg table.

```scala
spark.sql("CREATE DATABASE iceberg_catalog.db")

spark.sql("CREATE TABLE iceberg_catalog.db.test (id INT, name STRING) USING iceberg")
```

## Inserting and Querying Data

Insert some rows and query them back.

```scala
spark.sql("INSERT INTO iceberg_catalog.db.test VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie')")

spark.sql("SELECT * FROM iceberg_catalog.db.test").show()
```

## Verifying Iceberg Metadata

Query the Iceberg snapshots table to confirm that metadata is being stored in your Wasabi bucket.

```scala
spark.sql("SELECT * FROM iceberg_catalog.db.test.snapshots").show()
```

The output will show a snapshot with a manifest_list path beginning with s3a://mt-iceberg/, confirming that Iceberg is managing the table and storing all data and metadata in Wasabi.

![](https://cdn.document360.io/bef0a1ea-7768-4d5a-b520-c4fe2f7fafad/Images/Documentation/apache-iceberg-with-wasabi-image-fly9uvij.png)

## Verifying Your Bucket Data

Verify that Iceberg has written data and metadata to your Wasabi bucket using the [Wasabi Console](https://console.wasabisys.com). Log in to the console, click **Buckets** in the left navigation. Click the name of your bucket (mt-iceberg in this example):

![](https://cdn.document360.io/bef0a1ea-7768-4d5a-b520-c4fe2f7fafad/Images/Documentation/apache-iceberg-with-wasabi-image-j5m8l8il.png)

Drill down through the folder structure to see your data in the bucket. Navigate to db/test to see the data and metadata folders created by Iceberg.

![](https://cdn.document360.io/bef0a1ea-7768-4d5a-b520-c4fe2f7fafad/Images/Documentation/apache-iceberg-with-wasabi-image-muq1vjp2.png)

## Catalog Structure

The fully qualified name for any Iceberg table follows this hierarchy.

`catalog.database.table`

In the example above:

- `iceberg_catalog` is the catalog, defined in `spark-defaults.conf` and pointing to your Wasabi bucket.
- `db` is the database (also called a namespace).
- `test` is the table.

You can create multiple databases under the same catalog without any additional configuration.

```scala
spark.sql("CREATE DATABASE iceberg_catalog.project1")

spark.sql("CREATE DATABASE iceberg_catalog.project2")
```

## Apache Polaris (Optional — Out of Scope)

The setup described in this article uses a Hadoop-based catalog, which stores Iceberg metadata directly in your Wasabi bucket. This is simple to configure and works well for single-engine environments where only Spark is reading and writing tables.

For more advanced use cases, Apache Polaris is an open-source REST catalog for Apache Iceberg that acts as a central metadata service. Rather than each engine connecting directly to object storage to find table metadata, all engines connect to Polaris over a standard REST API. This enables:

- Multi-engine access—Spark, Trino, Flink, and other engines can all read and write the same Iceberg tables simultaneously through a single catalog
- Centralized access control—Role-based permissions managed in one place rather than through storage bucket policies
- Credential vending—Polaris issues short-lived, scoped credentials to engines rather than requiring each engine to hold long-lived storage keys
- Audit and governance—All catalog operations are logged through a single service

Apache Polaris became a top-level Apache Software Foundation project in February 2026. It supports S3-compatible storage including Wasabi, and can use PostgreSQL as a persistent metastore backend.

Configuring Polaris is significantly more involved than the Hadoop catalog approach and is outside the scope of this article. For more information, see:

- Apache Polaris documentation—[https://polaris.apache.org](https://polaris.apache.org)
- Polaris quickstart guide—[https://polaris.apache.org/guides/quickstart/](https://polaris.apache.org/guides/quickstart/)
- Creating a catalog on S3-compatible storage—[https://polaris.apache.org/in-dev/unreleased/getting-started/creating-a-catalog/s3/](https://polaris.apache.org/in-dev/unreleased/getting-started/creating-a-catalog/s3/)
