DataLakeHouse.io With Wasabi
    • 02 Oct 2024
    • 2 Minutes to read
    • PDF

    DataLakeHouse.io With Wasabi

    • PDF

    Article summary

    How do I use DataLakeHouse.io with Wasabi?

    Wasabi has been validated with DataLakeHouse.io which is a leading cloud data synchronization and business intelligence orchestration platform that enables data teams to build single source of truth repositories and data models with fast time to value solutions.

    1. Prerequisites

    2. Reference Architecture 

    DLH_Architecture_Black_Background.jpg

    3. Configuration

    3.1 Create an account at DataLakeHouse.io and make sure to verify your email address to activate your account.

    DLH_New_User_Black_Background.jpg

    3.2 Log in to your DataLakeHouse.io account at the portal login.

    DLH_Portal_Login_Black_Background.jpg

    3.3 On the Dashboard click on "+ Add a Source".

    Dashboard_Add_Source_Black_Background.jpg

    3.4 Search for Wasabi and then click on "Add New Source"

    DLH_Search_for_Wasabi_Black_Background.jpg

    3.5 Follow the instructions and enter the required information about your Wasabi Connection:

    • Name - Alias for this connection unique from any other connections you created or will create

    • Target Schema Prefix - Prefix for the schema at the target you will sync to

    • Bucket - Enter the bucket name where your files are stored

    • Region - Select the region where your bucket is stored

    • Folder Path - is a path on the root bucket from where desired files will be retrieved

    • Access Key - Enter your Wasabi Access Key credentials

    • Secret Key - Enter your Wasabi Secret Key credentials

    Wasabi_Storage_Connection_Black.jpg

    3.6 Click the "Save & Test" button once all the data has been entered. 

    3.7 Next you will create a Cloud Data Warehouse Target by click on "Targets" under "Connections" in the left hand pane.

    Target_Selection_Black.jpg

    3.8 Click on "+ Add New Target" under the desired Data Warehouse vendor

    Snowflake_Target_Settings_Black_Background.jpg

    Note - Snowflake Setup/Config Instructions below:

    Snowflake_Instructions_Black.jpg

    3.9 Click on the "Save & Test" button after inputting all the required information.

    3.10 Click on "Sync Bridges" in either the left hand pane or the top ribbon

    Sync_Bridges_Black.jpg

    3.11 Click on " + New Sync Bridge".

    New_Sync_Bridge_Black_Background.jpg

    3.12 Input the required information for your Sync Bridge Settings:

    • Sync Bridge Name - Enter a unique name for your Sync Bridge

    • Select Connections - Select your Source and Target Connections that you created from each Dropdown

    • Sync Time Zone - Reflects when your data should load. All times are ultimately converted to UTC.

    • Sync Frequency - Select a Sync Frequency at which your data will synchronize. Lowest frequency for non-Enterprise Plan customers is 15 min depending on the amount of data needing to be replicated which is source dependent. By selecting 12 or 24 hours you will be prompted with the option to set the start time of the sync, but this is optional.

      • If the Apply Start Time? appears (optional), by checking the checkbox you will be able to see a Sync Start Time dropdown

      • If a 'Manual Sync' option is available for your plan, the data will still be synchronized on a 24-hour period from when you save the Sync Bridge.

    3.13 Click on "Save Sync Bridge" and the Sync will start at the next scheduled time.

    Sync_Bridge_Settings_Black_Background.jpg

    3.14 Your Sync Bridges page will now reflect your newly created Bridge and should look similar to the following:

    Sync_Bridge_Black_Background.jpg

    3.15 While in the Sync Bridges page click on the "Actions" menu to the right of where your Sync Bridge listing appears. Click "Run Sync Bridge Right Now"

    Run_Sync_Bridge_Now_Black_Background.jpg

    Note - This will start the process to synchronize your Wasabi bucket files into the cloud data warehouse connection

    3.16 Once the Sync Bridge process is complete you can visit your Cloud Data Warehouse Database to see your data in the respective database tables ready for consumption.