Getting Started with Google BigQuery


This getting started gives some clues to start working with Google BigQuery


Google Cloud Project Metadata

You must have created before a Google Cloud Project Metadata, which contains the account and credentials to connect to Google BigQuery.

This is mandatory, Google BigQuery Metadata will use it to gather the account and credentials.

Google Cloud Storage Metadata

Google Cloud Storage is used as temporary location for temporary files to optimize data loading on Google BigQuery.

It is recommended to create a Google Cloud Storage Metadata, to define this temporary location.

Connect to your Data

Create the Metadata

To create a Google BigQuery Metadata, launch the Metadata creation wizard, select the Google BigQuery Metadata in the list and follow the wizard.

The wizard will ask you to choose the credentials to use, with a list of all credentials defined in your workspace. If the list is empty, make sure that you have read carefully the prerequisites.

getting started bigquery metadata credentials

Select the credentials, click on next and click on Connect.

getting started bigquery metadata connect

On the next page, click on refresh on the Catalog Name, then select the Google Project from the list.

Click on refresh on the Schema Name and select the Google BigQuery dataset to reverse from the list.

getting started bigquery metadata catalog

Finally click next, refresh the list of tables, and choose the ones to reverse.

After having clicked on finish the tables will be reversed in the Metadata.

For performance purposes, Semarchy xDM Data Integration is using Cloud Storage to optimize the data loading on Google BigQuery

Drag and drop or select your previously created Google Cloud Storage Metadata inside the related property.

You can choose a bucket or a folder, depending on your preferred organisation.

getting started bigquery metadata storage link

This bucker/folder will now be used as temporary location when necessary to optimize data loading into Google BigQuery

Create your first Mappings

Below are some examples of Google BigQuery usages in Mappings and Processes.

Example of Mapping loading data from an HSQL database to a Google BigQuery table

getting started bigquery mapping example 1

Example of Mapping loading data from multiple BigQuery tables with joins to an HSQL table

getting started bigquery mapping example 2

Additional Notes

Cloud Storage Mode

When integrating data into Google BigQuery, data may be going through Google Cloud Storage for performance purposes.

Depending on the amount of data sent and network quality, for instance, different methods are available in Templates to have better performances.:

  • stream: Data is streamed directly in the Google Storage Bucket.

  • localfile: Data is first exported to a local temporary file, which is then sent to the defined Google Storage Bucket. This method should be preferred for large sets of data.

The storage method is defined on the Template:

getting started bigquery template storage method

Sample Project

The Google BigQuery Component ships sample project(s) that contain various examples and use cases.

You can have a look at these projects to find samples and examples describing how to use it.

Refer to Install Components to learn how to import sample projects.