Run BigQuery ML pipeline on CRMint

This tutorial introduces CRMint users to implementing a BigQuery ML pipeline from training to predicting. We will deploy a model to predict the price of real-estate in Brasil based on some features of the apartments.

Before you begin

Prerequisites

About 30 mins of time.
A Google Account, for use on the Google Cloud Platform.
Working instance of CRMint, if not please read how to deploy CRMint on GCP.

Costs

This tutorial uses billable components of Cloud Platform, including:

Google BigQuery
BigQuery ML

You incur charges for:

Storing your ML model and training data in BigQuery
Querying data in BigQuery
Running BigQuery ML SQL statements

Let’s predict the price of houses

The goal of this tutorial is to build a pipeline to predict the price of real estate. We will use data from Properati.

Create a dataset in BigQuery

Enter the name of your project
Open your BigQuery console
Select your project name on the left sidebar.
Create a US dataset with the button on the right Create Dataset.
Name your dataset predict_realestate_brasil, and choose to store the data in the US location.
Be sure to use the US location for your dataset. It’s needed because the data source we will use is located in the US.

Create the training pipeline in CRMint

Open your CRMint instance <Project ID not set>.
Download the pre-built pipeline train_evaluate_model.json.
Import the pipeline in your CRMint instance with the right Import button.
Look at the pipeline graph, it contains two nodes:
- Train model: Creates and trains a model based on the data selected by the SQL statement. Click on the node to inspect the content of the BQML query.
- Evaluate model: Runs an ML.EVALUATE query once the model is trained, to assess performances.
The resulting model is stored as part of your BigQuery dataset and is ready for production use as soon as it is trained.
Configure the imported pipeline by clicking the Edit button. Fill in the BQ_PROJECT parameter with your Cloud Project ID.

Run and check evaluation metrics

Run the pipeline by clicking on the Start button.
Be patient, should not take more than a minute or two…
Explore the results saved in the price_model_evaluation table in BigQuery.

Let’s predict some prices

Now that we have a model trained, evaluated and deployed to GCP, we are ready for some predictions!

Import the pre-built pipeline predict.json.
Look at the pipeline graph, it contains one node:
- Get Predictions: Runs an ML.PREDICT query to feed the input features to the model and get back one prediction per row.
Configure the imported pipeline like previously, filling-in the BQ_PROJECT parameter.
Run the pipeline.
Once the pipeline has finished (should take a couple minutes), you can explore the predicted values in the predict_realestate_brasil.predictions table.
Congratulations, you now have a new table in BigQuery containing all your predictions!

What’s next

Schedule this pipeline to run daily.
Read a full explanation of how CRMint works in What is CRMint? and CRMint Concepts
CRMint pipeline concepts

Quick Start