BigQuery - Get Started

TABLE OF CONTENT

Use Standard SQL

At the beginning Google proposed a query language that called legacy SQL. This language was close to the classic SQL. Then Google decided to propose another language called standard SQL. This language is closer to SQL than legacy and allows generally more functionalities. Moreover this language is pushed by Google and Google won’t maintain 2 languages that’s why the first advice is use standard SQL

For going further I invite you to read this page : https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql

If you uncheck the box in yellow you activate standard SQL

Add a header in each planned query

When a query is launched inside a program (dataflow, Apps script, python….) please add a header to describe the query. That's really important for log analysis and because the cost is directly linked on the database use. So its better if we can detect queries that are not used anymore.

You can find the header description below :

Field	Description
project	The name of the project or program
program	Dataflow, apps script, python, datalab...
function	The name of the function
frequency	the frequency of the query
owner	owner of the program, could be a service account a user login into the tool
SQL	Legacy or standard
Link	filename or link if it’s a Google document

Example :

/============================================================/

/project : SolvIA - program : Apps script /

/function : load_daily_freshdesk_tickets                                                                /
/frequency : Daily                           - owner : david.tonda@solvay.com               /
/SQL : Standard                                                                                                     /

/* Link: https://script.google.com/a/solvay.com/d/1c8rvwEhtsI85u... */

/============================================================/

This list is not exhausted you can add what you want to be identified a query.

Name convention

Tables : the name of the main notion inside the table
Views : the characters V_ and the main source of the select. If already exists add a short label to precize the goal of the view.
DataSet : the goal of the dataset following by a short description and prefixed by restrictions if exists example : SAS_FRESHDESK_CGI

Dataset organisation

rights : the finest restriction level available in BQ is done on datasets, so if you need to restrict access to a part of the data you should create a copy of the specific data you want to share into another dataset.
set of relevant tables : in the dataset try to group a set of relevant tables by notion by project something with sense
data life time : in the dataset you can define the table life duration. It’s really usefull for dataset to do tests or for the datasets used to save data.
Saved Dataset : The service is managed and ensure the data can’t be loose by hardware issue. But it's not user issue proof. That’s why for the important table it’s good to provide a save of this table into a specific dataset where the expiration date is set in order to keep only one month of saves..
Logs Dataset : it’s just a note to remind that BigQuery is a good database to store logs.

Query

many steps, save tables : the cost of the storage is not expensive, so don’t hesitate to split the complexity of a query into many steps instead of writting a complicated query.
window functions : window functions are present in BigQuery. It’s very interesting in this kind of database especially if you want the last occurrence of a row. https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#window-clause
User Defined Functions (UDF) : User defined functions allow to BigQuery to execute javascript code. The restrictions are big, number of UDF launched in the same project in the same time. Query are longer and you may have a timeout. Try to avoid it if you can.
Explanation Plan and query information : you have tabs just above the result grid and you can see where BigQuery spent the time to display the result of your query and you can see the number of bytes billed for the execution of the query.
Saved queries : you can save queries for you or for the project. could be interesting when you are working on a complex query

Execution logs

fjhdjhfd

dfkj

rights : the finest restriction level available in BQ is done on datasets, so if you need to restrict access to the data you should create a copy of your data into another dataset.
set of relevant tables : in the dataset try to group a set of relevant tables by notion by project something with sense
Data life time : in the dataset you can define the table life duration. It’s really usefull for dataset to do tests.
Saved Dataset : The service is managed and ensure the data can’t be loose by hardware issue. But it doesn’t convert the user issue. That’s why for the important table it’s good to provide a save of this table into a specific dataset where the expiration date is set in order to keep only one month of saves..
Logs Dataset : it’s just a note to remind that BigQuery is a good database to store logs.

BigQuery

Page tree

BigQuery - Get Started

BigQuery - Get Started

TABLE OF CONTENT

Use Standard SQL

For going further I invite you to read this page : https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql

If you uncheck the box in yellow you activate standard SQL

Add a header in each planned query

When a query is launched inside a program (dataflow, Apps script, python….) please add a header to describe the query. That's really important for log analysis and because the cost is directly linked on the database use. So its better if we can detect queries that are not used anymore.

You can find the header description below :

Example :

/*============================================================*/

/*project : SolvIA - program : Apps script */

/*function : load_daily_freshdesk_tickets *//*frequency : Daily - owner : david.tonda@solvay.com *//*SQL : Standard */

/* Link: https://script.google.com/a/solvay.com/d/1c8rvwEhtsI85u... */

/*============================================================*/

This list is not exhausted you can add what you want to be identified a query.

Name convention

Tables : the name of the main notion inside the table

Views : the characters V_ and the main source of the select. If already exists add a short label to precize the goal of the view.

DataSet : the goal of the dataset following by a short description and prefixed by restrictions if exists example : SAS_FRESHDESK_CGI

Dataset organisation

rights : the finest restriction level available in BQ is done on datasets, so if you need to restrict access to a part of the data you should create a copy of the specific data you want to share into another dataset.

set of relevant tables : in the dataset try to group a set of relevant tables by notion by project something with sense

data life time : in the dataset you can define the table life duration. It’s really usefull for dataset to do tests or for the datasets used to save data.

Logs Dataset : it’s just a note to remind that BigQuery is a good database to store logs.

Query

many steps, save tables : the cost of the storage is not expensive, so don’t hesitate to split the complexity of a query into many steps instead of writting a complicated query.

window functions : window functions are present in BigQuery. It’s very interesting in this kind of database especially if you want the last occurrence of a row. https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#window-clause

User Defined Functions (UDF) : User defined functions allow to BigQuery to execute javascript code. The restrictions are big, number of UDF launched in the same project in the same time. Query are longer and you may have a timeout. Try to avoid it if you can.

Explanation Plan and query information : you have tabs just above the result grid and you can see where BigQuery spent the time to display the result of your query and you can see the number of bytes billed for the execution of the query.

Saved queries : you can save queries for you or for the project. could be interesting when you are working on a complex query

Execution logs

/============================================================/

/project : SolvIA - program : Apps script /

/function : load_daily_freshdesk_tickets /
/frequency : Daily - owner : david.tonda@solvay.com /
/SQL : Standard /

/============================================================/