Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

 

 

BigQuery - Get Started


 

 

TABLE OF CONTENT

Use Standard SQL

 At the beginning Google proposed a query language that called legacy SQL. This language was close to the classic SQL. Then Google decided to propose another language called standard SQL. This language is closer to SQL than legacy and allows generally more functionalities. Moreover this language is pushed by Google and Google won’t maintain 2 languages that’s why the first advice is use standard SQL

 For going further I invite you to read this page : https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql

If you uncheck the box in yellow you activate standard SQL

Add a header in each planned query

When a query is launched inside a program (dataflow, Apps script, python….) please add a header to describe the query. That's really important for log analysis and because the cost is directly linked on the database use. So its better if we can detect queries that are not used anymore.

You can find the header description below :


 

Field

Description

project

The name of the project or program

program

Dataflow, apps script, python, datalab...

function

The name of the function

frequency

the frequency of the query

owner

owner of the program, could be a service account a user login into the tool

SQL

Legacy or standard

Link

filename or link if it’s a Google document

 

 

Example :

 

/*============================================================*/

 

/*project : SolvIA                              - program : Apps script          */

 

/*function : load_daily_freshdesk_tickets                                                                */
/*frequency : Daily                           - owner : david.tonda@solvay.com               */
/*SQL : Standard                                                                                                     */

 

/* Link: https://script.google.com/a/solvay.com/d/1c8rvwEhtsI85u...                       */

 

/*============================================================*/

 

This list is not exhausted you can add what you want to be identified a query.

 


Name convention

  • Tables : the name of the main notion inside the table

  • Views : the characters V_ and the main source of the select. If already exists add a short label to precize the goal of the view.

  • DataSet : the goal of the dataset following by a short description and prefixed by restrictions if exists example : SAS_FRESHDESK_CGI

Dataset organisation

 

  • rights : the finest restriction level available in BQ is done on datasets, so if you need to restrict access to a part of the data you should create a copy of the specific data you want to share into another dataset.

  • set of relevant tables : in the dataset try to group a set of relevant tables by notion by project something with sense

  • data life time : in the dataset you can define the table life duration. It’s really usefull for dataset to do tests or for the datasets used to save data.

  • Saved Dataset : The service is managed and ensure the data can’t be loose by hardware issue. But it's not user issue proof. That’s why for the important table it’s good to provide a save of this table into a specific dataset where the expiration date is set in order to keep only one month of saves..

  • Logs Dataset : it’s just a note to remind that BigQuery is a good database to store logs.

Query

  • many steps, save tables : the cost of the storage is not expensive, so don’t hesitate to split the complexity of a query into many steps instead of writting a complicated query.

  • window functions : window functions are present in BigQuery. It’s very interesting in this kind of database especially if you want the last occurrence of a row. https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#window-clause

  • User Defined Functions (UDF) :  User defined functions allow to BigQuery to execute javascript code. The restrictions are big, number of UDF launched in the same project in the same time. Query are longer and you may have a timeout. Try to avoid it if you can.

  • Explanation Plan and query information : you have tabs just above the result grid and you can see where BigQuery spent the time to display the result of your query and you can see the number of bytes billed for the execution of the query.

  • Saved queries : you can save queries for you or for the project. could be interesting when you are working on a complex query

Execution logs




fjhdjhfd



dfkj

 

  • rights : the finest restriction level available in BQ is done on datasets, so if you need to restrict access to the data you should create a copy of your data into another dataset.

  • set of relevant tables : in the dataset try to group a set of relevant tables by notion by project something with sense

  • Data life time : in the dataset you can define the table life duration. It’s really usefull for dataset to do tests.

  • Saved Dataset : The service is managed and ensure the data can’t be loose by hardware issue. But it doesn’t convert the user issue. That’s why for the important table it’s good to provide a save of this table into a specific dataset where the expiration date is set in order to keep only one month of saves..

  • Logs Dataset : it’s just a note to remind that BigQuery is a good database to store logs.

BigQuery

The best way to get IT support is to use the new Service One Platform.