Acess request :
First you need to get access to :
- DataIku to treat easily the data (make filter, add calculated fields, merge two files...)
- BigQuery self service datalake to store my data
- Qliksense to create my own customized reports on my data
In order to get those service just write a demand with SolvIA. You can find a pretyped request below :
Hello, I want to use the selfservice datalake tool. Could you pleased grant access to me to : -BigQuery self service datalake -DataIku -Qliksense Have a great day |
|---|
How to use the self service datalake?
Access BigQuery : Cloud storage : WINDOWSLOGIN must be replaced with your windows login. |
|---|
The idea is to propose a complete environment with :
A place to manipulate data (join, merge, filter, formula...) => DSS by Dataiku
A place to store the results of the treatments from previous step accessible by all Solvay reporting tools => BigQuery by Google
A place to analyse, highlight, share your data => Qlik Sense by Qlik
Step 1 : Open dataiku
Step 2 : create your project
Click on the upper right corner to add a project.
Select blank project
Set a name to your project. Advice : put your name and a data context.
Step 3 add your xls file into DSS Dataiku
Click on the import dataset button in the middle of the screen in blue
Select Files (first icon in the screen shot) then click and slide your file
On the top right corner set a name to the dataset and take care of :
Special characters, space and accents are not allowed in the table and field names
Then create the dataset green button top right corner.
Give a description to the dataset
Click on summary in the top menu
And + Add a description.
In the needs of this example the step 3 was repeated to have another file in order to make a join between two files.
First click on flow
Then add a dataset by using the menu dataset in the top right corner
Step 4 join the 2 files
In the flow part
Click on the icon join with (in yellow/orange) in the right panel
Select the two tables
Give a name to the result of the join. Give it now because it cannot be renamed later.
and let the other parameters like that
Then make the join
When the join is finished click on Run on the bottom left corner
As you can see the join step was added in your flow and a new dataset suffixed by _joined is created :
Set a description of the new dataset it’s important the plugin sends data to BigQuery needs it.
On the bottom of the right panel find Send to BiQuery
Set the GDPR level and Click RUN MACRO
Be sure that table and fields names don’t have space or special characters (accent)
Be sure the dataset has a description
Well done you can Click on the link to see the result in BigQuery database.
Link to the wiki from Eric to indicates when use Qlik Sense and when use datastudio => Valentin
How to connect Qlik Sense :
Open Qlik Sense :
https://qliksensedev.solvay.com/
Create a new app :
Give a name to your application and click on create.
Then Open app.
Click on “Add data from files and other sources”
Choose Google Big Query
Click on Sign In and copy the code after authentification.
Then Validate.
Select the sol-self-service-datalake-prod Catalog
Click on create.
On the left click on the new data source you create (sol-self-service-datalake-prod_WINDOWS-ID)
Verify that owner is corresponding to your WINDOWS ID
Select your dataset
Click on Add data
Annex
Plugin send to bigquery
(Dataset plugin)
Who can use this plugin
All designers can use it. It’s free licencing for Solvay users, it was developed for Solvay only. All rights reserved to Solvay and.
Description
This plugin gets data from the DSS dataset and send it to the bigquery database in the Google Solvay Self Service datalake project (sol-self-service-datalake)
Operation
Data exports in CSV to local HDD in a temporary folder
A cloud storage bucket is created named as the windows login. This environment is shared to the user with owner access. If bucket already exists the plugin is skips this step
CSV are copied (or uploaded) to Google cloud storage
A BigQuery dataset is created named as the bucket. This dataset is shared to the user with owner access. If dataset already exists the plugin is skips this step
CSV in bucket cloud storage is loaded into BigQuery Dataset
Temporary folder is cleaned