Acess request : 

First you need to get access to :

  • DataIku to treat easily the data (make filter, add calculated fields, merge two files...)
  • BigQuery self service datalake to store my data
  • Qliksense to create my own customized reports on my data


In order to get those service just write a demand with SolvIA. You can find a pretyped request below :  

Hello,

I want to use the selfservice datalake tool. Could you pleased grant access to me to :

-BigQuery self service datalake

-DataIku

-Qliksense


Have a great day

How to use the self service datalake?





The idea is to propose a complete environment with : 

  • A place to manipulate data (join, merge, filter, formula...) => DSS by Dataiku

  • A place to store the results of the treatments from previous step accessible by all Solvay reporting tools => BigQuery by Google

  • A place to analyse, highlight, share your data => Qlik Sense by Qlik

Step 1 : Open dataiku

http://dss.solvay.com

Step 2 : create your project

Click on the upper right corner to add a project.
Select blank project

Set a name to your project. Advice : put your name and a data context.

Step 3 add your xls file into DSS Dataiku

Click on the import dataset button in the middle of the screen in blue

Select Files (first icon in the screen shot) then click and slide your file

On the top right corner set a name to the dataset and take care of :

Special characters, space and accents are not allowed in the table and field names

Then create the dataset green button top right corner.



Give a description to the dataset

Click on summary in the top menu

And + Add a description.

In the needs of this example the step 3 was repeated to have another file in order to make a join between two files.

First click on flow

Then add a dataset by using the menu dataset in the top right corner 



Step 4 join the 2 files

In the flow part

Click on the icon join with (in yellow/orange) in the right panel



Select the two tables

Give a name to the result of the join. Give it now because it cannot be renamed later.

and let the other parameters like that

Then make the join

When the join is finished click on Run on the bottom left corner

As you can see the join step was added in your flow and a new dataset suffixed by _joined is created :

Set a description of the new dataset it’s important the plugin sends data to BigQuery needs it.


On the bottom of the right panel find Send to BiQuery


Set the GDPR level and Click RUN MACRO

Be sure that table and fields names don’t have space or special characters (accent)

Be sure the dataset has a description

Well done you can Click on the link to see the result in BigQuery database.

Link to the wiki from Eric to indicates when use Qlik Sense and when use datastudio => Valentin


How to connect Qlik Sense :

Open Qlik Sense : 

https://qliksensedev.solvay.com/

Create a new app :


Give a name to your application and click on create.

Then Open app.

Click on “Add data from files and other sources

Choose Google Big Query

Click on Sign In and copy the code after authentification.

Then Validate.


Select the sol-self-service-datalake-prod Catalog

Click on create.

On the left click on the new data source you create (sol-self-service-datalake-prod_WINDOWS-ID)


Verify that owner is corresponding to your WINDOWS ID

Select your dataset

Click on Add data



Annex


Plugin send to bigquery

(Dataset plugin)

Who can use this plugin

All designers can use it. It’s free licencing for Solvay users, it was developed for Solvay only. All rights reserved to Solvay and.

Description

This plugin gets data from the DSS dataset and send it to the bigquery database in the Google Solvay Self Service datalake project (sol-self-service-datalake)

Operation

  1. Data exports in CSV to local HDD in a temporary folder

  2. A cloud storage bucket is created named as the windows login. This environment is shared to the user with owner access. If bucket already exists the plugin is skips this step

  3. CSV are copied (or uploaded) to Google cloud storage

  4. A BigQuery dataset is created named as the bucket. This dataset is shared to the user with owner access. If dataset already exists the plugin is skips this step

  5. CSV in bucket cloud storage is loaded into BigQuery Dataset

  6. Temporary folder is cleaned

  • No labels