Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
 

Connect to data

The first task when using Data Science Studio is to define datasets to connect to your data sources.

A dataset is a series of records with the same schema. It is quite analogous to a table in the SQL world.

For a more global explanation about the different kinds of datasets, see the Concepts page.

  • Supported connections
  • SQL databases
  • Cassandra
  • Amazon S3
  • Google Cloud Storage
  • Azure Blob Storage
  • ElasticSearch
  • FTP
  • SSH / SCP / SFTP (cached)
  • HTTP (cached)
  • FTP (cached)
  • “Files in folder” dataset
  • Dataset plugins
  • Data connectivity macros
  • Making relocatable managed datasets
  • Relocation of SQL datasets
  • Relocation of HDFS datasets