You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

SYSM-354 - Getting issue details... STATUS

  • Data retrieval strategy from SharePoint:
    • Files, Lists, custom applications, OneLake File Explorer
    • security, best practices


Version

Date

Description

Contributor

V0.1

 

Initial document

COLOMBANI Théo

V0.2

 

Added to the wiki

COLOMBANI Théo

V0.3










1. Axis — Load into Lakehouse Files

1.1 OneLake Shortcut (SharePoint / OneDrive)

Description
Logical link exposing SharePoint folders in OneLake without data duplication.

Functioning

  • Shortcut points to a SharePoint folder (folder-level only)
  • Data remains in SharePoint and is accessed virtually
  • Accessible across Fabric workloads

Key capabilities

  • Data virtualization (no physical copy)
  • Automatic synchronization with source changes
  • Unified access through OneLake

Advantages

  • No pipelines or ETL required
  • No data duplication
  • Fast implementation
  • Unified access layer

Limitations (decision drivers)

  • Folder-level granularity only
  • Performance dependent on SharePoint (latency, throttling)
  • No control over ingestion (no filtering, no incremental logic)
  • Runtime dependency on source availability
  • Not suitable when strong data isolation or historization is required


1.2 Custom ingestion — API (Notebook or Pipeline) → Files

Description
Extraction via Microsoft Graph or SharePoint REST API and storage in Lakehouse Files.

Execution models

  • Notebook (Spark / Python)
  • Data Pipeline:
    • Web Activity (REST calls)
    • Copy Activity with API source

Functioning

  • API calls to retrieve files or metadata
  • Data written into OneLake Files

Key capabilities

  • Supports full SharePoint surface (files, folders, metadata)
  • Custom ingestion logic (filtering, incremental, structuring)
  • Can be orchestrated via pipelines

Advantages

  • Full flexibility on ingestion logic
  • Ability to implement incremental loads (delta, watermark) at bronze layer
  • Can handle complex folder structures and edge cases
  • Works even when no native connector exists

Limitations (decision drivers)

  • Requires handling:
    • authentication (OAuth / Service Principal)
    • pagination (@odata.nextLink)
    • API rate limits / throttling
  • More complex error handling and retry logic
  • Development and maintenance effort
  • Pipeline Web Activity is stateless (no built-in transformation)
  • Copy Activity / Web Activity require manual schema handling

2. Axis — Load into Lakehouse Tables

2.1 Shortcut with transformation → Delta Tables

Description
Use of SharePoint shortcut with transformation to project files into Delta tables.

Functioning

  • Shortcut exposes files
  • Transformation step converts them into structured tables
  • Tables remain synchronized with source

Key capabilities

  • Automatic file-to-table conversion
  • Continuous synchronization
  • Direct consumption in SQL / BI

Advantages

  • No pipeline required
  • Direct analytical usability
  • Integrated with OneLake

Limitations (decision drivers)

  • Strong dependency on source file structure and quality
  • Limited transformation capabilities compared to ETL
  • Limited control over schema evolution
  • Debugging and lineage less explicit than pipeline-based ingestion

2.2 Mirroring (SharePoint Lists)

Description
Replication of SharePoint Lists into OneLake as Delta tables.

Functioning

  • Connection to SharePoint list
  • Continuous replication into Fabric tables
  • Automatic synchronization

Key capabilities

  • Near real-time data replication
  • Native Delta format
  • No ETL required

Advantages

  • Continuous synchronization
  • Simplified ingestion architecture
  • Direct usability for analytics

Limitations (decision drivers)

  • Limited to structured data (lists only)
  • Limited transformation capabilities during ingestion
  • Dependency on mirroring feature availability and scope
  • Limited control over ingestion logic (filters, enrichment)
  • Schema evolution handled automatically but with limited customization

2.3 Custom ingestion — API (Notebook or Pipeline) → Tables

Description
API-based extraction with transformation and direct load into Delta tables.

Same comments from Section 1.2 Custom ingestion — API (Notebook or Pipeline) → Files


3. Considerations

API usage (Notebook vs Pipeline)

Notebook

  • Better suited for:
    • complex transformations
    • large data processing
    • advanced logic (joins, enrichment)

Pipeline (Web / Copy Activity)

  • Better suited for:
    • orchestration
    • simple ingestion patterns
    • metadata-driven ingestion

Security

  • Authentication methods:
    • Organizational account
    • Workspace identity

Service principal recommended

  • API-based approaches require:
    • token management
    • permission configuration (e.g. Sites.Read.All)

3. MATRIX(s)


Synthesis


Data type

Load target

Options

Files

Files

Shortcut / API (Notebook or Pipeline)

Files

Tables

Shortcut + transformation / API (Notebook or Pipeline)

SharePoint Lists

Tables

Mirroring / API (Notebook or Pipeline)

Criteria


Criteria

Shortcut (Files)

Shortcut + Transform (Tables)

Mirroring (Lists)

API via Notebook

API via Pipeline 

(Web / Copy)



Data movement

No copy (virtual access) 

No copy (virtual + projection)

Physical copy (replication) 

Physical copy

Physical copy

Latency / freshness

Near real-time (source-driven)

Near real-time

Near real-time sync (incremental) 

Depends on orchestration

Depends on orchestration

Transformation capabilities

None

Limited

Limited

Full (Spark / code)

Limited (mapping / chaining)

Incremental / CDC logic

Not supported

Limited / implicit

Built-in incremental sync

Fully customizable

Manual implementation required

Handling complex structures

Limited (folder-based only)

Limited

Not applicable (structured only)

Strong capability

Moderate (complex via chaining)

Control over ingestion logic

None

Low

Low

Full

Medium

Operational complexity

Very low

Low

Low

High

Medium

Dependency on source 

availability

High

High

Low

Low (after ingestion)

Low (after ingestion)

Schema control / evolution

None

Limited

Limited

Full control

Medium control

Cost (compute / storage)

Low 

Low

Free

Higher (compute + dev)

Medium (pipeline runs)

Supported data types

Files only

Files (JSON, CSV, PARQUET, EXCEL) (structured)

SharePoint Lists only

All (files + lists)

All (files + lists via API)

Technical solutions (Fabric only recommended)

  • P1 : SharePoint Shortcuts,
    • Directly to Silver Tables Lakehouse with auto transform in delta (see Référence) -> newly working for .xlsx to delta table (only csv is working)
    • or to files zone Lakehouse (csv shortcut) then transformation to silver tables
      • Triggers on OneLake Events ? -> Trigger Events not working for shortcuts. 
  • P2 : PowerQuery code through notebooks
  • P3 : DataflowGen2 
  • P4 : Pipelines via API (doable in Azure)
  • No labels