• Data retrieval strategy from SharePoint:
    • Files, Lists, custom applications, OneLake File Explorer
    • security, best practices


Version

Date

Description

Contributor

V0.1

 

Initial document

COLOMBANI Théo

V0.2

 

Added to the wiki

COLOMBANI Théo

V0.3










1. Axis — Load into Lakehouse Files

1.1 OneLake Shortcut (SharePoint / OneDrive)

Description
Logical link exposing SharePoint folders in OneLake without data duplication.

Functioning

Key capabilities

Advantages

Limitations (decision drivers)


1.2 Custom ingestion — API (Notebook or Pipeline) → Files

Description
Extraction via Microsoft Graph or SharePoint REST API and storage in Lakehouse Files.

Execution models

Functioning

Key capabilities

Advantages

Limitations (decision drivers)


2. Axis — Load into Lakehouse Tables

2.1 Shortcut with transformation → Delta Tables

Description
Use of SharePoint shortcut with transformation to project files into Delta tables.

Functioning

Key capabilities

Advantages

Limitations (decision drivers)

2.2 Mirroring (SharePoint Lists)

Description
Replication of SharePoint Lists into OneLake as Delta tables.

Functioning

Key capabilities

Advantages

Limitations (decision drivers)

2.3 Custom ingestion — API (Notebook or Pipeline) → Tables

Description
API-based extraction with transformation and direct load into Delta tables.

Same comments from Section 1.2 Custom ingestion — API (Notebook or Pipeline) → Files


3. Considerations

API usage (Notebook vs Pipeline)

Notebook

Pipeline (Web / Copy Activity)

Security

Service principal recommended


4. MATRIX(s)

Synthesis

Data type

Load target

Options

Files

Files

Shortcut / API (Notebook or Pipeline)

Files

Tables

Shortcut + transformation / API (Notebook or Pipeline)

SharePoint Lists

Tables

Mirroring / API (Notebook or Pipeline)

Criteria

Criteria

Shortcut (Files)

Shortcut + Transform (Tables)

Mirroring (Lists)

API via Notebook

API via Pipeline 

(Web / Copy)



Data movement

No copy (virtual access) 

No copy (virtual + projection)

Physical copy (replication) 

Physical copy

Physical copy

Latency / freshness

Near real-time (source-driven)

Near real-time

Near real-time sync (incremental) 

Depends on orchestration

Depends on orchestration

Transformation capabilities

None

Limited

Limited

Full (Spark / code)

Limited (mapping / chaining)

Incremental / CDC logic

Not supported

Limited / implicit

Built-in incremental sync

Fully customizable

Manual implementation required

Handling complex structures

Limited (folder-based only)

Limited

Not applicable (structured only)

Strong capability

Moderate (complex via chaining)

Control over ingestion logic

None

Low

Low

Full

Medium

Operational complexity

Very low

Low

Low

High

Medium

Dependency on source 

availability

High

High

Low

Low (after ingestion)

Low (after ingestion)

Schema control / evolution

None

Limited

Limited

Full control

Medium control

Cost (compute / storage)

Low 

Low

Free

Higher (compute + dev)

Medium (pipeline runs)

Supported data types

Files only

Files (JSON, CSV, PARQUET, EXCEL) (structured)

SharePoint Lists only

All (files + lists)

All (files + lists via API)

Technical solutions (Fabric only recommended)