
- Data retrieval strategy from SharePoint:
- Files, Lists, custom applications, OneLake File Explorer
- security, best practices
|
Version | Date | Description | Contributor |
V0.1 | | Initial document | COLOMBANI Théo |
V0.2 | | Added to the wiki | COLOMBANI Théo |
V0.3 |
|
|
|
|
|
|
|

1. Axis — Load into Lakehouse Files
1.1 OneLake Shortcut (SharePoint / OneDrive)
Description
Logical link exposing SharePoint folders in OneLake without data duplication.
Functioning
- Shortcut points to a SharePoint folder (folder-level only)
- Data remains in SharePoint and is accessed virtually
- Accessible across Fabric workloads
Key capabilities
- Data virtualization (no physical copy)
- Automatic synchronization with source changes
- Unified access through OneLake
Advantages
- No pipelines or ETL required
- No data duplication
- Fast implementation
- Unified access layer
Limitations (decision drivers)
- Folder-level granularity only
- Performance dependent on SharePoint (latency, throttling)
- No control over ingestion (no filtering, no incremental logic)
- Runtime dependency on source availability
- Not suitable when strong data isolation or historization is required
1.2 Custom ingestion — API (Notebook or Pipeline) → Files
Description
Extraction via Microsoft Graph or SharePoint REST API and storage in Lakehouse Files.
Execution models
- Notebook (Spark / Python)
- Data Pipeline:
- Web Activity (REST calls)
- Copy Activity with API source
Functioning
- API calls to retrieve files or metadata
- Data written into OneLake Files
Key capabilities
- Supports full SharePoint surface (files, folders, metadata)
- Custom ingestion logic (filtering, incremental, structuring)
- Can be orchestrated via pipelines
Advantages
- Full flexibility on ingestion logic
- Ability to implement incremental loads (delta, watermark) at bronze layer
- Can handle complex folder structures and edge cases
- Works even when no native connector exists
Limitations (decision drivers)
- Requires handling:
- authentication (OAuth / Service Principal)
- pagination (@odata.nextLink)
- API rate limits / throttling
- More complex error handling and retry logic
- Development and maintenance effort
- Pipeline Web Activity is stateless (no built-in transformation)
- Copy Activity / Web Activity require manual schema handling
2. Axis — Load into Lakehouse Tables
2.1 Shortcut with transformation → Delta Tables
Description
Use of SharePoint shortcut with transformation to project files into Delta tables.
Functioning
- Shortcut exposes files
- Transformation step converts them into structured tables
- Tables remain synchronized with source
Key capabilities
- Automatic file-to-table conversion
- Continuous synchronization
- Direct consumption in SQL / BI
Advantages
- No pipeline required
- Direct analytical usability
- Integrated with OneLake
Limitations (decision drivers)
- Strong dependency on source file structure and quality
- Limited transformation capabilities compared to ETL
- Limited control over schema evolution
- Debugging and lineage less explicit than pipeline-based ingestion
2.2 Mirroring (SharePoint Lists)
Description
Replication of SharePoint Lists into OneLake as Delta tables.
Functioning
- Connection to SharePoint list
- Continuous replication into Fabric tables
- Automatic synchronization
Key capabilities
- Near real-time data replication
- Native Delta format
- No ETL required
Advantages
- Continuous synchronization
- Simplified ingestion architecture
- Direct usability for analytics
Limitations (decision drivers)
- Limited to structured data (lists only)
- Limited transformation capabilities during ingestion
- Dependency on mirroring feature availability and scope
- Limited control over ingestion logic (filters, enrichment)
- Schema evolution handled automatically but with limited customization
2.3 Custom ingestion — API (Notebook or Pipeline) → Tables
Description
API-based extraction with transformation and direct load into Delta tables.
Same comments from Section 1.2 Custom ingestion — API (Notebook or Pipeline) → Files
3. Considerations
API usage (Notebook vs Pipeline)
Notebook
- Better suited for:
- complex transformations
- large data processing
- advanced logic (joins, enrichment)
Pipeline (Web / Copy Activity)
- Better suited for:
- orchestration
- simple ingestion patterns
- metadata-driven ingestion
Security
- Authentication methods:
- Organizational account
- Workspace identity
Service principal recommended
- API-based approaches require:
- token management
- permission configuration (e.g. Sites.Read.All)
4.
Criteria
Criteria | Shortcut (Files) | Shortcut + Transform (Tables) | Mirroring (Lists) | API via Notebook | API via Pipeline (Web / Copy) |
Data movement | No copy (virtual access) | No copy (virtual + projection) | Physical copy (replication) | Physical copy | Physical copy |
Latency / freshness | Near real-time (source-driven) | Near real-time | Near real-time sync (incremental) | Depends on orchestration | Depends on orchestration |
Transformation capabilities | None | Limited | Limited | Full (Spark / code) | Limited (mapping / chaining) |
Incremental / CDC logic | Not supported | Limited / implicit | Built-in incremental sync | Fully customizable | Manual implementation required |
Handling complex structures | Limited (folder-based only) | Limited | Not applicable (structured only) | Strong capability | Moderate (complex via chaining) |
Control over ingestion logic | None | Low | Low | Full | Medium |
Operational complexity | Very low | Low | Low | High | Medium |
Dependency on source availability | High | High | Low | Low (after ingestion) | Low (after ingestion) |
Schema control / evolution | None | Limited | Limited | Full control | Medium control |
Cost (compute / storage) | Low | Low | Free | Higher (compute + dev) | Medium (pipeline runs) |
Supported data types | Files only | Files (JSON, CSV, PARQUET, EXCEL) (structured) | SharePoint Lists only | All (files + lists) | All (files + lists via API) |
Technical solutions (Fabric only recommended)