A Data Architecture consists of the following layers

  1. Business value (Reports, Exploratory Research, Insights, Actions): Data without direct or indirect business value is worthless. The costs of making data available, little as it might be, must be offset by the business value.
  2. Transformation (Rules): From raw data to information. In most cases raw data has little value. Only once it is distilled into key-performance-indicators (“KPIs”) and well documented attributes, its true value can be achieved.
  3. Data model (Navigation and self service): Defines the relationships between buckets of data and helps the user to navigate areas. The Business value of customer master data is little, the value of a list of sales orders is little as well. Looking at the relationship between the two helps to uncover new information, e.g. which region is prospering. A well designed and documented data model is the enabler for creating business value.
  4. Processing (Transformation flexibility): How sources provide data and how this is processed until the final consumer. The used architecture defines many qualities of the data consumption. For example, an Event Driven Architecture fosters low latency business decisions, a classic ETL based architecture is cheaper to build. The architecture also defines what kind of data and processing capabilities can be used. A SQL based architecture can cope with structured data only, a big data architecture with any kind of data.
  5. Consumption (APIs and Interfaces): How and for what purposes the data can be consumed. When building a Data Warehouse, the sole purpose is to enable self service analytics. With an Event Driven design, it can also be used for integrating systems asynchronous with each other, for alerting, business workflows,…
  6. Metadata (Data Governance): Data without context has no value. Only if one knows what a field means, how a KPI is defined, where to find certain data, its business meaning,.. data can be used. Metadata can be read (e.g. field name and data type), can be written down (e.g. documentation, business glossary), is created as by-product (e.g. the list of transformation steps a record did undergo, how often was data being used). Metadata can be of technical or business nature. Metadata can be about data sensitivity and permissions.
  7. Incentives (Data Producers): Why would somebody provide data or use it. Although there are cases where somebody needs certain data, providing data is usually a chore. Using data for the purpose of generating new insights cannot be enforced either. It would help to make the right decisions. Hence it is important to have incentives in producing and consuming the data. It should be easy to do so and there should be the self-motivation to be part of the process.


 

These layers build on each other.

  1. The most effort (company-wide) is to provide the data, but without data → no business value. Access to data is the foundation of everything. We have the situation at Syensqo at the moment where projects need data from other teams but fail to properly motivate them to provide it → Incentives.
  2. Now that all data is available, the result is a huge swamp of data. It is imperative to provide mechanisms for finding the desired data set by the users. This is achieved via a searchable catalog and the producers providing the information → Metadata.
  3. The best data available does not help, if it cannot be consumed as desired. The access method should support the desired style (push changes to inform about vs. pull data to query all or parts) and be an open interface like SQL to support as many tools as possible → Consumption.
  4. Similar the types of transformations possible. If the customer master data should be cleansed and addresses standardized, but there is no transformation option for that, the task cannot be achieved, the cleansed data cannot be provided and the query filtering on a city name does not return all records due to different spellings, falsifying the conclusions drawn from this query. The normal transformations must be simple, the complex possible → Processing.
  5. Another way to navigate within the pool of data sets is via relationships. The sales order has a relationship with customer and hence with buying location and other customer master data. Also, the data should be prepared in a way to help the query engine retrieving the data faster and hence will lower processing costs. Both are important properties for consumers to use the data. If the answer to a simple question takes 20 minutes, the question will probably not getting asked again - for the disadvantage of the company. Same if navigating the data can be done via searches only (Metadata) but not by moving along the relationships of data sets → Data Model.
  6. The current approach for many teams is for the business user to describe in a Jira ticket what he wants, a data architect translates it into a mapping document and the data engineer implements it. These are disjunct processes with a high potential of failure due to miscommunication or misunderstandings. Further more, even the business user does not know all the details right from the start. On the other hand, the higher the data quality is, the more insights will be derived for the benefit of the company. It should be the goal to enable the business user to provide the transformation rules in a way that can be executed by the code directly → Transformation.
  7. All of the above are steps to enable the users in making fact based decisions → Business value.


Displaying Screenshot 2025-03-27 at 19.58.56.jpeg

Most important, the Data Architecture must match the company culture and goals.

During the interviews, certain statements were heard repeatedly

This points towards an Event Driven architecture with a strong focus on Data Governance and a high degree of freedom for the users.

Why Data projects fail

There are typical pitfalls when it comes to data projects. Knowing these helps to avoid them and be successful with the project, short- and long-term.

Project Charter

Hence the following project charter shall be defined.

Syensqo seeks to modernize its data platform and avoid the current issues, which are around data governance, finding the data and more data being available to all.

Guiding principles

  1. The platform should help, not create restrictions. It should be fun and easy to use.
  2. The platform should be useful for all scenarios, from reporting and analytics to application integration. One system for all patterns.
  3. The platform should offer incentives to provide data. The minimum is to show how often data has been used by others.