An architectural quantum is an ‘independently deployable component with high functional cohesion, which includes all the structural elements required for the system to function properly’ according to O’Reilly. In layman’s terms, it’s something that can pretty much function independently.
When you think of the day in the life of a data engineer in a company it can make you wince a little. Like a brave explorer in a dense jungle of tangled vines and hidden traps, a data engineer navigates the treacherous landscape of data pipelines and infrastructure. Armed with their tools of coding and problem-solving, they venture into the heart of complex databases, battling erratic data flows, elusive bugs, and integration mishaps. Their mission is to unravel the mysteries of data, transforming it into valuable insights. Yet, with each step forward, they encounter new challenges that test their patience and expertise. From wrestling with data inconsistencies to deciphering cryptic error messages, the data engineer’s journey is a perpetual quest for order and efficiency in a chaotic realm.
What’s the problem?
Everything, everywhere, all at once. A good film, but also an explanation for something like Snowflake. Imagine having all of your data dumped. Good data, bad data, useful data, and irrelevant data all mixed up in a cocktail of confusion. Of course, you probably don’t have to imagine, it’s your working life.
In a recent post by Chad Sanderson, LinkedIn data influencer and data contract advocate, he wrote,
The vast majority of data models, cloud spend, and dashboards are unnecessary.
The majority of businesses have a relatively small number of business-critical pipelines that power ROI-associated assets like ML/AI models, financial reporting pipelines, and other production-grade data products.
However, because these data products often sit downstream of a tangled mess of spaghetti SQL, the data assets which generate the most value are often completely under-served, lack data contracts, monitoring, CI/CD, alerting, and ownership.
Data products that should be incrementally improved are left to rot, because the cost of refactoring the entire upstream pipeline is far too high for the upside.
My advice: Forget about complete warehouse refactors (which will be out of date in 6 months anyway). Focus ONLY on the most valuable, high ROI data sets. Data consumers and producers should work together to create data contracts around the core schemas, layer in strong change management, and apply quality downstream.
Some great places to start with data contracts:
- Financial data
- sev0 ML models like pricing or offer relevance
- Data that is surfaced to a 3rd party customer or embedded in an app
- PII or other highly regulated datasets
Chad Sanderson Tweet
This resonated with us and is a lot to do with our recent blog focusing on the business need to prioritize good data over big data.
Architectural quantum, I’ve heard that a lot recently
You may have heard the term architectural quantum used more frequently in recent years. That’s because Zhamak Dehghani used the term when outlining her Data Mesh vision. In particular, it’s used to describe data as a product.
It is described in a way we couldn’t better in Martin Fowler’s excellent data mesh principles post,
Logical architecture: data product the architectural quantum
Architecturally, to support data as a product that domains can autonomously serve or consume, data mesh introduces the concept of data product as its architectural quantum. Architectural quantum, as defined by Evolutionary Architecture, is the smallest unit of architecture that can be independently deployed with high functional cohesion, and includes all the structural elements required for its function.
Data product is the node on the mesh that encapsulates three structural components (code, data and metadata, and infrastructure) required for its function, providing access to the domain's analytical data as a product.
Martin Fowler's Principles of Data Mesh Tweet
Data mesh is too big a project to get buy-in
Data mesh might not necessarily be the right solution for you right now. It could be the long-term solution but with investment tight and the global economy looking iffy, a major change in organization-wide architecture may not be suitable.
That doesn’t mean that seeking other ways to make better use of data and better use of the data team’s talent is out of the question.
Going back to Chad’s LinkedIn post, he suggests data consumers and producers should work together to create data contracts around the core schemas and layer in strong change management to apply quality downstream.
Why not take this one step further with a solution that provides an architectural quantum?
Take TerminusDB or TerminusCMS for example. It’s designed to provide everything that a data team and data consumers would need. Firstly you can model your schema and give the schema meaning using metadata and documentation. It includes change management. Data can be imported and accessed via an API and you can even use the document UI SDK to build dashboards and interfaces. For business critical use cases, it’s ideal. Smaller to manage, easier to access, share and use, and requires a single data engineer rather than a team to manage. Even if the solution isn’t TerminusDB, there are countless other data management platforms that could provide the domain-based architectural quantum.
When a lot of company data isn’t used (this figure varies wildly depending on the study source), taking Chad’s advice and honing down on the areas that actually provide value to the business will ease the burden of overworked data teams, bring domains closer to data ownership, and enable businesses to use their data faster making them more agile.