We ❤️ Open Source

A community education resource

8 min read

Rethinking data infrastructure: A guide to AI-ready systems

From pipelines to policy: How to build a data flywheel for scalable, secure, and intelligent infrastructure.

Data is fashionable again with the advent of AI everywhere. For data veterans like myself this is not new as every new wave of innovation casts the spotlight on the importance of good data management principles. “Data is the new oil” has been thrown around more than once by each pioneer realizing the ephemeral value of user experiences or augmented intelligence.

So, it is a good time to revisit some of the basic principles for ensuring that this abstract entity called “data” can be a true asset and not become a liability as everyone from a solopreneur to the largest corporations rush to capitalize on the AI gold rush.

In this blog I will highlight the need for a very intentional approach to investing and managing your data infrastructure so that marginal investments in newer technologies like blockchain or AI can deliver truly transformative outcomes.

What is data? Managed data vs unstructured

Let’s start by defining “data” – it is everywhere and yet most people take it for granted until something goes wrong. On the one hand there is “managed” data – think of databases, data warehouses, mainframes etc. These are typically very thoughtfully designed, tightly controlled and very methodically managed. Then there is everything else – for e.g. customer interactions on social media, financial filings, contracts, video and audio feeds, and many more.

Drawing a clean line of ownership around all this data is hard! It might be easier to find the pot of gold at the end of a rainbow! Yet, technology and AI capabilities are so sophisticated that they are able to ingest all of the available data and also with increasing accuracy, make decisions.

That brings us to the point of this article – how should you think about your data infrastructure so that your business can leverage the rapid advances in data processing and not risk being left behind?

Read more: Demystifying external data as a service

Rethinking data infrastructure: Your robust data pipeline

“Infrastructure” is typically associated with hardware – servers, disks, networks, databases etc. True, and, since cloud computing introduced “Infrastructure as code”, this now also includes pipelines, repositories, workflows, logs and many more components. Taking a holistic approach to data is a key transformational concept to get started. Leading edge businesses with data intelligence at their core have adopted a data-centric approach to managing their data from every part of their business. Let’s look at the key insights that we can learn from these pioneers.

Data does not belong to a single application or a department – indeed data knows no boundaries. The basic premise here is that data that is critical to the business – customers, products, producers, pricing and promotions, customer feedback, regulatory – all of this must be freely and readily shareable by every part of the business. Artificially locking critical data behind restricted application interfaces or throwing everything into a data lake or warehouse and taking a free-for-all approach are both antithetical to this principle. Rather, what is needed is a systemic approach not dissimilar to a manufacturing assembly line.

So that’s concept number one – Does my data infrastructure have the rails to share critical data across the business in a friction-less manner – readily available to those who will need it, in a format that works for their needs, at a timely frequency that they can make a useful decision?

In a subsequent blog we will explore this concept a bit more in detail to understand what this looks like in practice. For now, let’s assume that we have got concept one down.

Read more: How strong data governance enables AI and machine learning growth

Driving adoption and enabling a data-driven culture

Now that you have built a robust data pipeline that enables sharing critical information in a democratic manner will there be widespread usage of this resource? In other words, if you build it, will they come and get it?

To counter traditional tendencies around information hoarding, access to the available information must be well understood and easily attainable. In other words, awareness and education about what to access (e.g. metadata and data dictionary) and how to access it (access rights and roles, supported tools) as easily as doing an online search is table stakes to make data a core competency.

Fortunately, with the ever-improving landscape for tools and technologies, this is getting easier to solve every day. Modern implementations increasingly rely on cloud-based, scalable platforms that can reliably process high volumes fast, and provide sophisticated visualization capabilities. With generative AI and Large Language Models (LLMs), even writing queries has now become as easy as speaking.

That’s concept number two – Does my data infrastructure actively promote adoption of a data-driven culture across a broad part of my business, or are teams and individuals looking for local optimizations or simply spinning their wheels trying to find what is available, where to find it and how to access it?

Integrating security, policy, and enforcement

Now it’s getting interesting – if our data infrastructure enables friction-free access and usage of critical information, what about security and risk? How do I know that the information does not fall into the wrong hands or gets misused or gets compromised?

This is a very important consideration and for most businesses, it is a persistent existential threat. Here again, technological advancements in data security and cyber-crime prevention evolve rapidly and are capable of implementing minute fine grained security, provide traceability at every step of information flow and make it really easy to monitor and detect threats or security breaches.

Despite technology advances, what creates data breaches or rogue actors accessing inappropriate information is human error, often exploited through a technology loophole. When security is implemented as an afterthought and is predominantly reliant on the technology to identify and mitigate risk, there is bound to be a loophole that will be exploited. Rather, security policies, access rights and information usage rules have to be codified and implemented before the first transaction has been completed or the first user has accessed the information.

So that brings up concept three – Does my data infrastructure implement security and risk mitigation at its core, and is supported by intentional policies and procedures on information rights, and backed by an organizational structure to implement and enforce them? Or, is my security plan a patchwork of buzzy technology that has been slapped together opportunistically and the tools don’t play well with one another, or, I don’t have the organizational set up to do a comprehensive implementation and enforce the policies and procedures?

Building a data flywheel for real results

That brings us to the all-important question – if I do one, two and three right – what do I get for it ? Will the substantial investments in making data available broadly, in a secure yet friction-free manner, aided by sophisticated visualizations translate to a positive ROI and more importantly – outcomes that matter – such as increasing market share or improving unit profitability?

This is almost entirely an organizational and business imperative. Does my organization have a penchant for paper reports that are created for a specific process as a reactive control? Do key decision makers habitually rely on gut instinct for routine decisions instead of relying on business facts? Are my KPIs or OKRs set up to measure what matters to the business now or are they more feel-good metrics that accentuate pet projects?

To counter the above, it helps to treat data as a product that can create a flywheel effect for other products. This might sound novel to some, as intelligent organizations are adept at managing data using the best of product management – complete with a vision, go to market strategy and sustain market share post launch. Productizing critical data positions the business to be very intentional about the approach to data and how it becomes an asset, not a liability for the business.

So that rounds up concept four – does my data infrastructure consistently enable a platform, to product, to projects, pipeline for my business to grow and adapt my real products in the marketplace?

Conclusion

These concepts are just as applicable to an entire business or a small portion. If you are one of the aspiring unicorn solopreneurs this might very well be the secret sauce that lets you scale up and scale out ahead of your competition.

More from We Love Open Source

About the Author

Ganesh is a semi-retired product veteran specializing in Data and Analytics with 30+ years of experience in financial services. Ganesh is available for Fractional Product Management or Fractional Data & Analytics Leader roles in a consulting or advisory capacity.

Read Ganesh Jayaraman's Full Bio

The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.

Want to contribute your open source content?

Contribute to We ❤️ Open Source

Help educate our community by contributing a blog post, tutorial, or how-to.

This year we're hosting two world-class events!

Join us for AllThingsOpen.ai, March 17-18, and All Things Open 2025, October 12-14.

Open Source Meetups

We host some of the most active open source meetups in the U.S. Get more info and RSVP to an upcoming event.