Dataiku Ebook

Modern Data Architecture Fundamentals

3 Keys to Supporting the Move to Everyday AI

Data architecture is both complex and constantly changing. How can it support the scaling of AI across an organization? 

This ebook provides three key recommendations, including inspiration from the modern data stack, for IT teams looking to build for the future of data democratization.

3d ebook cover that says '3 keys to a modern data architecture strategy fit for scaling ai'

Get the Ebook

Scale Data Architecture by Asking the Right Questions

Align Data Architecture With Business Needs

Let’s face it: architecture frameworks start to decay as soon as someone puts them on a slide. If there’s one thing we’ve learned at Dataiku after talking to thousands of prospects and customers about their data architecture, it’s that they also tend to be more aspirational than realistic. That's beacuse at the enterprise level, data architecture is both complex and constantly changing. 

So when it comes to a modern data architecture strategy, the most important factor is not actually the what but the how, answering questions such as:

  • Is your architecture agile enough to be able to easily adapt to changing needs and technology requirements?
  • As data ambitions across the organization evolve in the next year, five years, 10 years, will the data architecture be able to support it?
  • Is the team thinking about the business strategy around data and architecture, not just for downstream consumption, but also upstream (i.e., how data will be made available to other services like applications, APIs, website, etc.)?
IT professional pointing to a data architecture diagram on a whiteboard
data architects drawing a data architecture diagram on a whiteboard

Rethink the Role of IT

Support the Move to Data Democratization

Ultimately, the modern data stack is about providing a seamless experience for all users, no matter what their data needs are. It:

  1. Allows coders to do advanced data science on top of cloud data warehouses (including pushing down data processing tasks but also having the ability to operationalize data science projects quickly, to be leveraged by consumers on the business side) and
  2. Allows non-coders (like analysts) to do their own data transformation plus advanced data work (e.g., predictive use cases) and
  3. Automates and orchestrates the operationalization piece, including pushing the results of multi-tool analysis back to the SaaS tools business users are leveraging.

Even for organizations that have a much more complex existing, legacy setup and therefore can’t fully leverage the simplicity of the modern data stack, the goal of providing a seamless experience for all users to work with data is a valuable takeaway.

Scaling AI from a data architecture perspective requires rethinking the role of IT itself.  Business objectives should inform data architecture — not the other way around. People across all lines of business, including those without formal data analysis training, need to be able to access and use data for their day-to-day work. That means providing people with the tools to access and use data is the core of IT’s role in the modern enterprise. To avoid becoming burdened with data processing and integration jobs, AI platforms (like Dataiku) can ease the burden on IT teams

The Role of Dataiku in Modern Data Architecture

Data Democratization With Dataiku

Dataiku was built from the ground up to be one central, controlled environment used by a range of profiles. This includes low-code analysts and no-code contributors on the business side. But it’s not just a low- and no-code solution. Dataiku  and offers robust features to give IT teams maximum flexibility yet control over architecture:

  • Dataiku can run on-premise or in the cloud — with supported instances on Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure — integrating with storage and various computational layers for each cloud.
  • Dataiku uses a pushdown architecture to allow organizations to take advantage of existing, elastic, and highly scalable computing systems, including SQL databases, Spark, Kubernetes, and more.
  • Dataiku provides a fully managed Kubernetes solution that is compatible with all of the major cloud container services — Amazon EKS, Google Kubernetes Engine (GKE), and Azure Kubernetes Service (AKS) — as well as with on-premises Kubernetes/Docker clusters.
  • Dataiku supports the use of both CPUs and GPUs for model training. If multiple GPUs are available, Dataiku can distribute model training workloads across the GPUs to dramatically decrease training time.
  • In the Dataiku project flow, all visual components are reusable and portable. Individual preparation steps or entire sections of a flow (datasets and recipes together) can also be shared externally to other projects, allowing users to rename and re-tag objects in the process.
  • Organizations can extend the power of Dataiku with custom plugins. The Dataiku plugin library includes over 100 plugins that enhance existing Dataiku instances, including access to new data sources, charts, programming languages, algorithms and modeling techniques, partner integrations, and more.

    For those on the technical side — like data scientists, but also data engineers, architects, and more — Dataiku facilitates quick experimentation and operationalization for machine learning at scale.

Learn More
New_flow_collaboration_graphic-2