Building a solid data foundation: Bridging traditional BI and GenAI

  • Poor data quality undermines both BI and GenAI initiatives, making strong data foundations essential
  • GenAI enhances, not replaces, traditional BI
  • Combining traditional data engineering with AI agents enables more accurate, scalable and efficient outcomes for tasks such as data remediation

After working in Business Intelligence (BI) for over a decade, I’ve witnessed the data landscape evolve in surprising ways, especially over the last two years, as I’ve pivoted toward generative AI (GenAI). Despite Gen AI’s growing emphasis on unstructured data, I’ve found that a solid data foundation remains essential. By blending the tried and tested principles of BI with cutting-edge GenAI techniques, organisations can uncover deeper insights that drive real change.

The data imperative

For years, I've worked with organisations to prep and load data into data warehouses and dashboards. In the last two years, I’ve pivoted to GenAI projects that unlock new possibilities. Yet whether I’m working on a classic BI report or with AI agents, one thing is non-negotiable: quality data.

  • Data quality and governance
    Whether handling spreadsheets or social media feeds, having data that is clean, relevant and unbiased is critical. Poor quality data can lead to misleading outputs that could cost the business dearly.
  • Structured meets unstructured
    Traditional BI tends to focus primarily on structured data, while GenAI, on the other hand, is well-suited to working with a wide range of unstructured inputs such as voice recordings and PDFs. But no matter how sophisticated the model, poor or disorganised data can lead to misleading insights. Fixing errors, filling gaps and labelling data properly enables more accurate AI outcomes.
  • BI and AI under one roof
    I’ve learned that bridging BI and GenAI demands a well-designed data ecosystem. Data must be consistent enough to drive reliable dashboards but also flexible enough to feed into large language models (LLMs).

Use case: Transforming data remediation with GenAI

I recently worked on a data remediation initiative at a retail enterprise that had grown through several acquisitions. Over the course of a decade, each merger brought in its own set of data standards, siloed systems and inconsistent naming conventions. Eventually, all of this landed in one centralised data warehouse (DW), a place that had become increasingly tangled with duplicate customer profiles, incomplete data and mismatched records.

The pain points

  1. Duplicates and inconsistencies
    The client confirmed they frequently encountered inconsistencies, such as customers with multiple IDs or the same product listed under different stock-keeping units (SKUs). Because the source systems – primarily operational in nature – didn't enforce uniform standards, the data warehouse ended up with redundant records.
  2. Manual cleansing
    The client had spent hours trying to reconcile, merge or delete these duplicates to ensure reports were somewhat accurate. This manual process was both time consuming and error prone.
  3. Incorrect analytics
    The client asked us to help with predictive models for targeted marketing. But every time we tried to train our algorithms, we ran into problems caused by conflicting or incomplete records. It was clear we needed a robust cleanup strategy first.

Traditional remediation approach – What we used to do

Before GenAI, we used to rely on a rule-based ETL (Extract, Transform, Load) process:

  • Business rules for deduplication: We would set up matching rules, comparing first and last names, emails, phone numbers and so on – to identify potential duplicates.
  • Batch cleansing cycles: We would schedule cleansing pipelines to run overnight, merging or purging records based on predefined logic to reduce operational impact.
  • Maintenance overhead: As new data sources came in we would need to continually add or modify these rules. Even minor format changes would require us to update multiple scripts.

Though this approach used to get the job done, we would still end up with inconsistencies, and the time and effort to maintain all these rules would keep increasing.

Embracing GenAI for data remediation 

We replaced our rule-based traditional system with AI agents that use semantic understanding for deduplication, taps into data for enrichment, and learns continuously through a feedback loop, significantly improving data remediation accuracy.  

Throughout this process, I noticed that no matter how advanced your Gen AI solution, a clean and well-designed data ecosystem remains essential. 

  • Providing anchors for AI: Clean, consistent tables, where fields like “Customer_Address” or “Product_SKU” followed strict formats, served as foundational anchors. These clear references helped the AI model cross-check unstructured text more confidently.
  • Reducing model confusion: Whenever data was too messy or lacked a standardised format, the GenAI’s inferences became less reliable. Consistent formatting, validated and consistent taxonomy, and canonical product references acted like guardrails, ensuring better AI-driven insights.

What I learned

  • A hybrid approach works best: Gen AI didn’t replace the traditional rule-based scripts entirely, it complemented them. For straightforward duplicates, the rules still work. For ambiguous cases, the AI model shines.
  • Data governance is non-negotiable: Documenting ownership and transformation rules remains critical. AI models thrive in an environment where data lineage is clear and feedback loops are systematic.
  • Steady improvements over time: The magic of Gen AI is that it can learn. Every user validation or correction improved the next round of suggestions – a continuous cycle that made data remediation more efficient over time.

In short, using Gen AI for data remediation was a transformative experience. Traditional rule-based processes laid the groundwork, but the leap came when we used GenAI to interpret context and unstructured data. With a well-designed and structured data backbone, GenAI could deliver smarter, faster and more reliable remediation, making the business more data-driven in the process.

Looking ahead

The way we work with data is set to evolve in remarkable ways. As traditional BI and Gen AI continue to blend, we can expect more immediate, practical insights delivered when they are needed most. Emerging trends such as real-time analytics will further dissolve the boundaries between structured and unstructured data. This will mean clearer, more accessible information for all, whether you’re overseeing a large enterprise or a small business. With a well-grounded data foundation and a willingness to embrace change, organisations can navigate this dynamic landscape with confidence.


Contact the author

Beena Rao

Senior Manager, Advisory, PwC Australia

Contact form