Diving into Synthetic Data
As defined by Strat7 in its blog, “Synthetic data in market research – an expert perspective,” synthetic data, at its core, “is artificially generated data that doesn’t originate from real human beings. It’s constructed either by predicting responses based on historical survey data or, increasingly, through advanced AI technologies like generative AI.” Strat7 also further defines the category as having primarily two outputs: synthetic conversation tools such as AI chatbots and personas; and synthetic respondent data.
Hasdeep Sethi, Data Science Director and AI Lead at Strat7, points out that synthetic conversation tools can support concept testing, marketing and CR optimization, product positioning, and predicting future behavior. Synthetic respondent data, meanwhile, is mainly used to supplement real survey samples.
Other advantages of this type of data includes:
- Reduced costs and accelerated research timelines: Synthetic data can significantly lower research costs and speed up timelines by eliminating the need to recruit and survey real people at scale.
- Strengthened data quality and fewer gaps: Synthetic data fills gaps in research samples, particularly for hard-to-reach demographics, and strengthens the quality of the information collected by augmenting existing datasets.
- Safe exploration of sensitive topics: Synthetic data allows researchers to explore sensitive topics ethically and without compromising individual privacy, as it is not derived from real individuals.
- Consumer privacy and compliance assurance: By not being linked to real individuals, synthetic data protects consumer privacy and helps businesses comply with data protection regulations.
- Agile testing and iteration: The speed and cost-effectiveness of AI synthetic data generation make it ideal for rapid testing, iterative research and experimentation.
- Improved predictive modeling: Synthetic data improves predictive modeling by providing large, diverse datasets for training and enhancing the accuracy of algorithms.
Of course, there are also some key drawbacks to using synthetic data, such as data security and compliance issues. Strat7 points out that one significant issue is the potential for bias, as synthetic data models trained on biased datasets can perpetuate and amplify existing stereotypes.
“Another challenge is the ‘black box’ nature of some synthetic data tools, where the data generation process is opaque. This makes it difficult to verify the sources for the data and thus to trust the accuracy of the synthetic outputs. This lack of transparency can lead to ‘hallucinations,’ where the model fabricates information,” observes Strat7.
Synthetic Data in Action
During the AI in Action Summit at the upcoming TMRE 2025 conference, a presentation will be hold on, “Putting Synthetic Data to Work,” Speakers to be determined.
You know what synthetic data is—now it’s time to see it in action. This session dives into real-world examples of how leading insights teams are using synthetic data to simulate consumer behavior, test concepts, and fill data gaps. Walk away with practical strategies, lessons learned, and inspiration for how you can start applying synthetic data across your own organization.
Harnessing Synthetic Market Research
During TMRE @ Home 2025, Joshua Hawthorne, Research Operations Leader, LinkedIn, explored this topic in the session, “Beyond Real: Harnessing Synthetic Market Research Data to Accelerate Business Insights.”
Hawthorne expertly navigates the landscape of synthetic market research data, reviewing its creation, emphasizing rigorous validation, and advocating for its strategic role in augmenting traditional research to unlock faster and broader business insights. “Synthetic market research is not about replacing humans as our data sources. It is about extending the reach of our evidence to those decisions that have been previously eluding our data,” says Hawthorne.
Data Enhancement, Not Replacement
After exploring the benefits and disadvantages of synthetic data, will it ever truly replace real survey participants? “We’re not there yet,” says Sethi of Strat7. “It’s much faster to generate synthetic data than to gather human responses. But the trade-off is a decrease in the quality and precision of the data.”
However, synthetic data may prove beneficial in its ability to extend the reach of insights beyond the limitations of traditional research methodologies. This may include filling data voids, more rapid concept testing and evaluation, and deeper persona understanding.
Hawthorne of LinkedIn provides a few key takeaways from his presentation:
- Demand Transparency in Creation: When exploring synthetic data solutions, delve into the methodologies employed—from basic prompting to sophisticated fine-tuning and post-processing—to ensure alignment with your research objectives and desired accuracy.
- Scrutinize Validation Rigor: Prioritize vendors who can clearly articulate and demonstrate robust validation processes (A/B testing, statistical comparisons, backtesting), providing the necessary confidence in the reliability and applicability of the synthetic data.
- Strategically Extend Your Reach: Recognize synthetic data as a powerful tool to complement traditional research, enabling insights into areas previously inaccessible due to time, budget, or logistical constraints, rather than a complete replacement for human-derived understanding.
Video courtesy of IBM Technology
Contributor
-
Matthew Kramer is the Digital Editor for All Things Insights & All Things Innovation. He has over 20 years of experience working in publishing and media companies, on a variety of business-to-business publications, websites and trade shows.
View all posts