Overall, things seem to be looking up for those in the data science community. For example, respondents feel strongly that analytics and data science are becoming more integrated into both corporate and operational decision making. While recessionary winds keep blowing, and many feel that doing more with less is the status quo, the report indicates that the analytics and data science discipline continues to feel positive about its growth prospects. Over 60% of the community feels that influence has grown post-pandemic. Nearly half of the community is focused on growth.
Questioning Data Quality
Still, there are some challenges that survey participants indicated. An interesting challenge to currently explore is that of “data quality.” When asked, “Which are your organization’s top challenges in using data to inform decision-making?,” the majority of respondents (55.9%) said that data quality was the top challenge. To be sure, there were other challenges also selected in the survey: 39.4% said that a continuing challenge was the budgets/resources for data science. Meanwhile, talent/hiring the right skill set was selected by 33.1% of respondents.
Yet, back to data quality as the top challenge. Just what is driving that response? The rapid development of technology in the field could be one possible answer. Whether or not you think that the function is in part responsible for data quality, with the onset of Generative AI and LLMs, data quality must continue to be an area of focus based on the continued amplification of possible usable and useful data sets. This could also in part answer the question of why securing talent is also such a challenge, as the rise in technology presents its own set of issues in terms of training, upskilling and recruitment.
For her part, June Dershewitz, Board Member, Digital Analytics Association, challenges the community to change its mindset regarding data quality. Dershewitz asserts, “I think for organizations to say their top challenge is data quality is a scapegoat. It’s always there. I’m actively involved in conversations and initiatives aimed at improving data quality. We can do better. It’s hard to solve because it has to be solved collaboratively across the business with different parts of the organization that have a hand in creating the problem.”
Dershewitz further notes, “Few respondents said that there was not enough influence on the business and yet they only said that they were somewhat integrated into business decision making. That’s why I think data quality is kind of the excuse that’s masking the true problem, because if you had that influence on the business, you might be able to more strongly advocate for putting resources where they need to be to improve data quality.”
Perhaps an additional cause of data quality issues is the speed of technology in the current reality. For Neil Hoyne, Fellow, Wharton, the strategy playbook is rapidly changing and companies must be flexible and agile. “The persistent focus on data quality risks slowing things down. This prerequisite is an almost never-ending obstacle to any decision-making. Instead of striving for perfection on their data, organizations should focus on delivering incremental progress on the current programs. In other words, can leaders have the discipline to prioritize data-driven growth over perfection?” says Hoyne.
That’s not to say that data quality issues are irrelevant or nonexistent in the community. This also directly relates to the eventual generation of insights.
Hoyne adds, “If you don’t have good data, you won’t get anywhere. You won’t be able to influence or understand anything because you won’t even get to the analysis stage. Data quality will continue to be a top concern until companies can address it. But again, I believe some companies are so obsessed with data quality and afraid of imperfect analysis that they’re slowing down their progress. We need to ask ourselves if we’re telling the right story. Data is the language that tells us what’s happening in the marketplace. It’s time for data and analytics to step up and fulfill their role.”
The Volume of Data
The sheer volume of data being generated these days seems to be a looming concern among data science professionals.
“Data quality and the ability to translate data into insights into the business is a top concern,” agrees Anu Sundaram, Vice President, Business Analytics, Rue Gilt Groupe. “Top challenges include using data to inform decision making and data quality. I always hear data quality being the issue. We just can’t get it. There’s so much more data now, and the quality needs to catch up with the volume. The inability to translate data to the business is also very close to lack of data in the survey.”
Sundaram raises an interesting point, that the heightened volume of data is causing a logjam of sorts to being better able to understand the data.
“Respondents are doing what they can do with what they’ve got and while they are not doing poorly, have concerns regarding data quality issues,” adds Sundaram. “They need better processes, better resources, better ways to synthesize those data elements. Only then will you get value out of it.”
Chuck Martin, Editorial Director, Informa Tech, also sees the volume of data playing a role here and how it impacts quality. Human employees, it seems, have not been able to keep pace with the development of new technology.
Martin says, “Many organizations in the survey selected data quality as a challenge. At the Gartner data analytics conference, everything was about we have too much data. We can’t figure out which data we should get rid of. We can’t figure out how to get the data to work together with the other data. It’s just a mismatch they’ve created over the last few years.”
He continues, “The capability has risen to capture a lot of data and now they’re trying to figure out what we do with all this data because it’s not necessarily all useful. It’s a fire hose coming in. That’s a problem because they spent all these years figuring out how to capture all the data. Now they have to figure out which data they actually do not need.”
Still, the amount of data is in some ways a “good” problem to have for the data science community, which has shown itself to be resourceful and resilient. Getting a handle on all this data, to control what’s coming in, could in turn better support operations in the future.
“The data gets you there or doesn’t and there’s not much you can do with it if you can’t control what’s coming in,” says Matthew Mayo, Editor-in-Chief, KDnuggets. “It’s an ongoing struggle every day, no matter what the size of what you’re working with, even with ChatGPT and these huge models. Data quality could always be better. The data you need is probably out there somewhere. And if you can’t get it that could be a question of resources.”
The survey showed that data quality or lack of data or inability to translate data to the business was a common cause of concern. Mayo notes, “That’s interesting, that inability to translate. If it’s your organization’s data then not being able to make it work for you, that seems like a different kind of problem. This might be a lack of data fluency and the data scientists having an issue integrating their data into the business. Stakeholders want results. It’s not a matter of the data itself. It’s a matter of the process and the people working on the processes with the data.”
Moving Forward
Clearly, there are some concerns about data quality among the community, and there is room for improvement in that regard. However, that could also prove to be an opportunity for the skilled data specialist or data science team.
As Michael Bagalman, Vice President, Business Intelligence and Data Science for STARZ, puts it, “The exact nature of these data quality challenges is unclear, but the implication is that there is a significant opportunity for improvement. Despite budget constraints, we can anticipate an escalated emphasis on data engineering and data hygiene in the future. Given that these elements form the bedrock of data management, it aligns perfectly with the prevalent theme of efficiency, the need to address and rectify ongoing issues.”
Bagalman adds, “Data quality inevitably assumes paramount importance, as it forms the foundation on which all subsequent data-related tasks are built. After investing considerable resources, it is imperative to ensure not only that your data is of high quality but also that it is structured correctly to support your specific requirements.”
Just who is responsible for the data quality at an organization might also be a potential question that can be raised within the analytics community.
For Michelle Ballen-Griffin, Head of Data Analytics, Future, “Data quality is everybody’s problem. The whole organization needs to be involved in data quality. It’s not just the data team’s role. You have to put controls in place and have data governance. Educate people on their decisions and how it influences quality.”
With the rapid advancement and developments of technology, data quality is an issue that probably won’t be going away anytime soon. That might challenge the industry to improve its training and career development as well.
KDNuggets’ Mayo notes, “Some survey respondents did say they don’t know what to do with the data. That stands out, whether it’s good or bad. It makes sense but it seems like something that you’d want to take care of if you want to better influence and integrate with the business. There’s a lack of machine learning and predictive modeling preferences in the survey that is in line with what I thought. But with technology advancing, it’s a prime opportunity to learn about and incorporate new tools into the business.”
The All Things Insights analytics and data science community that participated in the H1 2023 Analytics & Data Science Spend & Trends Report has a solid core of top-level leaders along with a fair share of front-lines operators. The community is spread across several different industries with a fair share of large organizations represented along with some smaller companies.
Video courtesy of Eye on Tech
Contributor
-
Matthew Kramer is the Digital Editor for All Things Insights & All Things Innovation. He has over 20 years of experience working in publishing and media companies, on a variety of business-to-business publications, websites and trade shows.
View all posts