Over the last decade, I’ve led the data strategy for many companies in different stages of leveraging their data. Some (few) of them were taking their very first steps, while others were large, complex organizations that have worked with data for over a decade – but still struggled to derive the value they could expect it to deliver.
I’ve spoken about data strategy elsewhere, but today I’d like to address the elephant in the room: What data scientist teams often can’t seem to deliver data success. In other words, what makes or breaks the skill of designing data science solutions that would create a business and competitive value for the organization?
Investment in data science has considerably grown over the last few years, creating somewhat of a “gold rush”. The buzz has caused many companies to allocate significant funds and hire teams of data scientists, hoping they would become a growth engine. In most cases, these companies couldn’t say how that would happen but understood that there is a lot of underutilized data lying around, that there is immense value in it, and that maybe that data can translate into new business channels.
These companies weren’t wrong in this regard – it’s very likely that utilizing data science correctly would bring immense value to organizations that already have large, diversified pools of data. However, judging by the outcome, very few organizations have succeeded in turning data into gold using data science.
The segments of data science that are generating results are segments that have very clear, defined goals, and in which the product is entirely AI-based. Examples include automotive vehicles, automotive aircrafts, the defense industry (and its various products), medical and healthcare products and machines, and various communication bots.
What makes these examples special is that these products/companies are entirely data science-based, meaning the problems they solve are mostly technological, which in turn makes them possible to solve by elite data scientists, many of whom are professors. In these cases, the data scientist gets a technical issue and has the tools to solve it thanks to being a brilliant algorithmist.
On the other hand, businesses with products that aren’t AI-based that try to establish data science teams that will generate business value by building smart models, often fail to achieve meaningful ROI.
One of the reasons for this is that these organizations (which are often sales-led) contain tactical and strategic business processes that aren’t familiar to the “technical” data scientist. Data scientists find themselves needing to create something from scratch – that isn’t necessarily clearly defined. The organization knows how to articulate the business challenge but things get lost in translation on the way to an AI solution.
Bridging the gap between business needs and AI solutions is a real challenge.
The data scientists’ approach is very often isolated from the organizational needs and circumstances. Yes, they know their technology very well and are amazing with statistical/mathematical elements, but without a deep understanding of the business and how it would make use of the model, there would always be a built-in gap between the model’s technological capabilities and the value it can bring.
Organizations that wanted to hire AI teams would usually choose the most qualified data scientist, one that has far-reaching technological expertise and skills – but the question is, are they the best fit? For example, a data scientist whose background is in the defense industry and who has built systems that are supposed to help navigate rockets would probably be an overkill for an organization that is interested in predicting specific events in the customers’ lifecycle or optimizing some operational or marketing process.
A data scientist with the background described above will try to tackle these challenges in a highly technical way, meaning the models and solutions they develop might take shape in a way that will on the one hand require significant resources, while on the other hand not necessarily bringing enough value.
Before we get to how I believe organizations should handle this challenge, let me give you a few examples from our customers, whom we guide into deriving gold out of data.
A Precise Model May Miss the Target
In a logistics-rich organization, Operations are interested in predicting the daily sales in order to make sure they’re logistically prepared and are able to allocate positions, provide customer service, and so forth, without overspending on staff or creating surplus inventory.
Let’s say this organization has pretty stable sales, meaning that the sales don’t vary too much from day to day.
In this case, if we build a simple predictive model, we can get to very high precision rates – around 90%.
The data scientist would be able to use typical, simple historical data and add features (parameters) that have to do mostly with statistics (which as we said, are pretty stable and consistent). These features could be yesterday’s sales, last week’s sales, the daily average since the beginning of the month, etc.
The features are especially important in such a model since the model will identify connections between what happened yesterday or last month and what is going to happen tomorrow (of course this connection would exist most of the time but not all of the time).
So it all seems great, right? The model is 90% precise and it’s working well.
Well, on a deeper look, we’ll understand that if sales are stable and consistent – why do we even need a predictive model? The role of a model in this case is actually to identify and precut the “noise”. In other words, when something happens – externally or internally – that dramatically impacts the performance in either direction. That’s where the actual added value is – in predicting this digression in time.
But of course, if we designed a model with features that mostly rely on past performance, even adding parameters meant to identify “noise”, they’re not likely to gain significance.
If we understood the business challenge in advance, we would have designed the entire model differently, centering it around identifying changes in sales and not around identifying sales data based on past performance (which doesn’t necessitate a data science approach).
When the Target Function Is Wrong
In another case, a gaming company was interested in identifying players abandoning the game mid-session, and offering them something of value that will keep them in the game. This means we need to predict – almost in real time – what are the chances that a specific player will abandon the game in the next X minutes.
So far – sounds like a pretty common model that can probably give us pretty high precision rates.
Assuming I have a cost per keeping the player in the game, I can’t just keep it at the question of whether or not the player will quit and only then offer them the value proposition.
A deeper understanding of the business process would have revealed that a better number to figure out would be the player’s lifetime value (LTV), so I could decide whether to invest in them or not, as well as find out what would be the best value proposition that is most likely to keep them in the session.
So we’re already talking about three models – not one. Of course, we can also use data analysis to check whether these models are feasible, and how much money they can save or generate, assuming they’ll perform well.
Another different approach, coil has been letting go of the churn/abandonment question completely and rather focusing on which game is the best fit per player in order for them to keep playing. R what’s the future value of a player that has already played a few sessions and what is the likelihood of them converting to a paying player – and deciding accordingly how much to invest in them?
When You Identify a Need But Don’t Have The Right Offer
Let’s look at another case, this time from the finance world. A finance company selling credit wants to predict its clients’ need for credit, to decide which clients to target. In this case, the company has all the required data in order to reach those predictions, but its value proposition isn’t in sync with the customer’s needs.
A data scientist who would approach this issue by building a model predicting need might overlook the fact that the company’s offering isn’t a good fit for many of its clients (due to factors like interest rate, payment plans, etc). Mere prediction of needs without customizing the offering so it’s dynamic and data-based might lead to wasting immense resources on trying to sell credit to clients who indeed need credit but not this specific credit or these specific terms.
Can The Organization Support Its Models?
Let’s say an organization needs to predict events in the lifecycle of a customer, or customer value. These models can be churn prediction, new lead prediction, and customer segmentation (vegan, cares about sustainability, parent, etc).
A model like a customer LTV prediction (predicting the lifetime value of a customer) is very complicated and can be very valuable. A data scientist can spend months developing such a model. But without understanding the business use case of this model or how it can be incorporated into business processes (and which processes would these be), the target can be severely missed.
One example is developing a customer LTV prediction model when the organization does not have a marketing automation process that can then handle customers with high potential/high abandonment risk/enter into a specific segment.
Predicting Risks vs Opportunities
In Finance, there is always the Risk department, and separately – the Marketing department. Data scientists are spread across these two departments, meaning some data scientists are predicting risk, while others are predicting opportunities for selling financial products.
In many of the organizations where I witnessed such processes, there was little intersection between those groups. Meaning, that the risk predictors were erred on the stern side and found many of the company’s clients as risky. This left the marketing operations with very few customers that were both not deemed risky and deemed as potential opportunities.
If both data science teams had worked together, they could have tweaked the risk prediction model according to the results of the sales prediction models. For example, isolating small populations that are borderline risky but have many opportunities, and using them to learn about the quality of the risk prediction model, gradually opens it to more and more groups that can pose real potential for the organization.
This is just one of many, varied examples of gaps between the actual work of data scientists and organizations’ business needs.
Bridging The Gap Between Data Science Teams and Business Needs – Takeaways:
- The right data strategy helps organizations pre-identify the business areas in which data science models can deliver tangible value. Diving into a business area before examining all the other potentially relevant areas in the organization can lead to a real mess.
- There isn’t always a direct correlation between the expertise of the data scientist and the value they can deliver. It’s often the data analysts who spend a lot of time analyzing the business and gaining a business understanding that can grow into more valuable data scientists.
- Every data science team needs to include a “data-business translator” – someone who has some understanding of the technological and methodological possibilities but also has an excellent understanding of the business, and is able to clearly define the target function and the central business features, as well as guide the model development process and interpret the results into business terms.
- Developing data science models is a converging process, and it may take a few rounds until the model performs as needed. Managing the convergence is critical, and in the lack of proper convergence management, the model is likely to dissipate and diverge. That’s why the data-business translator has to work with the data scientist on each iteration of the model, examine the different indexes, the importance of the various features, the precision achieved in important business scenarios where accuracy is key, the logicalness of the target function, the relevance of the study population and so forth.
- Sometimes, a model we thought of as a data science model can turn into a model built on a set of business rules, which will better serve the organization. It’s important to be open-minded and allow the data science unit manager or the data-business translator to objectively examine whether there’s a real need to specifically develop a data science model, whereas a data scientist would usually make the decision to dive into complicated model development.
- The data unit manager needs to be highly involved in the organization’s strategy, have a seat at the table in management meetings, understand the organization’s strategic and tactical approach, and be able to identify opportunities for leveraging data in real-time. A data manager who does not live and breathe the business side won’t be able to deliver the immense value that data can provide.
I hope these tips have been helpful. Follow along for more articles and check out other articles we’ve published.