How to Keep Control in Scaling Data Science

Despite huge investments, many organizations fail to organize data science and analytics in a scalable way. The sky-high expectations are not realized, because companies struggle to integrate their data activities into their day-to-day decision-making processes. In this article, I will discuss the three biggest stumbling blocks and explain which organizational changes are necessary in order to structurally extract value from data.

Everyone knows the examples of failed software implementation processes in government agencies and companies. Millions of euros down the drain and years of work wasted for a final result that, ultimately, produces nothing! If we’re not careful, many data science and analytics projects risk meeting the same fate. Large organizations invest heavily in data science and artificial intelligence (AI), but typically do not make it much further than one or two successful projects. Only a handful of organizations actually succeeds in making these data activities scalable and integrating them into their strategy and critical decision-making processes, according to a recent study carried out by McKinsey. However, scalability and integration into daily practice are crucial if you are to create structural value with data and analytics. In a previous article, I explained which path companies should take to develop a Proof of Concept (PoC) into a Minimum Viable Product (MVP) as efficiently as possible. The next step is to achieve scalable, integrated success throughout the organization with data-driven initiatives so that data science and AI applications can become a strategic asset for the organization. The goal is to run subsequent projects increasingly quicker and more efficient and embed them in the organization more easily, by simply learning from previous successful projects. It will allow organizations to apply data science more quickly and with greater impact, by only putting in relatively less effort. In what follows, I will explain common stumbling blocks, as well as which organizational changes must be made to overcome them.

Stumbling block 1: willingness amidst an unorganized mess
According to McKinsey, only a very small portion of organizations has a truly sound approach in place. For the most part, you could say that organizations are still an unorganized mess in terms of data science and AI. Several distinct teams put great enthusiasm into organizing all sorts of initiatives, developing initial trials or venturing to develop a PoC. There is however no standard approach, there are no fixed roles, and there is no clearly defined structure of responsibility for the various sub-areas of the project. Due to this fragmented approach, projects fail as a result of incompetence, they overshoot the objectives and a lot of duplicate work is carried out. You will often see that AI applications become an end in themselves, rather than a means to an end.

Solution: repeatability
The solution for the situation sketched above is creating repeatability. This can be done by safeguarding knowledge, and developing, documenting and sharing standard methods for implementing data science applications. One way to do this is setting up a center of excellence, which supports the business and expands within the organization like an oil slick, up to the point it has reached a level of maturity that allows for some degree of independence. You could compare it to the gradual spread of a virus: you accompany a particular business unit on the journey until they become, in some way, capable of operating independently. This method ensures you do not have to keep reinventing the wheel and makes sure there is always a central team to which everyone in the organization can turn. Within projects, it is best to work with cross-functional teams, consisting of employees involved in the core business, IT specialists, data scientists, and business translators. The latter role, that of business translator, is crucial for safeguarding and integrating your data science applications within the organization and, ultimately, for ensuring its success.

Stumbling block 2: plenty of ideas, no bigger picture
Companies whose analytics efforts can best be described as an unorganized mess often do not have anyone keeping an eye on the bigger picture, even though this can simply involve mapping out all core business processes for which data science may have value. As a result, various teams or units are working on their own data projects, which does not only increase the likelihood of duplication, but also causes many opportunities to be overlooked: perhaps your customer churn model can also be used to predict employee turnover or absenteeism. For multinationals in particular, which generally work with multiple divisions and business units, keeping an eye on the bigger picture is crucial in order to recoup investments and get real, continuous value from data science projects. After all, the bigger picture can also help you focus on those business processes that lead to the most profit or directly contribute to the strategy of the organization. The application of data science is mainly about boosting effectiveness or efficiency, not necessarily about developing the most original or innovative application.

Solution: reusability
Again, the center of excellence, preferably with a Chief Data Officer at its helm to develop a strategy and keep control, has a crucial role to play in this respect. The center of excellence is essential in orchestrating the overall process and linking existing solutions to new challenges. To make sure that no opportunities are overlooked and that the organization can approach them as efficiently as possible, it is important that you are able to generalize problems and solutions, so that they can be reused to solve problems within the same context. Reusability can apply to analytical models and technological architecture, as well as the uniformity of data sources and the transformation processes used to unlock those data. In some cases, you may have to look beyond your own organization and gather the courage to implement existing market solutions. Why would you build something yourself if major players such as Microsoft, IBM or Google already have the perfect solution? In that case, it is crucial that your IT architecture allows for these external solutions to be used and integrated.

Stumbling block 3: day-to-day reality creeps in after the project
Ultimately, the core aim is to allow everyone in the organization to benefit from and actually use what data science has to offer. In that process, McKinsey labels ‘the last mile’ – embedding an application into an organization’s daily practice – the most difficult step. This requires consideration at an early stage, but many data scientists are then mainly busy proving the power of  the data science or AI application, rather than thinking about the future. This process starts as early as the intermediate phase between PoC and MVP (again, see this article), but it continues to grow in importance further down the road. Once a solution has been implemented, there are, after all, no guarantees that employees will actually use it in their day-to-day activities, and that the solution will deliver and continue to deliver the desired results.

Solution: maintainability
In this case, maintainability is the starting point, which requires you to look beyond the technical aspects of the application and to rather look at how people will use the application in practice. Which stakeholders should you involve in order to make sure your solution lands well in your organization? Clear explanation, guidance, and frequent evaluations are paramount. After all, the success of a data science or AI application stands or falls with how it is adopted.  Application maintenance is another key aspect that is often overlooked. Who, for instance, will be in charge of making sure that the model continues to work and analyze the right data? Essentially, this is no different than maintaining software: it is simply necessary if you want to ensure that your results remain reliable and want to keep users from getting annoyed. Maintainability, however, does not mean that organizations should seek to do everything themselves, which, in practice, is usually impossible for various reasons. What it does mean, though, is that you should be in control, whilst working with other teams and external partners on matters that require a more specialist approach.

Conclusion: Don’t lose sight of the bigger picture
There is a clear common thread that runs through my plea for repeatability, reusability, and maintainability: staying in control is crucial in order to make your data science and AI applications scalable. First of all, this means that a data-driven approach and the application of data science and AI must be embraced in your organization’s strategy. Secondly, because it is so easy to lose sight of the bigger picture in an organization with tens of thousands of employees, there should always be a team that knows exactly what is going on in all parts of all on-going data initiatives. This team is also responsible for ensuring that newly developed data applications are actually used within the organization and that they produce the intended results. All data science and AI expertise, skills and models must also be secured in a central location, such as a center of excellence. Setting up a center of excellence and training managers and other employees will require short-term investments, of course, but there is nothing wrong with that, as they will ultimately pay off in a manner that will help you to stay in control of your data-driven organization.

Avatar

 

Done with Failing Data Projects? Get Beyond that Pilot Phase!

Almost all top executives of the world’s biggest companies (99 percent) have data-driven ambitions, but only one third(!) indicated to be succesful in realizing them. So why do so many data projects fail? Pilot projects spring up like mushrooms, but are rarely taken into full production. In my view, poorly organized data projects are the reason for not getting beyond that pilot phase. In this article, I will explain what you should be focusing on in order for your data science projects to succeed.

 

Source: Forbes

The massive discrepancy between the number of data projects companies start and the number of projects that is eventually successful, is perfectly illustrated in the graph above. In fact, this graph is even a bit too optimistic, since according to figures by Gartner, up to 85 percent of data projects fail. And, even more disturbing, a recent study by PWC indicated that only four percent of the thousands of companies surveyed had successfully implemented an AI solution.

In my experience: when it goes wrong, it’s almost always a poorly organized data science project that causes the failure. As such, it usually goes wrong during the crucial phase between the pilot project and the implementation of a sustainable solution. The path (see illustration below) from good idea to brilliant software solution has several interim steps that simply can’t be skipped. Since every step requires different skills to pass through successfully, you’ll have to differentiate between the various interim steps in the project: first the Proof of Concept (PoC), then the step to initial implementation; a Minimum Viable Product, or – in short – MVP.

 

 

This is what you want to achieve with a Proof of Concept
The Proof of Concept helps you to answer the question: can we put this idea into practice? Is it realistic? Do we have all data at hand to have – for example –  a chatbot answer complex questions? To proof this, a small team – consisting of at least a data analyst and a data scientist –  works for a limited period of time to verify the feasibility of the idea.

To be able to do that, the criteria for the PoC’s success need to be defined in advance. During this phase there shouldn’t also be too many cooks spoiling the broth: having too many stakeholders involved could push the PoC  in the direction of an MVP, usually with disastrous consequences since a PoC can also fail. The quality of the data could – for example – leave much to be desired, there might simply be too little data available, the hypothesized model doesn’t seem to fit the purpose, or the results are simply too mediocre to justify the continuation of the project. In that case, it’s better to find out during the PoC phase, and not at a later stage, so you know what to improve before you start implementing it. After all, investments to improve the model are wasted money if the problem is the result of inferior data quality. And vice-versa; investments in data quality are pointless if the model doesn’t seem to fit.

How to move on to an MVP
The step towards the next phase is crucial: now that you’ve tested the concept, you want to see it in production. This will change the way you approach the project. In a PoC, you can afford to ignore reality to a certain extent. Since you only want to know if the idea is feasible, you don’t yet have to worry about secondary matters. But in pursuing a succesful MVP, the situation is entirely reversed: now you have to put the concept into practice, which means you have to consider the business processes, GDPR, security, and the connection with existing systems and infrastructures. For example, how will the chatbot integrate with the website?

Just as important, the MVP phase requires a completely different set of skills than the PoC phase. As I wrote in a previous article with tips to increase the chance of a data science project’s success: you need more than just an understanding of data science; business sense is at least as important, and only by adding a healthy dose of IT knowledge you will be able to move mountains. That’s precisely where many data science startups come up short, and is exactly the reason why so many pilot projects never get beyond that PoC phase. And even if you were trying to save the project in this phase, for example by contracting one of the big consultancy firms, it’s going to be challenging. Since it won’t be long before you notice that they lack the data science skills needed.

These are the roles you’ll need

The phase that follows a successful PoC will present new roles for business translators, software engineers, data engineers, data architects, and data scientists (see illustration below). But other departments will also need to be drawn into the project, such as legal, compliance, and IT. The trick is to make sure that happens at the exact right moment. That will differ for each organization and project but choosing the wrong moment will always lead to the exact same consequences. If you bring them in too early, it will slow down the project, but if you do it too late, then the departments might feel like they’ve been left out, and they could try to block it. The length of the phase between PoC and MVP largely depends on how familiar the organization is with these kinds of projects.

 

After a succesful PoC, it is also essential that a role is assigned to a business translator – a role that is becoming increasingly common. This jack-of-all-trades needs to understand what data science can do, but he or she must also be able to talk with the business, and subsequently translate it to IT- and software development.

Time for the real work
You’ve finally reached the point: the first version of the data project is live! The chatbot is on the website, even though it might not yet be able to answer all questions. But don’t worry; that will come later. The good news is: now that the MVP is a success, you can get started with the real work. Because the real challenge will be to make it a scalable solution in your organization – but that’s a topic for my next article.

Avatar

 

Three Tips to Boost the Success of Your Data Science Project

Data project failures at government agencies regularly hit the headlines. But the business world is facing these challenges as well. At the end of 2016 Gartner estimated that 60 percent of big data projects in 2017 wouldn’t make it past the experimental phase. Late last year another analyst revised that estimate, tweeting that the failure rate was no less than 85 percent. For someone who works in the data sector, these figures are distressing. It’s something I often discuss with my peers – how is it possible that so many projects end in failure? The general consensus is that there’s huge room for improvement in the initial phase. These three tips will boost the success of your data science project, even before the project is up and running.

By Patrick Hennen, Managing Partner Data Science & Consulting at ORTEC

TIP 1: Don’t be dazzled by the hype

Applications that use artificial intelligence (AI) are currently hip, hot and happening. This Google Trends graph shows that worldwide interest in AI has grown enormously over the past five years. Tech giants like Microsoft, Google, Facebook and IBM now present AI applications as the panacea for the business world. Want to gain better insights and earn more revenue faster? Simply download the algorithms and unleash them on your data. After all, anyone can use AI… can’t they? The message seems to be that the hype express has left the station and if your company’s not on board, you’ve missed the digital transformation boat.

Don’t be dazzled by the hype, though. I can’t deny that AI is a powerful technology that opens up numerous new possibilities for companies. But I’m increasingly hearing organizations say that what they want is an AI solution – and that’s the wrong way to go about things. AI applications are a means for solving a problem, not an end in themselves. At the end of the day, artificial intelligence may well be the most powerful tool for improving efficiency or effectiveness, but other methods may actually be better suited to you. For instance, are you looking to automatically count the number of vehicles in a car park? You could train a machine learning algorithm in image recognition to define objects as vehicles and then run this algorithm on camera images taken at the entrance. But it would probably be easier to just install road sensors in the ground.

TIP 2: Test and evaluate expertise during the selection process

Implementing a data science application is specialist work and is in every situation different. So the expertise you require will be broad ranging, covering both technical and business management. The first stage of a successful data science project is hence to select the right expertise. That’s easier said than done, particularly if you’re hiring a third party to do the work: there are many vendors who claim to have knowledge of data science, but lack the depth of knowledge or experience required. Do you find yourself dealing with someone who doesn’t have a demonstrable quantitative background? In that case you should be hearing alarm bells. Has someone switched to data science at a later stage of his or her career? In that case, do some extra work to verify his or her analytical and statistical skills. Make absolutely sure that you’re not dealing with someone who’s looking to make easy money. It’s important to note here is that a good provider will always be willing and able to explain what they do in simple, everyday language. So always keep asking, until you’re sure that you understand. Is your prospective vendor telling you that it’s too complex for you to grasp? In that case they’ve either got a hidden agenda or no idea what they’re talking about. You can find some examples of good questions to ask potential vendors to test their expertise in this article. So, take the selection process very seriously – that way, you’ll reap the benefits later. 

TIP 3: Make sure IT isn’t running the show

Your selection of external expertise isn’t the only factor that determines the success of a project – it’s also crucial that you involve the right stakeholders. So even before the project begins, it’s advisable to think about the composition of your internal team. When you’re doing this, bear in mind that you’re aiming to resolve a business problem. Data and your data science solution are the building materials and the tools that will help you achieve this. It’s in nobody’s interests to end up building an ivory tower, in which a few analysts and IT specialists are cloistered away, creating models without setting foot on the workfloor. So you should put together a team in which internal representatives of the business take the lead, and IT plays a supporting role. Internal or external data scientists can be deployed as a bridge between them, since they understand how both disciplines perceive things and can translate ideas between them. In larger projects it’s even advisable to make someone in the project team specifically responsible for liaising between business, data and IT: the ‘business translator’. Not only will this ensure that the process runs smoothly, but it will also help you to retain the data scientists’ expertise (see tip 2) in the long term. In addition, it’s advisable at the earliest stage possible – and long before the project begins – to get executive buy-in. Ensuring support from the top means that you’ve got a sponsor and that effective action can be taken when required. For instance you could organize an executive bootcamp during which new technologies and data science subjects are introduced and explained how they will be used in the project.

A good beginning…

It may be a cliché, but it’s true nevertheless: a good beginning is half the battle. That’s certainly the case when it comes to data science projects. Are you capable of recognizing hypes and buzzwords for what they are? Well informed and supported by the right external experts? And you’ve put your multidisciplinary team together? If that’s the case, then I’m certain that your project will be a success, and that your data science business case will fulfill its promise of adding value.

Avatar