Data Science: Hunting for a Unicorn

The position of Data Scientist is rapidly becoming a highly desired role as financial institutions consider how to implement Artificial Intelligence (AI) and Machine Learning (ML) projects within their organisations. Identifying the need for a Data Scientist is the easy part of the process, however, the real difficulty is in finding the right Data Scientist with the necessary skill set and knowledge needed to create real business benefits.

The high value placed on Data Scientists is a direct result of the unique set of skills and expertise needed to implement and effective AI strategies, allowing them to have a huge influence on the nature and direction of projects. Based on the data that is given to them, it is they who make a judgement on the tools that are used and the characteristics of the investigation that will ultimately lead to the identification and delivery of the business value from AI and ML.

There are three distinct phases in any project of this type, including:

Value identification: A high level inspection of where the business value (gold) can be found
Experimentation: Proving that the business value can be delivered
Operationalisation: Implementing the system that enables the business value to be delivered

Throughout the various stages of the project, Data Scientists have arguably the most important role working alongside the Developers, SME’s and Data Engineers in realising the value.

Data Engineers and Data scientists: What is the difference?

The role of Data Scientist should not be confused with that of a Data Engineer – the two roles are quite distinct from each other.

The key difference between the two is most evident within the ‘experimentation phase’. Data Engineers are required to get data to the ‘experimentation area’ in a timely fashion, with the right quality profile in order to allow the Data Scientist to undertake their role.

The Data Scientist uses a selection of tools from a vast toolbox to investigate, prove or find what they can from the data provided by the Data Engineer. These tight iterations of ‘experiment, prove and repeat’ build confidence in the model and demonstrate that the system should be put into production.

The Data Engineer returns to take the configured model and put it into production. The Data Engineer may have an understanding of what the outputs are from a data science activity, but it is the Data Scientist who is responsible for finding the ‘gold’ (Value Identification)

Selecting the right toolset

During the process of developing an AI use case, the Data Scientist must identify the appropriate toolset to use. Attempting to classify which tools should be used can be difficult. You can use Natural Language Programming (NLP) on an email routing agent that can be described as a cost reduction play. You can also use NLP in a chat bot that might be part of a cognitive or ‘more convenient’ play.

The toolset also includes some distinctly non AI tools (e.g. Monte Carlo simulations or even more so in robotic process automation), which demonstrates the fragmented and rapidly changing landscape of these types of projects and how dependent the tool usage is on the business case.

It is clear that one or more tools will be needed for each use case. Part of the Data Scientist’s role will be selecting the right tool or combination of tools at the right time, to meet the right business driver. The judgement about tool selection and combination can be truly considered as very much an ‘art’ rather than a ‘science’.

Finding the unicorn

Some people may argue that attempting to recruit the ideal Data Scientist is akin to tracking down the mythical unicorn. Data Scientists need great communication skills along with relevant business domain knowledge. This knowledge is important as it allows Data Scientists to recognise and understand what the data means, allowing them to ‘smell’ the gold that leads to where the business value exists. Great communication skills are needed to help explain insights and build confidence and trust amongst many stakeholders in order to demonstrate that their statistical methods have in fact been applied correctly.

The demand for such a diverse range of skills means the pool of available talent for Data Scientist roles is fairly shallow. A view exists that the world’s leading universities are failing to produce enough Data Scientists to meet current and future demands for their services.

An additional problem is that Data Scientists may have expertise in the wrong business domain or the wrong tool. This problem of hyper specialism adds to the difficulty in finding such a rare individual. The analogy of finding a unicorn does however point to a potential solution, by deconstructing the needs of the project. Different parts of the project can be supplied by different people, who can bring their own specialist skill sets to bear. This in theory should make it easier to find suitably qualified individuals whose collective skill set can be applied.

We can see that finding the right Data Scientist is a difficult task in itself, but that is only part of the battle. The ability to attract and retain Data Scientists is the next challenge. There is growing evidence that the salary of Data Scientists has plateaued in recent years. This may appear to be a good thing but it suggests that salary is not a primary consideration in terms of the Employee Value Proposition for a Data Scientist. The working environment is likely to be as equally as important.

Creating the right working environment

When considering how they attract and retain Data Scientists, organisations need to be honest in their appraisal. When they think about their business, they should consider the following:

Does their businesses have the agility required to find the problems that can deliver the best value? i.e. are the problems interesting and of high value?
Are the controls in place believed to be the lightest touch needed to run the business? i.e. will Data Scientists be tied up in the bureaucracy of a monolithic organisation?
Are the latest tools, libraries and vendors allowed so that they would be able to use the best approach for the job? Will Data Scientists get a free rein to solve problems with no restrictions on their creativity?

Conclusion

We can see that attracting good Data Scientists is no easy task. The perception rightly or wrongly is that banks may have been innovative institutions to work for back in the 20th Century, but nowadays they face increased regulation and have less appetite for risk. Far from being innovative, banks now look more like government owned utilities. However firms approach the problem of identifying, attracting and retaining Data Scientists, they must recognise that there is a fierce war for this expert talent taking place. Understanding how to approach this, by making their culture more attractive, the work more interesting, or distributing the tasks involved will be critical in finding the gold!

AI, Data Science, Business ChangeRichard Miller8 September 2017MV37 LtdData Science, Artificial Intelligence, Machine Learning