Paris, France
+33 6 84 12 47 08
cyril.poulet@centraliens.net

How to recruit a data scientist in a context of scarcity ?

How to recruit a data scientist in a context of scarcity ?

Recruiting a data scientist (in the largest definition, AI specialist) is a hard task, in particular for senior profiles : few profiles, high salaries, and the complexity of the tech involved makes it difficult to assess their skills without having a data scientist helping you.
This post aims at pointing out a few things that will help you finding the data scientist that is made for your team. This advice comes from my experience on both ends of the process.

Be reactive

Of course, this piece of advice stands for all kinds of recruitment processes, but it is particularly important when few candidates are available. You have to be able to answer quickly to any question your candidates may have, especially if you asked them to perform a test task. Your own data scientist should be on the front line of this communication process, and if it is not possible make sure the person handling the process has been correctly briefed on the test task.
As a candidate, it is not a good sign to get the answer “I don’t know, the data scientist is out of town and left us no instruction” when you’ve been asked to spend a few hours on a test project…

Don’t be pushy

This may also come as an evidence, but not all companies respect that. This is important in two situations:

  • if you are searching for candidates by yourself, make sure the job you are offering does indeed correspond to the candidate you are contacting. Of course, a 100%-fit won’t happen, but assessing how the job and the candidate fit is never lost.
    As a candidate, offering me a job about a technology I have used 10 years ago and never since is not the best way to make me discover your company. It only tells me that you spammed all the profiles you got with a keyword search on LinkedIn. In the current context of scarcity of my kind of profile, this is not a good start for you.

  • if a candidate tells you that the job you are offering is not what he is looking for even though you think it would be a great fit, do not push the candidate to enter the recruitment process if you cannot adapt the job to what the candidate wants (either on the kind of mission he would take, or on the level of salary).
    This will be a waste of time on both ends, and there is little to no chance that he will change his mind on the job if you were to propose it to him without adaptation. Worse, he will have the feeling that he gave some of his time because you lead him to believe you would adapt the job to his wishes, and will be greatly disappointed and angry to have lost time and efforts for nothing. This will give him a very bad image of your company, which he may not hesitate to share to anyone he knows would be a fit for your job.

Use the technical test that is adapted to the job you offer

If the job you offer requires high-quality code written quickly and efficiently, then coding tests or challenges are the way to go. Several on-line platforms allow you to set-up easily this kind of test.
However, for data scientists, it is more complicated to assess the abilities of the candidates, as they should be able to manipulate complex concepts and frameworks, and show understanding and problem-solving capacities for problems that are often poorly defined. It also requires a (senior) data scientist helping you for the tests.
Here a 3 possible tests:

  1. the algorithmic test. It usually requires an hour, and can be in person or on-line, with or without interaction on your part if on-line.
    The idea is to challenge the candidate with a well-defined problem (such as geometry, or counting objects with constraints) that will need to show a good understanding of algorithms and data structure. The goal is to have a solution that is not only functional, but also optimized in terms of computational complexity. Several on-line platforms will allow you to set-up this test (and add edge cases and high-volume cases to check complexity), but I highly recommend that you choose a setting where you can interact with the candidate. This will allow you to assess how the candidate is working his way around the problem (What is he looking at first ? does he have intuition ? Or can he find out quickly that his first try would not be fruitful ?), and also to guide him if need be and see if a small clue can get him back on his feet (stage fright can happen to anyone, it would be a shame to loose a candidate that only needed a little bit of encouragement to shine).

  2. the data science test (usually around machine learning jobs). This test asks for more time and investment from the candidate (several hours, usually), with more than one possible approach to the problem. Several criteria are important:
    • the complexity of the problem : asking the candidate to work on a problem that is too easy will not motivate him to bring out his A game, if he feels that any candidate would succeed without real work.
      As a candidate, I have once worked on a test which was so easy that it made me doubt the seniority of the job that was advertised (that is not a good incentive…).
      Also, it will not help you assess the qualities of the candidates.
    • the interest of the problem : you should send a project which is related to the subjects the candidate will work on should he get the job, which will make him think about these questions.
      For obvious reasons, you should also make sure that the problem you are sending does not have a ready-to-use solution on the internet (or acknowledge the existence of this solution and ask for another approach from the candidate).

    Of course, as you ask for more investment from the candidate, you should be ready to give more time on your side. The data scientist in charge of the process should take enough of his time to answer questions from the candidate, and analyse the proposed solutions. It is usually more fruitful to go for a complex and interesting challenge, and to ask the candidate not to solve the problem, but choose a solution, explore it, and come back with an analysis of his results and pros and cons of the approach he chose. This will give you an insight into how proficient the candidate is in the domain, and help you assess if he has hindsights on his own technological choices.

    As I said earlier, this way of testing candidates is the most time-consuming, and you can rationalize this time with two things:
    • choose an existing problem instead of creating it yourself. Various on-line platforms propose that kind of challenges (e.g. Kaggle). Choosing a problem on these platforms will ensure that the problem is complex enough, that relevant data is available, and you may ask the candidate to submit his solution to get a ranking (but that is usually not necessary)
    • using the same problem for all candidates: you will be able to compare approaches between candidates, and your own data scientist will quickly be proficient on the subject (and will need less time to understand and assess solutions)

  3. the scientific article test. This will help you determine if the candidate is able to read and understand a scientific paper in the domain that he will be working in. This is essential for AI, which is a very dynamic domain both in number of articles published each year and in the speed of technological evolution.
    This test is a discussion between your data scientist and the candidate. To avoid wasting time, you should send the article beforehand, so that the candidate can work on it and be ready to discuss it when the interview comes.
    Choosing the right article is crucial. It should contain lots of material (better to choose a subject than have nothing much to discuss), should depend on one or two other papers but not too many (it’s interesting to know that the candidate read the extra papers, but if there are 25 of them needed to understand the one you sent, the interview might get complicated to follow), and of course be relevant to the job. Choosing an article known as fundamental to the field is usually a good choice: plenty of ideas, well-known (so the candidate can see there is no trick), and allows to assess the candidate’s proficiency on the fundamental notions of the field.
    Again, you should rationalize the time of your data scientist by giving the same article to all candidates.

Conclusion

This concludes an already long blog post on data scientist recruitment. The important point is: give the candidate reasons to want to work with you. Invest time to support candidates during the testing phase, and save money by analysing which tests are adequate to your needs.
A candidate won’t complain about being tested. What he will complain about is not understanding the relevance of the test for the job offered, and why he should invest his time and effort if the company does not show the same interest.

If you have remarks or relevant experience on the matter, please share in the comments.

I am also interested to discuss how I could help you recruit your first in-house data scientist, for example, or a senior profile if you only have juniors at the moment.

Tags: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *