Using Artificial Intelligence to Obtain Test User Pool Quality at Scale

Guest article by our partner TestingTime

In these times where everybody is using big data, are artificial intelligence and machine learning actually a significant gold mine? We’ll try to find out the truth behind these widely discussed subjects and how to use them to improve a business’s service to its customers. Everything depends on what you are attempting to solve and on the quality of your data set. In the case of our company, and consistent with our vision of “a world full of happy users”, AI is used to provide actual help to our customers and test users. 

1. How to Be Successful With the Recruitment of Quality Test Users 

Test users are the basis of our business. The most important factor that distinguishes us from other market players is our pool of quality test users. Why? Because we ensure that you get a representative and diverse pool of study participants to allow you to obtain relevant insights for your user research. To say it clearly, we find test users matching the profile(s) our customers need, and above all, they can count on having unbiased test users. 

And something else that is very important: no-shows, cancellations, and test users showing up late shouldn’t be a concern for customers. Sure, there will always be a test user who for some reason isn’t able to participate in a test, but we have the means to reduce this to a minimum. To try to avoid this, the day before conducting a test, our test users receive a reminder via email and SMS. And this is the practical reason for having a below industry standards no-show rate. 


2. Manual Quality Checks Are Not the Best Method 

In past times we conducted many experiments using individual (manual) quality control to try to avoid unsuitable candidates. Our customer success team used to contact every new test user to onboard them and explain the test and its process – an extremely time-consuming procedure which turned out to have no practical effect on test user quality. In other cases, and prior to the actual testing process, we asked test users to record a video of themselves, explaining who they were and why they wanted to participate in the testing. But this experiment resulted in having a negative impact on the behaviour of test users, because some of the test users tended to try to please the test moderators by telling them how good the product was, instead of telling the truth about what they thought. 

2.1 How to Overcome the Limitations of Manual Quality Checks 

Carrying out manual quality checks on test users in a pool of more than 770,000 people is evidently unfeasible. When customers who need to do user research ask us to find test users for them, they need to get in touch with their users promptly. Thus, we have to undertake a fast search within our test user pool, to find those study participants that are suitable for the requested research as soon as possible. An automated process is the only way to achieve speed and quality with an increasing pool size. And this made us invest in data science starting in 2019, to get scalability and access to thousands of datasets. In turn, and after results in inference analysis proved to be successful, we took the leap, and in 2020 invested heavily in artificial intelligence

2.2 Artificial Intelligence Brings Increased Efficiency and Effectiveness 

The only way to determine suitable and available test users for a test is to ask them questions in a screening process consisting of a number of questions with a previously established logic for including and excluding candidates. Every test user recruitment process requires a screening procedure to check if the potential study participant meets the profile requested by our customer. But we have to take care not to incur in spamming our 770,000+ test users with these screening questions. And this is where artificial intelligence comes in, as it  improves the quality of the test user recruitment service we provide and increases the experience of our test users with us. Specifically, we use two artificial intelligence mechanisms – one to predict profiles (SmartInvite) and the other to predict quality (Honeybadger). 


3. SmartInvite Results in Less Mismatches

Mismatches are a source of frustration for test users because they take the time to answer questions hoping for a successful recruitment, only to be unsuccessful. Obviously, we should do everything possible to reduce test user anxiety and disappointment. And this is the step where we use SmartInvite. Just as an example, SmartInvite allows us to predict if the potential study participant John Meyers actually owns a Tesla, without John being required to specifically answer this question. 

How does this actually work? 

Our profile prediction uses all the data we have about a specific test user in our system. Currently there are more than 770,000 test users available, and about 200 answers per test user. Our solution uses “matrix factorization” to predict possible answers, for example, who owns a Tesla and owns a Mercedes, but doesn’t own a Ferrari, resulting in millions of specific answers (or if you prefer, features) to be predicted. 

The matrix below shows how this works. The grey numbers are for information we are not 100% sure of, but which can be used to do calculations with our prediction algorithm. 

This algorithm allows us to increase the probability of a test user matching a specific profile from 20% to 40%. 


4. Reliability Prediction Increases With Honeybadger

Conducting user research testing can be very frustrating, especially in case of: 

  • No-show: The test user doesn’t show up (unpunctual, forgot, not interested, unforeseen event, etc.). 
  • Misfit: The test user doesn’t fit the requirements (misunderstanding, insincerity or dishonesty, etc.). 

Our service is based on quality: we make it our business to provide reliable and suitable study participants. And to do this, we have to avoid the situations above, something we achieve using artificial intelligence, specifically something we call Honeybadger. 

4.1 Data Clustering Helps Quality Prediction 

Who qualifies for a test is determined by classifying test users into two groups or clusters based on all the data we have on our potential study participants. These two clusters are called “reliable” and “unreliable”. As you can imagine, we only invite users from the reliable cluster to a testing process. However, depending on the time frame of a study, its language, its location, etc., a test user can be sometimes included in the reliable group and other times in the unreliable group. Below is a visual representation supposing only two dimensions. 

4.2 How the Factors Impacting Test User Quality Are Established 

We have conducted experiments with hundreds of factors influencing quality, such as “the distance to the nearest train station”, “do young people go to the lake on Wednesday afternoons if it’s sunny?”, etc. After a while, we established that there are seven factors that provide a really good inference. Which ones? That’s the secret ingredient of our recipe! Recommended test users get an average score of 4.54 out of 5, whereas those not-recommended have an average rating of 4.17. 

But we should understand that artificial intelligence is far away from hitting the mark every time. It only is indicative, but it has allowed us to create a successful system. We never rely only on the status quo and constantly feed our system with what we learn from the errors that happen. 


5. Improving Artificial Intelligence Results 

In this article we tell you about the continuous improvement of our test user quality and the test user recruitment service we provide. How? By means of continuous learning based on new data and identifying further potential uses for artificial intelligence. 

5.1 Continuous Learning Based on New Data 

Additional information is gained throughout the work day, and every night this data is used to teach our prediction algorithm. The longer we train our algorithm and the more data we get about our test users, the more every single one of our test users and customers benefits from a more precise invitation. Thus, test users are not inconvenienced unless we need new information about them. 

5.2 Identification of Further Potential Uses for Artificial Intelligence 

Finding the right test users is one aspect of the process; finding out if they are available is something altogether different. Forecasting test user availability is something that might be improved using artificial intelligence – as well as forecasting test user suitability. Thus, we could help our customers to match the time to run their user tests with the time when their target group is basically available, taking into account, for instance, the usually rigid work schedules of blue collar workers. Another potential feature supported by artificial intelligence might be the possibility to forecast the motivation and the capacity of thinking out aloud of test users. And last but not least, suggesting incentives might be a game changer too. Forecasting the amount of the incentives to be paid to the test users of your specific target group to make the offer attractive to them, might influence their willingness to participate. Just as an example: you can’t pay the same amount to a practicing surgeon as to a medical student. 


6. The Right Balance Between Human and Artificial Intelligence 

Automating customer support is absolutely great to improve efficiency. However, customers should still be able to have a human interlocutor when their queries cannot be completely solved automatically. The right customer experience at precisely the right moment is fundamental. We work with an awesome team dedicated exclusively to our customer’s success, because we can’t forget that even if we rely on highly digitised and automated processes, our test users and customers are unique human beings with unique needs. Our customer success team is there to help our customers and will be happy to handle any situation that may arise. We wish to deliver the best possible customer experience, and thus we want to invest further in our team. They can help you to establish the best profile mix, the most appropriate screening questions, the best study methodology to use, how to build your own pool of test users (Private Pool), and much more≥


About the Author: Cosima Lefranc

Creating a great technological product from beginning to the end is Cosima’s job at TestingTime. With a dual background in Business Intelligence and Quantitative Finance, Cosima is on a constant quest to find new data-driven insights from our product usage. She enjoys talking to our users and discovering new ways in which we can help them become more successful.