Contributing Editor David J. Hand, Imperial College London, writes:
“Algorithm” seems to be the word of the moment. To those untrained in quantitative disciplines it can seem to be imbued with magical properties. Algorithms are what underlie speech recognition systems, self-driving cars, and optimal sat-nav route-finding systems. They are what lie at the heart of artificial intelligence systems and surfing the world wide web.
But those in the know understand that algorithms are not magical. They know that an algorithm is just “a process or set of rules to be followed in calculations or other problem-solving operations” (according to the Oxford English Dictionary, which then adds “especially by a computer”). They appreciate that algorithms of one kind or another pervade our lives. A baking recipe is an algorithm: you prepare the ingredients, add them to each other, mix and stir, and place them in the oven at a specified temperature for a given time, to produce a cake. The formula for a manufacturing process is an algorithm: the raw materials are extracted and subjected to various highly specified and tightly constrained transformations, combinations, and other changes. The calculation of a credit score is an algorithm, as is a medical diagnosis, in each case collecting and combining relevant information according to given rules.
Not all algorithms are apparent. You don’t see the algorithms controlling your printer, or transmitting the signals when you make a credit card transaction. They are working behind the scenes. But many others are very apparent: they interact at some level with humans, and they do this in a social context. And this can pose special challenges. Given a clearly specified and unambiguous objective, we can write an algorithm to achieve that objective. In many cases we might even be able to rigorously prove that the algorithm is optimal in some sense (to prove that it’s the quickest or most accurate possible, for example). But the human social world is not an abstract mathematical world which generally permits clear and unambiguous specification. Ambiguity, difference of opinion, and woolliness are the order of the day. The data which the algorithm processes might not be clearly-defined. A six-foot man might be tall or short, according to whether he is a jockey or a basketball player. One right answer to a mathematics question may be better than another because of its novelty or the insight it provides.
Moreover, there might be other factors beyond ambiguity about what the algorithm is supposed to be doing, and beyond uncertainty and bias in the data. Consultant statisticians will be familiar with this challenge. Clients will generally have requirements which are more complicated than simply to optimise some well-specified objective. Certainly, they might want an algorithm that will predict most accurately which customers will be profitable, but they will want that algorithm as soon as possible. Better to have a slightly sub-optimal algorithm in time to do some good, than wait months for a wonderful algorithm which arrives too late to take advantage of. As has been said, the best is the enemy of the good.
Other examples of constraints apply in consumer banking. If the aim is to build a model to predict the probability of defaulting on a loan, one will obviously want to build the best model one can. And the best model will use all the information you can get about the applicants: more information can only help the predictive accuracy of the model. But considerations beyond accuracy, arising from the social and ethical context also apply. For example, for sound reasons of fairness, such models are precluded from including certain “protected” characteristics, such as disability, sex, and religion.
In the UK in 2020, the Covid-19 pandemic led to the cancellation of the “A-level” examinations. These are the final school examinations, the grades of which determine students’ admission to universities. A sophisticated algorithm was constructed, aimed at predicting most accurately, on the basis of the data that was available, how the students would have been likely to perform in these examinations. But the algorithm, sophisticated though it was, encountered a substantial backlash from the general public. This might have been foreseen: the predictions referred directly to individuals and moreover, they all occurred at the same time, so that a coordinated reaction was inevitable. But the point is that public perception, public understanding of what the algorithm was trying to do and the difficulties of doing it, and public acceptance that the algorithm was a good attempt at meeting a difficult (some would say impossible) challenge was crucial to it being adopted. In the end it wasn’t adopted: public reaction led to the algorithm being discarded, being replaced by teacher predictions, which have substantial problems of their own.
In short, technical accuracy of a predictive model used in a social context is not enough. It is necessary to take into account the human and social conditions and constraints under which the model must function.
Without that, success is impossible.