As used in this article, bias is similar to the Wikipedia definition, namely, a disproportionate weighting of factors in an unfair way, using underrepresented samples of a population, or an estimation process that does not give accurate results on average.
What do you think the response would be if you asked 100 random people the following question? Which do you think is most capable of analyzing data and providing an accurate and impartial answer, a human or a computer? I suspect an overwhelming number of responding people would choose the computer. Why? Because most people think of computers as unemotional neutral devices that respond with answers based on facts and mathematical computations. The proposed question is probably wrong because humans create computers and design and train the systems to make them work. As these computer systems are created, the argument goes, they reflect the biases of their human creators. Wow!
The idea of AI bias is not a wild or fringe concept. The subject has been discussed extensively in scholarly writing, news articles, and practical “how-to” pieces, and the existence of AI bias has been documented within numerous AI applications.
First, I must caution readers. This article emphasizes early and sometimes worse-case scenarios of AI bias. However, every year, there has been improvement in limiting the bias effect in AI applications. As I discuss instances of AI bias below, understand that today’s concerns may not be the same as tomorrow’s. While improvements in AI results are occurring continuously, AI creators and users must be vigilant to avoid the law of unintended consequences, such as the solution to one problem causing a new problem, or when one problem is solved, another problem that previously went unnoticed becomes apparent.
Well-documented instances of AI bias occurred within Amazon’s corporate structure, which has been remarkably successful in using AI to analyze its customers’ purchases, make predictions regarding their future needs, and create more efficiencies with their prominent use of robots in the Amazon distribution centers. Amazon scrapped two separate AI personnel tools. One tool was used to review résumés in an effort to find the best job candidates. Eventually, Amazon abandoned the tool because it was biased against women. The AI tool was trained on 10-year-old information about Amazon’s previous hires in which women were underrepresented. Consequently, the AI application favored men over women.
Amazon made another effort when it attempted to develop an AI tool to autonomously search the internet for candidates deemed worthy of recruitment. The team created 500 computer models to recognize 50,000 terms in past candidates’ résumés. Eventually, this project was abandoned. An example of the bias discovered in the computer models was that it assigned a higher value to terms such as “executed” and “captured,” which were commonly found in the résumés submitted by male engineers. In addition to instances of gender bias, and apparently based on the data used to train the computer models, the AI application often recommended unqualified candidates for jobs and, in some instances, seemed to recommend candidates at random.
Facial recognition technology is an example of AI technologies that started with severe criticisms because of biased results, which have improved substantially since its early days, according to research by the National Institute of Standards and Technology (NIST). One researcher uncovered significant gender and racial bias in early facial recognition AI systems in that they performed substantially better on male faces than female faces. In addition, the systems had error rates of about 1 percent for lighter-skinned men and 35 percent for darker-skinned women. Also of note, several early systems failed to correctly classify the faces of Oprah Winfrey, Michelle Obama, and Serena Williams, for example, identifying them as a gentleman, young boy, or young man, and misinterpreting their hair as a cap or a headpiece.
Although facial recognition is now a feature of most smartphones and the software’s accuracy today is much improved, the use of facial recognition technology just a few years ago was the subject of significant controversy due to the misidentification of Black people compared to whites. In 2019, San Francisco banned law enforcement’s use of facial recognition technology. Within two years after the San Francisco ban, at least 16 municipalities enacted similar local bans, primarily because of perceived AI bias. Indeed, California enacted a three-year statewide ban on the use of the technology starting in January 2020. This year, in January 2024, legislative bills were filed in the New York Assembly and Senate banning law enforcement’s use of facial recognition and other biometric surveillance technology. However, due to improvements in the technology, the opposition now centers more on individual privacy rather than AI bias.
AI risk assessment tools for defendants facing criminal charges and predictive AI applications for police patrol practices also have been criticized. Algorithms used for risk assessments to predict the chances that the individual would commit another crime were later assessed to have biases against Black people. Unfortunately for the subjects of those risk assessments, the scores were used to determine bail, sentencing, and parole.
Predictive AI applications are used to identify specific areas as hot spot locations for officers to expect trouble when on patrol, and, based on the predictions, police will allocate resources to meet the anticipated need. The early predictive AI applications were trained on historical data, which, not surprisingly, attributed priority to the areas where more arrests historically occurred. Unfortunately, practice indicates that additional police resources in those areas increased the likelihood that police would stop or arrest people in the same locations, thus reinforcing the historical pattern and whatever biases were baked into the training data.
An often-cited 2016 investigative article in ProPublica reviewed several instances of risk assessment scores that were questionable. One of the cited risk score discrepancies on which the article based its conclusions of AI bias concerned the cases of Brisha Borden and Vernon Prater, who were arrested on separate occasions in Florida. Borden, an 18-year-old Black female, was arrested when she and a friend grabbed an unlocked bicycle and scooter in their neighborhood and started to ride away. Before they got away, the two were arrested and charged with burglary and petty theft of items valued at $80. Borden had four juvenile misdemeanor offenses in her past. She received a high-risk assessment score of 8 out of 10. Prater, a 41-year-old white male, was picked up for shoplifting $86.35 worth of tools from a nearby Home Depot store. Prather’s previous record included an attempted armed robbery offense and two armed robbery offenses for which he served five years in prison. Prather received a low-risk assessment score of 3 out of 10. Everyone agreed the AI risk assessment got it wrong. Borden had no subsequent offense, but Prater had a subsequent offense of grand theft.
Final Comments
Again, as expressed above, this article aims to raise the readers’ consciousness and awareness to be ever diligent about unquestioningly accepting AI results. To the extent that AI bias exists, Sam Altman, the founder and CEO of OpenAI, the creator of ChatGPT, has offered his opinion that AI systems will eventually fix themselves. He bases his conclusion on a technology called RLHF (reinforcement learning from human feedback). Altman accepts that early AI systems may have reinforced their creators’ biases.
Unfortunately, however, even if we reach the point of AI systems correcting their built-in biases, there is another type of bias that could defeat that outcome even if the AI application provides an accurate prediction, namely, human review bias. That includes a situation where the AI makes a correct prediction, but the human reviewer negates the result with their own bias. For example, the human reviewer concludes the AI prediction is wrong because the human reviewer knows the people in the designated neighborhood and is confident they would never act the way the AI system predicts. In such a situation, all the work done to produce an AI that correctly predicts the results would be undone by human review bias, the same issue that started us looking to computers to produce a fairer result.