The effect of preconceptions on the results of machine learning

In her book “Weapons of Math Destruction”, Cathy O’Neil argues that in order for a data model to be “healthy”, it should be transparent, continuously updated, and have statistical rigor. In harmful models, data relevant to the outcome that is being predicted is often lacking. Instead, it is substituted by proxies.
One example of models perpetuating inequality is an algorithm used to predict prisoner recidivism rates in the United States called Correctional Offender Management Profiling for Alternative Sanctions, or COMPAS1, which is the product of a for-profit company, Northpointe. The idea was to make criminal sentencing fairer by removing human bias.
The COMPAS software used defendants’ answers to a questionnaire to predict how likely they were to reoffend. The questionnaire included questions such as:
“Was one of your parents ever sent to jail or prison?”
“How many of your friends/acquaintances are taking drugs illegally?”
“How many of your friends/acquaintances served time in jail or prison?”
“In your neighborhood, have some of your friends or family been crime victims?”
“Do some of the people in your neighborhood feel they need to carry a weapon for protection?”
One can see how these types of questions, when used to assess recidivism, would further marginalize people from less privileged backgrounds, causing them to receive harsher sentences than someone from a more privileged background who had committed the same crime. At the same time, someone who is likely to commit more violent offences might be let off easier due to their background, potentially leading to more victims of violent crime.
In 2009, Northpointe co-founder Tim Brennan and colleagues published a validation study, according to which their algorithm had an accuracy rate of 68 percent in a sample of 2,328 people. Brennan says “it is difficult to construct a score that doesn’t include items that can be correlated with race — such as poverty, joblessness and social marginalization”, and removing those factors reduced accuracy.
According to an article published in ProPublica1, black defendants were twice as likely as white defendants to receive a score indicating that they were at a high risk to reoffend, yet not go on to do so. ProPublica’s analysis on data from Broward County, Florida also showed the opposite mistake for white defendants: They were much more likely to be labeled lower risk but go on to reoffend.
Skin colour | White | Black |
Labeled Higher Risk, But Didn’t Re-Offend | 23.5% | 44.9% |
Labeled Lower Risk, Yet Did Re-Offend | 47.7% | 28.0% |
Results from ProPublica’s analysis on data from Broward County, Florida
Another example of a model where proxies were used to measure something immeasurable is the teacher evaluation program called IMPACT, developed by education reformer Michelle Rhee in Washington, D.C. According to Washington Post, “for some teachers, half of their appraisal is contingent on whether students meet predicted improvement targets on standardized tests.”2 It is probably obvious to anyone that students’ scores are affected by more things than their teacher. However, the use of these evaluations led to the firing of 206 teachers in District of Columbia public schools at the end of the 2009-2010 school year. These algorithmic evaluations overweighed positive reviews from school administrators and students’ parents, likely leading to the firing of competent teachers.
These are just a couple of examples of big data algorithms perpetuating inequality or otherwise leading to undesirable outcomes. Similar algorithms are used for many other purposes, such as hiring decisions by companies, university admissions, and granting people insurance or loans.
In conclusion, while the use of machine learning opens up many possibilities for advancing scientific understanding and improving society, we have to be careful that models are built on rigorous data instead of proxies. We should also keep in mind any assumptions or oversights made by models, especially when their use is widespread and can impact people in life-altering ways. This can only be done when the workings of the model are transparent and the data used is continuously updated.
Read more:
[1] https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[2] 206 low-performing D.C. teachers fired - The Washington Post
Kirjoittajasta
Anna Lohikko on edistyneen data-analytiikan asiantuntija