Materializing Data Bias(2)| FMP
18/7/21–25/7/21
🤝Teammates: Tiana Robison, Jinsong (Sylvester) Liu,Luchen Peng.
This week, our group mainly focused on learning about how algorithms to be biased and invited the public ( our target audience) to the workshop to learn their attitude toward algorithms.
Literature Review and Interview with Experts
In order to figure out how bias happens in the algorithm, we conduct a literature review and Interview with related experts online. Combined with literature research,on one aspect, we find that when programmers build algorithm models, they will unconsciously input their own biases into them. On the other aspect, data bias occurs when there is a lack of representation in a data set. This causes algorithms to be biased. Because the decision-making process of algorithms is unknown as a black box, our group decided to focus on the impact of algorithm bias caused by data bias.
Mr. Choi: “There is no such thing as unbiased data at all.”
Hassan:” We are supposed to collect diverse, fair data, but these data often might not be available”
Synthesizing discoveries
After deciding the Subdivision of area (Data bias), I made a summary for our finds.
1. Bias often happen in the stage of Data feeding ( rubbish in, rubbish out)
when data is labeled or classified, data scientists often bring their own bias into it. For instance, most demographic data end up labeled on the basis of simplistic, binary female-male categories. When gender classification collapses gender in this way, it reduces the potential for AI to reflect gender fluidity and self-held gender identity.
2. A problem within a problem
On the on hand, data has bias itself because of the bias which past society has, the algorithm will learn this kind of bias’s relationship. For example. the original data used by the amazon recruitment system is the company’s past employee data,In the past, Amazon hired more men. The algorithm learns this feature of the data set, so it is easier to ignore female job seekers in decision-making.
3. Data collecting‘s dilemma
Although lots of Data scientists have realized the dangers of lacking enough representative data,they still face a dilemma — data isn’t available. In the medical area, Cahan, E. M. et al. (2019) has pointed out that collecting data from mobile phones or wearable devices will make data homogenous and lack demographic diversity. Because the poor, the elderly, and the disabled who might benefit most from optimized medical interventions are least likely using that platform to generate data. Meanwhile, Marginalized populations are possible to unconsent to collect data because of historical mistreatment. A similar dilemma also happens in another area. Mover over, like what Hassan expressed, many companies are unwilling to provide funds in data collecting for the sake of commercial interests.
References
Cahan, E. M. et al. (2019) “Putting the data before the algorithm in big data addressing personalized healthcare,” npj Digital Medicine, 2(1), pp. 1–6. doi: 10.1038/s41746–019–0157–2.
Crawford, K. and Paglen, T. (2019) Excavating AI: The Politics of Training Sets for Machine Learning. Available at:https://excavating.ai/(Accessed: 14 July 2021).
Dastin, J. (2018) “Amazon scraps secret AI recruiting tool that showed bias against women,” Reuters, 10 October. Available at: https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G (Accessed: July 15, 2021).
Kalluri, P. (2020). Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature. https://www.nature.com/articles/d41586-020-02003-2