To understand AI Bias, we need to understand Dataset Bias. Collecting, labelling, and organizing data is a time consuming and expensive effort. Many popular datasets in the artificial intelligence community can take years to produce and publish. This effort requires a large amount of resources, and does not make dataset creation a small or efficient task. Since it’s impractical to create a dataset with all possible permutations and domains, all datasets have some form of bias in them. This limitation in data causes lower performance and decreased generalization across unrepresented domains.

The simple answer is to create more data, but…

