25 Jun 2021

Controlled vocabularies should no longer be created and used because they are biased.

Blogpost by Alison Kindegran. MLIS student at UCD. Graduating in 2022.

"LCSH" by Travelin' Librarian / CC BY-NC 2.0.


Controlled vocabularies are organised and arranged word and phrases used to retrieve items via navigation and searches. The purpose of this essay is to form an opinion on the narrative:

“Controlled vocabularies should no longer be created and used because they are biased.”

To form a well-rounded opinion extensive research was conducted. While it was easy to find many articles and journals agreeing that controlled vocabularies are biased and therefore should no longer be used, the argument to continue the use of controlled vocabularies was under represented. There were by far more challenges noted than benefits. However, the writer of this essay did take into consideration all points and did not allow the majority of those in agreement with the statement to sway their opinion from the outset and used self-debate for both sides with supporting articles for each side to form the understanding of both sides and ultimately to form a conclusion.

Benefits of Controlled Vocabularies:
The main areas that are beneficial are:

•    Users with limited knowledge of a topic:
It is believed that even if a user has limited knowledge on a topic they wish to perform a search, the main benefit of controlled vocabularies is that once you have the heading then all classifications or variants will be found.  

•    Vocabulary Deficits:
The use of controlled vocabularies is beneficial for vocabulary deficits; this is where one user may have limited terms on a subject. The benefit is, if they have one term it will still retrieve their search, showing the additional terms available thus adding new terminologies to the user’s vocabulary which would close the gap of deficit in the future and an additional benefit to the user.

•    Topics/Concepts Covered:
Controlled vocabularies guarantee the topic and concepts are covered in the article if the subject heading is listed this assures the topic is covered in the article. Controlled vocabularies can make a search more specific. The hierarchy structure will go from broad to specific as the user goes through the headings where they are narrowed down.

Challenges of Controlled Vocabularies:
The main challenges of controlled vocabularies:

•    Synonymous Concepts:
There are a number of popular used examples when researching the synonymous challenges and the most common examples for this are around soda, pop, soda pop and coke. Soda pop and coke are examples of words that often represent the same idea, or thing. However, those are used differently in different regions and some regional dialects use different terms altogether. The author would not use these words at all. As the author is from Ireland, the words used would generally be fizzy drink, mineral. Also here in Ireland, there is no use of a particular brand or drink type used interchangeably to refer to a number of drinks just the drink noted. For example, coke is used in the above example as it referring to any fizzy which seems to be acceptable in the USA. However, Coca-Cola also known as Coke is to refer to this drink only in Ireland as Coke would be considered completely different to Fanta Orange (alternative brand and drink type) this would not be used interchangeably.

•    Word Form:
Word form is also a challenge. An example would be the word “Online”. The writer would use online all in one and not hyphenated. However, the word online is also acceptable in other formats such as: on line and on-line.

•    Homographs:
Words that look that same but have different meaning, these may or may not be pronounced the same. The pronunciation is not an issue but rather the same spelling of different meanings which would result in the incorrect result. For example, if the user wanted to search bat referring to the animal their result would also include bat referring to the sports equipment. One could argue that the use of qualifiers would remove any issues with this for example: bat (mammal). Similar issues arise with homophones, words that sound the same but are spelled different such as fowl and foul.

The above is not an exhaustive list but an example of the main benefits and challenges with controlled vocabularies. So where is the discussion or argument for bias?

The writer would argue that the bias begins with ignoring that it exists. In the research carried out the challenges that were noted by those who overall would advocate for the use of controlled vocabularies did not address bias at all. This is a concern as not addressing these concerns from those who believe that controlled vocabularies are bias is not helpful in their efforts to portray the benefits and advocate for its use. By and large, those who advocated for the use of controlled vocabularies where libraries, universities or individuals associated with these libraries or universities where controlled vocabularies are used and/or a taught subject. The writer of this essay believes that as it is used and/or taught they do not wish to portray it in any negative light. However, the writer believes that anyone can advocate for something while still addressing any issues it has such as bias.

Ways in which controlled vocabularies are bias:
•    Outdated Terminology:
This is particularly the case in racial categories. There are not reflective of current terms and inclusive of all racial groups.

For those who identify as non-binary, the LCSH term “gender non-conforming people” is an exact match for “non-binary people”. Being gender non-conforming is not the same as being non-binary, although some will identify with both terms. People do not have to be non-binary in order to be gender non-conforming.

•    Too simple in terms and non-representative:
An example of this would be that all First Nations groups are not part of the LCSH or Library of Congress Name Authority File.

•    Non-Proactive Approach:
Most would accept that historic terms for certain groups of people could be described as historical, inaccurate, non-representative and offensive so would expect changes to be made. However, a major issue is the non-proactive approach in addressing these required changes. LCSH as an example have been slow to update the changes. There are a number of resources which would help them make these changes including representing bodies or groups if they were not fully educated on the correct terms. They are freely available.

•    Language Vocabularies:  All languages are not included.

With a number of clear bias found as set out above consideration was given to what the alternative would be if controlled vocabulary was no longer created or used. The result being keywords being used as an alternative. Using keywords, non-controlled vocabulary or natural language as it is also known has a set of advantages and disadvantages of its own which are outside the scope of this essay. In summary, these searches would likely result in a broader search which may include non-related topics.

It is difficult to portray in general terms the impact that these biases have on those they impact. In order to highlight that impact the writer will include an individual focus piece as part of this essay. The piece will focus on one particular individual and their experiences.

Individual Focus: Safiya Umoja Noble

Safiya Umoja Noble is an Associate Professor at UCLA in the Departments of Information Studies and African American Studies. She is also an author of the book Algorithms of Oppression: How Search Engines Reinforce Racism (NYU Press). She has also done a number of Tedx Talks including one entitled “How biased are our algorithms”.

Safiya talks about her mixed-race background and how she had grown up during a time of cultural revolution. The 1970’s was when civil rights movement was highly active and the Black Power Movement was “creating change for how African Americans saw themselves” (Noble, 2014).

Her white mother was aware of a previous study that was done in the 1940’s where black children were given a black doll and a while doll and asked which doll was the best, the prettiest, which one did they like the most? The black children would all pick the white doll. Her mother tried to instil her daughters’ pride in her own black heritage and did not want her daughter to feel negative thoughts and feelings towards her own race.

This experience led Noble to research this further in 2009. One of the first searches she conducted in 2011 was a simple entry into Google “black girls”. The results were of a pornographic nature where they sexualised black girls. In contrast a search for “white girls” returned blonde haired blue eyed general pictures of girls with no implied sexualisation for this general search term in the results. She worked to change this and only months later the algorithm changed and this result in a search for “black girls” is no longer the case.

She is advocating for the pursuit of socially responsible information and technology. She talks about the exploitation of children in the Democratic Republic of Congo where they work in deplorable conditions to source parts required for technology. She wants everyone to consider the part they play in all the above.

In her book, Algorithms of Oppression: How Search Engines Reinforce Racism (NYU Press) she talks about how the public generally views what they find on places like Google to be credible and fair. However, her research found that it was largely misrepresentative. Taking marginalised and oppressed people and making them further marginalised and oppressed with their racist and sexist algorithm bias. She talks about how it is designed to make some heard and others silenced. She finishes the book by seeking alternatives. She argues that there needs to be searches that are for public interest and not driven by marketing and money-making schemes.

The writer of this essay would agree that they also would have considered Google or any other search engine to be credible and fair. While the writer would not consider it the best resource for research, it is the most used way to source information in the modern digital age. The term “Google it” is often used as a response to a query raised by others rather than suggesting an alternative search engine or other resources. It has become an easy and acceptable form of mass use for information. The writer has used it for personal use and to find that the results were most likely skewed in a way that further marginalised people is alarming and seriously problematic. It has truly opened the writers mind to the impact of this on those it affects and the public at large.


There is no doubt that bias exists within controlled vocabularies. The bias has been formed due to historical bias, intentional bias, unintentional bias and unconscious bias. The individual focus piece opened up the problems further. However, it is the writer’s belief that the use of controlled vocabularies remains beneficial. The writer believes that with a number of changes the bias can be removed and the benefits of controlled vocabularies will remain. The end of controlled vocabularies does not address the biases but rather shifts it somewhere else. That is why the recommendations will focus on removing bias and making better more inclusive controlled vocabulary options.  The writer offers a number of recommendations to address and remove forms of bias within the controlled vocabularies. The benefits can be advocated for while making changes to the issues. It does not have to be one or the other.


Education: While some bias will have been intentional the writer would like to believe that these are historical and that no truly inclusive database owners, creators and users would like to continue with these bias terms. It is fully acceptable that the desire to create change is there but the knowledge on terms may not. This is where the education around the terms will be required. There are a number of international and local groups that can assist with correct terminology who should be sought out and collaborated with to bring about these changes.

Support the change:  Change is a difficult transition for most people. Change is usually met with resistance. In order to support those affected by this bias, first of all, support change. This is an opportunity for those affected by bias to be supported and to support those addressing the change with guidance and education.

Evolving terminology: Understand that this change will be ongoing. A term that was acceptable decades ago can be considered outdated or offensive today. Understand that the same will be true for words used right now. They become obsolete and new terms will inevitably develop over time.  
Understanding: Just because a term does not offend or affect one individual, it must be understood that it may not be appropriate for those who are part of a particular community or group. Don’t use outdated terminology and don’t accept it within your controlled vocabulary or indeed anywhere else. Advocate for your family, friends, colleagues and even strangers who are part of these marginalised communities.

Accept Feedback: Provide an option for users to give feedback on existing controlled vocabularies. Those who are affected are best placed to assist with change.

Show Progress: All words which are under review for removal or areas where terms are to be created could be highlighted. The changes will take time and overall will be an ongoing challenge. Let everyone know that the bias is being challenged and addressed from within.

Proactive: This is one of the most important steps. Without being proactive and creating the changes required the argument turns in favour of no longer using controlled vocabularies due to bias.



Post a Comment