AI for SDGs - Its All About the Data

I recently came across the following article: “Surveying Attitudinal Alignment between Large Language Models vs. Humans towards 17 Sustainable Development Goals.” For those who aren’t aware, in 2015 the United Nations endorsed the “2030 Agenda for Sustainable Development” that included 17 Sustainable Development Goals (SDGs), along with 169 specific targets, aimed at addressing a range of global development challenges. As the name suggests, the goal was to achieve the SDGs by the year 2030. Last year, however, the UN reported that because of the effects of COVID-19 we have fallen behind on our progress towards achieving the SDGs by 2030:

Of the approximately 140 targets that can be evaluated, half of them show moderate or severe deviation from the desired trajectory. Furthermore, more than 30 per cent of these targets have experienced no progress or, even worse, regression below the 2015 baseline.

According to the report, the impacts from the COVID-19 pandemic stalled three decades of steady progress in reducing extreme poverty, with the number of people living in extreme poverty increasing for the first time in a generation.

More optimistically, a recent report from Nature entitled “The Role of Artificial Intelligence in Achieving the Sustainable Development Goals” found that AI could help achieve 79% of the SDGs.

While AI could help achieve 79% of the SDGs, the authors of this article are asking if AI will actually help. The premise of the article is straightforward: “LLMs have the potential to enhance productivity and innovation, promoting the achievement of SDG targets associated with decent work and economic growth. However, if the benefits are unevenly distributed or disproportionately reward those with higher skills, the deployment of LLMs may exacerbate inequalities, widening the gap between high-income and low-income individuals.”

Its All about the Data

What really stood out to me was the prominent role that the authors believe data and datasets will play in ensuring that AI is a net benefit towards achieving the UN’s SDGs by 2030. It is fair to say that the lack of actionable and unbiased datasets is the single biggest impediment to AI actually helping us achieve the SDGs — or makes things even worse!

SDG1 No Poverty – “there are biases in data collection and analysis. The datasets relied upon by LLMs often contain biases, leading to disparities between the understanding of poverty situations and the actual conditions. Data related to impoverished areas are often prone to being missing or incomplete.
SDG 2 Zero Hunger – “Since LLMs are usually trained on data from different

sources around the globe, they can not adequately take into account the cultural, social and traditional aspects of a particular region. This can lead to bias or incomplete consideration in proposing solutions.”
SDG 3 Good Health and Well-being – “LLMs are susceptible to social biases when collecting data from the internet, as viewpoints on the internet inherently carry biases.”
SDG 4 Quality Education – “LLMs can be constrained by language and cultural biases in training data, leading to outputs that favor mainstream cultures or languages while overlooking the needs of other cultures and languages, specially in extensive underdeveloped and island countries.”
SDG 5 Gender Equality – “Gender inequality in human society can also be reflected in data. For example, traditionally, men are more represented in the field of technology, leading to gender inequality in certain datasets.”
SDG 7 Affordable and Clean Energy – “LLMs can be constrained by the training

data, potentially failing to capture specific circumstances and constraints in all regions. For instance, models trained on large-scale datasets can overlook situations where small communities cannot utilize clean energy due to geographical or cultural factors.”
SDG 8 Decent Work and Economic Growth – “If the training data for LLMs mainly come from mainstream groups or specific social classes’ economic data, their evaluations will reflect the government’s or organization’s stance and preferences, leading to bias towards a certain economic theory or political position when assessing and recommending strategies for decent work and economic growth.”
SDG 9 Industry, Innovation and Infrastructure – “the availability and quality of data constrain LLMs, leading to biases in understanding these issues.”
SDG 10 Reduced Inequality – “data bias poses a challenge. LLMs are constrained by the data they are trained on, which can lead to biases, resulting in a lack of full understanding of inequality situations for certain groups or regions.”
SDG 11 Sustainable Cities and Communities – “LLMs can be influenced

by data biases, resulting in an inadequate understanding of the realities within cities and communities.”
SDG 12 Responsible Consumption and Production – “decisions made by LLMs can be influenced by biases or incomplete training data. For example, if training

data has biases against specific communities or regions, the model can tend to recommend or evaluate consumption and production methods tailored to these groups while overlooking the needs and impacts of other groups.”
SDG 13 Climate Action – “while LLMs are trained on vast amounts of information and analyze complex data, outdated information within LLMs can compromise the integrity and accuracy of analyses and insights. Furthermore, reliance on large datasets by LLMs can lead to erroneous conclusions and recommendations due to the presence of outdated or inaccurate information, thereby undermining the reliability of the decision-making process.”
SDG 16 Peace, Justice, and Strong Institutions – “These tendencies stem from their training data, posing challenges to ensuring neutrality in social justice and legal contexts. Clearly, training data for LLMs can reflect biases of specific races, regions, or cultures. This can lead to biases in the models’ understanding of peace, justice, and institutional power across different cultural backgrounds.”
SDG 17 Partnerships for the Goals – “Ideally, ChatGPT’s attitude is neutral; it is neither personalized nor biased but relies on its training data to provide

information and analysis. … However, in reality, its training data can contain biases, leading to skewed or incomplete representations of sustainable

development issues. It is generally believed in the existing research, LLMs like ChatGPT are prone to embedding biases present in their training data.”