Technology

Responsible use of data

Reliable figures offer indispensable information for getting a clear idea of a country’s level of development and promoting progress. The conventional method of data collection is to actively request and gather data, but information technology (IT) is now making available large amounts of automatically-generated data. Huge data sets are referred to as big data because of their volume and complexity. The questions are how – and even whether – to use them.

by Tobias Knobloch

by Julia Manske

11.01.2017

Koene/Lineair

Mobile phone and social media use generate large volumes of data: mobile phone user in northern Kenya.

Data play an important role in development. In agriculture, for instance, reliable sets of data can help to predict crop yields. The International Center of Tropical Agriculture (CIAT) has developed a computer programme in cooperation with the Colombian Rice Growers Association to predict dry periods. Based on weather data from the past 10 years, the system assesses how plants react to changing weather and soil conditions. It helps farmers to avoid unprofitable sowing. Thanks to such information, they were able to save almost $ 3.8 million in 2013.

In many countries, however, statistical services are weak. Data are poorly maintained and incomplete. They offer rather limited information concerning development (in particular socio-economic needs), vital and population statistics, both at national and subnational levels. Today, the spread of digital technologies is raising hopes that new kinds of data which are inadvertently generated when people use mobile phones and other digital devices may not only fill the gap, but even facilitate better insights than conventional data made possible. In addition, data is actively created by users, for example by volunteers uploading information to internet platforms (see box).

In 2014, for example, the research arm of the American IT company IBM studied the spread of Ebola in West Africa. In cooperation with the Open Data Initiative in Sierra Leone and the University of Cambridge, IBM conducted a big data analysis. The idea was that user-generated data would help to track the spread of the virus. IBM also set up a system for residents of Sierra Leone to report new cases of Ebola by sending a free SMS or calling a mailbox (on the Ebola crisis in Sierra Leone see Anne Jung in e-Paper 2016/08, page 23). The data ultimately helped IBM mobilise life-saving health services and deliver important resources like medication and hygiene products.

Other examples

A well-known example of user-generated data is Kenya’s Ushahidi platform. It was used for the first time during the political unrest in the country in 2008. The platform allowed citizens to report violent clashes by SMS or online. The resulting data was then displayed on a digitised map. Since that time, the platform has been used in a variety of situations, for instance to improve the coordination of humanitarian aid after earthquakes. User-generated data offers new opportunities to give the unheard a voice.

In addition to user-generated data, we see another trend in using inadvertently generated data to get new insights. So called big data results from the automatic analysis of large – and often unstructured – data volumes. Technology firms like Facebook and Google use such data to get precise information about the personal preferences and behaviour of their customers. Data that is automatically generated when people use digital devices can provide rather solid information concerning socio-economic patters, gender and age of users. Unsurprisingly, these new opportunities make some people hope that new sources of data will deliver better results than traditional sources. They believe that existing information gaps might be closed in many world regions this way.

Inspired by such examples, many international agencies have begun initiating pilot projects and investing in research on data programmes. The Organisation for Economic Co-operation and Development (OECD), the World Bank and others established the Partnership in Statistics for Development in the 21st Century (Paris 21) in order to help partner countries to improve their statistical systems and use new data sources. The UN organisation Global Pulse makes development-related results of big data analyses available. The Global Partnership for Sustainable Development Data is a consortium of organisations dedicated to improving data in order to evaluate progress towards the SDGs. Often they call for a “data revolution”.

Limits and risks

Many of these data are not publicly available, however. During the Ebola crisis, civil-society organisations unsuccessfully tried to convince private-sector companies to give them access to mobile phone data, for example, in order to get a better idea of how the disease spreads and where help is most urgently needed. The companies denied such requests because of legal ambiguities and profit motives.

Mobile providers like Telefónica and Orange have begun to release some mobile phone data (so-called call detail records) for research purposes. But it is problematic for them to act as the rightful owners of these data, as opposed to the people who generated the data in the first place. However, if development agencies want to use such data, they depend on the good will of the corporations. The power imbalance between the corporate sector and development agencies is growing bigger and bigger.

Protecting privacy is another huge challenge (see Nanjira Sambuli in e-Paper 2016/06, page 34). In many countries, data protection laws are inadequate or do not even exist, so data are analysed without the prior consent of users. That is illegal in Germany. Moreover, the safe anonymisation of data cannot be ensured anymore when large data volumes are collected and collated. As data are often automatically cross-referenced, computerised tools allow individuals to be re-identified from insufficiently anonymised data sets.

Another challenge is the quality of data. Regardless of the amount there is no guarantee that the data are informative and of good quality nor that they fulfil the statistical standards of representation. On the contrary, in view of how many and what kind of people around the world are using digital technologies, it is obvious that the volume of data generated by these technologies says nothing about how representative they are of any given population. Many studies have shown that big data analysis in particular often provides distorted glimpses of reality. Moreover, raw data can be manipulated, which leads to faulty insights that might influence decisions.

The opaqueness of the data-processing algorithms pose additional risks. Simply put, algorithms are mathematical formulas that derive useful information from raw data. Algorithms reflect social relations, so they are not objective in any way. They must be designed and used responsibly since it is well understood that bulk data may reflect or even exacerbate existing forms of discrimination.

In Florida, for example, the police wanted to assess the future threat posed by criminals in order to determine whether they should be released from prison. The computer programme relied on bulk data. It falsely and unfairly predicted that African-Americans were twice as likely to commit new offences as white Americans. The programme was praised for being especially neutral, but since the algorithms were fed data from past convictions that were marked by centuries of discrimination against African-Americans, it really only made matters worse. International development agencies must pay attention to problems of this kind.

While the use of data presents great opportunities, development agencies must heed the risks sketched out in this essay. Otherwise, the data revolution may turn out to be a Trojan horse. What is needed is an informed debate on the responsible use of data worldwide. Non-governmental organisations and citizens in developing countries must be actively involved.

Tobias Knobloch leads the project “Open Data and Privacy” at the Stiftung Neue Verantwortung (SNV), a Berlin-based think tank.
tknobloch@stiftung-nv.de

Julia Manske also leads the project “Open Data and Privacy” at the SNV.
jmanske@stiftung-nv.de

References

De Montjoye, Y.-A., C.A. Hidalgo, M. Verleysen and V.D. Blondel, 2013: Unique in the crowd: The privacy bounds of human mobility.
http://www.chidalgo.com/Papers/2013/Unique_in_the_Crowd_srep.pdf

Nyirenda-Jere, T. and T. Biru, 2015: Internet development and internet governance in Africa.
http://www.internetsociety.org/sites/default/files/Internet%20development%20and%20Internet%20governance%20in%20Africa.pdf

Open Definition 2.1:
http://opendefinition.org/od/

Oxfam, 2015: A rights based approach to responsible data.
http://policy-practice.oxfam.org.uk/blog/2015/08/a-rights-based-approach-to-treating-data-responsibly

Pasquale, F., 2015: Digital star chamber. Aeon Essays.
https://aeon.co/essays/judge-jury-and-executioner-the-unaccountable-algorithm

Taylor, L., 2015: In the name of development: power, profit and the datafication of the global South.
http://www.academia.edu/13226191/In_the_name_of_Development_power_profit_and_the_datafication_of_the_global_South

UN, 2014: A world that counts. Mobilising the data revolution for sustainable development. Prepared by The Independent Expert Advisory Group on a Data Revolution for Sustainable Development.
http://www.undatarevolution.org/report/

World Bank, 2014: Open data challenges and opportunities for national statistical offices.
https://openknowledge.worldbank.org/handle/10986/19984/