The Need for Access to European Microdata
by Franz Kraus
Microdata have been a major source for empirical social research since the 1960s at least. Most of this research focused on national aspects, however. The development of appropriate methodological tools and analysis techniques was a major issue. Applied social research very often meant applying modern tools to readily available survey data. In Europe, for a long time working with microdata meant having to make a secondary analysis of data produced by commercial market research organisations or conducting and analysing small-scale academic surveys. In either case, time and space did not play a significant role. Cross-national analysis was mainly limited to the analysis of aggregate data from official statistics. Important cross-national data collections were produced at that time, such as Arthur Banks Cross-National Time Series Data Archive, or Taylor and Hudsons World Handbook of Political and Social Indicators, to mention only a few.
Microdata from government agencies and academic research
Cross-national comparative analysis based on microdata gained importance with the availability of the European Communities Eurobarometer survey data and their distribution within the ICPSR. The success of comparative academic research, e.g. Ingleharts work on post-materialism, clearly demonstrated the research potential of comparative microdata. In addition to that, official statistics in Europe have gradually shifted from using administrative data to using survey data since the late 1950s. Family budget surveys, level of living resp. quality of life surveys, and, above all, labour force surveys were carried out in most European countries (ILO 1992, 1990; Flora et al 1994). It was particularly the Labour Force Survey with its comparatively high standardisation and large sample size as well as its wide and early diffusion in Europe that offered enormous research possibilities for comparative social research from the very beginning. Surprisingly enough, comparative research did not really welcome these new possibilities. There were many reasons for the disparity between research potential and actual recipience within the scientific community, but access to data was certainly a major point. Keeping in mind the linguistic and cultural fragmentation of Europe, it seems reasonable to suppose that the scientific community simply could not manage the exploitation of these sources for comparative research without proper infrastructural services. And a service institution with a comparative orientation that could take care of these microdata did not exist. With the gradual diffusion of national social science data services in the 1980s, the academic community made new efforts to establish cross-national microdatabases of its own (such as the International Social Survey Program or the World Value Survey). The utilisation of surveys from official statistics for comparative purposes actually began with the foundation of the Luxembourg Income Study (LIS) during the 1980s (cf. the contribution of Gaston Schaber in this newsletter). Earlier efforts at the University of Mannheim to establish a comparative microdata archive containing labour force surveys (W. Müller) could not be institutionalised. Again, the tremendous success of the European study group on social mobility and of LIS also in terms of substantive research results clearly demonstrates the great research potential inherent in microdata from official statistics.
Official microdata as a prerequisite for comparative research on Europe
Access to microdata is the prerequisite for an efficient use of the unprecedented wealth of official statistics in comparative research for at least two reasons:
There are many unexplored fields of research that require our attention. Many of the questions that have been studied so far within a national context must become the object of research at the European level. Research on Europe is evidently a task of much higher complexity. If we want to understand the difficulties and the problems encountered in the process of integration properly, we have to come to a more integrated knowledge of the economic, social and political pathways in the history of the countries, in their present developments and their likely futures. The many complex interdependencies between existing structures, politics and social behaviour need to be researched at the European level, and often also within the world system. Labour markets and the welfare state are certainly crucial areas regarding basic as well as applied research. We need to clarify the structures and dynamics of both employment and unemployment across occupations, gender, industries and educational categories at the level of regions. We need a better understanding of the individuals labour market behaviour from the perspective of the family. We need more research on the behaviour of vulnerable groups. We need more comprehensive and systematic research on the interrelationship between economic growth and social differentiation. How do family types and forms, life courses and life styles differ from country to country, what will be their likely future courses, and what will be the consequences regarding social integration? These are just a few and sketchy examples. The numerous working papers and research monographs produced by the LIS research network provide many additional examples, and show that comparative social research can be highly relevant for political decision makers as well. It is clear, however, that LIS and the academic microdatabases briefly mentioned above do not suffice to study the evolution of a European society in a thorough and comprehensive way.
The research agenda, set out in the working programme of TSER (a research programme of the European Commission), confirms that in-depth analyses on social change and integration in Europe cannot be carried out without analyses of individual data. This does certainly not only hold for policy-oriented analysis, but is generally valid. Given the relatively large sample size of official surveys, their high degree of comparability and the repetitiousness of measurement, academic surveys cannot be a substitute. For all these reasons, it would be an incredible waste of public resources, if there was no guarantee that data collected by statistical agencies could readily be made available in the form of microdata for research purposes.
Conditions of access to microdata in Europe
In many countries there is some form of gaining access to official microdata. Legislation (de Guchtenaire/Mochmann 1990) and procedures, however, vary greatly from country to country as well as in terms of statistical sources. According to a recent survey of national statistical institutes in Western Europe, conducted by the Dutch Statistical Office (Cittuer and Willenborg 1991), only two countries, namely France and Great Britain, offer access via public use files. Meanwhile the Italian Statistical Office also offers public use files, and the Anglo-Saxon oversea countries offer them as well (Müller et al 1991).
In a number of European countries (such as Austria, Denmark, Finland, Germany, Hungary, Norway, Poland, Portugal, Spain and Switzerland), access to national microdata is granted for scientific purposes via individual, special contracts. There is great variation, however, with respect to sources for which access is granted as well as the forms of access. Some countries, e.g. Denmark and regarding some sources Germany as well, only allow remote processing, others disseminate data for local use, e.g. Switzerland. More important than differences between forms of access are differences with respect to statistical sources that can be accessed. Only a few countries, including once more France and Britain, allow access to population censuses. Considering its extremely high value for comparative research, it is obvious that comparative research cannot exploit the full potential of official statistics, not even in those countries where access to microdata is granted in principle. Nevertheless, these countries are, of course, have taken the right way.
For Europe as a whole, however, the problem of how to gain access to official microdata has not even been solved in principle. It is true that due to the European microdata archives established at CEPS/INSTEAD in Luxembourg (cf. the contribution of Gaston Schaber in this newsletter) the situation has improved greatly. However, the experiences made with LIS show how time-consuming and tedious it is to organise a cross-national database based on official microdata via bilateral contracts. Although extraordinarily high security standards aimed at preventing the misuse of microdata were implemented from the very beginning, it took years to establish the current database. Apparently the Luxembourg Employment Study, a recent attempt to make labour force surveys available, is making the same experiences. In a way, these academic institutions have to repeat what Eurostat has already done: to harmonise national microdata ex-post. The waste of public money is obvious, and opportunity costs are high. Nevertheless, the impact on comparative research was extraordinary. Many researchers around the world used these data and published highly recognised work - which would not have been possible without this database.
Confidentiality and access to microdata*
(*This section draws on an unpublished paper by Müller and Wirth, 1994)
It is obvious that the current conditions for access to microdata are far from satisfactory. Access to microdata was gradually restricted in Europe in the early 1980s via national data protection measures. In most Western European countries, the possible misuse of microdata has become an issue of political debate, the crucial point being identity disclosure.
The right to ensure freedom of information forms part of the European Convention on Human Rights. But the convention also contains another article which is highly relevant in this context: the right to respect private life. The Convention on Data Protection, which was signed in the early 1980s by the Council of Europe, laid down a number of principles which influenced data protection legislation in many countries as well as within the European Communities. Shortly after that, the Council of Europe passed a recommendation on scientific research and statistics. The recommendation, accepted by the Council of Ministers, recognised scientific research and statistics as a special case and introduced de facto anonymity of individual data as a criterion for data protection. This means that microdata are to be considered safe if the disclosure of identity cannot be achieved unless an unreasonable amount of time, cost and manpower required to identify the individual (Hunstix 1991) is invested.
It is very likely that all of us accept the principle that providing data should not bring any person or institution disadvantages. The securing of privacy and confidentiality is of crucial concern for all data-collecting institutions or persons. In this respect statisticians and scientists have identical interests. Breaches in confidentiality will have negative impacts on both of them, irrespective of whether official or academic surveys are concerned. The conservation of confidentiality and privacy is of mutual interest, and LIS provides again a good example in this context. However, in public debates the risk of intentional disclosure is often exaggerated. In the past, many studies were carried out examining how it would be possible to protect the interest of data subjects by means of procedures that ensure privacy and confidentiality and at the same time secure access to data needed by the users. A recent study carried out by the German Statistical Office (Müller 1991) reveals that it is extremely difficult to identify individuals once direct identifiers and detailed regional information are removed. At a recent conference organised by Eurostat (CEC 1993) several contributions supported these findings. The fact that all data include measurement errors or, for other reasons, are incompatible with the prior knowledge of an invader shows that there is a strong natural barrier against disclosure. Theoretical studies on the risks of identity disclosure have often neglected this factor. Practical experience in countries which have been providing public use files for years show that the scientific community can be trusted. Researchers have scientific interests and subscribe to high ethical standards. The permanent re-enforcement of these standards, for example via formal commitments of data users and their institutions to codes of conduct, and organisational safety measures are an additional safeguard against confidentiality breaches.
The need for a balanced decision
It is obvious that privacy and confidentiality must be balanced against the human right of information. The risk of disclosure exists even if access to microdata outside statistical offices is completely impossible. There is, of course, always the risk that somebody, be it in the statistical office itself or in the interviewer crew, misuses his or her position. However, if one mistrusted these professional groups in the same way one sometimes mistrusts scientists, one would really have to think about closing the offices down. One can imagine how high the costs of non-access are if one considers the enormous wealth of highly recognised research findings of researchers working with the microdata provided by LIS.
Research on the evolution of a European society, on economic, social and political integration can only be done if comparable data on social structures and processes are available. Statistical offices make important contributions not only to information needs. Their contribution to substantive research is also indispensable and highly appreciated. However, due to the enormous research agenda which was described above the involvement of universities and research teams in other organisations outside the statistical offices is also necessary. A few countries, among which Great Britain and France occupy a prominent place, have already developed comprehensive national services to advance the use of national microdata. These may be important and pioneering efforts, but they do not suffice to create a modern European infrastructure for high-quality social research. It is evident that the current conditions of access to microdata are far from being satisfactory.
A research team, for example, interested in doing a historical cross-national study on stratification in the European Union, not to speak of the whole of Europe, would encounter enormous problems to get all the microdata it needs. Although the EU's Statistical Office (Eurostat) has meanwhile acquired a sizeable stock of microdata from national statistical offices, so far no possibilities exist to use these data for general scientific purposes outside Eurostat. According to a Council Regulation currently operative, microdata given to Eurostat by Member States may be used for statistical purposes only. The communication of data to third parties is not allowed. This regulation has been extended to also include microdata communicated by the members of the European Economic Area. Therefore, in order to do a comparative study, a research team would have to consult each statistical office separately to get access to national microdata. The team would have to comply with quite different rules, and in some countries access could even be denied if no native researcher was involved. The amount of money, time and energy necessary to accomplish this task is so prohibitively high that as yet nobody has succeeded in preparing such a comparative study covering all European countries.
For various reasons the statistical office of the European Union is of crucial importance to comparative research. Eurostats contribution to the harmonisation of statistics has been tremendous. The many methodological studies Eurostat organised dealing with the comparability of national statistics, governmental data services and meta-information systems are very helpful for many of us. Its cooperativeness regarding outside researchers is highly recognised. With its limited resources Eurostat tries hard to meet special data needs of individual researchers. The full exploitation of its potential contribution to comparative research is, however, significantly limited by two facts:
In practice, the lack of extra resources to meet such demands leads to a situation where special services, e.g. the extraction of aggregate tables from microdata, cannot develop in such a way that they would meet the needs of comparative researchers. As a result, the scientific community cannot benefit from the enormous efforts made by Eurostat, supported by her national partners, to achieve an international harmonisation of national surveys. The currently operative legislation even leads to a situation where those countries that do allow access to microdata in the national context cannot apply this principle to EU surveys. In Germany, for example, the combination of national and European regulations produces quite absurd results. Since the European Labour Force Survey is integrated into the national micro-census, the German Federal Bureau of Statistics sees no possibility of making the labour force survey available to the scientific community - although access to the microdata of the microcensus is granted.
In order to encourage economic and social research on Europe, some form of access to the Eurostat's microdata must be guaranteed. What is possible within many individual countries must also be possible at the European level. The 1994 draft for a Council Regulation On Community Action in the Field of Statistics (CEC COM 78 final) is a step in the right direction. According to article 17, Access to confidential data which do not allow direct identification may be granted to scientific research institutes, researchers and authorities responsible for the production of statistics other than Community statistics... under certain conditions. It remains to be seen, however, if this principle can be put into practice. It would be essential that Eurostat herself could act as a distributor. Otherwise, a comparativist research team would still have to deal with language barriers and organisational imponderabilities. The draft is still under discussion, but we all hope that there will be a breakthrough at the European level. Certainly not only the research community, but also our political decision makers - both at the EU and at the national level - would benefit from it.
European Communities - Commission (1994): Draft. Council Regulation. On Community Action in the Field of Statistics. Com(94) 78 final. Luxemburg.: Office for Official Publications of the EC.
_(1993): International Seminar on Statistical Confidentiality. Proceedings. Luxemburg: Office for Official Publications of the EC.
de Guchtenaire, P. & E. Mochmann (1990): Data protection and data access. Reports from ten countries on data protection and data access in social research. Amsterdam.
Flora, P. et al (1994): Social Statistics and Social Reporting in and for Europe. Bonn: IZ.
Hustinx, P. (1994): Policy Observations on Privacy and Confidentiality. In: Eurostat, Strategic Issues in Statistical Policy. Luxembourg: Office for Official Publications of the EC.
ILO (1990): Economically active population, employment, unemployment and hours of work (household surveys), second edition, Statistical Sources and Methods, Volume 3. Geneva.
__(1992): Household and Income Expenditure Surveys, Statistical Sources and Methods, Vol. 6. Geneva.
Müller, W. et al (1991). Die faktische Anonymität von Mikrodaten. [De facto anonymity of microdata.]. = Statistisches Bundesamt (Ed.), Schriftenreihe der Bundesstatistik, Vol. 19. Stuttgart: Metzler-Poeschel.
< Müller, W. & Wirth, H (1994). Research Needs for European Microdata and Data Confidentiality. Unpublished paper. University of Mannheim.
Franz Kraus is economist, Managing Director of the EURODATA Research Archive and Co-editor of this newsletter.
University of Mannheim
Phone: +49-621-292 1794
Fax: +49-621-292 1723
EURODATA Newsletter No.2, p.9-12