The Internet Democracy Project welcomes the consultation by the Committee of Experts under the chairmanship of K-Gopalakrishnan, set up by the Ministry of Electronics and Information Technology, on the topic of non-personal data. We greatly appreciate the Committee’s effort to draft a report to understand the concepts and notions with respect to non-personal data before drafting a legislation to regulate it. We would like to thank you for this opportunity to present our comments on this important policy document. In the interest of a transparent process, we hope that the current strategy changes and that all responses will be made public.
At the Internet Democracy Project (https://internetdemocracy.in/, http://genderingsurveillance.in) we work towards realising feminist visions of the digital in society, by researching and analysing power imbalances in the areas of norms, governance and infrastructure in India and beyond, and by providing proposals for alternatives, based on this research, that can lead to a more equal digital society for all. In our analysis of the report’s proposals and recommendations, we have therefore focused in particular on those that relate to data that is of direct or indirect relevance to people, and their potential impact on the data related and other human rights of the Indian people in the digital age. While important issues are also to be discussed with regards to other types of non-personal data, those are, thus, not the subject of our submission.
Unfortunately, our analysis found the report to be based on a number of faulty assumptions which need to be addressed in order for any future regulation of non-personal data to serve the interests of the citizens of India from whom this data is taken. Our detailed comments are as follows:
Furthering surveillance capitalism
This report re-affirms the Indian government’s understanding of data as primarily an economic resource, which has earlier also been articulated in policy documents such as the draft Personal Data Protection Bill 2019 1, the Economic Surveys 2018-2019 2 and the draft National E-Commerce Policy 2019 3. Aligned with this understanding, and in the name of economic growth and the public good, the report lays out a broad agenda in favour of data driven businesses and data driven policymaking. In particular, to bring this agenda into effect, the report prescribes for data sharing ‘to spur innovation and digital economic growth at an unprecedented scale in the country’ (paragraph 7). While paying lip service to the social and public value of data, the report thus emphasises the economic benefits of data accumulation and benefits, and continues to view and portray data as a key enabler for economic betterment of the nation without engaging much with criticisms of such an approach. For instance, paragraph 3.1 of the report attributes the current scorching pace of data generation and consumption to the internet, smartphones and cloud driven apps: these technologies, it is claimed, enable us to unlock the real value of data. What the report ignores is that this is ‘not a naturally occurring phenomenon that is an inevitable factor of developments in technology, but in fact a result of a market where there is both demand for more data and a promise of development from this data.’ 4 Thus, the real reason why we are generating and consuming so much data is because we are living in the age of state-supported and enabled surveillance capitalism, wherein both state and non-state actors are driven to build ecosystems which are enabling, and are enabled by behavioural surplus. ‘However, within Indian policy documents and proposals, the availability of increasing amounts of data is framed as an unquestionable state of affairs, and business models to monetise this data are then framed as an imperative.’ 5 If data collection, sharing and processing were regulated appropriately in the first place, data consumption and generation would also drop drastically.
Such state-supported and enabled surveillance capitalism is coming under increased criticism for its deep impact on our individual and collective well-being and human rights. For example, Couldry and Mejias have shown how, ‘by installing automated surveillance into the space of the self,’ surveillance capitalism puts us at risk of ‘losing the very thing - the open-ended space in which we continuously monitor and transform ourselves over time - that constitutes us as selves at all.’ 6 Similarly, Zuboff has identified surveillance capitalism as a fundamental threat to not only people’s sovereignty but to human nature itself. 7
But even if earlier draft policies and reports relating to data governance in India have therefore been criticised for promoting a state supported and enabled form of surveillance capitalism, this report yet again endorses the same approach. Instead of analysing why the world is awash with data and what harms, in addition to benefits, might accrue from this state of affairs to the Indian people, the report blindly furthers the idea that ‘unlocking’ the value of data through the governance of personal and non-personal data is imperative. Rather than evaluating or even fully acknowledging the range of harms inflicted upon individuals as a consequence of this intense datafication of our world, it seeks to further it, in the name of economic growth. Rather than bridging the socio-economic gaps between communities that exist in the country, the proposed operationalisation of some of the concepts and mechanisms prescribed in the report may in fact further existing disparities, as our analysis below will show.
Perceiving data as a resource
At the heart of the furthering of surveillance capitalism that the report entails is its conceptualisation of data as a resource, as a disembodied object that, moreover, can be easily and fairly unproblematically monetised. At the outset, the report delineates certain key takeaways that perceive data as a means to transform existing businesses and as a solution to various social and economic problems. The report acknowledges that data has social and public value as well, and recognises the concept of collective harm of a community. However, the report fails miserably in articulating these broader values while proposing a framework, instead emphasising again and again the economic gains to be had. Seeing its emphasis on data as a resource, this should not come as a surprise. After all, such an approach facilitates the exploitation of citizens as the raw material for wealth-creation.
According to surveillance capitalists, all behavioural surplus that they collect while we use their products or services can be used to create wealth for these businesses and not the individuals from which it is appropriated. 8 And so, paragraph 3.3 of the report, for example, delineates processes for value creation from data which includes ‘converting information into insights that help in prediction and decision making for revenue/ profit generation.’ Paragraph 3.5 highlights frameworks that are being developed to understand the uses and benefits of data and its value; this includes treating data as an asset which can be monetised directly by trading or building services on top of it. While the report states that data can be viewed from two lenses, economic and informational, it tends to consistently privilege the former over the latter, in the ways here described, thus considerably circumscribing our understanding of the nature of data. 9
Such an approach to data finds its parallels in references to human beings in economic processes as ‘human resources,’ and suffers from the same shortcomings. The phrase ‘human resource’ reduces our value to our economic or even our broader instrumental worth, for the ‘social good’ or ‘public interest’. More fundamentally, it is central to establishing a relationship of subordination between those who constitute the ‘human resource’ provided and those who summon such resources. 10 The reference to our data as a resource achieves much of the same goals.
In practice, however, the reference to data as something that is simply out there, up for grabs, ready to be mined, is not actually aligned with people’s experiences. Whether it is data that we provide, data that is inferred about us, or the impact of decisions about us based on this data or even data others provide - all of these are experienced as deeply embodied and as speaking to our understandings of our selves as much more complex and multifaceted than the concept of ‘resource’ can ever encapsulate.
Moreover, that data’s value lies precisely in its connection to our bodies or persons is further confirmed by how it is approached by those on drives to collect endless data about us. For example, when you have been found to be COVID19 positive and are forced to download a quarantine app, the GPS data of your movements does not have value because it is a resource, but because it is a means for the government to ensure that your body remains in a physical space. Users of these apps experience the functionality of the app as their movements are being watched, not simply their data. 11
Thus, it is apparent that data that is directly or indirectly relevant to us as people is not merely a resource, but is always embodied. The harms that need to be prevented or addressed therefore are also not merely informational harms, but include impacts on our autonomy, dignity, and bodily integrity. In other words, to ensure the rights of persons in the digital age, it is imperative to put our bodies back into the debate. Many of the shortcomings of the report’s proposals stem precisely from its failure to recognise this.
Problematic definition of Non-Personal Data
The report defines non-personal data as data that does not contain any personal identifiable information. It includes data that was never related to an identifiable or identified individual, and data that was earlier personal but is now anonymous.
There are a number of reasons why this broad definition is problematic. First, while the report does foresee a few additional protections where data that relates directly or indirectly to people is concerned, it attempts to provide a regulatory framework that covers data as varied as weather data, data about soil quality, and data about people’s most intimate habits and traits. The discussion around an appropriate governance framework of non-personal data would have benefitted if the governance of data that relates to people and the governance of other data would have each been discussed separately and systematically, even if there likely would be overlap in terms of concerns and possible solutions.
Further, even where data relating to people is concerned, the definition of non-personal data put forward is problematic, because the report perceives personal and non-personal data as binaries. In reality, however, the line of separation between personal and non-personal data is often blurred. In fact, many scholars have noted that the data generated by human activities can never be truly non-personal, as it is always contextualised: it is collected for a purpose in a context, and this has implications within that context and beyond. Just like images captured by a photographer are always framed, selected out of the profilmic experience in which the photographer stands, points, shoots. “Data too needs to be understood as framed and framing - understood, that is, according to the uses to which it is and can be put”. 12
Even if we were to believe that data can be segregated in categories of personal and non-personal data, it is very difficult to ascertain which data falls in which category, as it is subjective and contextual. 13 Purpose, context, time, and technology play a significant role in assessing whether data is personal or not. What may be considered as non-personal data today, may prove to be personal in some other context, with the help of some other technology and in another time. 14 Therefore, what may be non-personal for one data fiduciary at a particular time and context might contain personal data for another. For example,in the landmark Breyer case, the dynamic IP address of a website user was considered personal data for the website provider on the basis of the information not at the disposal of the website provider but available to the internet service provider. 15
This test of contextualisation also applies to another qualification that this report sets to use for segregating non-personal data, i.e. whether an individual is identifiable or not. Scholars state that the possibility of identification must also be judged on the basis of all means used by a data controller and not just on the basis of from where data is procured. Two data sets that separately may not be identifiable when combined may lead to identifiability of certain individuals. 16
Moreover, while the report views anonymous data as non-personal data after the process of anonymisation, it is imperative to note that there is no such thing as bulletproof anonymisation. 17 For example, a journalist and a data scientist unveiled the ease with which they could re-identify individuals from the anonymised browsing history (URLs) of three million German citizens. They claimed that merely ten URLs were enough to uniquely identify an individual by drawing parallels with other easily available public data (such as social media accounts, public YouTube lists among others) 18. This implies that despite applying robust standards for anonymisation, data can easily be traced back to its originating body from an anonymised dataset. 19 20
The definition of non-personal data provided in the report fails to account for the fluid nature of data, that data in different forms, with different controllers, or in different contexts gains a different meaning. Additionally, the definition continues to overlook the limitations of anonymisation techniques. If not revisited this basis of data segregation may have serious privacy implications.
Inadequate addressing of the challenge of collective harms
There is, however, a further problem with the concept of non-personal data as understood in the report: while it recognises that even non-personal data as defined by the Committee might lead to harms to individuals, in particular in the form of collective harms, it fails to provide mechanisms to adequately address these harms. At the heart of this failure lies its conceptualisation of data as a resource and its assumption that data that is not personally identifiable requires fewer protections than data that is personally identifiable. It is these assumptions that drive the report’s recommendation to facilitate the sharing of non-personal data as widely as possible.
Limitations of the mechanisms proposed in the report
There are two ways in which the report seeks to protect individuals from possible harms related to non-personal data. The first is by recognising a distinct category of sensitive non-personal data. In particular, where data relating to people is concerned, the Committee recommends that ‘non-personal data inherits the sensitivity characteristics of the underlying personal data from which the non-personal data is derived’ (para 4.5.V). Further, in recognition of the risks of re-identification in particular (para 4.6.I), the Committee also recommends seeking consent for anonymising data and putting it to use from the data principals, at the same time as data principals provide their consent for the collection and use of their personal data.
The distinction between sensitive and non-sensitive data, while not without value, has its limitations in an age where data is increasingly inferred about us on the basis of proxies.
The recommendation to ask data principals for their consent for anonymised data also hardly provides relief. In fact, it is easy to read this as a move that simply absolves data processors from responsibility and putting the onus of any possible harm on data principals instead.
It is acknowledged that if the data is embodied, the autonomy of that data should also rest on the data subject. However, the consent envisioned in the report hardly enables such autonomy, as it explicitly excludes conditions such as for the consent to be ‘specific’ or ‘capable of being withdrawn’ (para 4.6). The report provides no clarity on what may or may not happen to data after anonymisation and in fact gives the impression that data accumulators are not required to seek consent for the purposes of use post anonymisation. The report merely states that consent should be asked for anonymisation. Once the data is anonymised it shall be treated as non-personal data. This approach is problematic because consent cannot be termed meaningful unless it is obtained for a specific purpose and time and constitutes a genuine choice (which means that not consenting needs to be a realistic option).
Moreover, even where personally identifiable information is concerned, the way consent mechanisms are currently designed is insufficient to obtain meaningful consent from individuals. Consent forms are approached as contracts which lay out standard boiler terms to seek consent for different purposes and services. Not just that, the contracts that are formulated to obtain consent are mostly adhesion contracts, requiring an individual to ‘agree’ to most or all of the terms of service to access a service. 21 Such contracts do not provide a means to negotiate the terms of consent, or alternative means to access services or products online in cases where individuals are not willing to provide consent. 22
Current consent mechanisms, thus, fail to put individuals in control of their data. They frequently force us to consent to, for example, third party data sharing, which is a mechanism central to the development of surveillance capitalism. Unfortunately it is also central to us losing control over what happens with our data. As provisions regarding the anonymisation of data are likely to become just another clause forced on users through adhesion contracts, this report lays the groundwork for only furthering such data sharing practices.
Further challenges and solutions needed
In addition to the above limitations of the mechanisms proposed, there are also a number of collective harms that the report simply does not address at all. What needs to be analysed and understood is how and to what extent can harms be inflicted on human bodies even if the data in its current form may not personally identify individuals. After all, as our world is more and more driven by algorithms that attempt to slot us into boxes, the data that affects us the most is not necessarily our own data but may be data about other people. Moreover, as so much of the value of data lies in its aggregation, demonstrating harm is often really hard to do for an individual, even more so where algorithms function as black boxes.
Thus, Tisné delineates three major ways in which people may suffer harm from non-personal data. 23 The first one, which the report recognises to some extent, concerns instances where an individual is harmed by data that directly relates to them. For example, Quividi, 24 a company that has designed and deployed facial detection billboards, claims that the billboards do not record personally identifiable data and that the data is recorded only for the time an individual is looking at the billboard. 25 Even if the data collected by billboards is not personally identifiable, the technology deployed by billboards has the ability to categorise individuals without their consent and serve advertisements on the basis of stereotypes around, for example, race or color. The mechanisms that the report proposes to address these harms are, however, as analysed above, deeply insufficient.
A second category concerns harms that accrue because it is inferred that an individual is part of a category that is constructed on the basis of other people’s data. For example, machine learning algorithms are being trained to infer with considerable accuracy the sexual orientation of a person on the basis of other people’s physical facial features. 26
A third category concerns harms that result from ‘how machine learning systems are optimised’, 27 without considering the externalities associated with this particular form of optimisation. For example, YouTube’s algorithms are programmed in such a way as to keep people as long as possible on the platform. Seeing that its machine learning algorithm has learned that increasingly incendiary content generally serves that purpose, it serves its viewers precisely that, often leading them down a rabbit hole of extremism. The impact on non-users, minorities and society at large is not taken into account.28
Especially in the latter two cases, neither the concept of sensitive non-personal data nor inserting the notion of consent will provide any form of relief. As Tisné writes:
Data protection, as currently framed, is premised on a relationship between data controllers and data subjects. As technology becomes increasingly sophisticated, that connection between data controllers and data subjects falters. It is not always clear who the controller is nor which subject has been harmed. A legal vacuum arises - and possibly already exists - and accountability falls away.29
The proposals made in the report would ensure such a legal vacuum would be inscribed in Indian law as well, to the detriment of citizens’ rights.
What is required to fully address the possible collective harms that come with the wide sharing of non-personal data as proposed in the report is a focus on power relations. This includes, among other things, ‘1) clear transparency about where and when automated decisions take place and their impact on people and groups, 2) the right to give meaningful public input and call those in authority to justify their decisions, 30 and 3) the ability to enforce sanctions’. This also requires the ability to audit and inspect the decisions of algorithms and big data analytics on society as well as strong regulatory oversight of data-driven decision making. 31 Merely handing over some of these responsibilities to a data trustee, as the report proposes, is not sufficient.
The only partial understanding of the threat of new collective harms posed by algorithms and big data analytics is also reflected in the report’s introduction and deployment of a new concept of ‘community non-personal data’.
The concept of community data was first noticed in the Indian context in a 2018 report titled ‘A Free and Fair Digital Economy: Protecting Privacy, Empowering Indians’ drafted by a committee of experts under the chairmanship of Justice B.N. Srikrishna. While ‘community’ is defined very broadly in the report, it is used to highlight the need for higher protections for group privacy than are currently available, including through class action remedies. The concept re-emerged in a more concerning form in later policy documents, such as the Draft National E-commerce Policy 2019. In this draft Policy, the concept of community has been introduced to bereave individuals of their autonomy over their data and under the garb of vacuum of ownership, transfer the ownership of the community data to government or community leaders. 32 Unfortunately the current report seems to build on this later trend.
According to the NPD Report (para 4.3.I), a community is
any group of people that are bound by common interests and purposes, and involved in social and/or economic interactions. It could be a geographic community, a community by life, livelihood, economic interactions or other social interests and objectives, and/or an entirely virtual community.
This definition is at once too specific and too broad. It is too specific, in that it appears to presume a group consists of a set of stable elements whose properties, including rights, should be taken into account in discussions around non-personal data. While this approach might have value in particular cases, it is unhelpful as a general principle when it comes to data governance because it does not take into account the fluid nature of groups in automated decision making and big data analytics. Groups are not pre-constituted but are created as technologies and related practices focus on particular features for a particular purpose. 33 34 And thus, by presuming, for example, that Uber users are a group, this definition erases from view that within this ‘community’, there may well be a variety of groups, possibly with competing interests, depending on which properties one seeks to highlight. It also erases from view that the people who are included in any such group may well not be aware that this is so, or that even sub groups do not always fall neatly within a larger, more easy to identify group such as Uber users.
The definition is also too broad, however, in that it allows any set of people with common interests to claim specifically the status of a community. In doing so, it undermines the potential value of the use of the concepts of community and community data in data governance as political tools intended to give greater sovereignty over their data to pre-identified communities that have historically been marginalised in particular. Initiatives such as the Maori Data Sovereignty Network and the First Nations Information Governance Centre, for example, do not only seek control over the use of indigenous people’s own data, but also over what data is collected in the first place, as a means to re-establish self-determination and self-governance. 35 Where pre-existing communities are concerned, the concept of community data loses its value if the power relations that exist among such communities are not taken into account and all communities are treated on an equal footing. If the concept of community data has value, it’s precisely because of its potential to attract attention to the specific needs of already marginalised groups and to help right such historical wrongs.
Community data and public participation
The lack of attention for power relations also is evident in the mechanism the report proposes to ensure participation of communities as defined in the report in data-related policy making: that of data trustees, who will exercise data rights on behalf of the community. ‘In principle,’ the report notes, the data trustee ‘should be the closest and most appropriate representative body for the community concerned’ (para 4.9.II). In practice, the report foresees that this would often be a corresponding government entity. With this, rather than providing communities greater negotiating power in data governance, the report actually exacerbates already existing power inequalities. Where groups that are created through technologies and related practices are concerned, such a move does nothing to increase the transparency necessary to bring the existence of many of these groups forth in the first place. Any resultant harms are thus unlikely to be addressed. Moreover, where both pre-existing communities and groups that are created through technology are concerned, this move would result in a further transfer of power to the state, which would do little to strengthen the rights to self-determination and self-governance, or even simply the voice, of marginalised groups in particular. Rather, it would only further strengthen the unequal power relations between the state and such groups, as policy makers, lobbyists and other powerful actors will firmly remain at the centre of decision making. 36
The concept of data trustees might have value to protect the rights of pre-existing marginalised communities. However, even in these cases, it will only do so if the data trustee is chosen, constituted and governed by members of the community themselves. If we are going to be serious about empowering marginalised communities with their data, these communities themselves need to be centrally involved in any policy touching their data and interests, including on decisions on whether certain data about their community can be collected in the first place, what are the decisions that are to be made and how. To change the social structures and prevent selective participation, it is imperative to move marginalised communities from the fringes to the centre in policy debates. 37 This is important because no person can empower another, people should engage in their own empowerment. 38 Furthermore, mechanisms should be designed to ensure that all sections of the community are adequately represented within the data trustee, as unequal power relations might structure marginalised communities as well.
Developing much stronger participatory structures is not impossible, as the experiment with panchayat raj in India over the past three decades has shown. It may not be perfect, but it has significantly increased the voice of women and sc/st communities in local development. As experiments with indigenous data sovereignty, for example, have shown, the same can be done in the data area as well. Unfortunately the proposals in this report fall short of doing so though.
Where groups that are created by technologies and related practices are concerned, as well as pre-existing groups that are already more powerful, the concept of data trustees will be of little value at all. To ensure more broad public accountability, what is needed are measures such as those already highlighted in the previous section: transparency and the ability to audit and inspect decisions of algorithms and big data analytics by a wide range of actors, the right to give meaningful public input, an obligation on the part of those designing algorithms and big data analytics to justify their decisions, the ability to impose sanctions, and in general, strong regulatory oversight of data-driven decision making. Moreover, such mechanisms are required not only where practices involving what the report defines as community data are concerned, but for public and private non-personal data as well. The report stops short, however, of such truly empowering measures.