This article is the first instalment of a two-part series by Mohit Chawdhry of Esya Centre on the new government-led regulatory framework proposed in the Centre’s Non-Personal Data Report. Click here to read part two.
A recent article on The Bastion spoke of India’s aspirations to emerge as a global leader in digital policy in light of the government’s initial ban of 59 Chinese apps. The state took a further step in this direction by appointing a committee to examine the regulation of non-personal data (NPD) in India.
Interestingly, the NPD of users, unlike their ‘identifiable’ personal data, hasn’t been the subject of much regulation in other countries. The EU is the only other country/bloc which has specifically regulated NPD, and that too for the limited purpose of increasing cross border flows of data across Europe’s data economy.
The Committee’s report, on the other hand, establishes a broad and holistic regulatory framework of NPD with the objective of protecting privacy. Why? One reason for this, as Shashank Mohan has previously noted, is that even NPD can be used to algorithmically profile groups of people by Big Tech companies, raising questions of the privacy of users.
How well do the Non-Personal Data Report’s recommendations meet up to the challenges of regulating non-personal #Data, amidst a regulatory lacuna for the same? @Shshk of @CCGNLUD explores:https://t.co/z901KnH2nH
— The Bastion (@_thebastion_) September 19, 2020
If you’ve glanced at the headlines emanating from the tech world over the last couple of months, you’ve probably seen this too, given the repeated use of the word ‘antitrust’. Whether in reference to the App Store’s policies being challenged by developers such as Epic, or the deposition by leaders of the 4 ‘Big Tech’ firms before the US government, antitrust has emerged as one of the key concerns for digital and data-based businesses. Clearly, that data can also hinder the competitiveness and fairness of the overall market is a major concern too. How so? Take a search engine, as an example.
Initially, the search engine is used by a limited number of people. As more people start to use it, it is provided with more data regarding what people search for, and what results they click on first. It can use this data to improve its performance, by suggesting better search queries or displaying more relevant results to the user. Due to this improved performance, more users will gradually start using the platform, providing it with even more data.
This shows us that data as a resource displays network effects; that is, its value increases when more people use a service. Sufficient research also tells us that the performance of a search engine can increase markedly depending on the size and variety of data it is trained on. Hence, data also displays economies of scale and scope.
A combination of these characteristics can lead to situations where an entity, which has cornered a large segment of the market, can begin to abuse its position by restricting the entry of new entrants and preventing access to essential datasets. This obviously leads to less than desirable outcomes, as new companies are disincentivised from entering markets and innovating new products and services. Eventually, the customer suffers as they have very limited options to choose from. That’s why alongside the other 3 ‘Big Tech’ giants, companies like Google are coming under criticism.
The Committee does well to recognize the concerns of monopolistic tech companies in India, a market failure that can be corrected by some regulatory intervention. It believes that this can be achieved by implementing a mandatory data-sharing framework wherein start-ups, Indian businesses, and research entities will have access to certain kinds of non-personal data used by bigger companies, both Indian and foreign.
🦄 takes 7 years pic.twitter.com/PjcEeI6Car
— pj (@BeingPractical) September 20, 2020
Younger companies will be able to use this data to build new and innovative solutions for consumers to enjoy. So, not only will mandatory data sharing check the dominance of bigger firms, it will also, ideally, spur innovations and boost start-ups in India.
Fostering ‘Competition’ By Increasing Access?
The question now is: is fostering innovation through ‘easy access’ to data really that simple a solution?
Assume there’s a Company X, which provides a platform for booking cabs online. Company X invests heavily in setting up an app and data pipelines to gain insights on how its customers are travelling, when they are likely to book a cab, and which vehicle model is most preferred. It uses this data to continuously innovate and develop its service.
Now if the Committee’s NPD framework were in place, the Government, a research institution, or even a cab aggregation start-up could request access to the raw and metadata, underlying the NPD collected by X, possibly without remuneration. What does this mean?
Raw data usually refers to data which has not been processed or labelled. Metadata is the data underlying the actual data, which gives us information about it–like a label. For example, it could tell us who the author of the data is, or when it was collected. However, scholars have also pointed out that ‘raw data’ is a misnomer as all data is collected in a specific context with specific objectives in mind: it never really is ‘raw’ or ‘unstructured’.
“There is no such thing as ‘raw data,’ and hence, no pure…unbiased data. #Data is always historical and, as such, is the repository of historical #bias“. Some ways of understanding data “account for its origin and relation to history” and some don’t. #AI https://t.co/vvNRWqbgWL
— Dorothea Baur (@DorotheaBaur) August 31, 2020
By mandating the sharing of such data, the Committee seems to wrongly assume that NPD consisting of raw and metadata is of little economic significance to X, and that it will continue to invest in data collection despite the mandatory sharing it recommends. As illustrated above, X has made significant investments in order to collect this data and organise it, even if only into ‘raw’ datasets. If all of this data were to be made freely available to X’s competitors, it seems unlikely that X would continue to invest in such collection and processing.
The Committee also assumes that raw and metadata is not protected by existing commercial rights, such as Copyright and trade secrets. It is well established that datasets that satisfy requirements of ‘originality’ and ‘creativity’ are protected as literary works under Copyright law.
So, it is conceivable that a dataset consisting of ‘raw’ data could qualify for Copyright protection owing to its original or unique arrangement of data points, collected by a specific entity for a specific service or purpose. In this situation, mandatory sharing conflicts directly with the proprietary rights that the collecting entity has over the dataset. The Committee provides no clarity on how this conflict between ‘free access’ and commercial rights and interests would be resolved–providing little incentive for precious trade secrets to be traded across company lines.
Even if we were to ignore these contradictions inherent to the NPD framework, does making vast amounts of data available spur innovation by itself?
Making Use of NPD
As we highlighted above, data is contextual and purpose-specific. It is collected in a manner that helps the agency collecting it; and so, it may not be of any significant value to another entity that is given access to it. Second, access to data is of little use if you lack the capability to do something meaningful with it. This means that Indian entities need to be equipped with the relevant skills and infrastructure, in the form of storage space and computational resources, to effectively make use of the NPD they now have access to. While the Committee report does briefly discuss these issues, it could have used the NPD framework as an opportunity to focus on the need for capacity building and infrastructure augmentation.
Now, let’s suppose the Non-Personal Data sharing framework clarifies the overlap with commercial rights and also that Indian entities possess the requisite skill and infra to use shared data. In this situation, mandatory sharing should be easy to implement, right? Well, not exactly.
This is because it’s very hard to assign a value to data as an asset: the possible uses of any given dataset are continuously evolving, making it difficult to estimate what it’s value is at any given point in time. If a government agency is tasked with determining the value at which datasets are to be shared, it could potentially disrupt the entire market. We’ve recently seen how the approach of the Department of Telecommunications and Supreme Court towards determining AGR dues has roiled the telecom sector and almost pushed it towards monopolisation.
So, on closer examination, it appears that there isn’t a clear and convincing case for creating a new government-led regulatory framework. This assertion is further supported by research which has shown that regulating uncertain markets, such as the current data market, can in fact deter innovation. If a regulatory framework is unnecessary, then what else can be done to address the very real concerns emerging from monopolistic controls over data in India?
Featured image: courtesy of Daniel Aleksandersen, Ctrl.blog. | Views expressed are personal.