In a previous blog, we explored some common problems data scientists encounter when collecting and analyzing data. In the accompanying Red Flags Explainer, we drew on our experience of building and analysing datasets of government procurement over the past ten years to answer some Frequently Asked Questions about our work, explaining some of the challenges and what can be done to fix or work around them. Some of these challenges are made more explicit in our most recent report, “India’s Federal Procurement Data Infrastructure: Observations and Recommendations.”
In India, according to the Ministry of Finance General Financial Rules (2017), all procuring authorities are responsible and accountable for ensuring transparency, fairness, equality, competition, and appeal rights in contracting. The transparency principle is about making information easily accessible to the public: it prescribes that all procuring entities should ensure the publication of all relevant information on the Central Public Procurement Portal (CPPP).
Despite the General Financial Rules’ formal requirement for transparency, we found that the Indian federal public procurement data that we could collect from public sources was insufficient for robust analysis. Besides a number of technical difficulties, the key problem is that many contract awards are not published; their publication seems not to be monitored or enforced and most contract awards are missing. This makes rigorous analysis impossible, since it is likely that our sample is biased and, moreover, it is impossible to determine the nature of any bias.
Given the Indian government’s commitment to the transparency principle, the report seeks to inform future reforms by providing: (1) a description of our data collection efforts and our (incomplete) dataset; (2) our observations on the current data infrastructure; and (3) a set of recommendations for how to make the data more accessible and usable for analysis in the future.
These recommendations are as follows:
- Make the publication of contract awards mandatory throughout the federal public procurement system and communicate the requirement to all stakeholders.
- Monitor and enforce clear rules for procuring entities to collect and publish relevant public procurement data in a consistent and timely manner, including publication of contract awards.
- Publish all data in one place (ideally the CPPP website) in machine-readable format (e.g., CSV, JSON, XML) to improve usability. Users should also be able to download data in bulk either as CSV or through an API.
- Use unique standardised IDs for all tender announcements and contract-award notices to ensure that they can be linked.
- Use unique standardised IDs for organisations—both buyers and suppliers—in addition to their names.
- Collect information on more details of the tender process and in standardised formats (e.g., detailed product codes and structured addresses).
- Publish information on amendments, modifications, and failed tenders in a structured and reliable format so that up-to-date information is available on all tenders.
- Facilitate matching with other public datasets (e.g., it should be possible to match procurement data with budgets or other public financial management data, company registry data, court rulings).