Overview
This vignette provides a fully documented production pipeline, describing the data, processing, validation, aggregation and estimation of the multiple data sources for the purpose of constructing subnational PPPs in the testing countries. Data processing, validation, aggregation, and estimation follow the international recommendations ICP (2021).
The production pipeline covers the country-level data validation, the aggregation and estimation of subnational purchasing power parities and the harmonisation of these estimates to make sPPP indicators comparable across countries (Table 1). Only the raw data validation, which derives a standard structure across all testing countries, remains country- and dataset-specific. The individual stages of the production pipeline are discussed in the subsequent paragraphs.
| Step | Counterpart | Uses OECDsppps
|
|---|---|---|
| Raw data processing | OECD or country | No |
| Raw data validation | OECD or country | Yes |
| Estimation at basic-heading level | OECD or country | Yes |
| Validation of estimation at basic-heading level | OECD or country | Yes |
| Estimation beyond the basic-heading level | OECD | Yes |
| Validation of estimation beyond the basic-heading level | OECD | Yes |
| Comparability across countries and regions | OECD | Yes |
1 Raw data processing
The objective of the raw data processing is to derive a standard structure across all testing countries. Data are sourced primarily from official CPI programmes of National Statistical Offices (NSOs) and are country-specific. Consequently, data cleaning is country- and data-specific, and typically the most time-consuming part of the initial data work, as data can be available at different levels of granularity (spatial and product-related), content (available variables and information), and coverage (e.g., products, types of activity, etc.).
The data processing takes the raw (unprocessed) CPI microdata. It ensures that product characteristics, as well as the observed quantities and measurement units of the observed price quotes, are harmonised, enabling a like-for-like comparison of products across regions. See Table 2 for a stylised example based on Weinand and Auer (Weinand and Auer 2020).
| Region | Outlet | Quantity observed | Measurement unit of observed quantity | Product characteristics | Price observed |
|---|---|---|---|---|---|
| A | Supermarket | 1 | Kilograms | “Bens, basmati, bag” | 1.69 |
| C | Supermarket | 500 | Grams | “Ben’s, basmati, bulk” | 0.79 |
| B | Supermarket | 0.5 | Kilograms | “Ben, basm., bulk” | 0.69 |
In addition to harmonising the individual price quotes, initial data processing also classifies the individual items or projects according to their respective COICOP subclasses. Once a common structure is established, harmonised data processing using OECDsppps commences with the data validation.
2 Raw Data validation
Data validation is carried out to confirm the validity of price statistics at various levels of aggregation, from the initial item-level price quotes to the basic heading level and upwards, as well as comparing household expenditure weights across regions.
Validation begins with analysing item-level prices within regions and involves outlier detections of single price quotes and average price aggregates. The two validation steps taken at this stage are described in the Validation vignette:
- Intra-regional validation analyses individual and aggregate price quotes within the same region and across regions of the same country
- Inter-regional validation performs prices validation across all regions and countries, ensuring that average prices are based on comparable products in regions across countries and that products have been accurately priced.
The raw data validation of alternative data sources is also carried out at this stage
3 Estimation at basic-heading level
Estimation of basic headings using item-level prices, where price data are aggregated up to the level of basic headings, generally without the use of expenditure weights.
The estimation is carried out using estim_cpd() with argument output = "Full", which summarises the key information of the estimate CPD model: It provides the ‘Regression output` as well as the individual ’Residuals’ of the CPD regression; see EXAMPLE 4 in the Estimation vignette.
4 Validation of estimation at basic-heading level
5 Validation of estimation beyond the basic-heading level
🚧 Work in progress.
6 Estimation beyond the basic-heading level
🚧 Work in progress.
7 Comparability across countries and regions
🚧 Work in progress.
