Is Predicted Data a Viable Alternative to Real Data?

It is costly to collect the household- and individual-level data that underlie official estimates of poverty and health. For this reason, developing countries often do not have the budget to update estimates of poverty and health regularly, even though these estimates are most needed there. One way...

Full description

Bibliographic Details
Main Authors: Fujii, Tomoki, van der Weide, Roy
Format: Journal Article
Published: Published by Oxford University Press on behalf of the World Bank 2021
Subjects:
Online Access:http://hdl.handle.net/10986/36720
id okr-10986-36720
recordtype oai_dc
spelling okr-10986-367202021-12-11T05:10:39Z Is Predicted Data a Viable Alternative to Real Data? Fujii, Tomoki van der Weide, Roy PREDICTION DOUBLE SAMPLING SURVEY COSTS POVERTY It is costly to collect the household- and individual-level data that underlie official estimates of poverty and health. For this reason, developing countries often do not have the budget to update estimates of poverty and health regularly, even though these estimates are most needed there. One way to reduce the financial burden is to substitute some of the real data with predicted data by means of double sampling, where the expensive outcome variable is collected for a subsample and its predictors for all. This study finds that double sampling yields only modest reductions in financial costs when imposing a statistical precision constraint in a wide range of realistic empirical settings. There are circumstances in which the gains can be more substantial, but these denote the exception rather than the rule. The recommendation is to rely on real data whenever there is a need for new data and to use prediction estimators to leverage existing data. 2021-12-10T18:03:30Z 2021-12-10T18:03:30Z 2020-06 Journal Article World Bank Economic Review 1564-698X http://hdl.handle.net/10986/36720 CC BY-NC-ND 3.0 IGO http://creativecommons.org/licenses/by-nc-nd/3.0/igo World Bank Published by Oxford University Press on behalf of the World Bank Publications & Research Publications & Research :: Journal Article
repository_type Digital Repository
institution_category Foreign Institution
institution Digital Repositories
building World Bank Open Knowledge Repository
collection World Bank
topic PREDICTION
DOUBLE SAMPLING
SURVEY COSTS
POVERTY
spellingShingle PREDICTION
DOUBLE SAMPLING
SURVEY COSTS
POVERTY
Fujii, Tomoki
van der Weide, Roy
Is Predicted Data a Viable Alternative to Real Data?
description It is costly to collect the household- and individual-level data that underlie official estimates of poverty and health. For this reason, developing countries often do not have the budget to update estimates of poverty and health regularly, even though these estimates are most needed there. One way to reduce the financial burden is to substitute some of the real data with predicted data by means of double sampling, where the expensive outcome variable is collected for a subsample and its predictors for all. This study finds that double sampling yields only modest reductions in financial costs when imposing a statistical precision constraint in a wide range of realistic empirical settings. There are circumstances in which the gains can be more substantial, but these denote the exception rather than the rule. The recommendation is to rely on real data whenever there is a need for new data and to use prediction estimators to leverage existing data.
format Journal Article
author Fujii, Tomoki
van der Weide, Roy
author_facet Fujii, Tomoki
van der Weide, Roy
author_sort Fujii, Tomoki
title Is Predicted Data a Viable Alternative to Real Data?
title_short Is Predicted Data a Viable Alternative to Real Data?
title_full Is Predicted Data a Viable Alternative to Real Data?
title_fullStr Is Predicted Data a Viable Alternative to Real Data?
title_full_unstemmed Is Predicted Data a Viable Alternative to Real Data?
title_sort is predicted data a viable alternative to real data?
publisher Published by Oxford University Press on behalf of the World Bank
publishDate 2021
url http://hdl.handle.net/10986/36720
_version_ 1764485767822835712