Flow sampling

In statistics, in flow sampling, as opposed to stock sampling, observations are collected as they enter the particular state of interest during a particular interval.^[1] When dealing with duration data (such as employment spells or mortality outcomes), the data sampling method has a direct impact on subsequent analyses and inference. An example in demography would be sampling the number of people who die within a given time frame (e.g. a specific calendar year); a popular example in economics would be the number of people leaving unemployment within a given time frame (e.g. a specific quarter).^[2] Researchers imposing similar assumptions but using different sampling methods, can reach fundamentally different conclusions if the joint distribution across the flow and stock samples differ.^[3]

Typically, flow samples suffer from right censoring. After a certain amount of time, as the sampling interval ends, the individuals in the sample are not followed any longer, outcomes are recorded and the data is analyzed. In the unemployment example outlined above, we observe the exact duration for individuals leaving unemployment within the time frame. For people that haven't left unemployment yet, we only observe the lower bound of the unemployment spell.^[4] The difference between stock and flow sampling can also help explain why certain statistics that measure similar duration measures can differ in important ways. Consider, for instance, the Average Interrupted Duration (AID), the average period for which people that are currently unemployed have been unemployed, and ACD, the average duration of the complete unemployment spell for employed people. Salant shows that heterogeneity in hazard rates between the stock and the flow distribution provides a key to understanding why these two statistics differ. For instance, if the probability of getting a job offer goes down with time unemployed, E[T] < E[S], where S and T stand for observed and actual duration respectively.^[2]

Renewal theory is the appropriate tool for handling these issues,^[1] and a wide range of estimators have been proposed. These estimators range from fully parametric models such as the Mixed Proportional Hazard model,^[5] to nonparametric and semiparametric methods.^[6]

References

^ ^a ^b Cameron A. C. and P. K. Trivedi (2005): Microeconometrics: Methods and Applications. Cambridge University Press, New York.
^ ^a ^b Salant, S. (1977): Search Theory and Duration Data: A Theory of Sorts. The Quarterly Journal of Economics, 91(1), pp. 39–57.
^ Chesher, A. and T. Lancaster (1981): Stock and Flow Sampling. Economics Letters 8(1), pp. 63–65.
^ Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.
^ Lancaster, T. (1979): Econometric Methods for the Duration of Unemployment. Econometrica 47(4), pp. 939–956.
^ Hausman, J. A. and T. Woutersen (2014), Estimating a semi-parametric duration model without specifying heterogeneity. Journal of Econometrics 178(1), pp. 114–131.

[auto-1] Cameron A. C. and P. K. Trivedi (2005): Microeconometrics: Methods and Applications. Cambridge University Press, New York.

[auto1-2] Salant, S. (1977): Search Theory and Duration Data: A Theory of Sorts. The Quarterly Journal of Economics, 91(1), pp. 39–57.

[3] Chesher, A. and T. Lancaster (1981): Stock and Flow Sampling. Economics Letters 8(1), pp. 63–65.

[4] Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.

[5] Lancaster, T. (1979): Econometric Methods for the Duration of Unemployment. Econometrica 47(4), pp. 939–956.

[6] Hausman, J. A. and T. Woutersen (2014), Estimating a semi-parametric duration model without specifying heterogeneity. Journal of Econometrics 178(1), pp. 114–131.

[1]

[2]

[3]

[4]

[5]

[6]