Hurdle model

A hurdle model is a class of statistical models where a random variable is modelled using two parts, the first which is the probability of attaining value 0, and the second part models the probability of the non-zero values. The use of hurdle models are often motivated by an excess of zeroes in the data, that is not sufficiently accounted for in more standard statistical models.

In a hurdle model, a random variable x is modelled as

\Pr(x=0)=\theta

\Pr(x\neq 0)=p_{x\neq 0}(x)

where $p_{x\neq 0}(x)$ is a truncated probability distribution function, truncated at 0.

Hurdle models were introduced by John G. Cragg in 1971,^[1] where the non-zero values of x were modelled using a normal model, and a probit model was used to model the zeros. The probit part of the model was said to model the presence of "hurdles" that must be overcome for the values of x to attain non-zero values, hence the designation hurdle model. Hurdle models were later developed for count data, with Poisson, geometric,^[2] and negative binomial^[3] models for the non-zero counts .

Relationship with zero-inflated models

Hurdle models differ from zero-inflated models in that zero-inflated models model the zeros using a two-component mixture model. With a mixture model, the probability of the variable being zero is determined by both the main distribution function $p(x=0)$ and the mixture weight $\pi$ . Specifically, a zero-inflated model for a random variable x is

\Pr(x=0)=\pi +(1-\pi )\times p(x=0)

\Pr(x=h_{i})=(1-\pi )\times p(x=h_{i})

where $\pi$ is the mixture weight that determines the amount of zero-inflation. A zero-inflated model can only increase the probability of $\Pr(x=0)$ , but this is not a restriction in hurdle models.^[4]

References

^ Cragg, John G. (1971). "Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods". Econometrica. 39 (5): 829–844. doi:10.2307/1909582. JSTOR 1909582.
^ Mullahy, John (1986). "Specification and testing of some modified count data models". Journal of Econometrics. 33 (3): 341–365. doi:10.1016/0304-4076(86)90002-3.
^ Welsh, A. H.; Cunningham, R. B.; Donnelly, C. F.; Lindenmayer, D. B. (1996). "Modelling the abundance of rare species: statistical models for counts with extra zeros". Ecological Modelling. 88 (1–3): 297–308. doi:10.1016/0304-3800(95)00113-1.
^ Min, Yongyi; Agresti, Alan (2005). "Random effect models for repeated measures of zero-inflated count data". Statistical Modelling. 5 (1): 1–19. CiteSeerX 10.1.1.296.3503. doi:10.1191/1471082X05st084oa. S2CID 2400918.

[1] Cragg, John G. (1971). "Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods". Econometrica. 39 (5): 829–844. doi:10.2307/1909582. JSTOR 1909582.

[2] Mullahy, John (1986). "Specification and testing of some modified count data models". Journal of Econometrics. 33 (3): 341–365. doi:10.1016/0304-4076(86)90002-3.

[3] Welsh, A. H.; Cunningham, R. B.; Donnelly, C. F.; Lindenmayer, D. B. (1996). "Modelling the abundance of rare species: statistical models for counts with extra zeros". Ecological Modelling. 88 (1–3): 297–308. doi:10.1016/0304-3800(95)00113-1.

[4] Min, Yongyi; Agresti, Alan (2005). "Random effect models for repeated measures of zero-inflated count data". Statistical Modelling. 5 (1): 1–19. CiteSeerX 10.1.1.296.3503. doi:10.1191/1471082X05st084oa. S2CID 2400918.

[1]

[2]

[3]

[4]

Relationship with zero-inflated models

See also

References