Estimators

The Basics

Regression analysis assumes a model for the data generation process of some outcome. For example, the Mincer earnings function assumes that earnings are a function of schooling and experience.

\[\begin{equation} \begin{aligned} \ln(\text{wage}) &= \beta_{0} + \beta_{1}\text{ schooling} + \beta_{2}\text{ experience} + \beta_{3}\text{ (experience)}^{2} + \varepsilon\\\\ \varepsilon &\sim \mathcal{N}\left(\mu, \sigma^{2}\right) \end{aligned} \end{equation}\]

The response variable, earnings, is usually log-normal distributed (Clementi and Gallegati 2005). In addition, earnings are non-negative and assuming the sample has positive earnings a useful transformation is to take the natural log-normal distributed response. Schooling is commonly measured in years of schooling and experience as the number of years in the labor market. The model assumes a polynomial relationship between years of experience and the log of earnings of degree two.

How could we use Econometrics.jl to analyze such a model?

We can use the RDatasets package for accessing various example datasets.

The following line loads packages for the tutorial.

using CSV, RDatasets, Econometrics

We can load the PSID dataset from the Ecdat R package.

data = RDatasets.dataset("Ecdat", "PSID")
data = data[data.Earnings .> 0 .&
            data.Kids .< 98,:] # Only those with earnings and valid number of kids

3652×8 DataFrame. Omitted printing of 1 columns
│ Row  │ IntNum │ PersNum │ Age   │ Educatn │ Earnings │ Hours │ Kids  │
│      │ Int32  │ Int32   │ Int32 │ Int32⍰  │ Int32    │ Int32 │ Int32 │
├──────┼────────┼─────────┼───────┼─────────┼──────────┼───────┼───────┤
│ 1    │ 4      │ 4       │ 39    │ 12      │ 77250    │ 2940  │ 2     │
│ 2    │ 4      │ 6       │ 35    │ 12      │ 12000    │ 2040  │ 2     │
│ 3    │ 4      │ 7       │ 33    │ 12      │ 8000     │ 693   │ 1     │
│ 4    │ 4      │ 173     │ 39    │ 10      │ 15000    │ 1904  │ 2     │
│ 5    │ 5      │ 2       │ 47    │ 9       │ 6500     │ 1683  │ 5     │
│ 6    │ 6      │ 4       │ 44    │ 12      │ 6500     │ 2024  │ 2     │
│ 7    │ 6      │ 172     │ 38    │ 16      │ 7000     │ 1144  │ 3     │
⋮
│ 3645 │ 9284   │ 1       │ 34    │ 12      │ 300      │ 357   │ 2     │
│ 3646 │ 9285   │ 4       │ 41    │ 8       │ 1800     │ 410   │ 98    │
│ 3647 │ 9286   │ 2       │ 45    │ 12      │ 20080    │ 2040  │ 2     │
│ 3648 │ 9292   │ 2       │ 41    │ 6       │ 13000    │ 2448  │ 5     │
│ 3649 │ 9297   │ 2       │ 42    │ 2       │ 3000     │ 1040  │ 4     │
│ 3650 │ 9302   │ 1       │ 37    │ 8       │ 22045    │ 2793  │ 98    │
│ 3651 │ 9305   │ 2       │ 40    │ 6       │ 134      │ 30    │ 3     │
│ 3652 │ 9306   │ 2       │ 37    │ 17      │ 33000    │ 2423  │ 4     │

Baseline model

model = fit(EconometricModel, # Indicates the default model
            @formula(log(Earnings) ~ Educatn + Age + Age^2), # formula
            data)

Continuous Response Model
Number of observations: 3652
Null Loglikelihood: -5696.23
Loglikelihood: -5671.17
R-squared: 0.0136
LR Test: 50.11 ∼ χ²(3) ⟹  Pr > χ² = 0.0000
Formula: log(Earnings) ~ 1 + Educatn + Age + (Age ^ 2)
Variance Covariance Estimator: OIM
────────────────────────────────────────────────────────────────────────────────────
                    PE           SE      t-value  Pr > |t|        2.50%       97.50%
────────────────────────────────────────────────────────────────────────────────────
(Intercept)   7.14834      0.957333      7.46693    <1e-12   5.27138     9.0253     
Educatn       0.00370285   0.00111874    3.30984    0.0009   0.00150944  0.00589627 
Age           0.0934537    0.0493985     1.89183    0.0586  -0.00339765  0.190305   
Age ^ 2      -0.000918383  0.000628258  -1.46179    0.1439  -0.00215015  0.000313389
────────────────────────────────────────────────────────────────────────────────────

Frequency weights

We can also specify observation weights,

model = fit(EconometricModel,
            @formula(log(Earnings) ~ Educatn + Age + Age^2),
            data,
            wts = :PersNum, # frequency weights
            )

Continuous Response Model
Number of observations: 223174
Null Loglikelihood: -6746726.41
Loglikelihood: -349853.81
R-squared: 0.0064
LR Test: 12793745.20 ∼ χ²(3) ⟹  Pr > χ² = 0.0000
Formula: log(Earnings) ~ 1 + Educatn + Age + (Age ^ 2)
Variance Covariance Estimator: OIM
────────────────────────────────────────────────────────────────────────────────────
                    PE           SE      t-value  Pr > |t|         2.50%      97.50%
────────────────────────────────────────────────────────────────────────────────────
(Intercept)   8.50901      0.131474     64.7201     <1e-99   8.25132      8.7667    
Educatn       0.00117042   0.000140001   8.36006    <1e-16   0.000896017  0.00144481
Age           0.0305687    0.00686287    4.45422    <1e-5    0.0171177    0.0440198 
Age ^ 2      -0.000167677  8.8448e-5    -1.89576    0.0580  -0.000341032  5.67929e-6
────────────────────────────────────────────────────────────────────────────────────

Categorical variables

Categorical variables can be passed through a contrast or if the feature is coded as a categorical variables it will be handled as categorical.

model = fit(EconometricModel,
            @formula(log(Earnings) ~ Educatn + Age + Age^2 + Kids),
            data,
            wts = :PersNum,
            contrasts = Dict(:Kids => DummyCoding()))

Continuous Response Model
Number of observations: 223174
Null Loglikelihood: -6746726.41
Loglikelihood: -345253.18
R-squared: 0.0466
LR Test: 12802946.46 ∼ χ²(15) ⟹  Pr > χ² = 0.0000
Formula: log(Earnings) ~ 1 + Educatn + Age + (Age ^ 2) + Kids
Variance Covariance Estimator: OIM
────────────────────────────────────────────────────────────────────────────────────────
                    PE           SE        t-value  Pr > |t|         2.50%        97.50%
────────────────────────────────────────────────────────────────────────────────────────
(Intercept)   8.55057      0.131389      65.0782      <1e-99   8.29306       8.80809    
Educatn       0.000538588  0.000138739    3.88203     0.0001   0.000266663   0.000810512
Age           0.0389003    0.0068679      5.66408     <1e-7    0.0254394     0.0523613  
Age ^ 2      -0.000191192  8.86411e-5    -2.15692     0.0310  -0.000364926  -1.74575e-5 
Kids: 1      -0.132069     0.00828577   -15.9392      <1e-56  -0.148309     -0.115829   
Kids: 2      -0.290339     0.00727643   -39.9013      <1e-99  -0.304601     -0.276078   
Kids: 3      -0.529931     0.00820589   -64.5793      <1e-99  -0.546014     -0.513848   
Kids: 4      -0.710859     0.011204     -63.447       <1e-99  -0.732818     -0.688899   
Kids: 5      -0.491404     0.0180217    -27.2673      <1e-99  -0.526726     -0.456082   
Kids: 6      -1.47718      0.0381049    -38.7663      <1e-99  -1.55187      -1.4025     
Kids: 7      -0.242236     0.0609217     -3.97619     <1e-4   -0.361641     -0.122831   
Kids: 8      -1.53682      0.328191      -4.68271     <1e-5   -2.18007      -0.893577   
Kids: 9      -0.209419     1.13675       -0.184226    0.8538  -2.43741       2.01858    
Kids: 10      0.62745      0.50838        1.23421     0.2171  -0.368963      1.62386    
Kids: 98     -0.71333      0.0186111    -38.3281      <1e-99  -0.749807     -0.676853   
Kids: 99     -1.02093      0.0252724    -40.3971      <1e-99  -1.07046      -0.971397   
────────────────────────────────────────────────────────────────────────────────────────

Absorbing features

If one only care about the estimates of a subset of features and controls such as number of kids (as categorical), one can absorb those features,

model = fit(EconometricModel,
            @formula(log(Earnings) ~ Educatn + Age + Age^2 + absorb(Kids)),
            data,
            wts = :PersNum)

Continuous Response Model
Number of observations: 223174
Null Loglikelihood: -6746726.41
Loglikelihood: -345253.18
R-squared: 0.0466
Wald: 860.46 ∼ F(3, 223158) ⟹ Pr > F = 0.0000
Formula: log(Earnings) ~ 1 + Educatn + Age + (Age ^ 2) + absorb(Kids)
Variance Covariance Estimator: OIM
──────────────────────────────────────────────────────────────────────────────────────
                    PE           SE      t-value  Pr > |t|         2.50%        97.50%
──────────────────────────────────────────────────────────────────────────────────────
(Intercept)   8.23837      0.13135      62.721      <1e-99   7.98093       8.49581    
Educatn       0.000538588  0.000138739   3.88203    0.0001   0.000266663   0.000810512
Age           0.0389003    0.0068679     5.66408    <1e-7    0.0254394     0.0523613  
Age ^ 2      -0.000191192  8.86411e-5   -2.15692    0.0310  -0.000364926  -1.74575e-5 
──────────────────────────────────────────────────────────────────────────────────────

Variance Covariance Estimators

Various variance-covariance estimators are available for continous response models.

For example,

vcov(model, HC0)

4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
  0.000304618  -1.6099e-8    -1.60138e-5    2.07213e-7 
 -1.6099e-8     2.51577e-10   6.23515e-10  -8.35769e-12
 -1.60138e-5    6.23515e-10   8.46239e-7   -1.10009e-8 
  2.07213e-7   -8.35769e-12  -1.10009e-8    1.4367e-10

vcov(model, HC1)

4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
  0.00030464  -1.61001e-8   -1.6015e-5    2.07228e-7 
 -1.61001e-8   2.51595e-10   6.2356e-10  -8.35829e-12
 -1.6015e-5    6.2356e-10    8.463e-7    -1.10017e-8 
  2.07228e-7  -8.35829e-12  -1.10017e-8   1.4368e-10

vcov(model, HC2)

4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
  0.000305269  -1.62242e-8   -1.60486e-5    2.07673e-7 
 -1.62242e-8    2.53522e-10   6.28483e-10  -8.41935e-12
 -1.60486e-5    6.28483e-10   8.48106e-7   -1.10257e-8 
  2.07673e-7   -8.41935e-12  -1.10257e-8    1.43999e-10

vcov(model, HC3)

4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
  0.000305927  -1.63508e-8   -1.60838e-5   2.08138e-7 
 -1.63508e-8    2.55503e-10   6.3349e-10  -8.48145e-12
 -1.60838e-5    6.3349e-10    8.49996e-7  -1.10508e-8 
  2.08138e-7   -8.48145e-12  -1.10508e-8   1.44333e-10

vcov(model, HC4)

4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
  0.000307226  -1.66052e-8   -1.61533e-5    2.09058e-7 
 -1.66052e-8    2.59567e-10   6.43482e-10  -8.60518e-12
 -1.61533e-5    6.43482e-10   8.53731e-7   -1.11003e-8 
  2.09058e-7   -8.60518e-12  -1.11003e-8    1.44993e-10

Longitudinal Data

Longitudinal data is when a sample contains repeated measurements of the unit of observation. This kind of data allows for various estimators that make use of such a structure.

Example,

data = RDatasets.dataset("Ecdat", "Crime") |>
  (data -> select(data, [:County, :Year, :CRMRTE, :PrbConv, :AvgSen, :PrbPris]))

630×6 DataFrame
│ Row │ County │ Year  │ CRMRTE    │ PrbConv  │ AvgSen  │ PrbPris  │
│     │ Int32  │ Int32 │ Float64   │ Float64  │ Float64 │ Float64  │
├─────┼────────┼───────┼───────────┼──────────┼─────────┼──────────┤
│ 1   │ 1      │ 81    │ 0.0398849 │ 0.402062 │ 5.61    │ 0.472222 │
│ 2   │ 1      │ 82    │ 0.0383449 │ 0.433005 │ 5.59    │ 0.506993 │
│ 3   │ 1      │ 83    │ 0.0303048 │ 0.525703 │ 5.8     │ 0.479705 │
│ 4   │ 1      │ 84    │ 0.0347259 │ 0.604706 │ 6.89    │ 0.520104 │
│ 5   │ 1      │ 85    │ 0.036573  │ 0.578723 │ 6.55    │ 0.497059 │
│ 6   │ 1      │ 86    │ 0.0347524 │ 0.512324 │ 6.9     │ 0.439863 │
│ 7   │ 1      │ 87    │ 0.0356036 │ 0.527596 │ 6.71    │ 0.43617  │
⋮
│ 623 │ 195    │ 87    │ 0.0313973 │ 1.67052  │ 13.02   │ 0.470588 │
│ 624 │ 197    │ 81    │ 0.0178621 │ 1.06098  │ 11.35   │ 0.356322 │
│ 625 │ 197    │ 82    │ 0.0180711 │ 0.545455 │ 6.64    │ 0.363636 │
│ 626 │ 197    │ 83    │ 0.0155747 │ 0.480392 │ 7.77    │ 0.428571 │
│ 627 │ 197    │ 84    │ 0.0136619 │ 1.41026  │ 10.11   │ 0.372727 │
│ 628 │ 197    │ 85    │ 0.0130857 │ 0.830769 │ 5.96    │ 0.333333 │
│ 629 │ 197    │ 86    │ 0.012874  │ 2.25     │ 7.68    │ 0.244444 │
│ 630 │ 197    │ 87    │ 0.0141928 │ 1.18293  │ 12.23   │ 0.360825 │

The Between estimator

model = fit(BetweenEstimator, # Indicates the between estimator
            @formula(CRMRTE ~ PrbConv + AvgSen + PrbPris),
            data,
            panel = :County)

Between Estimator
County with 630 groups
Balanced groups with size 7
Number of observations: 90
Null Loglikelihood: 239.56
Loglikelihood: 244.40
R-squared: 0.1029
Wald: 3.29 ∼ F(3, 86) ⟹ Pr > F = 0.0246
Formula: CRMRTE ~ 1 + PrbConv + AvgSen + PrbPris
Variance Covariance Estimator: OIM
───────────────────────────────────────────────────────────────────────────────────
                    PE          SE       t-value  Pr > |t|        2.50%      97.50%
───────────────────────────────────────────────────────────────────────────────────
(Intercept)  -0.00464339   0.0186571   -0.24888     0.8040  -0.0417325   0.0324457 
PrbConv      -0.00354113   0.0018904   -1.87322     0.0644  -0.00729911  0.00021685
AvgSen        0.000848049  0.00116477   0.728086    0.4685  -0.00146743  0.00316353
PrbPris       0.0730299    0.0332057    2.19932     0.0305   0.0070191   0.139041  
───────────────────────────────────────────────────────────────────────────────────

The Within estimator (Fixed Effects)

model = fit(EconometricModel,
            @formula(CRMRTE ~ PrbConv + AvgSen + PrbPris + absorb(County)),
            data)

Continuous Response Model
Number of observations: 630
Null Loglikelihood: 1633.30
Loglikelihood: 2273.95
R-squared: 0.8707
Wald: 0.16 ∼ F(3, 537) ⟹ Pr > F = 0.9258
Formula: CRMRTE ~ 1 + PrbConv + AvgSen + PrbPris + absorb(County)
Variance Covariance Estimator: OIM
──────────────────────────────────────────────────────────────────────────────────────
                   PE           SE        t-value  Pr > |t|         2.50%       97.50%
──────────────────────────────────────────────────────────────────────────────────────
(Intercept)   0.0314527   0.00204638   15.3699       <1e-43   0.0274328    0.0354726  
PrbConv       6.65981e-6  0.0001985     0.0335507    0.9732  -0.000383272  0.000396592
AvgSen        7.83181e-5  0.000127904   0.612318     0.5406  -0.000172936  0.000329572
PrbPris      -0.0013419   0.00405182   -0.331185     0.7406  -0.00930126   0.00661746 
──────────────────────────────────────────────────────────────────────────────────────

Absorbing the panel and temporal indicators

model = fit(EconometricModel,
            @formula(CRMRTE ~ PrbConv + AvgSen + PrbPris + absorb(County + Year)),
            data)

Continuous Response Model
Number of observations: 630
Null Loglikelihood: 1633.30
Loglikelihood: 2290.25
R-squared: 0.8775
Wald: 0.23 ∼ F(3, 531) ⟹ Pr > F = 0.8767
Formula: CRMRTE ~ 1 + PrbConv + AvgSen + PrbPris + absorb(County + Year)
Variance Covariance Estimator: OIM
──────────────────────────────────────────────────────────────────────────────────────
                    PE           SE       t-value  Pr > |t|         2.50%       97.50%
──────────────────────────────────────────────────────────────────────────────────────
(Intercept)   0.0319651    0.00206378   15.4886      <1e-44   0.027911     0.0360193  
PrbConv       7.90362e-5   0.000196089   0.403062    0.6871  -0.00030617   0.000464242
AvgSen       -9.4788e-5    0.000134924  -0.702531    0.4827  -0.000359838  0.000170262
PrbPris       0.000979533  0.0040432     0.242267    0.8087  -0.00696309   0.00892216 
──────────────────────────────────────────────────────────────────────────────────────

The random effects model à la Swamy-Arora

model = fit(RandomEffectsEstimator, # Indicates the random effects estimator
            @formula(CRMRTE ~ PrbConv + AvgSen + PrbPris),
            data,
            panel = :County,
            time = :Year)

One-way Random Effect Model
Longitudinal dataset: County, Year
Balanced dataset with 90 panels of length 7
individual error component: 0.0162
idiosyncratic error component: 0.0071
ρ: 0.8399
Number of observations: 630
Null Loglikelihood: 2225.79
Loglikelihood: 2226.01
R-squared: 0.0007
Wald: 0.15 ∼ F(3, 626) ⟹ Pr > F = 0.9291
Formula: CRMRTE ~ PrbConv + AvgSen + PrbPris
Variance Covariance Estimator: OIM
──────────────────────────────────────────────────────────────────────────────────────
                    PE           SE       t-value  Pr > |t|         2.50%       97.50%
──────────────────────────────────────────────────────────────────────────────────────
(Intercept)   0.0309293    0.00266706   11.5968      <1e-27   0.0256918    0.0361668  
PrbConv      -3.96634e-5   0.000198471  -0.199845    0.8417  -0.000429413  0.000350086
AvgSen        8.2428e-5    0.000127818   0.644887    0.5192  -0.000168575  0.000333431
PrbPris      -0.000123327  0.00404272   -0.030506    0.9757  -0.00806226   0.00781561 
──────────────────────────────────────────────────────────────────────────────────────

Instrumental Variables

Instrumental variables models are estimated using the 2SLS estimator.

model = fit(RandomEffectsEstimator,
            @formula(CRMRTE ~ PrbConv + (AvgSen ~ PrbPris)),
            data,
            panel = :County,
            time = :Year)

One-way Random Effect Model
Longitudinal dataset: County, Year
Balanced dataset with 90 panels of length 7
individual error component: 0.0413
idiosyncratic error component: 0.0074
ρ: 0.9691
Number of observations: 630
Null Loglikelihood: 2268.01
Loglikelihood: 2248.34
R-squared: NaN
Wald: 0.03 ∼ F(2, 626) ⟹ Pr > F = 0.9671
Formula: CRMRTE ~ PrbConv + (AvgSen ~ PrbPris)
Variance Covariance Estimator: OIM
───────────────────────────────────────────────────────────────────────────────────────
                    PE           SE        t-value  Pr > |t|         2.50%       97.50%
───────────────────────────────────────────────────────────────────────────────────────
(Intercept)   0.037747     0.0241684     1.56183      0.1188  -0.00971405   0.085208   
PrbConv       1.39581e-5   0.000200244   0.0697054    0.9445  -0.000379273  0.000407189
AvgSen       -0.000688924  0.00266514   -0.258494     0.7961  -0.00592262   0.00454478 
───────────────────────────────────────────────────────────────────────────────────────

Nominal Response Models

data = joinpath(dirname(pathof(Econometrics)), "..", "data", "insure.csv") |>
   CSV.read |>
   (data -> select(data, [:insure, :age, :male, :nonwhite, :site])) |>
   dropmissing |>
   (data -> categorical!(data, [:insure, :site]))

615×5 DataFrame
│ Row │ insure       │ age     │ male  │ nonwhite │ site         │
│     │ Categorical… │ Float64 │ Int64 │ Int64    │ Categorical… │
├─────┼──────────────┼─────────┼───────┼──────────┼──────────────┤
│ 1   │ Indemnity    │ 73.7221 │ 0     │ 0        │ 2            │
│ 2   │ Prepaid      │ 27.8959 │ 0     │ 0        │ 2            │
│ 3   │ Indemnity    │ 37.5414 │ 0     │ 0        │ 1            │
│ 4   │ Prepaid      │ 23.6413 │ 0     │ 1        │ 3            │
│ 5   │ Prepaid      │ 29.6838 │ 0     │ 0        │ 2            │
│ 6   │ Prepaid      │ 39.4689 │ 0     │ 0        │ 2            │
│ 7   │ Uninsure     │ 63.102  │ 0     │ 1        │ 3            │
⋮
│ 608 │ Prepaid      │ 64.4216 │ 0     │ 1        │ 3            │
│ 609 │ Prepaid      │ 69.9877 │ 0     │ 0        │ 1            │
│ 610 │ Prepaid      │ 71.5209 │ 0     │ 0        │ 3            │
│ 611 │ Prepaid      │ 63.4168 │ 0     │ 0        │ 2            │
│ 612 │ Prepaid      │ 40.3532 │ 0     │ 1        │ 1            │
│ 613 │ Prepaid      │ 55.3237 │ 1     │ 0        │ 3            │
│ 614 │ Prepaid      │ 27.9836 │ 1     │ 0        │ 3            │
│ 615 │ Prepaid      │ 30.59   │ 1     │ 0        │ 3            │

The model automatically detects that the response is nominal and uses the correct model.

model = fit(EconometricModel,
            @formula(insure ~ age + male + nonwhite + site),
            data,
            contrasts = Dict(:insure => DummyCoding(base = "Uninsure")))

Probability Model for Nominal Response
Categories: Uninsure, Indemnity, Prepaid
Number of observations: 615
Null Loglikelihood: -555.85
Loglikelihood: -534.36
R-squared: 0.0387
LR Test: 42.99 ∼ χ²(10) ⟹ Pr > χ² = 0.0000
Formula: insure ~ 1 + age + male + nonwhite + site
───────────────────────────────────────────────────────────────────────────────────────────────────
                                       PE         SE       t-value  Pr > |t|       2.50%     97.50%
───────────────────────────────────────────────────────────────────────────────────────────────────
insure: Indemnity ~ (Intercept)   1.28694     0.59232     2.17271     0.0302   0.123689   2.4502   
insure: Indemnity ~ age           0.00779612  0.0114418   0.681372    0.4959  -0.0146743  0.0302666
insure: Indemnity ~ male         -0.451848    0.367486   -1.22957     0.2193  -1.17355    0.269855 
insure: Indemnity ~ nonwhite     -0.217059    0.425636   -0.509965    0.6103  -1.05296    0.618843 
insure: Indemnity ~ site: 2       1.21152     0.470506    2.57493     0.0103   0.287497   2.13554  
insure: Indemnity ~ site: 3       0.207813    0.366293    0.56734     0.5707  -0.511547   0.927172 
insure: Prepaid ~ (Intercept)     1.55666     0.596327    2.61041     0.0093   0.385533   2.72778  
insure: Prepaid ~ age            -0.00394887  0.0115993  -0.340439    0.7336  -0.0267287  0.018831 
insure: Prepaid ~ male            0.109846    0.365187    0.300793    0.7637  -0.607343   0.827035 
insure: Prepaid ~ nonwhite        0.757718    0.419575    1.80592     0.0714  -0.0662835  1.58172  
insure: Prepaid ~ site: 2         1.32456     0.469789    2.81947     0.0050   0.401941   2.24717  
insure: Prepaid ~ site: 3        -0.380175    0.372819   -1.01973     0.3083  -1.11235    0.352001 
───────────────────────────────────────────────────────────────────────────────────────────────────

Ordinal Response Models

data = RDatasets.dataset("Ecdat", "Kakadu") |>
       (data -> select(data, [:RecParks, :Sex, :Age, :Schooling]))
data.RecParks = convert(Vector{Int}, data.RecParks)
data.RecParks = levels!(categorical(data.RecParks, ordered = true), collect(1:5))
data

1827×4 DataFrame
│ Row  │ RecParks     │ Sex          │ Age   │ Schooling │
│      │ Categorical… │ Categorical… │ Int32 │ Int32     │
├──────┼──────────────┼──────────────┼───────┼───────────┤
│ 1    │ 3            │ male         │ 27    │ 3         │
│ 2    │ 5            │ female       │ 32    │ 4         │
│ 3    │ 4            │ male         │ 32    │ 4         │
│ 4    │ 1            │ female       │ 70    │ 6         │
│ 5    │ 2            │ male         │ 32    │ 5         │
│ 6    │ 3            │ male         │ 47    │ 6         │
│ 7    │ 1            │ male         │ 42    │ 5         │
⋮
│ 1820 │ 3            │ female       │ 32    │ 4         │
│ 1821 │ 4            │ male         │ 21    │ 5         │
│ 1822 │ 3            │ male         │ 21    │ 3         │
│ 1823 │ 3            │ male         │ 52    │ 4         │
│ 1824 │ 2            │ female       │ 21    │ 3         │
│ 1825 │ 5            │ female       │ 21    │ 2         │
│ 1826 │ 1            │ female       │ 21    │ 4         │
│ 1827 │ 4            │ male         │ 21    │ 5         │

The model automatically detects that the response is ordinal and uses the correct model.

model = fit(EconometricModel,
            @formula(RecParks ~ Age + Sex + Schooling),
            data)

Probability Model for Ordinal Response
Categories: 1 < 2 < 3 < 4 < 5
Number of observations: 1827
Null Loglikelihood: -2677.60
Loglikelihood: -2657.40
R-squared: 0.0075
LR Test: 40.42 ∼ χ²(3) ⟹ Pr > χ² = 0.0000
Formula: RecParks ~ Age + Sex + Schooling
──────────────────────────────────────────────────────────────────────────────────────────
                          PE          SE        t-value  Pr > |t|        2.50%      97.50%
──────────────────────────────────────────────────────────────────────────────────────────
Age                  0.00943647  0.00254031    3.71469     0.0002   0.00445424   0.0144187
Sex: male           -0.0151659   0.0846365    -0.179188    0.8578  -0.181161     0.150829 
Schooling           -0.103902    0.0248742    -4.17711     <1e-4   -0.152687    -0.0551174
(Intercept): 1 | 2  -2.92405     0.191927    -15.2352      <1e-48  -3.30047     -2.54763  
(Intercept): 2 | 3  -1.54922     0.171444     -9.03632     <1e-18  -1.88547     -1.21297  
(Intercept): 3 | 4  -0.298938    0.166904     -1.79108     0.0734  -0.626281     0.0284051
(Intercept): 4 | 5   0.669835    0.167622      3.9961      <1e-4    0.341083     0.998587 
──────────────────────────────────────────────────────────────────────────────────────────