Estimators
The Basics
Regression analysis assumes a model for the data generation process of some outcome. For example, the Mincer earnings function assumes that earnings are a function of schooling and experience.
The response variable, earnings, is usually log-normal distributed (Clementi and Gallegati 2005). In addition, earnings are non-negative and assuming the sample has positive earnings a useful transformation is to take the natural log-normal distributed response. Schooling is commonly measured in years of schooling and experience as the number of years in the labor market. The model assumes a polynomial relationship between years of experience and the log of earnings of degree two.
How could we use Econometrics.jl to analyze such a model?
We can use the RDatasets package for accessing various example datasets.
The following line loads packages for the tutorial.
using CSV, RDatasets, Econometrics
We can load the PSID dataset from the Ecdat R package.
data = RDatasets.dataset("Ecdat", "PSID")
data = data[data.Earnings .> 0 .&
data.Kids .< 98,:] # Only those with earnings and valid number of kids
3652×8 DataFrame. Omitted printing of 1 columns
│ Row │ IntNum │ PersNum │ Age │ Educatn │ Earnings │ Hours │ Kids │
│ │ Int32 │ Int32 │ Int32 │ Int32⍰ │ Int32 │ Int32 │ Int32 │
├──────┼────────┼─────────┼───────┼─────────┼──────────┼───────┼───────┤
│ 1 │ 4 │ 4 │ 39 │ 12 │ 77250 │ 2940 │ 2 │
│ 2 │ 4 │ 6 │ 35 │ 12 │ 12000 │ 2040 │ 2 │
│ 3 │ 4 │ 7 │ 33 │ 12 │ 8000 │ 693 │ 1 │
│ 4 │ 4 │ 173 │ 39 │ 10 │ 15000 │ 1904 │ 2 │
│ 5 │ 5 │ 2 │ 47 │ 9 │ 6500 │ 1683 │ 5 │
│ 6 │ 6 │ 4 │ 44 │ 12 │ 6500 │ 2024 │ 2 │
│ 7 │ 6 │ 172 │ 38 │ 16 │ 7000 │ 1144 │ 3 │
⋮
│ 3645 │ 9284 │ 1 │ 34 │ 12 │ 300 │ 357 │ 2 │
│ 3646 │ 9285 │ 4 │ 41 │ 8 │ 1800 │ 410 │ 98 │
│ 3647 │ 9286 │ 2 │ 45 │ 12 │ 20080 │ 2040 │ 2 │
│ 3648 │ 9292 │ 2 │ 41 │ 6 │ 13000 │ 2448 │ 5 │
│ 3649 │ 9297 │ 2 │ 42 │ 2 │ 3000 │ 1040 │ 4 │
│ 3650 │ 9302 │ 1 │ 37 │ 8 │ 22045 │ 2793 │ 98 │
│ 3651 │ 9305 │ 2 │ 40 │ 6 │ 134 │ 30 │ 3 │
│ 3652 │ 9306 │ 2 │ 37 │ 17 │ 33000 │ 2423 │ 4 │
Baseline model
model = fit(EconometricModel, # Indicates the default model
@formula(log(Earnings) ~ Educatn + Age + Age^2), # formula
data)
Continuous Response Model
Number of observations: 3652
Null Loglikelihood: -5696.23
Loglikelihood: -5671.17
R-squared: 0.0136
LR Test: 50.11 ∼ χ²(3) ⟹ Pr > χ² = 0.0000
Formula: log(Earnings) ~ 1 + Educatn + Age + (Age ^ 2)
Variance Covariance Estimator: OIM
────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
────────────────────────────────────────────────────────────────────────────────────
(Intercept) 7.14834 0.957333 7.46693 <1e-12 5.27138 9.0253
Educatn 0.00370285 0.00111874 3.30984 0.0009 0.00150944 0.00589627
Age 0.0934537 0.0493985 1.89183 0.0586 -0.00339765 0.190305
Age ^ 2 -0.000918383 0.000628258 -1.46179 0.1439 -0.00215015 0.000313389
────────────────────────────────────────────────────────────────────────────────────
Frequency weights
We can also specify observation weights,
model = fit(EconometricModel,
@formula(log(Earnings) ~ Educatn + Age + Age^2),
data,
wts = :PersNum, # frequency weights
)
Continuous Response Model
Number of observations: 223174
Null Loglikelihood: -6746726.41
Loglikelihood: -349853.81
R-squared: 0.0064
LR Test: 12793745.20 ∼ χ²(3) ⟹ Pr > χ² = 0.0000
Formula: log(Earnings) ~ 1 + Educatn + Age + (Age ^ 2)
Variance Covariance Estimator: OIM
────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
────────────────────────────────────────────────────────────────────────────────────
(Intercept) 8.50901 0.131474 64.7201 <1e-99 8.25132 8.7667
Educatn 0.00117042 0.000140001 8.36006 <1e-16 0.000896017 0.00144481
Age 0.0305687 0.00686287 4.45422 <1e-5 0.0171177 0.0440198
Age ^ 2 -0.000167677 8.8448e-5 -1.89576 0.0580 -0.000341032 5.67929e-6
────────────────────────────────────────────────────────────────────────────────────
Categorical variables
Categorical variables can be passed through a contrast or if the feature is coded as a categorical variables it will be handled as categorical.
model = fit(EconometricModel,
@formula(log(Earnings) ~ Educatn + Age + Age^2 + Kids),
data,
wts = :PersNum,
contrasts = Dict(:Kids => DummyCoding()))
Continuous Response Model
Number of observations: 223174
Null Loglikelihood: -6746726.41
Loglikelihood: -345253.18
R-squared: 0.0466
LR Test: 12802946.46 ∼ χ²(15) ⟹ Pr > χ² = 0.0000
Formula: log(Earnings) ~ 1 + Educatn + Age + (Age ^ 2) + Kids
Variance Covariance Estimator: OIM
────────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
────────────────────────────────────────────────────────────────────────────────────────
(Intercept) 8.55057 0.131389 65.0782 <1e-99 8.29306 8.80809
Educatn 0.000538588 0.000138739 3.88203 0.0001 0.000266663 0.000810512
Age 0.0389003 0.0068679 5.66408 <1e-7 0.0254394 0.0523613
Age ^ 2 -0.000191192 8.86411e-5 -2.15692 0.0310 -0.000364926 -1.74575e-5
Kids: 1 -0.132069 0.00828577 -15.9392 <1e-56 -0.148309 -0.115829
Kids: 2 -0.290339 0.00727643 -39.9013 <1e-99 -0.304601 -0.276078
Kids: 3 -0.529931 0.00820589 -64.5793 <1e-99 -0.546014 -0.513848
Kids: 4 -0.710859 0.011204 -63.447 <1e-99 -0.732818 -0.688899
Kids: 5 -0.491404 0.0180217 -27.2673 <1e-99 -0.526726 -0.456082
Kids: 6 -1.47718 0.0381049 -38.7663 <1e-99 -1.55187 -1.4025
Kids: 7 -0.242236 0.0609217 -3.97619 <1e-4 -0.361641 -0.122831
Kids: 8 -1.53682 0.328191 -4.68271 <1e-5 -2.18007 -0.893577
Kids: 9 -0.209419 1.13675 -0.184226 0.8538 -2.43741 2.01858
Kids: 10 0.62745 0.50838 1.23421 0.2171 -0.368963 1.62386
Kids: 98 -0.71333 0.0186111 -38.3281 <1e-99 -0.749807 -0.676853
Kids: 99 -1.02093 0.0252724 -40.3971 <1e-99 -1.07046 -0.971397
────────────────────────────────────────────────────────────────────────────────────────
Absorbing features
If one only care about the estimates of a subset of features and controls such as number of kids (as categorical), one can absorb those features,
model = fit(EconometricModel,
@formula(log(Earnings) ~ Educatn + Age + Age^2 + absorb(Kids)),
data,
wts = :PersNum)
Continuous Response Model
Number of observations: 223174
Null Loglikelihood: -6746726.41
Loglikelihood: -345253.18
R-squared: 0.0466
Wald: 860.46 ∼ F(3, 223158) ⟹ Pr > F = 0.0000
Formula: log(Earnings) ~ 1 + Educatn + Age + (Age ^ 2) + absorb(Kids)
Variance Covariance Estimator: OIM
──────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
──────────────────────────────────────────────────────────────────────────────────────
(Intercept) 8.23837 0.13135 62.721 <1e-99 7.98093 8.49581
Educatn 0.000538588 0.000138739 3.88203 0.0001 0.000266663 0.000810512
Age 0.0389003 0.0068679 5.66408 <1e-7 0.0254394 0.0523613
Age ^ 2 -0.000191192 8.86411e-5 -2.15692 0.0310 -0.000364926 -1.74575e-5
──────────────────────────────────────────────────────────────────────────────────────
Variance Covariance Estimators
Various variance-covariance estimators are available for continous response models.
For example,
vcov(model, HC0)
4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
0.000304618 -1.6099e-8 -1.60138e-5 2.07213e-7
-1.6099e-8 2.51577e-10 6.23515e-10 -8.35769e-12
-1.60138e-5 6.23515e-10 8.46239e-7 -1.10009e-8
2.07213e-7 -8.35769e-12 -1.10009e-8 1.4367e-10
vcov(model, HC1)
4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
0.00030464 -1.61001e-8 -1.6015e-5 2.07228e-7
-1.61001e-8 2.51595e-10 6.2356e-10 -8.35829e-12
-1.6015e-5 6.2356e-10 8.463e-7 -1.10017e-8
2.07228e-7 -8.35829e-12 -1.10017e-8 1.4368e-10
vcov(model, HC2)
4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
0.000305269 -1.62242e-8 -1.60486e-5 2.07673e-7
-1.62242e-8 2.53522e-10 6.28483e-10 -8.41935e-12
-1.60486e-5 6.28483e-10 8.48106e-7 -1.10257e-8
2.07673e-7 -8.41935e-12 -1.10257e-8 1.43999e-10
vcov(model, HC3)
4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
0.000305927 -1.63508e-8 -1.60838e-5 2.08138e-7
-1.63508e-8 2.55503e-10 6.3349e-10 -8.48145e-12
-1.60838e-5 6.3349e-10 8.49996e-7 -1.10508e-8
2.08138e-7 -8.48145e-12 -1.10508e-8 1.44333e-10
vcov(model, HC4)
4×4 LinearAlgebra.Hermitian{Float64,Array{Float64,2}}:
0.000307226 -1.66052e-8 -1.61533e-5 2.09058e-7
-1.66052e-8 2.59567e-10 6.43482e-10 -8.60518e-12
-1.61533e-5 6.43482e-10 8.53731e-7 -1.11003e-8
2.09058e-7 -8.60518e-12 -1.11003e-8 1.44993e-10
Longitudinal Data
Longitudinal data is when a sample contains repeated measurements of the unit of observation. This kind of data allows for various estimators that make use of such a structure.
Example,
data = RDatasets.dataset("Ecdat", "Crime") |>
(data -> select(data, [:County, :Year, :CRMRTE, :PrbConv, :AvgSen, :PrbPris]))
630×6 DataFrame
│ Row │ County │ Year │ CRMRTE │ PrbConv │ AvgSen │ PrbPris │
│ │ Int32 │ Int32 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────┼───────┼───────────┼──────────┼─────────┼──────────┤
│ 1 │ 1 │ 81 │ 0.0398849 │ 0.402062 │ 5.61 │ 0.472222 │
│ 2 │ 1 │ 82 │ 0.0383449 │ 0.433005 │ 5.59 │ 0.506993 │
│ 3 │ 1 │ 83 │ 0.0303048 │ 0.525703 │ 5.8 │ 0.479705 │
│ 4 │ 1 │ 84 │ 0.0347259 │ 0.604706 │ 6.89 │ 0.520104 │
│ 5 │ 1 │ 85 │ 0.036573 │ 0.578723 │ 6.55 │ 0.497059 │
│ 6 │ 1 │ 86 │ 0.0347524 │ 0.512324 │ 6.9 │ 0.439863 │
│ 7 │ 1 │ 87 │ 0.0356036 │ 0.527596 │ 6.71 │ 0.43617 │
⋮
│ 623 │ 195 │ 87 │ 0.0313973 │ 1.67052 │ 13.02 │ 0.470588 │
│ 624 │ 197 │ 81 │ 0.0178621 │ 1.06098 │ 11.35 │ 0.356322 │
│ 625 │ 197 │ 82 │ 0.0180711 │ 0.545455 │ 6.64 │ 0.363636 │
│ 626 │ 197 │ 83 │ 0.0155747 │ 0.480392 │ 7.77 │ 0.428571 │
│ 627 │ 197 │ 84 │ 0.0136619 │ 1.41026 │ 10.11 │ 0.372727 │
│ 628 │ 197 │ 85 │ 0.0130857 │ 0.830769 │ 5.96 │ 0.333333 │
│ 629 │ 197 │ 86 │ 0.012874 │ 2.25 │ 7.68 │ 0.244444 │
│ 630 │ 197 │ 87 │ 0.0141928 │ 1.18293 │ 12.23 │ 0.360825 │
The Between estimator
model = fit(BetweenEstimator, # Indicates the between estimator
@formula(CRMRTE ~ PrbConv + AvgSen + PrbPris),
data,
panel = :County)
Between Estimator
County with 630 groups
Balanced groups with size 7
Number of observations: 90
Null Loglikelihood: 239.56
Loglikelihood: 244.40
R-squared: 0.1029
Wald: 3.29 ∼ F(3, 86) ⟹ Pr > F = 0.0246
Formula: CRMRTE ~ 1 + PrbConv + AvgSen + PrbPris
Variance Covariance Estimator: OIM
───────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
───────────────────────────────────────────────────────────────────────────────────
(Intercept) -0.00464339 0.0186571 -0.24888 0.8040 -0.0417325 0.0324457
PrbConv -0.00354113 0.0018904 -1.87322 0.0644 -0.00729911 0.00021685
AvgSen 0.000848049 0.00116477 0.728086 0.4685 -0.00146743 0.00316353
PrbPris 0.0730299 0.0332057 2.19932 0.0305 0.0070191 0.139041
───────────────────────────────────────────────────────────────────────────────────
The Within estimator (Fixed Effects)
model = fit(EconometricModel,
@formula(CRMRTE ~ PrbConv + AvgSen + PrbPris + absorb(County)),
data)
Continuous Response Model
Number of observations: 630
Null Loglikelihood: 1633.30
Loglikelihood: 2273.95
R-squared: 0.8707
Wald: 0.16 ∼ F(3, 537) ⟹ Pr > F = 0.9258
Formula: CRMRTE ~ 1 + PrbConv + AvgSen + PrbPris + absorb(County)
Variance Covariance Estimator: OIM
──────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
──────────────────────────────────────────────────────────────────────────────────────
(Intercept) 0.0314527 0.00204638 15.3699 <1e-43 0.0274328 0.0354726
PrbConv 6.65981e-6 0.0001985 0.0335507 0.9732 -0.000383272 0.000396592
AvgSen 7.83181e-5 0.000127904 0.612318 0.5406 -0.000172936 0.000329572
PrbPris -0.0013419 0.00405182 -0.331185 0.7406 -0.00930126 0.00661746
──────────────────────────────────────────────────────────────────────────────────────
Absorbing the panel and temporal indicators
model = fit(EconometricModel,
@formula(CRMRTE ~ PrbConv + AvgSen + PrbPris + absorb(County + Year)),
data)
Continuous Response Model
Number of observations: 630
Null Loglikelihood: 1633.30
Loglikelihood: 2290.25
R-squared: 0.8775
Wald: 0.23 ∼ F(3, 531) ⟹ Pr > F = 0.8767
Formula: CRMRTE ~ 1 + PrbConv + AvgSen + PrbPris + absorb(County + Year)
Variance Covariance Estimator: OIM
──────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
──────────────────────────────────────────────────────────────────────────────────────
(Intercept) 0.0319651 0.00206378 15.4886 <1e-44 0.027911 0.0360193
PrbConv 7.90362e-5 0.000196089 0.403062 0.6871 -0.00030617 0.000464242
AvgSen -9.4788e-5 0.000134924 -0.702531 0.4827 -0.000359838 0.000170262
PrbPris 0.000979533 0.0040432 0.242267 0.8087 -0.00696309 0.00892216
──────────────────────────────────────────────────────────────────────────────────────
The random effects model à la Swamy-Arora
model = fit(RandomEffectsEstimator, # Indicates the random effects estimator
@formula(CRMRTE ~ PrbConv + AvgSen + PrbPris),
data,
panel = :County,
time = :Year)
One-way Random Effect Model
Longitudinal dataset: County, Year
Balanced dataset with 90 panels of length 7
individual error component: 0.0162
idiosyncratic error component: 0.0071
ρ: 0.8399
Number of observations: 630
Null Loglikelihood: 2225.79
Loglikelihood: 2226.01
R-squared: 0.0007
Wald: 0.15 ∼ F(3, 626) ⟹ Pr > F = 0.9291
Formula: CRMRTE ~ PrbConv + AvgSen + PrbPris
Variance Covariance Estimator: OIM
──────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
──────────────────────────────────────────────────────────────────────────────────────
(Intercept) 0.0309293 0.00266706 11.5968 <1e-27 0.0256918 0.0361668
PrbConv -3.96634e-5 0.000198471 -0.199845 0.8417 -0.000429413 0.000350086
AvgSen 8.2428e-5 0.000127818 0.644887 0.5192 -0.000168575 0.000333431
PrbPris -0.000123327 0.00404272 -0.030506 0.9757 -0.00806226 0.00781561
──────────────────────────────────────────────────────────────────────────────────────
Instrumental Variables
Instrumental variables models are estimated using the 2SLS estimator.
model = fit(RandomEffectsEstimator,
@formula(CRMRTE ~ PrbConv + (AvgSen ~ PrbPris)),
data,
panel = :County,
time = :Year)
One-way Random Effect Model
Longitudinal dataset: County, Year
Balanced dataset with 90 panels of length 7
individual error component: 0.0413
idiosyncratic error component: 0.0074
ρ: 0.9691
Number of observations: 630
Null Loglikelihood: 2268.01
Loglikelihood: 2248.34
R-squared: NaN
Wald: 0.03 ∼ F(2, 626) ⟹ Pr > F = 0.9671
Formula: CRMRTE ~ PrbConv + (AvgSen ~ PrbPris)
Variance Covariance Estimator: OIM
───────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
───────────────────────────────────────────────────────────────────────────────────────
(Intercept) 0.037747 0.0241684 1.56183 0.1188 -0.00971405 0.085208
PrbConv 1.39581e-5 0.000200244 0.0697054 0.9445 -0.000379273 0.000407189
AvgSen -0.000688924 0.00266514 -0.258494 0.7961 -0.00592262 0.00454478
───────────────────────────────────────────────────────────────────────────────────────
Nominal Response Models
data = joinpath(dirname(pathof(Econometrics)), "..", "data", "insure.csv") |>
CSV.read |>
(data -> select(data, [:insure, :age, :male, :nonwhite, :site])) |>
dropmissing |>
(data -> categorical!(data, [:insure, :site]))
615×5 DataFrame
│ Row │ insure │ age │ male │ nonwhite │ site │
│ │ Categorical… │ Float64 │ Int64 │ Int64 │ Categorical… │
├─────┼──────────────┼─────────┼───────┼──────────┼──────────────┤
│ 1 │ Indemnity │ 73.7221 │ 0 │ 0 │ 2 │
│ 2 │ Prepaid │ 27.8959 │ 0 │ 0 │ 2 │
│ 3 │ Indemnity │ 37.5414 │ 0 │ 0 │ 1 │
│ 4 │ Prepaid │ 23.6413 │ 0 │ 1 │ 3 │
│ 5 │ Prepaid │ 29.6838 │ 0 │ 0 │ 2 │
│ 6 │ Prepaid │ 39.4689 │ 0 │ 0 │ 2 │
│ 7 │ Uninsure │ 63.102 │ 0 │ 1 │ 3 │
⋮
│ 608 │ Prepaid │ 64.4216 │ 0 │ 1 │ 3 │
│ 609 │ Prepaid │ 69.9877 │ 0 │ 0 │ 1 │
│ 610 │ Prepaid │ 71.5209 │ 0 │ 0 │ 3 │
│ 611 │ Prepaid │ 63.4168 │ 0 │ 0 │ 2 │
│ 612 │ Prepaid │ 40.3532 │ 0 │ 1 │ 1 │
│ 613 │ Prepaid │ 55.3237 │ 1 │ 0 │ 3 │
│ 614 │ Prepaid │ 27.9836 │ 1 │ 0 │ 3 │
│ 615 │ Prepaid │ 30.59 │ 1 │ 0 │ 3 │
The model automatically detects that the response is nominal and uses the correct model.
model = fit(EconometricModel,
@formula(insure ~ age + male + nonwhite + site),
data,
contrasts = Dict(:insure => DummyCoding(base = "Uninsure")))
Probability Model for Nominal Response
Categories: Uninsure, Indemnity, Prepaid
Number of observations: 615
Null Loglikelihood: -555.85
Loglikelihood: -534.36
R-squared: 0.0387
LR Test: 42.99 ∼ χ²(10) ⟹ Pr > χ² = 0.0000
Formula: insure ~ 1 + age + male + nonwhite + site
───────────────────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
───────────────────────────────────────────────────────────────────────────────────────────────────
insure: Indemnity ~ (Intercept) 1.28694 0.59232 2.17271 0.0302 0.123689 2.4502
insure: Indemnity ~ age 0.00779612 0.0114418 0.681372 0.4959 -0.0146743 0.0302666
insure: Indemnity ~ male -0.451848 0.367486 -1.22957 0.2193 -1.17355 0.269855
insure: Indemnity ~ nonwhite -0.217059 0.425636 -0.509965 0.6103 -1.05296 0.618843
insure: Indemnity ~ site: 2 1.21152 0.470506 2.57493 0.0103 0.287497 2.13554
insure: Indemnity ~ site: 3 0.207813 0.366293 0.56734 0.5707 -0.511547 0.927172
insure: Prepaid ~ (Intercept) 1.55666 0.596327 2.61041 0.0093 0.385533 2.72778
insure: Prepaid ~ age -0.00394887 0.0115993 -0.340439 0.7336 -0.0267287 0.018831
insure: Prepaid ~ male 0.109846 0.365187 0.300793 0.7637 -0.607343 0.827035
insure: Prepaid ~ nonwhite 0.757718 0.419575 1.80592 0.0714 -0.0662835 1.58172
insure: Prepaid ~ site: 2 1.32456 0.469789 2.81947 0.0050 0.401941 2.24717
insure: Prepaid ~ site: 3 -0.380175 0.372819 -1.01973 0.3083 -1.11235 0.352001
───────────────────────────────────────────────────────────────────────────────────────────────────
Ordinal Response Models
data = RDatasets.dataset("Ecdat", "Kakadu") |>
(data -> select(data, [:RecParks, :Sex, :Age, :Schooling]))
data.RecParks = convert(Vector{Int}, data.RecParks)
data.RecParks = levels!(categorical(data.RecParks, ordered = true), collect(1:5))
data
1827×4 DataFrame
│ Row │ RecParks │ Sex │ Age │ Schooling │
│ │ Categorical… │ Categorical… │ Int32 │ Int32 │
├──────┼──────────────┼──────────────┼───────┼───────────┤
│ 1 │ 3 │ male │ 27 │ 3 │
│ 2 │ 5 │ female │ 32 │ 4 │
│ 3 │ 4 │ male │ 32 │ 4 │
│ 4 │ 1 │ female │ 70 │ 6 │
│ 5 │ 2 │ male │ 32 │ 5 │
│ 6 │ 3 │ male │ 47 │ 6 │
│ 7 │ 1 │ male │ 42 │ 5 │
⋮
│ 1820 │ 3 │ female │ 32 │ 4 │
│ 1821 │ 4 │ male │ 21 │ 5 │
│ 1822 │ 3 │ male │ 21 │ 3 │
│ 1823 │ 3 │ male │ 52 │ 4 │
│ 1824 │ 2 │ female │ 21 │ 3 │
│ 1825 │ 5 │ female │ 21 │ 2 │
│ 1826 │ 1 │ female │ 21 │ 4 │
│ 1827 │ 4 │ male │ 21 │ 5 │
The model automatically detects that the response is ordinal and uses the correct model.
model = fit(EconometricModel,
@formula(RecParks ~ Age + Sex + Schooling),
data)
Probability Model for Ordinal Response
Categories: 1 < 2 < 3 < 4 < 5
Number of observations: 1827
Null Loglikelihood: -2677.60
Loglikelihood: -2657.40
R-squared: 0.0075
LR Test: 40.42 ∼ χ²(3) ⟹ Pr > χ² = 0.0000
Formula: RecParks ~ Age + Sex + Schooling
──────────────────────────────────────────────────────────────────────────────────────────
PE SE t-value Pr > |t| 2.50% 97.50%
──────────────────────────────────────────────────────────────────────────────────────────
Age 0.00943647 0.00254031 3.71469 0.0002 0.00445424 0.0144187
Sex: male -0.0151659 0.0846365 -0.179188 0.8578 -0.181161 0.150829
Schooling -0.103902 0.0248742 -4.17711 <1e-4 -0.152687 -0.0551174
(Intercept): 1 | 2 -2.92405 0.191927 -15.2352 <1e-48 -3.30047 -2.54763
(Intercept): 2 | 3 -1.54922 0.171444 -9.03632 <1e-18 -1.88547 -1.21297
(Intercept): 3 | 4 -0.298938 0.166904 -1.79108 0.0734 -0.626281 0.0284051
(Intercept): 4 | 5 0.669835 0.167622 3.9961 <1e-4 0.341083 0.998587
──────────────────────────────────────────────────────────────────────────────────────────