The Effect of Schooling and Ability on Achievement Test Scores

Karsten T. Hansen, James J. Heckman and Kathleen J. Mullen

Forthcoming in Journal of Econometrics

 

Contents of Web Appendix

 

Supplemental Tables

 

Note that AFQT component scores are standardized to have within-sample mean 0, variance 1.

 

Back to top

 

Note on Imputation Rules for Missing Variables

 

Rather than drop observations without information on parents’ education and family income, which would result in a loss of almost a quarter of our sample, we decided to impute values for the missing data.  The imputation procedure was straightforward. For each variable with missing values (mother’s education, father’s education and family income):

 

(1) We ran an OLS regression of the nonmissing values on the following variables: dummy for southern residence at age 14, dummy for urban residence at age 14, dummy for broken home status at age 14, number of siblings, and year of birth dummies.

 

(2) We then used estimates from step 1 to compute predicted values for the missing data.

 

Back to top

 

Program Files to Estimate Joint Factor Model

 

The program samples parameter values from an iterative Markov chain (see Appendix C in the paper) whose stationary distribution is the joint posterior distribution of the model parameters.  The Fortran source code for the program can be downloaded here.

 

  1. Download the lfm_v4.exe file and note the directory (e.g. C:\EXE_DIR\).

 

  1. Modify the input file to match desired model specification.  (See the posted input file for a template; the comments to the right of the ! sign give instructions for formatting.)  Save the input file as a text file, noting the filename and location (e.g. C:\INPUT_DIR\input_file.txt).

 

Note the data must be in a (tab or space) delimited text file with numerical values only.

 

  1. At the DOS command prompt, type:

 

C:\EXE_DIR\lfm_v4 C:\INPUT_DIR\input_file.txt n_draws n_burn n_skips

 

where

 

n_draws is number of draws from the Markov chain that will be recorded, 

n_burn is the number of initial (burn-in) draws the program will discard, and

n_skips specifies that number of draws the program will skip.

 

For example:

 

C:\EXE_DIR\lfm_v4 C:\INPUT_DIR\input_file.txt 5000 10000 2

 

will instruct the program to sample 2*5000+10000=20,000 times from the Markov chain, discarding the first 10,000 draws and recording every other draw of the 10,000 sampled thereafter.

 

This program will estimate the joint posterior distribution of the parameters of the model.  The estimates can then be used to produce, for example, the predicted AFQT distribution or conditional choice probabilities.  Fortran and Matlab programs to produce the estimated posterior quantities discussed in the paper are available from the authors on request.

 

Program files:

 

Back to top