Stata is easy to grow with

Consistent command syntax

Stata's commands are intuitive and easy to learn. Even better, everything you learn about performing a task can be applied to other tasks.

Need to limit your analysis to females? Add if female==1 to any command.

Need standard errors that are robust to many common assumptions? Add vce(robust) to almost any estimation command.

Need to account for sampling weights, clusters, and stratification? Add svy: to the beginning of the command.

The consistency goes even deeper. What you learn about data management commands often applies to estimation commands, and vice versa. There is a full suite of postestimation commands to perform hypothesis tests, form linear and nonlinear combinations, make predictions, form contrasts, and even perform marginal analysis with interaction plots. These commands work the same way after virtually every estimator.

See how it works

Let's start with linear regression. We fit a variety of models and explore results using the postestimation commands for testing, prediction, and marginal analysis.
// Regression of body mass index (BMI) on age and region indicators
regress bmi age i.region 

// Fit the model for females only 
regress bmi age i.region if female==1 

// Obtain robust standard errors 
regress bmi age i.region, vce(robust) 

// Include a female indicator and its interaction with age 
regress bmi age i.region i.female c.age#i.female 

// Perform a joint test of significance for the region indicators 
testparm i.region 

// Compute the predicted BMI for each person 
predict bmi_hat 

// Obtain the average prediction (potential outcome), treating
// all individuals as if they live in region 1 
margins 1.region 

// Obtain average predictions for all regions 
margins region 

// Obtain average predictions by sex across a range of ages 
margins female, at(age=(20 40 60 80)) 

// Plot this interaction 
marginsplot 

(See the graph)

What if we instead have a binary outcome variable, an indicator of whether an individual has high blood pressure? We could fit a logistic regression model. We replace regress in the commands above with logistic, and we use highbp instead of bmi as the dependent variable. Otherwise, the model specification, options, and postestimation commands are almost identical.

// Logistic regression of high blood pressure on age and region indicators 
logistic highbp age i.region 

// Fit the model for females only 
logistic highbp age i.region if female==1 

// Obtain robust standard errors 
logistic highbp age i.region, vce(robust) 

// Include a female indicator and its interaction with age 
logistic highbp age i.region i.female c.age#i.female 

// Perform a joint test of significance for the region indicators 
testparm i.region 

// Compute the predicted probability of high blood pressure
// for each person 
predict prob_hbp 

// Obtain the average predicted probability (potential outcome),
// treating all individuals as if they live in region 1 
margins 1.region 

// Obtain average predicted probability for all regions 
margins region 

// Obtain average predicted probabilities by sex across a range of ages 
margins female, at(age=(20 40 60 80)) 

// Plot this interaction 
marginsplot 

(See the graph)

If we have a count outcome such as the number of individuals in the household, we might want to fit a Poisson model. We use the poisson command andhousesize as the dependent variable, but again, the rest of the command syntax is the same.

// Poisson regression of household size on age and region indicators 
poisson housesize age i.region 

// Fit the model for females only 
poisson housesize age i.region if female==1 

// Obtain robust standard errors 
poisson housesize age i.region, vce(robust) 

// Include a rural location indicator and its interaction with age 
poisson housesize age i.region i.rural c.age#i.rural 

// Perform a joint test of significance for the region indicators 
testparm i.region 

// Compute the predicted number of individuals in each household 
predict size 

// Obtain the average predicted household size (potential outcome),
// treating all individuals as if they live in region 1 
margins 1.region 

// Obtain average predicted household size for all regions 
margins region 

// Obtain average predicted household size by rural across 
// a range of ages 
margins rural, at(age=(20 40 60 80)) 

// Plot this interaction 
marginsplot 

(See the graph)

We could fit many other models. Models for ordered and unordered categorical outcomes. Multilevel models. Models for time-series, panel, or survival data. Models accounting for endogeneity and sample selection. Regardless of the model, we can use the same command structure, same options, and same postestimation commands that we used above.
Post your comment

Timberlake Consultants