2020 London Stata Online Conference - Proceedings

26th UK Stata Conference - Online, 10 & 11 September 2020

From datasets to metadatasets in Stata

Roger Newson
Department of Primary Care and Public Health, Imperial College London

Metadatasets are Stata datasets, in files or in frames, which may have one observation per file, per dataset, per variable, or per variable value. Metadatasets can be used to modify a Stata database, or to make a Stata database self-documenting, especially if converted to non-Stata formats, such as HTML or even Microsoft Excel. We present some user-written packages, updated to Stata version 16, for creating and using metadatasets. The xdir package creates a resultsset with one observation per file in a folder conforming to a user-specified pattern. The descgen pack inputs a xdir resultsset, and generates a new variable indicating whether each file is a Stata dataset, and other new variables containing dataset attributes, such as the dataset label and characteristics, the sort key of variables, and the numbers of observations and variables. The vallabdef package inputs a dataset with one observation per label name per value per value label, and generates Stata value labels. The vallabsave package loads and saves value labels from and to label-only datasets, and transfers value labels between data frames. The descsave package creates a metadataset with one observation per variable in a dataset, and data on variable attributes (including characteristics). The invdesc package modifies the variable attributes of the dataset in the current frame, inputting a descsave resultsset in a second data frame to set the variable attributes, and inputting value labels from a dataset in a third data frame. The datasets containing the variable attributes and value labels may be produced as resultssets by Stata packages, or produced manually in a spreadsheet using LibreOffice Calc or Microsoft Excel, and input into Stata datasets using import delimited or import excel.

2020 London Stata Online Conference - Proceedings

26th UK Stata Conference - Online, 10 & 11 September 2020

From datasets to metadatasets in Stata

Second Generation P-Values (SGPV) for common estimation commands in Stata

xthst: Testing for slope homogeneity in Stata

Unit root tests for explosive behaviour

A gmm recipe to get standard errors for control function and other two-step estimators

randregret: A command for fitting Random Regret Minimization Models

Agent based models in Mata: Modelling aggregate processes, like the spread of a disease

New Bayesian features: multiple chains, predictions, and more

Non-parametric estimation in multi-state survival models: An update to msaj

kinkyreg: Instrument-free inference for linear regression models with endogenous regressors

Sample size calculation for an ordered categorical outcome

Fancy graphics: Force-directed diagrams

f_able: Estimation of marginal effects for models with alternative variable transformations

Socioeconomic Factors influencing the Spatial Spread of COVID-19 in the United States

Correlated random effects methods for panel data models with heterogeneous time effects

`randregret`: A command for fitting Random Regret Minimization Models

Non-parametric estimation in multi-state survival models: An update to `msaj`

`kinkyreg`: Instrument-free inference for linear regression models with endogenous regressors

`f_able`: Estimation of marginal effects for models with alternative variable transformations