## Exploring Python within Stata - Part 1 with Dr George Naufal

Written by **Dr. George Naufal** (Texas A&M University).

Stata 16 has brought the ability to use Python and its all capabilities right in Stata. Now you can code in your own dofile and switch between Stata and Python language. Stata Corp. has excellent blog posts on how to integrate your Stata with Python. Check these out at this** link**. This blog post replicates **part 5** from that series using a different data set and creates a 3D plot of marginal predictions of getting married in relations to age and number of hours worked per week.

We use the NLS data which is based on the National Longitudinal Survey of Young Women 14-26 years of age in 1968. This data is one of the many **data sets **used for training purposes. We keep three variables:

msp = 1 if married, spouse present

hours = weekly hours worked

age = age in years

We use hours and age to estimate the marginal predictions of getting married. The code is below:

`use nls, clear`

keep msp age hours

svyset _n

svy: logistic msp age hour c.age#c.hours

quietly margins, at(age=(18(5)50) hours=(10(5)60)) vce(unconditional) saving(marginal_predictions, replace)

use marginal_predictions, clear

rename _at1 age

rename _at2 hours

rename _margin pr_married

save marginal_predictions, replace

In the code above, we open the NLS file, keep the three variables, and then estimate a `logit`

model to calculate the marginal predictions using the `margins`

command. We fixe covariate value for age from 18 to 50 varying by 5 years, and `hours`

from 10 to 60 varying by 5 hours (these values were guided by the distribution of both variables). After saving the marginal predictions in `marginal_predictions`

we rename these predictions to match the names of the variables themselves.

Here we are done with the Stata side and launch the Python side, without having to get out of the `dofile.`

Stata 16 allows you access to all of Python’s capabilities include the thousands of its libraries and packages such as `pandas, numpy,`

and `matplotlib`

which we download (to learn how to integrate Python in Stata 16 check this **link,** this **one**, and this **one**). Remember that to move to Python in the `dofile`

just type `python`

and then code in the new language and close it with an end to either finish the `dofile`

or go back to coding in Stata language.

`python:`

import pandas as pd

import numpy as np

import matplotlib

import matplotlib.pyplot as plt

matplotlib.use('TkAgg')

The `matplotlib.use('TkAgg')`

is to identify which rendering engine is being used and in this case, it is the `TkAgg`

. This may not be the same on your machine; to find this out you type:

`python:`

import matplotlib

matplotlib.get_backend()

end

We then let Python know to use the data set `marginal_predictions`

, define a 3D plot, and import the three marginal predictions variables.

`data = pd.read_stata("marginal_predictions.dta")`

ax = plt.axes(projection='3d')

ax.plot_trisurf(data['age'], data['hours'], data['pr_married'], cmap=plt.cm.Spectral_r)

The `Spectral_r`

defines patterns of colors that will reflect the marginal prediction values. For more information on choosing colormaps in `matplotlib`

check this **link.**

The next step is to specify the values of the ticks on all three axes: X (age), Y (hours), and Z (marginal predictions). We again use the values from the distribution of each of the variables in the original `nls`

data set.

ax.set_xticks(np.arange(18, 50, step=5))

ax.set_yticks(np.arange(10, 70, step=10))

ax.set_zticks(np.arange( 0, 1.2, step=0.2))

We then define the titles for all three axes with the below:

`ax.set_title("Probability of Getting Married by Age and Hours Worked")`

ax.set_xlabel("Age (Years)")

ax.set_ylabel("Hours Worked per Week")

The last thing to do is to decide on the rotation and elevation of the figure and we do that below before saving the graph in .png (the file is saved in the Stata working directory) and don’t forget to end the Python code with an `end command`

.

`ax.zaxis.set_rotate_label(False)`

ax.set_zlabel("Probability of Marriage", rotation=90)

ax.view_init(elev=30, azim=240)

plt.savefig("Margins3d.png", dpi=1200)

end.

The outcome is the below picture:

This figures tells us that older women are more likely to be married (`Spectral_r`

colors higher values of the marginal predictions in red and lower values in blue) and more work hours per week is reflected in less likelihood of being married.

**Download **supporting Stata files to investigate yourself.