How to install Python packages into Stata
Written by Chuck Huber (director of statistical outreach - StataCorp).
Using pip to install Python packages
Let’s begin by typing python query to verify that Python is installed on our system and that Stata is set up to use Python.
The results indicate that Stata is set up to use Python 3.8, so we are ready to install packages.
NumPy is a popular package that is described as “the fundamental package for scientific computing with Python”. Many other packages rely on NumPy‘s mathematical features, so let’s begin by installing it. It is possible that NumPy is already installed on my system, and I can check by typing python which numpy in Stata.
NumPy is not found on my system, so I am going to install it. I am using Windows 10, so I will type shell in Stata to open a Windows Command Prompt.
Figure 1: Windows Command Prompt
shell will also open a terminal in Mac or Linux operating systems. Note that experienced Stata users often type ! rather than the word shell.
Next, I will use a program named pip to install NumPy. You can type pip -V in the Windows Command Prompt or terminal in Mac or Linux to see the version and location of your pip program.
Figure 2: pip version and location
The path for pip is the same as the path returned by python query above. You should verify this if you have multiple versions of Python installed on your system.
Next, type pip install numpy in the Command Prompt or terminal, and pip will download and install NumPy in the appropriate location on your system.
Figure 3: pip install numpy
The output tells us that NumPy was installed successfully.
We can verify that NumPy was installed successfully by again typing python which numpy
Let’s install three more packages that we will use in the future. Pandas is a popular Python package used for importing, exporting, and manipulating data. We can install it by typing pip install pandas in the Command Prompt.
Figure 4: pip install pandas
You can watch a video that demonstrates how to use pip to install Pandas on the Stata YouTube channel.
Matplotlib is a popular package that “is a comprehensive library for creating static, animated, and interactive visualizations in Python”. We can install it by typing pip install matplotlib in the Command Prompt.
Figure 5: pip install matplotlib
Scikit-learn is a popular package for machine learning. We can install it by typing pip install sklearn in the Command Prompt.
Figure 6: pip install scikit-learn
Let’s use python which to verify that pandas, matplotlib, and scikit-learn are installed.
We did it! We successfully installed four of the most popular Python packages using pip. You can use your Internet search engine to find hundreds of other Python packages and install them with pip.