Training Calendar

Data Scraping and Analysis in Python and Stata: Techniques and Applications

Online 2 days (16th May 2024 - 17th May 2024) Stata Intermediate, Introductory
Delivered by: Francesco Saverio Stentella Lopes, University of Roma Tre, Rome, Italy
Data Management, Statistics

Instructor: Francesco Saverio Stentella Lopes, University of Roma Tre, Rome, Italy


Course Context

Recent years have witnessed an unprecedented growth in the types and amount of information available online. Through the internet, analysts and practitioners can now access an increasing amount of information characterized by high detail and frequency. However, in some cases, the required information is not readily available to download. The ability to build tools capable of retrieving and parsing information stored on the internet becomes a valuable tool in many veins of data science. This course is a primer to data scraping using Stata and Python. Stata today can interact with Python. The two programs are well integrated, and you can run Python operations within Stata. This integration opens a wide range of possibilities.

Course Overview

In this course, participants will learn how online information can be organized and written in a Stata-readable format from any website. The teaching approach will be based on learning by doing, and any line of code presented in the course will be discussed and potentially run by all attendants. The discussion will be based on real-world examples and will be oriented toward practical applications rather than theory. After the course, participants are expected to have an improved understanding of the integration between Stata and Python and will be able to understand in detail and potentially write themselves a web scraping script in Python within Stata. Hence, participants will become able to master research tasks including but not limited to:

  • Adding a Python script within a Stata do file.
  • Retrieving public information from the web.
  • Parsing and organizing the retrieved information.
  • Creating a dataset containing the data retrieved online.

Who is the course for

This course is designed for analysts, researchers, and data professionals interested in enhancing their skills in data scraping and analysis using Python and Stata. It is suitable for those who want to leverage the integration between Python and Stata for retrieving, parsing, and organizing information from the web for practical applications in data science and research.

 



Course Timetable

Morning SessionAfternoon SessionQ&A with Instructor
10am-12pm (London time) 2pm-4pm (London time) 4pm-4:30pm (London time)

Course Agenda

Day 1

Session 1: Data Scraping and PyStata basics

  • Data Scraping: definition, rationale, usefulness

  • Data scraping projects

  • Data scraping vs. API

  • Pros and Cons of data scraping

  • Python within Stata: Python basics

    • Python Installation

    • Variables, Lists, and loops

    • Defining functions using Python

    • Writing and reading csv files

    • Stata implementation

Session 2: Opening and parsing webpages using Python within STATA

  • The requests library in Python

    • Installing requests in Python

    • Opening a webpage with Python using requests

    • Understanding the basic structure of an HTML webpage

    • Stata implementation

  • The beautiful soup library within Python

    • Installing beautiful soup in Python

    • Parsing a webpage using requests and beautiful soup

    • Reaching specific information within a webpage

    • Stata implementation


Day 2

Session 1: Writing your first data scraping project

  • Writing your first data scraping project

    • Retrieving specific information from a website

    • Cleaning the retrieved information

    • Writing the cleaned data on a csv file

    • Stata implementation

Session 2: Crawl across webpages

  • Letting your first data scraping project run across different webpages

    • Creating loops to scrape data across different pages

    • Avoid overcharging a website with too many requests

    • Stata implementation


Q&A Session

At the end of the course, there will be a dedicated informal session for Q&A relevant to the content of the course.



Prerequisites

This course requires you to use Python. You therefore will need to either check that Python is already installed on your computer, or you will need to install Python onto your computer before the course starts.

Installing Python is generally easy, and nowadays many Linux and UNIX distributions include a recent Python. Even some Windows computers (notably those from HP) now come with Python already installed.

To start programming, you need an operating system (OS). Python is cross-platform and will work on Windows, macOS, and Linux.

If you do need to install Python, you can do so here.

To work with Python, you will need a Text Editor or IDE. This course does not require a specific Text Editor or IDE as we focus on the integration of Stata and Python and therefore Python scripts that we will discuss will be embedded inside Stata .do files, which will be executed directly from within Stata.

Terms & Conditions

  • Student registrations: Attendees must provide proof of full time student status at the time of booking to qualify for student registration rate (valid student ID card or authorised letter of enrolment).
  • Additional discounts are available for multiple registrations. Contact us for more information.
  • Temporary, time limited licences for the software(s) used in the course will be provided. You are required to install the software provided prior to the start of the course.
  • Full payment of course fees is required prior to the course start date to guarantee your place.
  • Registration closes 1 calendar day prior to the start of the course.

Cancellations or changes to your registration

  • 100% fee returned for cancellations made over 28-calendar days prior to start of the course.
  • 50% fee returned for cancellations made 14-calendar days prior to the start of the course.
  • No fee returned for cancellations made less than 14-calendar days prior to the start of the course.

The number of attendees is restricted. Please register early to guarantee your place.

  •  CommercialAcademicStudent
    16 & 17 May 2024 (16/05/2024 - 17/05/2024)

All prices exclude VAT or local taxes where applicable.

* Required Fields

£0
Post your comment

Timberlake Consultants