Congressional Record for the 43rd-114th Congresses: Parsed Speeches and Phrase Counts

This dataset contains processed text from the bound and daily editions of the United States Congressional Record, as provided by HeinOnline. The bound edition covers the 43rd through the 111th Congresses, and the daily edition covers the 97th through the 114th. Each edition includes all the text spoken on the floor of each chamber of Congress: the United States House of Representatives and the United States Senate. An automated script parses the text of each session to produce full-text speeches, metadata on the speeches and their speakers, and counts of two-word phrases (bigrams) by speaker and participant. Text is aggregated across sessions to flag bigrams related to congressional procedure or are extremely common or rare. Also included are the results of a manual audit of the script and statistics about our rate of speech matching with members of Congress.

Organization

Stanford University

Temporal coverage

2000 - 2017

Data
Usage guide
Loading...

® 2025 Data Basis

Terms of Use

Privacy Policy

Contact