Stuart J.H. Graham, Alan C. Marco, and Richard Miller
This article describes the “USPTO Patent Examination Research Dataset” (PatEx) and explores possible selection issues and the representativeness of the nearly 9.2 million US patent application records it contains. We find that data are sparse for years before 1981, and that serious selection issues affect records on applications filed prior to 2001 due to nonpublication in the United States. Following implementation of a policy change in November 2000, both coverage and representativeness of the PatEx data improve substantially. We uncover specific areas that are prone to selectivity issues, by generating statistical evidence across application characteristics such as application type, age, ownership type, domestic or foreign origin, patent family status, and technology class among others. Although our exploration suggests to researchers several categories of specific concern, our findings overall show that the PatEx data are generally representative of the population of patent applications filed in the United States after November 2000 across observable characteristics.