Digital.gov Guide

Handbook for the Site Scanning program

This program is available to automatically generate data about the health and best practices of federal websites.
A person works in front of a computer with many internet symbols on it

Understand the data

Learn about the various types of data collected from scanned websites.

Reading time: 1 minute

The Site Scanning engine runs against the full list of federal government websites and analyzes various aspects of them.

The scans operate without authentication over the public internet. Using a headless browser (a browser without a graphical interface), they load each Target URL and inspect what would normally be returned to a user who is visiting that page with a web browser. The results of these inspections form the data that Site Scanning makes available.

The scans currently collect the following data about each target URL. A complete data dictionary with much more detail can be found in the program’s documentation hub.

General USWDS DAP SEO Third Party Services
Server Response Code Presence of USWDS components Presence of DAP snippet Meta Description Tags Presence of Third Party Services
Redirects USWDS Version Customizations of the Snippet Presence of Robots.txt Number of Third Party Services
Domain Degree of Implementation Elements of the Robots.txt
Agency Presence of Sitemap.xml
Bureau Elements of Sitemap.xml
404 Configuration Canonical URL
IPv6 Compliance
Underlying Technology

Have ideas for what else we should be scanning for? Please file an issue or add your idea to the list of proposed future scans!