Understand the data

Learn about the various types of data collected from scanned websites.

Reading time: 1 minute

The Site Scanning engine runs against the full list of federal government websites and analyzes various aspects of them.

The scans operate without authentication over the public internet. Using a headless browser (a browser without a graphical interface), they load each Target URL and inspect what would normally be returned to a user who is visiting that page with a web browser. The results of these inspections form the data that Site Scanning makes available.

The scans currently collect the following data about each target URL. A complete data dictionary with much more detail can be found in the program’s documentation hub.

General	USWDS	DAP	SEO	Third Party Services
Server Response Code	Presence of USWDS components	Presence of DAP snippet	Meta Description Tags	Presence of Third Party Services
Redirects	USWDS Version	Customizations of the Snippet	Presence of Robots.txt	Number of Third Party Services
Domain	Degree of Implementation		Elements of the Robots.txt
Agency			Presence of Sitemap.xml
Bureau			Elements of Sitemap.xml
404 Configuration			Canonical URL
IPv6 Compliance
Underlying Technology

Have ideas for what else we should be scanning for? Please file an issue or add your idea to the list of proposed future scans!

Handbook for the Site Scanning program

Sections

Understand the data