Technical details

Learn about the automated processes behind the site scanning program.

Reading time: 2 minutes

The Site Scanning program maintains a number of automated processes that, together, consitute the entire project and seek to deliver useful data. The basic flow of these events are as follows:

Each week, a comprehensive list of public federal .gov websites is assembled as the Federal Website Index.
- Direct download of the current Federal Website Index.
- Process description, including details about the sources used, how the list is combined, and which criteria are used to remove entries.
- Snapshots from each step in the assembly process, including which URLs are removed at each step and which remain.
- Data dictionary for the Federal Website Index.
- Summary report for the assembly process.
- Summary report for the completed Federal Website Index.
- Task repository.
Every day, the Federal Website Index is then scanned. This is done by loading each Target URL in a virtual browser and noting the results. This information is the Site Scanning data.
- Scanning process description, including what criteria are used to create each field of data.
- Data dictionary for the Site Scanning data.
The resulting information is stored in a database that is queryable via API, but each week, a series of static snapshot of the data are generated and made available for download.
- API Documentation.
- The ‘All’ snapshot (CSV) includes every URL in the Federal Website Index.
- The ‘Primary’ snapshot (CSV) is a subset of the initial snapshot and includes only live, human-readable URLs. This is likely the best starting point for most users.
- The ‘Unique Final URL’ snapshot (CSV) then further trims the Primary snapshot by removing duplicative Final URLs (details).
- The ‘Unique Final Website’ snapshot (CSV) then finally trims the Unique Final URL snapshot by removing duplicative Final URL - Base Websites (details). This is arguably the best count of federal public .gov websites.
After these snapshots are generated, a series of reports are run that analyze or pull information out of them.
Other useful information
Project Repositories

Understanding the Site Scanning program

Sections

Technical details