{
    "version" : "https://jsonfeed.org/version/1",
    "content" : "guides",
    "type" : "single",
    "title" : "Understand the data |Digital.gov",
    "description": "Understand the data",
    "home_page_url" : "/preview/gsa/digitalgov.gov/bc-archive-content-3/","feed_url" : "/preview/gsa/digitalgov.gov/bc-archive-content-3/guides/site-scanning/understand-the-data/index.json","item" : [
    {"title" :"Understand the data","deck" : "","summary" : "Learn about the various types of data collected from scanned websites.","date" : "2020-07-28T09:00:00-05:00","date_modified" : "2025-01-27T19:42:55-05:00","primary_image" : { "uid" : "guide-site-scanning", "alt" :
  "A person works in front of a computer with many internet symbols on it", "width" :
  "1200", "height" :
  "630", "credit" :
  "agny_illustration/iStock via Getty Images", "caption" :
  "", "format" :
  "png" },"branch" : "bc-archive-content-3",
      "filename" :"understand-the-data.md",
      
      "filepath" :"guides/site-scanning/understand-the-data.md",
      "filepathURL" :"https://github.com/GSA/digitalgov.gov/blob/bc-archive-content-3/content/guides/site-scanning/understand-the-data.md",
      "editpathURL" :"https://github.com/GSA/digitalgov.gov/edit/bc-archive-content-3/content/guides/site-scanning/understand-the-data.md","url" : "/preview/gsa/digitalgov.gov/bc-archive-content-3/guides/site-scanning/understand-the-data/","aliases" : {"0" : "/guide/site-scanning/understand-the-data/","1" : "/guide/site-scanning/understanding-the-data/"},"content" :"\u003cp\u003eThe Site Scanning engine runs against the \u003ca href=\"https://github.com/GSA/federal-website-index\"\u003efull list of federal government websites\u003c/a\u003e and analyzes various aspects of them.\u003c/p\u003e\n\u003cp\u003eThe scans operate without authentication over the public internet. Using a headless browser (a browser without a graphical interface), they load each Target URL and inspect what would normally be returned to a user who is visiting that page with a web browser. The results of these inspections form the data that Site Scanning makes available.\u003c/p\u003e\n\u003cp\u003eThe scans currently collect the following data about each target URL. A \u003ca href=\"https://github.com/GSA/site-scanning-documentation/blob/main/data/Site_Scanning_Data_Dictionary.csv\"\u003ecomplete data dictionary with much more detail\u003c/a\u003e can be found in the program\u0026rsquo;s \u003ca href=\"https://github.com/GSA/site-scanning-documentation\"\u003edocumentation hub\u003c/a\u003e.\u003c/p\u003e\n\u003ctable class=\"usa-table usa-table--stacked\"\u003e\n    \u003cthead\u003e\n      \u003ctr\u003e\n        \u003cth\u003eGeneral\u003c/th\u003e\n        \u003cth\u003eUSWDS\u003c/th\u003e\n        \u003cth\u003eDAP\u003c/th\u003e\n        \u003cth\u003eSEO\u003c/th\u003e\n        \u003cth\u003eThird Party Services\u003c/th\u003e\n      \u003c/tr\u003e\n    \u003c/thead\u003e\n    \u003ctr\u003e\n      \u003ctd data-label=\"General\"\u003eServer Response Code\u003c/td\u003e\n      \u003ctd data-label=\"USWDS\"\u003ePresence of USWDS components\u003c/td\u003e\n      \u003ctd data-label=\"DAP\"\u003ePresence of DAP snippet\u003c/td\u003e\n      \u003ctd data-label=\"SEO\"\u003eMeta Description Tags\u003c/td\u003e\n      \u003ctd data-label=\"Third Party Services\"\u003ePresence of Third Party Services\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd data-label=\"General\"\u003eRedirects\u003c/td\u003e\n      \u003ctd data-label=\"USWDS\"\u003eUSWDS Version\u003c/td\u003e\n      \u003ctd data-label=\"DAP\"\u003eCustomizations of the Snippet\u003c/td\u003e\n      \u003ctd data-label=\"SEO\"\u003ePresence of Robots.txt\u003c/td\u003e\n      \u003ctd data-label=\"Third Party Services\"\u003eNumber of Third Party Services\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd data-label=\"General\"\u003eDomain\u003c/td\u003e\n      \u003ctd data-label=\"USWDS\"\u003eDegree of Implementation\u003c/td\u003e\n      \u003ctd data-label=\"DAP\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"SEO\"\u003eElements of the Robots.txt\u003c/td\u003e\n      \u003ctd data-label=\"Third Party Services\"\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd data-label=\"General\"\u003eAgency\u003c/td\u003e\n      \u003ctd data-label=\"USWDS\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"DAP\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"SEO\"\u003ePresence of Sitemap.xml\u003c/td\u003e\n      \u003ctd data-label=\"Third Party Services\"\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd data-label=\"General\"\u003eBureau\u003c/td\u003e\n      \u003ctd data-label=\"USWDS\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"DAP\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"SEO\"\u003eElements of Sitemap.xml\u003c/td\u003e\n      \u003ctd data-label=\"Third Party Services\"\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd data-label=\"General\"\u003e404 Configuration\u003c/td\u003e\n      \u003ctd data-label=\"USWDS\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"DAP\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"SEO\"\u003eCanonical URL\u003c/td\u003e\n      \u003ctd data-label=\"Third Party Services\"\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd data-label=\"General\"\u003eIPv6 Compliance\u003c/td\u003e\n      \u003ctd data-label=\"USWDS\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"DAP\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"SEO\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"Third Party Services\"\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd data-label=\"General\"\u003eUnderlying Technology\u003c/td\u003e\n      \u003ctd data-label=\"USWDS\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"DAP\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"SEO\"\u003e\u003c/td\u003e\n      \u003ctd data-label=\"Third Party Services\"\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cem\u003eHave ideas for what else we should be scanning for? Please \u003ca href=\"https://github.com/gsa/site-scanning/issues\"\u003efile an issue\u003c/a\u003e or add your idea \u003ca href=\"https://github.com/GSA/site-scanning-documentation/blob/main/pages/candidate-scans.md\"\u003eto the list of proposed future scans\u003c/a\u003e!\u003c/em\u003e\u003c/p\u003e\n"}
  ]
}
