{
    "version" : "https://jsonfeed.org/version/1",
    "content" : "guides",
    "type" : "single",
    "title" : "Access the data |Digital.gov",
    "description": "Access the data",
    "home_page_url" : "/preview/gsa/digitalgov.gov/bc-archive-content-3/","feed_url" : "/preview/gsa/digitalgov.gov/bc-archive-content-3/guides/site-scanning/data/index.json","item" : [
    {"title" :"Access the data","summary" : "Learn how to get started and download data from the Site Scanning program.","date" : "2020-07-28T09:00:00-05:00","date_modified" : "2025-01-27T19:42:55-05:00","primary_image" : { "uid" : "guide-site-scanning", "alt" :
  "A person works in front of a computer with many internet symbols on it", "width" :
  "1200", "height" :
  "630", "credit" :
  "agny_illustration/iStock via Getty Images", "caption" :
  "", "format" :
  "png" },"branch" : "bc-archive-content-3",
      "filename" :"data.md",
      
      "filepath" :"guides/site-scanning/data.md",
      "filepathURL" :"https://github.com/GSA/digitalgov.gov/blob/bc-archive-content-3/content/guides/site-scanning/data.md",
      "editpathURL" :"https://github.com/GSA/digitalgov.gov/edit/bc-archive-content-3/content/guides/site-scanning/data.md","url" : "/preview/gsa/digitalgov.gov/bc-archive-content-3/guides/site-scanning/data/","aliases" : {"0" : "/guide/site-scanning/data/","1" : "/guides/site-scanning/download-the-data/"},"content" :"\u003ch2 id=\"get-started\"\u003eGet started\u003c/h2\u003e\n\u003cp\u003eThe easiest way to begin accessing and using data from the Site Scanning program is to:\u003c/p\u003e\n\u003col\u003e\n\u003cli\u003eDownload the \u003ca href=\"https://api.gsa.gov/technology/site-scanning/data/weekly-snapshot.csv\"\u003eprimary CSV dataset\u003c/a\u003e.\u003c/li\u003e\n\u003cli\u003eOpen the data in a spreadsheet program.\u003c/li\u003e\n\u003cli\u003eApply filters to view the websites for your agency, bureau, or domain only.\u003c/li\u003e\n\u003cli\u003eHide or delete any unused columns to make the spreadsheet faster and more responsive.\u003c/li\u003e\n\u003cli\u003eReview the \u003ca href=\"https://github.com/GSA/site-scanning-documentation/blob/main/data/Site_Scanning_Data_Dictionary.csv\"\u003edata dictionary\u003c/a\u003e to understand the fields for each website.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThen, start looking at the data and using it to generate insights and make data-informed decisions.\u003c/p\u003e\n\u003cp\u003eMost data fields are straightforward. Review the \u003ca href=\"https://digital.gov/guides/site-scanning/technical-details/\"\u003etechnical details\u003c/a\u003e if you want to dig deeper or pursue more sophisticated tasks.\u003c/p\u003e\n\u003ch2 id=\"download-the-data\"\u003eDownload the data\u003c/h2\u003e\n\u003cp\u003eThe scan data is exported on a weekly basis. You can download it with live URLs only or all URLs.\u003c/p\u003e\n\u003ch3 id=\"primary-dataset-with-live-urls-only\"\u003ePrimary dataset with live URLs only\u003c/h3\u003e\n\u003cp\u003eThe primary dataset includes scan data for all live URLs only. In other words, it only includes the websites for which \u003ccode\u003eFinal URL - Live\u003c/code\u003e has a value of \u003ccode\u003eTRUE\u003c/code\u003e. The dataset excludes machine-readable data files such as XML and JSON files.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"https://api.gsa.gov/technology/site-scanning/data/weekly-snapshot.csv\"\u003eDownload the primary CSV\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://api.gsa.gov/technology/site-scanning/data/weekly-snapshot.json\"\u003eDownload the primary JSON\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"full-dataset-with-all-urls\"\u003eFull dataset with all URLs\u003c/h3\u003e\n\u003cp\u003eThe  full dataset includes scan data for all URLs that were scanned, regardless of whether they are live or not. Some URLs may be inaccessible over the public internet, no longer live, or experiencing downtime.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"https://api.gsa.gov/technology/site-scanning/data/weekly-snapshot-all.csv\"\u003eDownload the full CSV\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://api.gsa.gov/technology/site-scanning/data/weekly-snapshot-all.json\"\u003eDownload the full JSON\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2 id=\"access-the-api\"\u003eAccess the API\u003c/h2\u003e\n\u003cp\u003eThe Site Scanning program provides an API that you can use to access all of the scan data. Visit the \u003ca href=\"https://open.gsa.gov/api/site-scanning-api/\"\u003eSite Scanning API\u003c/a\u003e page for documentation, including an API key registration.\u003c/p\u003e\n\u003ch2 id=\"contact-the-site-scanning-team\"\u003eContact the Site Scanning team\u003c/h2\u003e\n\u003cp\u003eIf you have any questions, please email the Site Scanning team at \u003ca href=\"mailto:site-scanning@gsa.gov\"\u003esite-scanning@gsa.gov\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eThey welcome your feedback, including suggestions for federal websites to add or remove from the \u003ca href=\"https://github.com/GSA/federal-website-index\"\u003eFederal Website Index.\u003c/a\u003e\u003c/p\u003e\n"}
  ]
}
