{
    "version" : "https://jsonfeed.org/version/1",
    "content" : "news",
    "type" : "single",
    "title" : "Web Metadata Publishing Using XML |Digital.gov",
    "description": "Web Metadata Publishing Using XML",
    "home_page_url" : "/preview/gsa/digitalgov.gov/cm-topics-button-component/","feed_url" : "/preview/gsa/digitalgov.gov/cm-topics-button-component/2015/03/23/web-metadata-publishing-using-xml/index.json","item" : [
    {"title" :"Web Metadata Publishing Using XML","summary" : "Metadata for website content is usually managed as part of the editorial process when documents are created and published with content management systems. There may be another source for this metadata, especially in regulatory agencies: internal databases that reference Web content in support of record keeping processes. These databases may contain public and non-public information","date" : "2015-03-23T11:00:18-04:00","date_modified" : "2024-04-02T09:45:13-04:00","authors" : {"bob-rand" : "Bob Rand"},"topics" : {
        
            "content-strategy" : "Content Strategy",
            "mobile" : "Mobile",
            "open-data" : "Open Data",
            "search-engine-optimization" : "Search Engine Optimization"
            },"branch" : "cm-topics-button-component",
      "filename" :"2015-03-23-web-metadata-publishing-using-xml.md",
      
      "filepath" :"news/2015/03/2015-03-23-web-metadata-publishing-using-xml.md",
      "filepathURL" :"https://github.com/GSA/digitalgov.gov/blob/cm-topics-button-component/content/news/2015/03/2015-03-23-web-metadata-publishing-using-xml.md",
      "editpathURL" :"https://github.com/GSA/digitalgov.gov/edit/cm-topics-button-component/content/news/2015/03/2015-03-23-web-metadata-publishing-using-xml.md","slug" : "web-metadata-publishing-using-xml","url" : "/preview/gsa/digitalgov.gov/cm-topics-button-component/2015/03/23/web-metadata-publishing-using-xml/","content" :"\u003cdiv class=\"image\"\u003e\n  \u003cimg\n    src=\"https://s3.amazonaws.com/digitalgov/_legacy-img/2015/03/600-x-408-Technology-Concept-Business-Chart-alexaldo-iStock-Thinkstock-497231209.jpg\"\n    alt=\"Technology Concept Business Chart\"/\u003e\u003c/div\u003e\n\n\n\u003cp\u003eMetadata for website content is usually managed as part of the editorial process when documents are created and published with \u003ca href=\"/preview/gsa/digitalgov.gov/cm-topics-button-component/2013/10/30/content-management-systems-toolkit/\"\u003econtent management systems\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eThere may be another source for this metadata, especially in regulatory agencies: internal databases that reference Web content in support of record keeping processes. These databases may contain public and non-public information that were never meant to be published for public consumption. “Metadata” is not typically how the content is described.\u003c/p\u003e\n\u003cp\u003eThe Securities and Exchange Commission (SEC) has recently developed two online \u003ca href=\"http://en.wikipedia.org/wiki/XML\" title=\"XML - Wikipedia, the free encyclopedia\"\u003eXML\u003c/a\u003e products using a database sourcing approach. The content is published using an older Web technology—\u003ca href=\"http://en.wikipedia.org/wiki/XSLT\" title=\"XSLT - Wikipedia, the free encyclopedia\"\u003eExtensible Stylesheet Language Transformations (XSLT)\u003c/a\u003e, which renders the XML as HTML in the user’s browser.\u003c/p\u003e\n\u003ch2 id=\"sec-online-docket\"\u003eSEC Online Docket\u003c/h2\u003e\n\u003cp\u003eThe SEC used data from a database that was used to track official agency releases to create an online XML version of the previously print-only \u003ca href=\"http://www.sec.gov/about/sec-docket.shtml\"\u003eSEC Docket\u003c/a\u003e, a compilation of all materials submitted by the SEC to the Federal Register. Staff had talked for years about saving the significant time and expense involved in preparing layout pages and printing the Docket, given that all the materials referenced by each edition were already available online.\u003c/p\u003e\n\u003cp\u003eThere was considerable discussion about how much of the information from the database to publish, the quality of the data, and how to remove non-public information. Some staff lobbied for a minimal approach, while others advocated using all the public metadata available from the database. We eventually decided to include all available public data, including links to the versions of the documents published in the Federal Register.\u003c/p\u003e\n\u003cp\u003eA proof of concept demonstrated that we could create an XML file containing all the metadata we needed from an Excel export of the release log database and that we could use XSLT to render the XML as HTML within a user’s browser using our normal HTML template.\u003c/p\u003e\n\u003cp\u003eThe key to making the project sustainable, with weekly updates, was the development by one of our in-house staff programmers of a C# program in Visual Studio to automate the regular creation of XML files from the database.\u003c/p\u003e\n\u003ch2 id=\"administrative-proceedings-files-by-case-project\"\u003eAdministrative Proceedings Files by Case Project\u003c/h2\u003e\n\u003cp\u003eShortly after creating the Docket, our group was asked by staff to develop a new Web section to aggregate \u003ca href=\"http://www.sec.gov/litigation/apdocuments.shtml\"\u003eadministrative proceedings cases\u003c/a\u003e around a unique integrative identifier, a file number (for example, “\u003ca href=\"http://www.sec.gov/litigation/apdocuments/ap-3-15116.xml\"\u003e3-15116\u003c/a\u003e”) that is assigned to all documents relating to a particular case.\u003c/p\u003e\n\u003cp\u003eThe basic information for this project comes from another internal staff recordkeeping database, in this case one that is used to track SEC enforcement cases. Some data also comes from the SEC Docket. We decided again on an XML/XSLT approach for this project. As in the case of the online Docket, non-public information is removed before generating the XML. The structure is simpler than that of the Docket, but it uses the same element names.\u003c/p\u003e\n\u003ch2 id=\"notes-and-lessons-learned\"\u003eNotes and Lessons Learned\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eQuality control\u003c/strong\u003e to correct human errors has turned out to be a significant challenge. The C# program that initially processes the data into XML also runs error-checking routines. Prior to posting each Docket issue, staff use the error reports to correct problems with the database.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRemoving non-public information\u003c/strong\u003e. This is a major and valid concern. Staff who manage internal databases may be surprised and alarmed by the idea of extracting only selected data with the purpose of publishing the “safe” portions on agency public websites. In the case of the SEC Docket and AP Case projects, there was significant incentive on both sides to find a solution.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eClient-side XSLT\u003c/strong\u003e is supported by all major browsers, but not by some mobile browsers. There is considerable discussion about the future of XSLT browser support. For us, it was a quick open source solution that seems to be working. We may move to server-side XSLT in the near future so that the XML is rendered as HTML at the server instead of at the client.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eXML design.\u003c/strong\u003e Our data design could have been better, and we are not validating against a schema, as is a best practice. However, getting our feet wet with XML has changed how we look at Web publishing and opened up new possibilities for data publishing.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cem\u003eFor more on the benefits of structured data, see the \u003ca href=\"http://gsa.github.io/Open-And-Structured-Content-Models\"\u003eOpen and Structured Content Models Project\u003c/a\u003e. An upcoming DigitalGov University webinar, \u003ca href=\"\n\"\u003eWhere to Start with Structured Data and Content\u003c/a\u003e, will be held on March 31st at 2 p.m. EST.\u003c/em\u003e \u003cem\u003e\u003cstrong\u003eBob Rand\u003c/strong\u003e is a Web developer for the U.S. Securities and Exchange Commission.\u003c/em\u003e\u003c/p\u003e\n"}
  ]
}