Skip to content Skip to sidebar Skip to footer

Closing Tags When Extracting HTML From XML

I am transforming a mixed html and xml document using an xslt stylesheet and extracting only the html elements. Source file:

Solution 1:

I am afraid you don't understand the syntax rules for SGML based HTML which HTML 4 or 4.01 is: the correct markup for an empty element is <input>, it is not <input></input> nor <input/> nor <input />.

So with your request of the HTML output method and version you get the correct HTML syntax when the result tree of your XSLT transformation is serialized.

Check for instance http://validator.w3.org/check?uri=http%3A%2F%2Fhome.arcor.de%2Fmartin.honnen%2Fxslt%2Ftest2013040901Result.html&charset=%28detect+automatically%29&doctype=Inline&group=0, there are no errors or warnings on elements not being closed properly in there.

However with http://validator.w3.org/check?uri=http%3A%2F%2Fhome.arcor.de%2Fmartin.honnen%2Fxslt%2Ftest2013040902Result.html&charset=%28detect+automatically%29&doctype=Inline&group=0 you get warnings that elements are incorrectly closed.

So the html output method does the right thing, see also http://www.w3.org/TR/xslt#section-HTML-Output-Method which says:

The html output method should not output an end-tag for empty elements. For HTML 4.0, the empty elements are area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta and param. For example, an element written as <br/> or <br></br> in the stylesheet should be output as <br>.


Solution 2:

The <meta>, <img> and <input> elements don't need to be closed — it's still valid HTML.

If you want to have them closed, you could use xml (with XSLT2.0 you could use xhtml, too, as far as I know) as the output method and add the <meta> tag yourself if you need it. For example:

Stylesheet

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:TLA="http://www.TLA.com" exclude-result-prefixes="TLA">
  <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
  <xsl:strip-space elements="*" />

  <xsl:template match="@*|node()" priority="-2">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="head">
    <xsl:copy>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- This element-only identity template prevents the 
       TLA namespace declaration from being copied to the output -->
  <xsl:template match="*">
    <xsl:element name="{name()}">
      <xsl:apply-templates select="@* | node()" />
    </xsl:element>
  </xsl:template>

  <!-- Pass processing on to child elements of TLA elements -->
  <xsl:template match="TLA:*">
    <xsl:apply-templates select="*" />
  </xsl:template>
</xsl:stylesheet>

Output

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <title>Simplified Example Form</title>
  </head>
  <body>
    <table id="table_logo" style="display:inline">
      <tr>
        <td height="20" align="middle">Big Title Goes Here</td>
      </tr>
      <tr>
        <td align="center">
          <img src="logo.jpg" border="0"/>
        </td>
      </tr>
    </table>
    <table id="table_id_1">
      <tr>
        <td>Label text goes here</td>
        <td>
          <input id="input_id_1" type="text"/>
        </td>
      </tr>
    </table>
  </body>
</html>

Post a Comment for "Closing Tags When Extracting HTML From XML"