You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

232 lines
8.6KB

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  3. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  4. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
  5. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  6. <meta name="description" content="Tutorial for tweaking HTML Purifier's Tidy-like behavior." />
  7. <link rel="stylesheet" type="text/css" href="style.css" />
  8. <title>Tidy - HTML Purifier</title>
  9. </head><body>
  10. <h1>Tidy</h1>
  11. <div id="filing">Filed under Development</div>
  12. <div id="index">Return to the <a href="index.html">index</a>.</div>
  13. <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
  14. <p>You've probably heard of HTML Tidy, Dave Raggett's little piece
  15. of software that cleans up poorly written HTML. Let me say it straight
  16. out:</p>
  17. <p class="emphasis">This ain't HTML Tidy!</p>
  18. <p>Rather, Tidy stands for a cool set of Tidy-inspired features in HTML Purifier
  19. that allows users to submit deprecated elements and attributes and get
  20. valid strict markup back. For example:</p>
  21. <pre>&lt;center&gt;Centered&lt;/center&gt;</pre>
  22. <p>...becomes:</p>
  23. <pre>&lt;div style=&quot;text-align:center;&quot;&gt;Centered&lt;/div&gt;</pre>
  24. <p>...when this particular fix is run on the HTML. This tutorial will give
  25. you the lowdown of what exactly HTML Purifier will do when Tidy
  26. is on, and how to fine-tune this behavior. Once again, <strong>you do
  27. not need Tidy installed on your PHP to use these features!</strong></p>
  28. <h2>What does it do?</h2>
  29. <p>Tidy will do several things to your HTML:</p>
  30. <ul>
  31. <li>Convert deprecated elements and attributes to standards-compliant
  32. alternatives</li>
  33. <li>Enforce XHTML compatibility guidelines and other best practices</li>
  34. <li>Preserve data that would normally be removed as per W3C</li>
  35. </ul>
  36. <h2>What are levels?</h2>
  37. <p>Levels describe how aggressive the Tidy module should be when
  38. cleaning up HTML. There are four levels to pick: none, light, medium
  39. and heavy. Each of these levels has a well-defined set of behavior
  40. associated with it, although it may change depending on your doctype.</p>
  41. <dl>
  42. <dt>light</dt>
  43. <dd>This is the <strong>lenient</strong> level. If a tag or attribute
  44. is about to be removed because it isn't supported by the
  45. doctype, Tidy will step in and change into an alternative that
  46. is supported.</dd>
  47. <dt>medium</dt>
  48. <dd>This is the <strong>correctional</strong> level. At this level,
  49. all the functions of light are performed, as well as some extra,
  50. non-essential best practices enforcement. Changes made on this
  51. level are very benign and are unlikely to cause problems.</dd>
  52. <dt>heavy</dt>
  53. <dd>This is the <strong>aggressive</strong> level. If a tag or
  54. attribute is deprecated, it will be converted into a non-deprecated
  55. version, no ifs ands or buts.</dd>
  56. </dl>
  57. <p>By default, Tidy operates on the <strong>medium</strong> level. You can
  58. change the level of cleaning by setting the %HTML.TidyLevel configuration
  59. directive:</p>
  60. <pre>$config-&gt;set('HTML.TidyLevel', 'heavy'); // burn baby burn!</pre>
  61. <h2>Is the light level really light?</h2>
  62. <p>It depends on what doctype you're using. If your documents are HTML
  63. 4.01 <em>Transitional</em>, HTML Purifier will be lazy
  64. and won't clean up your <code>center</code>
  65. or <code>font</code> tags. But if you're using HTML 4.01 <em>Strict</em>,
  66. HTML Purifier has no choice: it has to convert them, or they will
  67. be nuked out of existence. So while light on Transitional will result
  68. in little to no changes, light on Strict will still result in quite
  69. a lot of fixes.</p>
  70. <p>This is different behavior from 1.6 or before, where deprecated
  71. tags in transitional documents would
  72. always be cleaned up regardless. This is also better behavior.</p>
  73. <h2>My pages look different!</h2>
  74. <p>HTML Purifier is tasked with converting deprecated tags and
  75. attributes to standards-compliant alternatives, which usually
  76. need copious amounts of CSS. It's also not foolproof: sometimes
  77. things do get lost in the translation. This is why when HTML Purifier
  78. can get away with not doing cleaning, it won't; this is why
  79. the default value is <strong>medium</strong> and not heavy.</p>
  80. <p>Fortunately, only a few attributes have problems with the switch
  81. over. They are described below:</p>
  82. <table class="table">
  83. <thead><tr>
  84. <th>Element@Attr</th>
  85. <th>Changes</th>
  86. </tr></thead>
  87. <tbody>
  88. <tr>
  89. <td>caption@align</td>
  90. <td>Firefox supports stuffing the caption on the
  91. left and right side of the table, a feature that
  92. Internet Explorer, understandably, does not have.
  93. When align equals right or left, the text will simply
  94. be aligned on the left or right side.</td>
  95. </tr>
  96. <tr>
  97. <td>img@align</td>
  98. <td>The implementation for align bottom is good, but not
  99. perfect. There are a few pixel differences.</td>
  100. </tr>
  101. <tr>
  102. <td>br@clear</td>
  103. <td>Clear both gets a little wonky in Internet Explorer. Haven't
  104. really been able to figure out why.</td>
  105. </tr>
  106. <tr>
  107. <td>hr@noshade</td>
  108. <td>All browsers implement this slightly differently: we've
  109. chosen to make noshade horizontal rules gray.</td>
  110. </tr>
  111. </tbody>
  112. </table>
  113. <p>There are a few more minor, although irritating, bugs.
  114. Some older browsers support deprecated attributes,
  115. but not CSS. Transformed elements and attributes will look unstyled
  116. to said browsers. Also, CSS precedence is slightly different for
  117. inline styles versus presentational markup. In increasing precedence:</p>
  118. <ol>
  119. <li>Presentational attributes</li>
  120. <li>External style sheets</li>
  121. <li>Inline styling</li>
  122. </ol>
  123. <p>This means that styling that may have been masked by external CSS
  124. declarations will start showing up (a good thing, perhaps). Finally,
  125. if you've turned off the style attribute, almost all of
  126. these transformations will not work. Sorry mates.</p>
  127. <p>You can review the rendering before and after of these transformations
  128. by consulting the <a
  129. href="http://htmlpurifier.org/live/smoketests/attrTransform.php">attrTransform.php
  130. smoketest</a>.</p>
  131. <h2>I like the general idea, but the specifics bug me!</h2>
  132. <p>So you want HTML Purifier to clean up your HTML, but you're not
  133. so happy about the br@clear implementation. That's perfectly fine!
  134. HTML Purifier will make accomodations:</p>
  135. <pre>$config-&gt;set('HTML.Doctype', 'XHTML 1.0 Transitional');
  136. $config-&gt;set('HTML.TidyLevel', 'heavy'); // all changes, minus...
  137. <strong>$config-&gt;set('HTML.TidyRemove', 'br@clear');</strong></pre>
  138. <p>That third line does the magic, removing the br@clear fix
  139. from the module, ensuring that <code>&lt;br clear="both" /&gt;</code>
  140. will pass through unharmed. The reverse is possible too:</p>
  141. <pre>$config-&gt;set('HTML.Doctype', 'XHTML 1.0 Transitional');
  142. $config-&gt;set('HTML.TidyLevel', 'none'); // no changes, plus...
  143. <strong>$config-&gt;set('HTML.TidyAdd', 'p@align');</strong></pre>
  144. <p>In this case, all transformations are shut off, except for the p@align
  145. one, which you found handy.</p>
  146. <p>To find out what the names of fixes you want to turn on or off are,
  147. you'll have to consult the source code, specifically the files in
  148. <code>HTMLPurifier/HTMLModule/Tidy/</code>. There is, however, a
  149. general syntax:</p>
  150. <table class="table">
  151. <thead>
  152. <tr>
  153. <th>Name</th>
  154. <th>Example</th>
  155. <th>Interpretation</th>
  156. </tr>
  157. </thead>
  158. <tbody>
  159. <tr>
  160. <td>element</td>
  161. <td>font</td>
  162. <td>Tag transform for <em>element</em></td>
  163. </tr>
  164. <tr>
  165. <td>element@attr</td>
  166. <td>br@clear</td>
  167. <td>Attribute transform for <em>attr</em> on <em>element</em></td>
  168. </tr>
  169. <tr>
  170. <td>@attr</td>
  171. <td>@lang</td>
  172. <td>Global attribute transform for <em>attr</em></td>
  173. </tr>
  174. <tr>
  175. <td>e#content_model_type</td>
  176. <td>blockquote#content_model_type</td>
  177. <td>Change of child processing implementation for <em>e</em></td>
  178. </tr>
  179. </tbody>
  180. </table>
  181. <h2>So... what's the lowdown?</h2>
  182. <p>The lowdown is, quite frankly, HTML Purifier's default settings are
  183. probably good enough. The next step is to bump the level up to heavy,
  184. and if that still doesn't satisfy your appetite, do some fine-tuning.
  185. Other than that, don't worry about it: this all works silently and
  186. effectively in the background.</p>
  187. </body></html>
  188. <!-- vim: et sw=4 sts=4
  189. -->