You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

413 lines
15KB

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  3. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  4. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  5. <head>
  6. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  7. <meta name="description" content="Describes config schema framework in HTML Purifier." />
  8. <link rel="stylesheet" type="text/css" href="./style.css" />
  9. <title>Config Schema - HTML Purifier</title>
  10. </head>
  11. <body>
  12. <h1>Config Schema</h1>
  13. <div id="filing">Filed under Development</div>
  14. <div id="index">Return to the <a href="index.html">index</a>.</div>
  15. <div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
  16. <p>
  17. HTML Purifier has a fairly complex system for configuration. Users
  18. interact with a <code>HTMLPurifier_Config</code> object to
  19. set configuration directives. The values they set are validated according
  20. to a configuration schema, <code>HTMLPurifier_ConfigSchema</code>.
  21. </p>
  22. <p>
  23. The schema is mostly transparent to end-users, but if you're doing development
  24. work for HTML Purifier and need to define a new configuration directive,
  25. you'll need to interact with it. We'll also talk about how to define
  26. userspace configuration directives at the very end.
  27. </p>
  28. <h2>Write a directive file</h2>
  29. <p>
  30. Directive files define configuration directives to be used by
  31. HTML Purifier. They are placed in <code>library/HTMLPurifier/ConfigSchema/schema/</code>
  32. in the form <code><em>Namespace</em>.<em>Directive</em>.txt</code> (I
  33. couldn't think of a more descriptive file extension.)
  34. Directive files are actually what we call <code>StringHash</code>es,
  35. i.e. associative arrays represented in a string form reminiscent of
  36. <a href="http://qa.php.net/write-test.php">PHPT</a> tests. Here's a
  37. sample directive file, <code>Test.Sample.txt</code>:
  38. </p>
  39. <pre>Test.Sample
  40. TYPE: string/null
  41. DEFAULT: NULL
  42. ALLOWED: 'foo', 'bar'
  43. VALUE-ALIASES: 'baz' => 'bar'
  44. VERSION: 3.1.0
  45. --DESCRIPTION--
  46. This is a sample configuration directive for the purposes of the
  47. &lt;code&gt;dev-config-schema.html&lt;code&gt; documentation.
  48. --ALIASES--
  49. Test.Example</pre>
  50. <p>
  51. Each of these segments has a specific meaning:
  52. </p>
  53. <table class="table">
  54. <thead>
  55. <tr>
  56. <th>Key</th>
  57. <th>Example</th>
  58. <th>Description</th>
  59. </tr>
  60. </thead>
  61. <tbody>
  62. <tr>
  63. <td>ID</td>
  64. <td>Test.Sample</td>
  65. <td>The name of the directive, in the form Namespace.Directive
  66. (implicitly the first line)</td>
  67. </tr>
  68. <tr>
  69. <td>TYPE</td>
  70. <td>string/null</td>
  71. <td>The type of variable this directive accepts. See below for
  72. details. You can also add <code>/null</code> to the end of
  73. any basic type to allow null values too.</td>
  74. </tr>
  75. <tr>
  76. <td>DEFAULT</td>
  77. <td>NULL</td>
  78. <td>A parseable PHP expression of the default value.</td>
  79. </tr>
  80. <tr>
  81. <td>DESCRIPTION</td>
  82. <td>This is a...</td>
  83. <td>An HTML description of what this directive does.</td>
  84. </tr>
  85. <tr>
  86. <td>VERSION</td>
  87. <td>3.1.0</td>
  88. <td><em>Recommended</em>. The version of HTML Purifier this directive was added.
  89. Directives that have been around since 1.0.0 don't have this,
  90. but any new ones should.</td>
  91. </tr>
  92. <tr>
  93. <td>ALIASES</td>
  94. <td>Test.Example</td>
  95. <td><em>Optional</em>. A comma separated list of aliases for this directive.
  96. This is most useful for backwards compatibility and should
  97. not be used otherwise.</td>
  98. </tr>
  99. <tr>
  100. <td>ALLOWED</td>
  101. <td>'foo', 'bar'</td>
  102. <td><em>Optional</em>. Set of allowed value for a directive,
  103. a comma separated list of parseable PHP expressions. This
  104. is only allowed string, istring, text and itext TYPEs.</td>
  105. </tr>
  106. <tr>
  107. <td>VALUE-ALIASES</td>
  108. <td>'baz' =&gt; 'bar'</td>
  109. <td><em>Optional</em>. Mapping of one value to another, and
  110. should be a comma separated list of keypair duples. This
  111. is only allowed string, istring, text and itext TYPEs.</td>
  112. </tr>
  113. <tr>
  114. <td>DEPRECATED-VERSION</td>
  115. <td>3.1.0</td>
  116. <td><em>Not shown</em>. Indicates that the directive was
  117. deprecated this version.</td>
  118. </tr>
  119. <tr>
  120. <td>DEPRECATED-USE</td>
  121. <td>Test.NewDirective</td>
  122. <td><em>Not shown</em>. Indicates what new directive should be
  123. used instead. Note that the directives will functionally be
  124. different, although they should offer the same functionality.
  125. If they are identical, use an alias instead.</td>
  126. </tr>
  127. <tr>
  128. <td>EXTERNAL</td>
  129. <td>CSSTidy</td>
  130. <td><em>Not shown</em>. Indicates if there is an external library
  131. the user will need to download and install to use this configuration
  132. directive. As of right now, this is merely a Google-able name; future
  133. versions may also provide links and instructions.</td>
  134. </tr>
  135. </tbody>
  136. </table>
  137. <p>
  138. Some notes on format and style:
  139. </p>
  140. <ul>
  141. <li>
  142. Each of these keys can be expressed in the short format
  143. (<code>KEY: Value</code>) or the long format
  144. (<code>--KEY--</code> with value beneath). You must use the
  145. long format if multiple lines are needed, or if a long format
  146. has been used already (that's why <code>ALIASES</code> in our
  147. example is in the long format); otherwise, it's user preference.
  148. </li>
  149. <li>
  150. The HTML descriptions should be wrapped at about 80 columns; do
  151. not rely on editor word-wrapping.
  152. </li>
  153. </ul>
  154. <p>
  155. Also, as promised, here is the set of possible types:
  156. </p>
  157. <table class="table">
  158. <thead>
  159. <tr>
  160. <th>Type</th>
  161. <th>Example</th>
  162. <th>Description</th>
  163. </tr>
  164. </thead>
  165. <tbody>
  166. <tr>
  167. <td>string</td>
  168. <td>'Foo'</td>
  169. <td><a href="http://docs.php.net/manual/en/language.types.string.php">String</a> without newlines</td>
  170. </tr>
  171. <tr>
  172. <td>istring</td>
  173. <td>'foo'</td>
  174. <td>Case insensitive ASCII string without newlines</td>
  175. </tr>
  176. <tr>
  177. <td>text</td>
  178. <td>"A<em>\n</em>b"</td>
  179. <td>String with newlines</td>
  180. </tr>
  181. <tr>
  182. <td>itext</td>
  183. <td>"a<em>\n</em>b"</td>
  184. <td>Case insensitive ASCII string without newlines</td>
  185. </tr>
  186. <tr>
  187. <td>int</td>
  188. <td>23</td>
  189. <td>Integer</td>
  190. </tr>
  191. <tr>
  192. <td>float</td>
  193. <td>3.0</td>
  194. <td>Floating point number</td>
  195. </tr>
  196. <tr>
  197. <td>bool</td>
  198. <td>true</td>
  199. <td>Boolean</td>
  200. </tr>
  201. <tr>
  202. <td>lookup</td>
  203. <td>array('key' =&gt; true)</td>
  204. <td>Lookup array, used with <code>isset($var[$key])</code></td>
  205. </tr>
  206. <tr>
  207. <td>list</td>
  208. <td>array('f', 'b')</td>
  209. <td>List array, with ordered numerical indexes</td>
  210. </tr>
  211. <tr>
  212. <td>hash</td>
  213. <td>array('key' =&gt; 'val')</td>
  214. <td>Associative array of keys to values</td>
  215. </tr>
  216. <tr>
  217. <td>mixed</td>
  218. <td>new stdclass</td>
  219. <td>Any PHP variable is fine</td>
  220. </tr>
  221. </tbody>
  222. </table>
  223. <p>
  224. The examples represent what will be returned out of the configuration
  225. object; users have a little bit of leeway when setting configuration
  226. values (for example, a lookup value can be specified as a list;
  227. HTML Purifier will flip it as necessary.) These types are defined
  228. in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php">
  229. library/HTMLPurifier/VarParser.php</a>.
  230. </p>
  231. <p>
  232. For more information on what values are allowed, and how they are parsed,
  233. consult <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
  234. library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well
  235. as <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php">
  236. library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for
  237. the semantics of the parsed values.
  238. </p>
  239. <h2>Refreshing the cache</h2>
  240. <p>
  241. You may have noticed that your directive file isn't doing anything
  242. yet. That's because it hasn't been added to the runtime
  243. <code>HTMLPurifier_ConfigSchema</code> instance. Run
  244. <code>maintenance/generate-schema-cache.php</code> to fix this.
  245. If there were no errors, you're good to go! Don't forget to add
  246. some unit tests for your functionality!
  247. </p>
  248. <p>
  249. If you ever make changes to your configuration directives, you
  250. will need to run this script again.
  251. </p>
  252. <h2>Adding in-house schema definitions</h2>
  253. <p>
  254. Placing stuff directly in HTML Purifier's source tree is generally not a
  255. good idea, so HTML Purifier 4.0.0+ has some facilities in place to make your
  256. life easier.
  257. </p>
  258. <p>
  259. The first is to pass an extra parameter to <code>maintenance/generate-schema-cache.php</code>
  260. with the location of your directory (relative or absolute path will do). For example,
  261. if I'm storing my custom definitions in <em>/var/htmlpurifier/myschema</em>, run:
  262. <code>php maintenance/generate-schema-cache.php /var/htmlpurifier/myschema</code>.
  263. </p>
  264. <p>
  265. Alternatively, you can create a small loader PHP file in the HTML Purifier base
  266. directory named <code>config-schema.php</code> (this is the same directory
  267. you would place a <code>test-settings.php</code> file). In this file, add
  268. the following line for each directory you want to load:
  269. </p>
  270. <pre>$builder-&gt;buildDir($interchange, '/var/htmlpurifier/myschema');</pre>
  271. <p>You can even load a single file using:</p>
  272. <pre>$builder-&gt;buildFile($interchange, '/var/htmlpurifier/myschema/MyApp.Directive.txt');</pre>
  273. <p>Storing custom definitions that you don't plan on sending back upstream in
  274. a separate directory is <em>definitely</em> a good idea! Additionally, picking
  275. a good namespace can go a long way to saving you grief if you want to use
  276. someone else's change, but they picked the same name, or if HTML Purifier
  277. decides to add support for a configuration directive that has the same name.</p>
  278. <!-- TODO: how to name directives that rely on naming conventions -->
  279. <h2>Errors</h2>
  280. <p>
  281. All directive files go through a rigorous validation process
  282. through <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php">
  283. library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well
  284. as some basic checks during building. While
  285. listing every error out here is out-of-scope for this document, we
  286. can give some general tips for interpreting error messages.
  287. There are two types of errors: builder errors and validation errors.
  288. </p>
  289. <h3>Builder errors</h3>
  290. <blockquote>
  291. <p>
  292. <strong>Exception:</strong> Expected type string, got
  293. integer in DEFAULT in directive hash 'Ns.Dir'
  294. </p>
  295. </blockquote>
  296. <p>
  297. You can identify a builder error by the keyword "directive hash."
  298. These are the easiest to deal with, because they directly correspond
  299. with your directive file. Find the offending directive file (which
  300. is the directive hash plus the .txt extension), find the
  301. offending index ("in DEFAULT" means the DEFAULT key) and fix the error.
  302. This particular error would occur if your default value is not the same
  303. type as TYPE.
  304. </p>
  305. <h3>Validation errors</h3>
  306. <blockquote>
  307. <p>
  308. <strong>Exception:</strong> Alias 3 in valueAliases in directive
  309. 'Ns.Dir' must be a string
  310. </p>
  311. </blockquote>
  312. <p>
  313. These are a little trickier, because we're not actually validating
  314. your directive file, or even the direct string hash representation.
  315. We're validating an Interchange object, and the error messages do
  316. not mention any string hash keys.
  317. </p>
  318. <p>
  319. Nevertheless, it's not difficult to figure out what went wrong.
  320. Read the "context" statements in reverse:
  321. </p>
  322. <dl>
  323. <dt>in directive 'Ns.Dir'</dt>
  324. <dd>This means we need to look at the directive file <code>Ns.Dir.txt</code></dd>
  325. <dt>in valueAliases</dt>
  326. <dd>There's no key actually called this, but there's one that's close:
  327. VALUE-ALIASES. Indeed, that's where to look.</dd>
  328. <dt>Alias 3</dt>
  329. <dd>The value alias that is equal to 3 is the culprit.</dd>
  330. </dl>
  331. <p>
  332. In this particular case, you're not allowed to alias integers values to
  333. strings values.
  334. </p>
  335. <p>
  336. The most difficult part is translating the Interchange member variable (valueAliases)
  337. into a directive file key (VALUE-ALIASES), but there's a one-to-one
  338. correspondence currently. If the two formats diverge, any discrepancies
  339. will be described in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
  340. library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>.
  341. </p>
  342. <h2>Internals</h2>
  343. <p>
  344. Much of the configuration schema framework's codebase deals with
  345. shuffling data from one format to another, and doing validation on this
  346. data.
  347. The keystone of all of this is the <code>HTMLPurifier_ConfigSchema_Interchange</code>
  348. class, which represents the purest, parsed representation of the schema.
  349. </p>
  350. <p>
  351. Hand-writing this data is unwieldy, however, so we write directive files.
  352. These directive files are parsed by <code>HTMLPurifier_StringHashParser</code>
  353. into <code>HTMLPurifier_StringHash</code>es, which then
  354. are run through <code>HTMLPurifier_ConfigSchema_InterchangeBuilder</code>
  355. to construct the interchange object.
  356. </p>
  357. <p>
  358. From the interchange object, the data can be siphoned into other forms
  359. using <code>HTMLPurifier_ConfigSchema_Builder</code> subclasses.
  360. For example, <code>HTMLPurifier_ConfigSchema_Builder_ConfigSchema</code>
  361. generates a runtime <code>HTMLPurifier_ConfigSchema</code> object,
  362. which <code>HTMLPurifier_Config</code> uses to validate its incoming
  363. data. There is also an XML serializer, which is used to build documentation.
  364. </p>
  365. </body>
  366. </html>
  367. <!-- vim: et sw=4 sts=4
  368. -->