151 line
6.7KB

  1. TODO List
  2. = KEY ====================
  3. # Flagship
  4. - Regular
  5. ? Maybe I'll Do It
  6. ==========================
  7. If no interest is expressed for a feature that may require a considerable
  8. amount of effort to implement, it may get endlessly delayed. Do not be
  9. afraid to cast your vote for the next feature to be implemented!
  10. Things to do as soon as possible:
  11. - http://htmlpurifier.org/phorum/read.php?3,5560,6307#msg-6307
  12. - Think about allowing explicit order of operations hooks for transforms
  13. - Fix "<.<" bug (trailing < is removed if not EOD)
  14. - Build in better internal state dumps and debugging tools for remote
  15. debugging
  16. - Allowed/Allowed* have strange interactions when both set
  17. ? Transform lone embeds into object tags
  18. - Deprecated config options that emit warnings when you set them (with'
  19. a way of muting the warning if you really want to)
  20. - Make HTML.Trusted work with Output.FlashCompat
  21. - HTML.Trusted and HTML.SafeObject have funny interaction; general
  22. problem is what to do when a module "supersedes" another
  23. (see also tables and basic tables.) This is a little dicier
  24. because HTML.SafeObject has some extra functionality that
  25. trusted might find useful. See http://htmlpurifier.org/phorum/read.php?3,5762,6100
  26. FUTURE VERSIONS
  27. ---------------
  28. 4.9 release [OMG CONFIG PONIES]
  29. ! Fix Printer. It's from the old days when we didn't have decent XML classes
  30. ! Factor demo.php into a set of Printer classes, and then create a stub
  31. file for users here (inside the actual HTML Purifier library)
  32. - Fix error handling with form construction
  33. - Do encoding validation in Printers, or at least, where user data comes in
  34. - Config: Add examples to everything (make built-in which also automatically
  35. gives output)
  36. - Add "register" field to config schemas to eliminate dependence on
  37. naming conventions (try to remember why we ultimately decided on tihs)
  38. 5.0 release [HTML 5]
  39. # Swap out code to use html5lib tokenizer and tree-builder
  40. ! Allow turning off of FixNesting and required attribute insertion
  41. 5.1 release [It's All About Trust] (floating)
  42. # Implement untrusted, dangerous elements/attributes
  43. # Implement IDREF support (harder than it seems, since you cannot have
  44. IDREFs to non-existent IDs)
  45. - Implement <area> (client and server side image maps are blocking
  46. on IDREF support)
  47. # Frameset XHTML 1.0 and HTML 4.01 doctypes
  48. - Figure out how to simultaneously set %CSS.Trusted and %HTML.Trusted (?)
  49. 5.2 release [Error'ed]
  50. # Error logging for filtering/cleanup procedures
  51. # Additional support for poorly written HTML
  52. - Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!)
  53. - Friendly strict handling of <address> (block -> <br>)
  54. - XSS-attempt detection--certain errors are flagged XSS-like
  55. - Append something to duplicate IDs so they're still usable (impl. note: the
  56. dupe detector would also need to detect the suffix as well)
  57. 6.0 release [Beyond HTML]
  58. # Legit token based CSS parsing (will require revamping almost every
  59. AttrDef class). Probably will use CSSTidy
  60. # More control over allowed CSS properties using a modularization
  61. # IRI support (this includes IDN)
  62. - Standardize token armor for all areas of processing
  63. 7.0 release [To XML and Beyond]
  64. - Extended HTML capabilities based on namespacing and tag transforms (COMPLEX)
  65. - Hooks for adding custom processors to custom namespaced tags and
  66. attributes, offer default implementation
  67. - Lots of documentation and samples
  68. Ongoing
  69. - More refactoring to take advantage of PHP5's facilities
  70. - Refactor unit tests into lots of test methods
  71. - Plugins for major CMSes (COMPLEX)
  72. - phpBB
  73. - Also, a FAQ for extension writers with HTML Purifier
  74. AutoFormat
  75. - Smileys
  76. - Syntax highlighting (with GeSHi) with <pre> and possibly <?php
  77. - Look at http://drupal.org/project/Modules/category/63 for ideas
  78. Neat feature related
  79. ! Support exporting configuration, so users can easily tweak settings
  80. in the demo, and then copy-paste into their own setup
  81. - Advanced URI filtering schemes (see docs/proposal-new-directives.txt)
  82. - Allow scoped="scoped" attribute in <style> tags; may be troublesome
  83. because regular CSS has no way of uniquely identifying nodes, so we'd
  84. have to generate IDs
  85. - Explain how to use HTML Purifier in non-PHP languages / create
  86. a simple command line stub (or complicated?)
  87. - Fixes for Firefox's inability to handle COL alignment props (Bug 915)
  88. - Automatically add non-breaking spaces to empty table cells when
  89. empty-cells:show is applied to have compatibility with Internet Explorer
  90. - Table of Contents generation (XHTML Compiler might be reusable). May also
  91. be out-of-band information.
  92. - Full set of color keywords. Also, a way to add onto them without
  93. finalizing the configuration object.
  94. - Write a var_export and memcached DefinitionCache - Denis
  95. - Built-in support for target="_blank" on all external links
  96. - Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
  97. Also, enable disabling of directionality
  98. ? Externalize inline CSS to promote clean HTML, proposed by Sander Tekelenburg
  99. ? Remove redundant tags, ex. <u><u>Underlined</u></u>. Implementation notes:
  100. 1. Analyzing which tags to remove duplicants
  101. 2. Ensure attributes are merged into the parent tag
  102. 3. Extend the tag exclusion system to specify whether or not the
  103. contents should be dropped or not (currently, there's code that could do
  104. something like this if it didn't drop the inner text too.)
  105. ? Make AutoParagraph also support paragraph-izing double <br> tags, and not
  106. just double newlines. This is kind of tough to do in the current framework,
  107. though, and might be reasonably approximated by search replacing double <br>s
  108. with newlines before running it through HTML Purifier.
  109. Maintenance related (slightly boring)
  110. # CHMOD install script for PEAR installs
  111. ! Factor out command line parser into its own class, and unit test it
  112. - Reduce size of internal data-structures (esp. HTMLDefinition)
  113. - Allow merging configurations. Thus,
  114. a -> b -> default
  115. c -> d -> default
  116. becomes
  117. a -> b -> c -> d -> default
  118. Maybe allow more fine-grained tuning of this behavior. Alternatively,
  119. encourage people to use short plist depths before building them up.
  120. - Time PHPT tests
  121. ChildDef related (very boring)
  122. - Abstract ChildDef_BlockQuote to work with all elements that only
  123. allow blocks in them, required or optional
  124. - Implement lenient <ruby> child validation
  125. Wontfix
  126. - Non-lossy smart alternate character encoding transformations (unless
  127. patch provided)
  128. - Pretty-printing HTML: users can use Tidy on the output on entire page
  129. - Native content compression, whitespace stripping: use gzip if this is
  130. really important
  131. vim: et sw=4 sts=4