You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

219 lines
7.8KB

  1. THE UNIVERSAL DESIGN PATTERN: PROPERTIES
  2. Steve Yegge
  3. Implementation:
  4. get(name)
  5. put(name, value)
  6. has(name)
  7. remove(name)
  8. iteration, with filtering [this will be our namespaces]
  9. parent
  10. Representations:
  11. - Keys are strings
  12. - It's nice to not need to quote keys (if we formulate our own language,
  13. consider this)
  14. - Property not present representation (key missing)
  15. - Frequent removal/re-add may have null help. If null is valid, use
  16. another value. (PHP semantics are weird here)
  17. Data structures:
  18. - LinkedHashMap is wonderful (O(1) access and maintains order)
  19. - Using a special property that points to the parent is usual
  20. - Multiple inheritance possible, need rules for which to lookup first
  21. - Iterative inheritance is best
  22. - Consider performance!
  23. Deletion
  24. - Tricky problem with inheritance
  25. - Distinguish between "not found" and "look in my parent for the property"
  26. [Maybe HTML Purifier won't allow deletion]
  27. Read/write asymmetry (it's correct!)
  28. Read-only plists
  29. - Allow ability to freeze [this is what we have already]
  30. - Don't overuse it
  31. Performance:
  32. - Intern strings (PHP does this already)
  33. - Don't be case-insensitive
  34. - If all properties in a plist are known a-priori, you can use a "perfect"
  35. hash function. Often overkill.
  36. - Copy-on-read caching "plundering" reduces lookup, but uses memory and can
  37. grow stale. Use as last resort.
  38. - Refactoring to fields. Watch for API compatibility, system complexity,
  39. and lack of flexibility.
  40. - Refrigerator: external data-structure to hold plists
  41. Transient properties:
  42. [Don't need to worry about this]
  43. - Use a separate plist for transient properties
  44. - Non-numeric override; numeric should ADD
  45. - Deletion: removeTransientProperty() and transientlyRemoveProperty()
  46. Persistence:
  47. - XML/JSON are good
  48. - Text-based is good for readability, maintainability and bootstrapping
  49. - Compressed binary format for network transport [not necessary]
  50. - RDBMS or XML database
  51. Querying: [not relevant]
  52. - XML database is nice for XPath/XQuery
  53. - jQuery for JSON
  54. - Just load it all into a program
  55. Backfills/Data integrity:
  56. - Use usual methods
  57. - Lazy backfill is a nice hack
  58. Type systems:
  59. - Flags: ReadOnly, Permanent, DontEnum
  60. - Typed properties isn't that useful [It's also Not-PHP]
  61. - Seperate meta-list of directive properties IS useful
  62. - Duck typing is useful for systems designed fully around properties pattern
  63. Trade-off:
  64. + Flexibility
  65. + Extensibility
  66. + Unit-testing/prototype-speed
  67. - Performance
  68. - Data integrity
  69. - Navagability/Query-ability
  70. - Reversability (hard to go back)
  71. HTML Purifier
  72. We are not happy with our current system of defining configuration directives,
  73. because it has become clear that things will get a lot nicer if we allow
  74. multiple namespaces, and there are some features that naturally lend themselves
  75. to inheritance, which we do not really support well.
  76. One of the considered implementation changes would be to go from a structure
  77. like:
  78. array(
  79. 'Namespace' => array(
  80. 'Directive' => 'val1',
  81. 'Directive2' => 'val2',
  82. )
  83. )
  84. to:
  85. array(
  86. 'Namespace.Directive' => 'val1',
  87. 'Namespace.Directive2' => 'val2',
  88. )
  89. The below implementation takes more memory, however, and it makes it a bit
  90. complicated to grab all values from a namespace.
  91. The alternate implementation choice is to allow nested plists. This keeps
  92. iteration easy, but is problematic for inheritance (it would be difficult
  93. to distinguish a plist from an array) and retrieval (when specifying multiple
  94. namespaces we would need some multiple de-referencing).
  95. ----
  96. We can bite the performance hit, and just do iteration with filter
  97. (the strncmp call should be relatively cheap). Then, users should be able
  98. to optimize doing something like:
  99. $config = HTMLPurifier_Config::createDefault();
  100. if (!file_exists('config.php')) {
  101. // set up $config
  102. $config->save('config.php');
  103. } else {
  104. $config->load('config.php');
  105. }
  106. Or maybe memcache, or something. This means that "// set up $config" must
  107. not have any dynamic parts, or the user has to invalidate the cache when
  108. they do update it. We have to think about this a little more carefully; the
  109. file call might be more expensive.
  110. ----
  111. This might get expensive, however, when we actually care about iterating
  112. over the configuration and want the actual values. So what about nesting the
  113. lists?
  114. "ns.sub.directive" => values['ns']['sub']['directive']
  115. We can distinguish between plists and arrays by using ArrayObjects for the
  116. plists, and regular arrays for the arrays? Alternatively, use ArrayObjects
  117. for the arrays, and regular arrays for the plists.
  118. ----
  119. Implementation demands, and what has caused them:
  120. 1. DefinitionCache, the HTML, CSS and URI namespaces have caches attached to them
  121. Results:
  122. - getBatchSerial()
  123. - getBatch() : in general, the ability to traverse just a namespace
  124. 2. AutoFormat/Filter, this is a plugin architecture, directives not hard-coded
  125. - getBatch()
  126. 3. Configuration form
  127. - Namespaces used to organize directives
  128. Other than that, we have a pure plist. PERHAPS we should maintain separate things
  129. for these different demands.
  130. Issue 2: Directives for configuring the plugins are regular plists, but
  131. when enabling them, while it's "plist-ish", what you're really doing is adding
  132. them to an array of "autoformatters"/"filters" to enable. We can setup
  133. magic BC as well as in the new interface, but there should also be an
  134. add('AutoFormat', 'AutoParagraph'); which does the right thing.
  135. One thing to consider is whether or not inheritance rules will apply to these.
  136. I'd say yes. That means that they're still plisty, in fact, the underlying
  137. implementation will probably be a plist. However, they will get their OWN
  138. plists, and will NOT support nesting.
  139. Issue 1: Our current implementation is generally not efficient; md5(serialize($foo))
  140. is pretty expensive. So, I don't think there will be any problems if it
  141. gets "less" efficient, as long as we give users a properly fast alternative;
  142. DefinitionRev gives us a way to do this, by simply telling the user they must
  143. update it whenever they update Configuration directives as well. (There are
  144. obvious BC concerns here).
  145. In such a case, we simply iterate over our plist (performing full retrievals
  146. for each value), grab the entries we care about, and then serialize and hash.
  147. It's going to be slow either way, due to the ability of plists to inherit.
  148. If we ksort(), we don't have to traverse the entire array, however, the
  149. cost of a ksort() call may not be worth it.
  150. At this point, last time, I started worrying about the performance implications
  151. of allowing inheritance, and wondering whether or not I wanted to squash
  152. the plist. At first blush, our code might be under the assumption that
  153. accessing properties is cheap; but actually we prefer to copy out the value
  154. into a member variable if it's going to be used many times. With this is mind
  155. I don't think CPU consumption from a few nested function calls is going to
  156. be a problem. We *are* going to enforce a function only interface.
  157. The next issue at hand is how we're going to manage the "special" plists,
  158. which should still be able to be inherited. Basically, it means that multiple
  159. plists would be attached to the configuration object, which is not the
  160. best for memory performance. The alternative is to keep them all in one
  161. big plist, and then eat the one-time cost of traversing the entire plist
  162. to grab the appropriate values.
  163. I think at this point we can write the generic interface, and then set up separate
  164. plists if that ends up being necessary for performance (it probably won't.) Now
  165. lets code our generic plist implementation.
  166. ----
  167. Iterating over the plist presents some problems. The way we've chosen to solve
  168. this is to squash all of the parents.
  169. ----
  170. But I don't need iteration.
  171. vim: et sw=4 sts=4