|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218 |
- THE UNIVERSAL DESIGN PATTERN: PROPERTIES
- Steve Yegge
-
- Implementation:
- get(name)
- put(name, value)
- has(name)
- remove(name)
- iteration, with filtering [this will be our namespaces]
- parent
-
- Representations:
- - Keys are strings
- - It's nice to not need to quote keys (if we formulate our own language,
- consider this)
- - Property not present representation (key missing)
- - Frequent removal/re-add may have null help. If null is valid, use
- another value. (PHP semantics are weird here)
-
- Data structures:
- - LinkedHashMap is wonderful (O(1) access and maintains order)
- - Using a special property that points to the parent is usual
- - Multiple inheritance possible, need rules for which to lookup first
- - Iterative inheritance is best
- - Consider performance!
-
- Deletion
- - Tricky problem with inheritance
- - Distinguish between "not found" and "look in my parent for the property"
- [Maybe HTML Purifier won't allow deletion]
-
- Read/write asymmetry (it's correct!)
-
- Read-only plists
- - Allow ability to freeze [this is what we have already]
- - Don't overuse it
-
- Performance:
- - Intern strings (PHP does this already)
- - Don't be case-insensitive
- - If all properties in a plist are known a-priori, you can use a "perfect"
- hash function. Often overkill.
- - Copy-on-read caching "plundering" reduces lookup, but uses memory and can
- grow stale. Use as last resort.
- - Refactoring to fields. Watch for API compatibility, system complexity,
- and lack of flexibility.
- - Refrigerator: external data-structure to hold plists
-
- Transient properties:
- [Don't need to worry about this]
- - Use a separate plist for transient properties
- - Non-numeric override; numeric should ADD
- - Deletion: removeTransientProperty() and transientlyRemoveProperty()
-
- Persistence:
- - XML/JSON are good
- - Text-based is good for readability, maintainability and bootstrapping
- - Compressed binary format for network transport [not necessary]
- - RDBMS or XML database
-
- Querying: [not relevant]
- - XML database is nice for XPath/XQuery
- - jQuery for JSON
- - Just load it all into a program
-
- Backfills/Data integrity:
- - Use usual methods
- - Lazy backfill is a nice hack
-
- Type systems:
- - Flags: ReadOnly, Permanent, DontEnum
- - Typed properties isn't that useful [It's also Not-PHP]
- - Seperate meta-list of directive properties IS useful
- - Duck typing is useful for systems designed fully around properties pattern
-
- Trade-off:
- + Flexibility
- + Extensibility
- + Unit-testing/prototype-speed
- - Performance
- - Data integrity
- - Navagability/Query-ability
- - Reversability (hard to go back)
-
- HTML Purifier
-
- We are not happy with our current system of defining configuration directives,
- because it has become clear that things will get a lot nicer if we allow
- multiple namespaces, and there are some features that naturally lend themselves
- to inheritance, which we do not really support well.
-
- One of the considered implementation changes would be to go from a structure
- like:
-
- array(
- 'Namespace' => array(
- 'Directive' => 'val1',
- 'Directive2' => 'val2',
- )
- )
-
- to:
-
- array(
- 'Namespace.Directive' => 'val1',
- 'Namespace.Directive2' => 'val2',
- )
-
- The below implementation takes more memory, however, and it makes it a bit
- complicated to grab all values from a namespace.
-
- The alternate implementation choice is to allow nested plists. This keeps
- iteration easy, but is problematic for inheritance (it would be difficult
- to distinguish a plist from an array) and retrieval (when specifying multiple
- namespaces we would need some multiple de-referencing).
-
- ----
-
- We can bite the performance hit, and just do iteration with filter
- (the strncmp call should be relatively cheap). Then, users should be able
- to optimize doing something like:
-
- $config = HTMLPurifier_Config::createDefault();
- if (!file_exists('config.php')) {
- // set up $config
- $config->save('config.php');
- } else {
- $config->load('config.php');
- }
-
- Or maybe memcache, or something. This means that "// set up $config" must
- not have any dynamic parts, or the user has to invalidate the cache when
- they do update it. We have to think about this a little more carefully; the
- file call might be more expensive.
-
- ----
-
- This might get expensive, however, when we actually care about iterating
- over the configuration and want the actual values. So what about nesting the
- lists?
-
- "ns.sub.directive" => values['ns']['sub']['directive']
-
- We can distinguish between plists and arrays by using ArrayObjects for the
- plists, and regular arrays for the arrays? Alternatively, use ArrayObjects
- for the arrays, and regular arrays for the plists.
-
- ----
-
- Implementation demands, and what has caused them:
-
- 1. DefinitionCache, the HTML, CSS and URI namespaces have caches attached to them
- Results:
- - getBatchSerial()
- - getBatch() : in general, the ability to traverse just a namespace
-
- 2. AutoFormat/Filter, this is a plugin architecture, directives not hard-coded
- - getBatch()
-
- 3. Configuration form
- - Namespaces used to organize directives
-
- Other than that, we have a pure plist. PERHAPS we should maintain separate things
- for these different demands.
-
- Issue 2: Directives for configuring the plugins are regular plists, but
- when enabling them, while it's "plist-ish", what you're really doing is adding
- them to an array of "autoformatters"/"filters" to enable. We can setup
- magic BC as well as in the new interface, but there should also be an
- add('AutoFormat', 'AutoParagraph'); which does the right thing.
-
- One thing to consider is whether or not inheritance rules will apply to these.
- I'd say yes. That means that they're still plisty, in fact, the underlying
- implementation will probably be a plist. However, they will get their OWN
- plists, and will NOT support nesting.
-
- Issue 1: Our current implementation is generally not efficient; md5(serialize($foo))
- is pretty expensive. So, I don't think there will be any problems if it
- gets "less" efficient, as long as we give users a properly fast alternative;
- DefinitionRev gives us a way to do this, by simply telling the user they must
- update it whenever they update Configuration directives as well. (There are
- obvious BC concerns here).
-
- In such a case, we simply iterate over our plist (performing full retrievals
- for each value), grab the entries we care about, and then serialize and hash.
- It's going to be slow either way, due to the ability of plists to inherit.
- If we ksort(), we don't have to traverse the entire array, however, the
- cost of a ksort() call may not be worth it.
-
- At this point, last time, I started worrying about the performance implications
- of allowing inheritance, and wondering whether or not I wanted to squash
- the plist. At first blush, our code might be under the assumption that
- accessing properties is cheap; but actually we prefer to copy out the value
- into a member variable if it's going to be used many times. With this is mind
- I don't think CPU consumption from a few nested function calls is going to
- be a problem. We *are* going to enforce a function only interface.
-
- The next issue at hand is how we're going to manage the "special" plists,
- which should still be able to be inherited. Basically, it means that multiple
- plists would be attached to the configuration object, which is not the
- best for memory performance. The alternative is to keep them all in one
- big plist, and then eat the one-time cost of traversing the entire plist
- to grab the appropriate values.
-
- I think at this point we can write the generic interface, and then set up separate
- plists if that ends up being necessary for performance (it probably won't.) Now
- lets code our generic plist implementation.
-
- ----
-
- Iterating over the plist presents some problems. The way we've chosen to solve
- this is to squash all of the parents.
-
- ----
-
- But I don't need iteration.
-
- vim: et sw=4 sts=4
|