` and ``. Block elements are normally spreads over several lines and are separated by blank lines. The most basic block element is a paragraph (`

`). Inline elements are elements that are added inside of block elements i.e. inside of text. This markdown parser allows you to extend the markdown language by changing existing elements behavior and also adding new block and inline elements. You do this by extending from the parser class and adding/overriding class methods and properties. For the different element types there are different ways to extend them as you will see in the following sections. ### Adding block elements The markdown is parsed line by line to identify each non-empty line as one of the block element types. To identify a line as the beginning of a block element it calls all protected class methods who's name begins with `identify`. An identify function returns true if it has identified the block element it is responsible for or false if not. In the following example we will implement support for [fenced code blocks][] which are part of the github flavored markdown. [fenced code blocks]: https://help.github.com/articles/github-flavored-markdown#fenced-code-blocks "Fenced code block feature of github flavored markdown" ```php [], ]; $line = rtrim($lines[$current]); // detect language and fence length (can be more than 3 backticks) $fence = substr($line, 0, $pos = strrpos($line, '`') + 1); $language = substr($line, $pos); if (!empty($language)) { $block['language'] = $language; } // consume all lines until ``` for($i = $current + 1, $count = count($lines); $i < $count; $i++) { if (rtrim($line = $lines[$i]) !== $fence) { $block['content'][] = $line; } else { // stop consuming when code block is over break; } } return [$block, $i]; } ``` 2. "rendering" the element. After all blocks have been consumed, they are being rendered using the `render{elementName}()`-method where `elementName` refers to the name of the element in the abstract syntax tree: ```php protected function renderFencedCode($block) { $class = isset($block['language']) ? ' class="language-' . $block['language'] . '"' : ''; return "

" . htmlspecialchars(implode("\n", $block['content']) . "\n", ENT_NOQUOTES, 'UTF-8') . '

'; } ``` You may also add code highlighting here. In general it would also be possible to render ouput in a different language than HTML for example LaTeX. ### Adding inline elements Adding inline elements is different from block elements as they are parsed using markers in the text. An inline element is identified by a marker that marks the beginning of an inline element (e.g. `[` will mark a possible beginning of a link or `` ` `` will mark inline code). Parsing methods for inline elements are also protected and identified by the prefix `parse`. Additionally a `@marker` annotation in PHPDoc is needed to register the parse function for one or multiple markers. The method will then be called when a marker is found in the text. As an argument it takes the text starting at the position of the marker. The parser method will return an array containing the element of the abstract sytnax tree and an offset of text it has parsed from the input markdown. All text up to this offset will be removed from the markdown before the next marker will be searched. As an example, we will add support for the [strikethrough][] feature of github flavored markdown: [strikethrough]: https://help.github.com/articles/github-flavored-markdown#strikethrough "Strikethrough feature of github flavored markdown" ```php parseInline($matches[1])], // return the offset of the parsed text strlen($matches[0]) ]; } // in case we did not find a closing ~~ we just return the marker and skip 2 characters return [['text', '~~'], 2]; } // rendering is the same as for block elements, we turn the abstract syntax array into a string. protected function renderStrike($element) { return '~~' . $this->renderAbsy($element[1]) . '~~'; } } ``` ### Composing your own Markdown flavor This markdown library is composed of traits so it is very easy to create your own markdown flavor by adding and/or removing the single feature traits. Designing your Markdown flavor consists of four steps: 1. Select a base class 2. Select language feature traits 3. Define escapeable characters 4. Optionally add custom rendering behavior #### Select a base class If you want to extend from a flavor and only add features you can use one of the existing classes (`Markdown`, `GithubMarkdown` or `MarkdownExtra`) as your flavors base class. If you want to define a subset of the markdown language, i.e. remove some of the features, you have to extend your class from `Parser`. #### Select language feature traits The following shows the trait selection for traditional Markdown. ```php class MyMarkdown extends Parser { // include block element parsing using traits use block\CodeTrait; use block\HeadlineTrait; use block\HtmlTrait { parseInlineHtml as private; } use block\ListTrait { // Check Ul List before headline identifyUl as protected identifyBUl; consumeUl as protected consumeBUl; } use block\QuoteTrait; use block\RuleTrait { // Check Hr before checking lists identifyHr as protected identifyAHr; consumeHr as protected consumeAHr; } // include inline element parsing using traits use inline\CodeTrait; use inline\EmphStrongTrait; use inline\LinkTrait; /** * @var boolean whether to format markup according to HTML5 spec. * Defaults to `false` which means that markup is formatted as HTML4. */ public $html5 = false; protected function prepare() { // reset references $this->references = []; } // ... } ``` In general, just adding the trait with `use` is enough, however in some cases some fine tuning is desired to get most expected parsing results. Elements are detected in alphabetical order of their identification function. This means that if a line starting with `-` could be a list or a horizontal rule, the preference has to be set by renaming the identification function. This is what is done with renaming `identifyHr` to `identifyAHr` and `identifyBUl` to `identifyBUl`. The consume function always has to have the same name as the identification function so this has to be renamed too. There is also a conflict for parsing of the `<` character. This could either be a link/email enclosed in `<` and `>` or an inline HTML tag. In order to resolve this conflict when adding the `LinkTrait`, we need to hide the `parseInlineHtml` method of the `HtmlTrait`. If you use any trait that uses the `$html5` property to adjust its output you also need to define this property. If you use the link trait it may be useful to implement `prepare()` as shown above to reset references before parsing to ensure you get a reusable object. #### Define escapeable characters Depenedend on the language features you have chosen there is a different set of characters that can be escaped using `\`. The following is the set of escapeable characters for traditional markdown, you can copy it to your class as is. ```php /** * @var array these are "escapeable" characters. When using one of these prefixed with a * backslash, the character will be outputted without the backslash and is not interpreted * as markdown. */ protected $escapeCharacters = [ '\\', // backslash '`', // backtick '*', // asterisk '_', // underscore '{', '}', // curly braces '[', ']', // square brackets '(', ')', // parentheses '#', // hash mark '+', // plus sign '-', // minus sign (hyphen) '.', // dot '!', // exclamation mark '<', '>', ]; ``` #### Add custom rendering behavior Optionally you may also want to adjust rendering behavior by overriding some methods. You may refer to the `consumeParagraph()` method of the `Markdown` and `GithubMarkdown` classes for some inspiration which define different rules for which elements are allowed to interrupt a paragraph. Acknowledgements ---------------- I'd like to thank [@erusev][] for creating [Parsedown][] which heavily influenced this work and provided the idea of the line based parsing approach. [@erusev]: https://github.com/erusev "Emanuil Rusev" [Parsedown]: http://parsedown.org/ "The Parsedown PHP Markdown parser" FAQ --- ### Why another markdown parser? While reviewing PHP markdown parsers for choosing one to use bundled with the [Yii framework 2.0][] I found that most of the implementations use regex to replace patterns instead of doing real parsing. This way extending them with new language elements is quite hard as you have to come up with a complex regex, that matches your addition but does not mess with other elements. Such additions are very common as you see on github which supports referencing issues, users and commits in the comments. A [real parser][] should use context aware methods that walk trough the text and parse the tokens as they find them. The only implentation that I have found that uses this approach is [Parsedown][] which also shows that this implementation is [much faster][benchmark] than the regex way. Parsedown however is an implementation that focuses on speed and implements its own flavor (mainly github flavored markdown) in one class and at the time of this writing was not easily extensible. Given the situation above I decided to start my own implementation using the parsing approach from Parsedown and making it extensible creating a class for each markdown flavor that extend each other in the way that also the markdown languages extend each other. This allows you to choose between markdown language flavors and also provides a way to compose your own flavor picking the best things from all. I chose this approach as it is easier to implement and also more intuitive approach compared to using callbacks to inject functionallity into the parser. [real parser]: http://en.wikipedia.org/wiki/Parsing#Types_of_parser [Parsedown]: http://parsedown.org/ "The Parsedown PHP Markdown parser" ### Where do I report bugs or rendering issues? Just [open an issue][] on github, post your markdown code and describe the problem. You may also attach screenshots of the rendered HTML result to describe your problem. [open an issue]: https://github.com/cebe/markdown/issues/new ### How can I contribute to this library? Check the [CONTRIBUTING.md](CONTRIBUTING.md) file for more info. ### Am I free to use this? This library is open source and licensed under the [MIT License][]. This means that you can do whatever you want with it as long as you mention my name and include the [license file][license]. Check the [license][] for details. [MIT License]: http://opensource.org/licenses/MIT [license]: https://github.com/cebe/markdown/blob/master/LICENSE Contact ------- Feel free to contact me using [email](mailto:mail@cebe.cc) or [twitter](https://twitter.com/cebe_cc).