You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 17KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400
  1. A super fast, highly extensible markdown parser for PHP
  2. =======================================================
  3. [![Latest Stable Version](https://poser.pugx.org/cebe/markdown/v/stable.png)](https://packagist.org/packages/cebe/markdown)
  4. [![Total Downloads](https://poser.pugx.org/cebe/markdown/downloads.png)](https://packagist.org/packages/cebe/markdown)
  5. [![Build Status](https://secure.travis-ci.org/cebe/markdown.png)](http://travis-ci.org/cebe/markdown)
  6. [![Tested against HHVM](http://hhvm.h4cc.de/badge/cebe/markdown.png)](http://hhvm.h4cc.de/package/cebe/markdown)
  7. [![Code Coverage](https://scrutinizer-ci.com/g/cebe/markdown/badges/coverage.png?s=db6af342d55bea649307ef311fbd536abb9bab76)](https://scrutinizer-ci.com/g/cebe/markdown/)
  8. [![Scrutinizer Quality Score](https://scrutinizer-ci.com/g/cebe/markdown/badges/quality-score.png?s=17448ca4d140429fd687c58ff747baeb6568d528)](https://scrutinizer-ci.com/g/cebe/markdown/)
  9. What is this? <a name="what"></a>
  10. -------------
  11. A set of [PHP][] classes, each representing a [Markdown][] flavor, and a command line tool
  12. for converting markdown files to HTML files.
  13. The implementation focus is to be **fast** (see [benchmark][]) and **extensible**.
  14. Parsing Markdown to HTML is as simple as calling a single method (see [Usage](#usage)) providing a solid implementation
  15. that gives most expected results even in non-trivial edge cases.
  16. Extending the Markdown language with new elements is as simple as adding a new method to the class that converts the
  17. markdown text to the expected output in HTML. This is possible without dealing with complex and error prone regular expressions.
  18. It is also possible to hook into the markdown structure and add elements or read meta information using the internal representation
  19. of the Markdown text as an abstract syntax tree (see [Extending the language](#extend)).
  20. Currently the following markdown flavors are supported:
  21. - **Traditional Markdown** according to <http://daringfireball.net/projects/markdown/syntax> ([try it!](http://markdown.cebe.cc/try?flavor=default)).
  22. - **Github flavored Markdown** according to <https://help.github.com/articles/github-flavored-markdown> ([try it!](http://markdown.cebe.cc/try?flavor=gfm)).
  23. - **Markdown Extra** according to <http://michelf.ca/projects/php-markdown/extra/> (currently not fully supported WIP see [#25][], [try it!](http://markdown.cebe.cc/try?flavor=extra))
  24. - Any mixed Markdown flavor you like because of its highly extensible structure (See documentation below).
  25. [#25]: https://github.com/cebe/markdown/issues/25 "issue #25"
  26. Future plans are to support:
  27. - Smarty Pants <http://daringfireball.net/projects/smartypants/>
  28. - ... (Feel free to [suggest](https://github.com/cebe/markdown/issues/new) further additions!)
  29. ### Who is using it?
  30. - It powers the [API-docs and the definitive guide](http://www.yiiframework.com/doc-2.0/) for the [Yii Framework][] [2.0](https://github.com/yiisoft/yii2).
  31. [Yii Framework]: http://www.yiiframework.com/ "The Yii PHP Framework"
  32. Installation <a name="installation"></a>
  33. ------------
  34. [PHP 5.4 or higher](http://www.php.net/downloads.php) is required to use it.
  35. It will also run on facebook's [hhvm](http://hhvm.com/).
  36. Installation is recommended to be done via [composer][] by running:
  37. composer require cebe/markdown "~1.0.1"
  38. Alternatively you can add the following to the `require` section in your `composer.json` manually:
  39. ```json
  40. "cebe/markdown": "~1.0.1"
  41. ```
  42. Run `composer update` afterwards.
  43. Usage <a name="usage"></a>
  44. -----
  45. ### In your PHP project
  46. To parse your markdown you need only two lines of code. The first one is to choose the markdown flavor as
  47. one of the following:
  48. - Traditional Markdown: `$parser = new \cebe\markdown\Markdown();`
  49. - Github Flavored Markdown: `$parser = new \cebe\markdown\GithubMarkdown();`
  50. - Markdown Extra: `$parser = new \cebe\markdown\MarkdownExtra();`
  51. The next step is to call the `parse()`-method for parsing the text using the full markdown language
  52. or calling the `parseParagraph()`-method to parse only inline elements.
  53. Here are some examples:
  54. ```php
  55. // traditional markdown and parse full text
  56. $parser = new \cebe\markdown\Markdown();
  57. $parser->parse($markdown);
  58. // use github markdown
  59. $parser = new \cebe\markdown\GithubMarkdown();
  60. $parser->parse($markdown);
  61. // use markdown extra
  62. $parser = new \cebe\markdown\MarkdownExtra();
  63. $parser->parse($markdown);
  64. // parse only inline elements (useful for one-line descriptions)
  65. $parser = new \cebe\markdown\GithubMarkdown();
  66. $parser->parseParagraph($markdown);
  67. ```
  68. You may optionally set one of the following options on the parser object:
  69. For all Markdown Flavors:
  70. - `$parser->html5 = true` to enable HTML5 output instead of HTML4.
  71. - `$parser->keepListStartNumber = true` to enable keeping the numbers of ordered lists as specified in the markdown.
  72. The default behavior is to always start from 1 and increment by one regardless of the number in markdown.
  73. For GithubMarkdown:
  74. - `$parser->enableNewlines = true` to convert all newlines to `<br/>`-tags. By default only newlines with two preceding spaces are converted to `<br/>`-tags.
  75. It is recommended to use UTF-8 encoding for the input strings. Other encodings are currently not tested.
  76. ### The command line script
  77. You can use it to render this readme:
  78. bin/markdown README.md > README.html
  79. Using github flavored markdown:
  80. bin/markdown --flavor=gfm README.md > README.html
  81. or convert the original markdown description to html using the unix pipe:
  82. curl http://daringfireball.net/projects/markdown/syntax.text | bin/markdown > md.html
  83. Here is the full Help output you will see when running `bin/markdown --help`:
  84. PHP Markdown to HTML converter
  85. ------------------------------
  86. by Carsten Brandt <mail@cebe.cc>
  87. Usage:
  88. bin/markdown [--flavor=<flavor>] [--full] [file.md]
  89. --flavor specifies the markdown flavor to use. If omitted the original markdown by John Gruber [1] will be used.
  90. Available flavors:
  91. gfm - Github flavored markdown [2]
  92. extra - Markdown Extra [3]
  93. --full ouput a full HTML page with head and body. If not given, only the parsed markdown will be output.
  94. --help shows this usage information.
  95. If no file is specified input will be read from STDIN.
  96. Examples:
  97. Render a file with original markdown:
  98. bin/markdown README.md > README.html
  99. Render a file using gihtub flavored markdown:
  100. bin/markdown --flavor=gfm README.md > README.html
  101. Convert the original markdown description to html using STDIN:
  102. curl http://daringfireball.net/projects/markdown/syntax.text | bin/markdown > md.html
  103. [1] http://daringfireball.net/projects/markdown/syntax
  104. [2] https://help.github.com/articles/github-flavored-markdown
  105. [3] http://michelf.ca/projects/php-markdown/extra/
  106. Extensions
  107. ----------
  108. Here are some extensions to this library:
  109. - [Bogardo/markdown-codepen](https://github.com/Bogardo/markdown-codepen) - shortcode to embed codepens from http://codepen.io/ in markdown.
  110. - [kartik-v/yii2-markdown](https://github.com/kartik-v/yii2-markdown) - Advanced Markdown editing and conversion utilities for Yii Framework 2.0.
  111. - [cebe/markdown-latex](https://github.com/cebe/markdown-latex) - Convert Markdown to LaTeX and PDF
  112. - ... [add yours!](https://github.com/cebe/markdown/edit/master/README.md#L98)
  113. Extending the language <a name="extend"></a>
  114. ----------------------
  115. Markdown consists of two types of language elements, I'll call them block and inline elements simlar to what you have in
  116. HTML with `<div>` and `<span>`. Block elements are normally spreads over several lines and are separated by blank lines.
  117. The most basic block element is a paragraph (`<p>`).
  118. Inline elements are elements that are added inside of block elements i.e. inside of text.
  119. This markdown parser allows you to extend the markdown language by changing existing elements behavior and also adding
  120. new block and inline elements. You do this by extending from the parser class and adding/overriding class methods and
  121. properties. For the different element types there are different ways to extend them as you will see in the following sections.
  122. ### Adding block elements
  123. The markdown is parsed line by line to identify each non-empty line as one of the block element types.
  124. To identify a line as the beginning of a block element it calls all protected class methods who's name begins with `identify`.
  125. An identify function returns true if it has identified the block element it is responsible for or false if not.
  126. In the following example we will implement support for [fenced code blocks][] which are part of the github flavored markdown.
  127. [fenced code blocks]: https://help.github.com/articles/github-flavored-markdown#fenced-code-blocks
  128. "Fenced code block feature of github flavored markdown"
  129. ```php
  130. <?php
  131. class MyMarkdown extends \cebe\markdown\Markdown
  132. {
  133. protected function identifyLine($line, $lines, $current)
  134. {
  135. // if a line starts with at least 3 backticks it is identified as a fenced code block
  136. if (strncmp($line, '```', 3) === 0) {
  137. return 'fencedCode';
  138. }
  139. return parent::identifyLine($lines, $current);
  140. }
  141. // ...
  142. }
  143. ```
  144. In the above, `$line` is a string containing the content of the current line and is equal to `$lines[$current]`.
  145. You may use `$lines` and `$current` to check other lines than the current line. In most cases you can ignore these parameters.
  146. Parsing of a block element is done in two steps:
  147. 1. "consuming" all the lines belonging to it. In most cases this is iterating over the lines starting from the identified
  148. line until a blank line occurs. This step is implemented by a method named `consume{blockName}()` where `{blockName}`
  149. is the same name as used for the identify function above. The consume method also takes the lines array
  150. and the number of the current line. It will return two arguments: an array representing the block element in the abstract syntax tree
  151. of the markdown document and the line number to parse next. In the abstract syntax array the first element refers to the name of
  152. the element, all other array elements can be freely defined by yourself.
  153. In our example we will implement it like this:
  154. ```php
  155. protected function consumeFencedCode($lines, $current)
  156. {
  157. // create block array
  158. $block = [
  159. 'fencedCode',
  160. 'content' => [],
  161. ];
  162. $line = rtrim($lines[$current]);
  163. // detect language and fence length (can be more than 3 backticks)
  164. $fence = substr($line, 0, $pos = strrpos($line, '`') + 1);
  165. $language = substr($line, $pos);
  166. if (!empty($language)) {
  167. $block['language'] = $language;
  168. }
  169. // consume all lines until ```
  170. for($i = $current + 1, $count = count($lines); $i < $count; $i++) {
  171. if (rtrim($line = $lines[$i]) !== $fence) {
  172. $block['content'][] = $line;
  173. } else {
  174. // stop consuming when code block is over
  175. break;
  176. }
  177. }
  178. return [$block, $i];
  179. }
  180. ```
  181. 2. "rendering" the element. After all blocks have been consumed, they are being rendered using the
  182. `render{elementName}()`-method where `elementName` refers to the name of the element in the abstract syntax tree:
  183. ```php
  184. protected function renderFencedCode($block)
  185. {
  186. $class = isset($block['language']) ? ' class="language-' . $block['language'] . '"' : '';
  187. return "<pre><code$class>" . htmlspecialchars(implode("\n", $block['content']) . "\n", ENT_NOQUOTES, 'UTF-8') . '</code></pre>';
  188. }
  189. ```
  190. You may also add code highlighting here. In general it would also be possible to render ouput in a different language than
  191. HTML for example LaTeX.
  192. ### Adding inline elements
  193. Adding inline elements is different from block elements as they are parsed using markers in the text.
  194. An inline element is identified by a marker that marks the beginning of an inline element (e.g. `[` will mark a possible
  195. beginning of a link or `` ` `` will mark inline code).
  196. Parsing methods for inline elements are also protected and identified by the prefix `parse`. Additionally a `@marker` annotation
  197. in PHPDoc is needed to register the parse function for one or multiple markers.
  198. The method will then be called when a marker is found in the text. As an argument it takes the text starting at the position of the marker.
  199. The parser method will return an array containing the element of the abstract sytnax tree and an offset of text it has
  200. parsed from the input markdown. All text up to this offset will be removed from the markdown before the next marker will be searched.
  201. As an example, we will add support for the [strikethrough][] feature of github flavored markdown:
  202. [strikethrough]: https://help.github.com/articles/github-flavored-markdown#strikethrough "Strikethrough feature of github flavored markdown"
  203. ```php
  204. <?php
  205. class MyMarkdown extends \cebe\markdown\Markdown
  206. {
  207. /**
  208. * @marker ~~
  209. */
  210. protected function parseStrike($markdown)
  211. {
  212. // check whether the marker really represents a strikethrough (i.e. there is a closing ~~)
  213. if (preg_match('/^~~(.+?)~~/', $markdown, $matches)) {
  214. return [
  215. // return the parsed tag as an element of the abstract syntax tree and call `parseInline()` to allow
  216. // other inline markdown elements inside this tag
  217. ['strike', $this->parseInline($matches[1])],
  218. // return the offset of the parsed text
  219. strlen($matches[0])
  220. ];
  221. }
  222. // in case we did not find a closing ~~ we just return the marker and skip 2 characters
  223. return [['text', '~~'], 2];
  224. }
  225. // rendering is the same as for block elements, we turn the abstract syntax array into a string.
  226. protected function renderStrike($element)
  227. {
  228. return '<del>' . $this->renderAbsy($element[1]) . '</del>';
  229. }
  230. }
  231. ```
  232. ### Composing your own Markdown flavor
  233. TBD
  234. Acknowledgements <a name="ack"></a>
  235. ----------------
  236. I'd like to thank [@erusev][] for creating [Parsedown][] which heavily influenced this work and provided
  237. the idea of the line based parsing approach.
  238. [@erusev]: https://github.com/erusev "Emanuil Rusev"
  239. FAQ <a name="faq"></a>
  240. ---
  241. ### Why another markdown parser?
  242. While reviewing PHP markdown parsers for choosing one to use bundled with the [Yii framework 2.0][]
  243. I found that most of the implementations use regex to replace patterns instead
  244. of doing real parsing. This way extending them with new language elements is quite hard
  245. as you have to come up with a complex regex, that matches your addition but does not mess
  246. with other elements. Such additions are very common as you see on github which supports referencing
  247. issues, users and commits in the comments.
  248. A [real parser][] should use context aware methods that walk trough the text and
  249. parse the tokens as they find them. The only implentation that I have found that uses
  250. this approach is [Parsedown][] which also shows that this implementation is [much faster][benchmark]
  251. than the regex way. Parsedown however is an implementation that focuses on speed and implements
  252. its own flavor (mainly github flavored markdown) in one class and at the time of this writing was
  253. not easily extensible.
  254. Given the situation above I decided to start my own implementation using the parsing approach
  255. from Parsedown and making it extensible creating a class for each markdown flavor that extend each
  256. other in the way that also the markdown languages extend each other.
  257. This allows you to choose between markdown language flavors and also provides a way to compose your
  258. own flavor picking the best things from all.
  259. I chose this approach as it is easier to implement and also more intuitive approach compared
  260. to using callbacks to inject functionallity into the parser.
  261. ### Where do I report bugs or rendering issues?
  262. Just [open an issue][] on github, post your markdown code and describe the problem. You may also attach screenshots of the rendered HTML result to describe your problem.
  263. ### How can I contribute to this library?
  264. Check the [CONTRIBUTING.md](CONTRIBUTING.md) file for more info.
  265. ### Am I free to use this?
  266. This library is open source and licensed under the [MIT License][]. This means that you can do whatever you want
  267. with it as long as you mention my name and include the [license file][license]. Check the [license][] for details.
  268. [MIT License]: http://opensource.org/licenses/MIT
  269. Contact
  270. -------
  271. Feel free to contact me using [email](mailto:mail@cebe.cc) or [twitter](https://twitter.com/cebe_cc).
  272. [PHP]: http://php.net/ "PHP is a popular general-purpose scripting language that is especially suited to web development."
  273. [Markdown]: http://en.wikipedia.org/wiki/Markdown "Markdown on Wikipedia"
  274. [composer]: https://getcomposer.org/ "The PHP package manager"
  275. [Parsedown]: http://parsedown.org/ "The Parsedown PHP Markdown parser"
  276. [benchmark]: https://github.com/kzykhys/Markbench#readme "kzykhys/Markbench on github"
  277. [Yii framework 2.0]: https://github.com/yiisoft/yii2
  278. [real parser]: http://en.wikipedia.org/wiki/Parsing#Types_of_parser
  279. [open an issue]: https://github.com/cebe/markdown/issues/new
  280. [license]: https://github.com/cebe/markdown/blob/master/LICENSE