Pandoc User’s GuideJohnMacFarlaneJanuary 27, 2012Synopsis
pandoc [options]
[input-file]…
Description
Pandoc is a Haskell
library for converting from one markup format to another, and a
command-line tool that uses this library. It can read
markdown
and (subsets of)
Textile,
reStructuredText,
HTML,
LaTeX, and
DocBook XML; and it can
write plain text,
markdown,
reStructuredText,
XHTML,
HTML 5,
LaTeX (including
beamer
slide shows),
ConTeXt,
RTF,
DocBook XML,
OpenDocument XML,
ODT,
Word
docx, GNU
Texinfo,
MediaWiki
markup, EPUB,
Textile,
groff
man pages, Emacs
Org-Mode,
AsciiDoc,
and Slidy,
DZSlides, or
S5 HTML
slide shows. It can also produce
PDF output on systems
where LaTeX is installed.
Pandoc’s enhanced version of markdown includes syntax for footnotes,
tables, flexible ordered lists, definition lists, delimited code
blocks, superscript, subscript, strikeout, title blocks, automatic
tables of contents, embedded LaTeX math, citations, and markdown
inside HTML block elements. (These enhancements, described below
under Pandoc’s markdown, can
be disabled using the --strict option.)
In contrast to most existing tools for converting markdown to HTML,
which use regex substitutions, Pandoc has a modular design: it
consists of a set of readers, which parse text in a given format and
produce a native representation of the document, and a set of
writers, which convert this native representation into a target
format. Thus, adding an input or output format requires only adding
a reader or writer.
Using pandoc
If no input-file is specified, input is read
from stdin. Otherwise, the
input-files are concatenated (with a blank
line between each) and used as input. Output goes to
stdout by default (though output to
stdout is disabled for the
odt, docx, and
epub output formats). For output to a file, use
the -o option:
pandoc -o output.html input.txt
Instead of a file, an absolute URI may be given. In this case
pandoc will fetch the content using HTTP:
pandoc -f html -t markdown http://www.fsf.org
If multiple input files are given, pandoc will
concatenate them all (with blank lines between them) before
parsing.
The format of the input and output can be specified explicitly
using command-line options. The input format can be specified
using the -r/--read or
-f/--from options, the output format using the
-w/--write or -t/--to
options. Thus, to convert hello.txt from
markdown to LaTeX, you could type:
pandoc -f markdown -t latex hello.txt
To convert hello.html from html to markdown:
pandoc -f html -t markdown hello.html
Supported output formats are listed below under the
-t/--to option. Supported input formats are
listed below under the -f/--from option. Note
that the rst, textile,
latex, and html readers are
not complete; there are some constructs that they do not parse.
If the input or output format is not specified explicitly,
pandoc will attempt to guess it from the
extensions of the input and output filenames. Thus, for example,
pandoc -o hello.tex hello.txt
will convert hello.txt from markdown to LaTeX.
If no output file is specified (so that output goes to
stdout), or if the output file’s extension is
unknown, the output format will default to HTML. If no input file
is specified (so that input comes from
stdin), or if the input files’ extensions are
unknown, the input format will be assumed to be markdown unless
explicitly specified.
Pandoc uses the UTF-8 character encoding for both input and
output. If your local character encoding is not UTF-8, you should
pipe input and output through iconv:
iconv -t utf-8 input.txt | pandoc | iconv -f utf-8
Creating a PDF
Earlier versions of pandoc came with a program,
markdown2pdf, that used pandoc and pdflatex to
produce a PDF. This is no longer needed, since
pandoc can now produce pdf
output itself. To produce a PDF, simply specify an output file
with a .pdf extension. Pandoc will create a
latex file and use pdflatex (or another engine, see
--latex-engine) to convert it to PDF:
pandoc test.txt -o test.pdf
Production of a PDF requires that a LaTeX engine be installed (see
--latex-engine, below), and assumes that the
following LaTeX packages are available:
amssymb, amsmath,
ifxetex, ifluatex,
listings (if the --listings
option is used), fancyvrb,
enumerate, ctable,
url, graphicx,
hyperref, ulem,
babel (if the lang variable
is set), fontspec (if
xelatex or lualatex is used
as the LaTeX engine), xltxtra and
xunicode (if xelatex is
used).
hsmarkdown
A user who wants a drop-in replacement for
Markdown.pl may create a symbolic link to the
pandoc executable called
hsmarkdown. When invoked under the name
hsmarkdown, pandoc will
behave as if the --strict flag had been
selected, and no command-line options will be recognized. However,
this approach does not work under Cygwin, due to problems with its
simulation of symbolic links.
OptionsGeneral options-fFORMAT,
-rFORMAT,
--from=FORMAT,
--read=FORMAT
Specify input format. FORMAT can be
native (native Haskell),
json (JSON version of native AST),
markdown (markdown),
textile (Textile), rst
(reStructuredText), html (HTML),
docbook (DocBook XML), or
latex (LaTeX). If +lhs
is appended to markdown,
rst, latex, the input
will be treated as literate Haskell source: see
Literate Haskell
support, below.
-tFORMAT,
-wFORMAT,
--to=FORMAT,
--write=FORMAT
Specify output format. FORMAT can be
native (native Haskell),
json (JSON version of native AST),
plain (plain text),
markdown (markdown),
rst (reStructuredText),
html (XHTML 1), html5
(HTML 5), latex (LaTeX),
beamer (LaTeX beamer slide show),
context (ConTeXt), man
(groff man), mediawiki (MediaWiki
markup), textile (Textile),
org (Emacs Org-Mode),
texinfo (GNU Texinfo),
docbook (DocBook XML),
opendocument (OpenDocument XML),
odt (OpenOffice text document),
docx (Word docx), epub
(EPUB book), asciidoc (AsciiDoc),
slidy (Slidy HTML and javascript slide
show), dzslides (HTML5 + javascript slide
show), s5 (S5 HTML and javascript slide
show), or rtf (rich text format). Note
that odt and epub
output will not be directed to stdout;
an output filename must be specified using the
-o/--output option. If
+lhs is appended to
markdown, rst,
latex, beamer,
html, or html5, the
output will be rendered as literate Haskell source: see
Literate Haskell
support, below.
-oFILE,
--output=FILE
Write output to FILE instead of
stdout. If FILE is
-, output will go to
stdout. (Exception: if the output
format is odt, docx,
or epub, output to stdout is disabled.)
--data-dir=DIRECTORY
Specify the user data directory to search for pandoc data
files. If this option is not specified, the default user
data directory will be used:
$HOME/.pandoc
in unix and
C:\Documents And Settings\USERNAME\Application Data\pandoc
in Windows. A reference.odt,
reference.docx,
default.csl, epub.css,
templates, slidy, or
s5 directory placed in this directory
will override pandoc’s normal defaults.
-v, --version
Print version.
-h, --help
Show usage message.
Reader options--strict
Use strict markdown syntax, with no pandoc extensions or
variants. When the input format is HTML, this means that
constructs that have no equivalents in standard markdown
(e.g. definition lists or strikeout text) will be parsed as
raw HTML.
-R, --parse-raw
Parse untranslatable HTML codes and LaTeX environments as
raw HTML or LaTeX, instead of ignoring them. Affects only
HTML and LaTeX input. Raw HTML can be printed in markdown,
reStructuredText, HTML, Slidy, DZSlides, and S5 output; raw
LaTeX can be printed in markdown, reStructuredText, LaTeX,
and ConTeXt output. The default is for the readers to omit
untranslatable HTML codes and LaTeX environments. (The LaTeX
reader does pass through untranslatable LaTeX
commands, even if -R
is not specified.)
-S, --smart
Produce typographically correct output, converting straight
quotes to curly quotes, --- to em-dashes,
-- to en-dashes, and
... to ellipses. Nonbreaking spaces are
inserted after certain abbreviations, such as
Mr. (Note: This option is significant only
when the input format is markdown or
textile. It is selected automatically
when the input format is textile or the
output format is latex or
context, unless
--no-tex-ligatures is used.)
--old-dashes
Selects the pandoc <= 1.8.2.1 behavior for parsing smart
dashes: - before a numeral is an en-dash,
and -- is an em-dash. This option is
selected automatically for textile input.
--base-header-level=NUMBER
Specify the base level for headers (defaults to 1).
--indented-code-classes=CLASSES
Specify classes to use for indented code blocks–for example,
perl,numberLines or
haskell. Multiple classes may be
separated by spaces or commas.
--normalize
Normalize the document after reading: merge adjacent
Str or Emph elements,
for example, and remove repeated Spaces.
-p, --preserve-tabs
Preserve tabs instead of converting them to spaces (the
default).
--tab-stop=NUMBER
Specify the number of spaces per tab (default is 4).
General writer options-s, --standalone
Produce output with an appropriate header and footer (e.g. a
standalone HTML, LaTeX, or RTF file, not a fragment). This
option is set automatically for pdf,
epub, docx, and
odt output.
--template=FILE
Use FILE as a custom template for the
generated document. Implies --standalone.
See Templates below for a
description of template syntax. If no extension is
specified, an extension corresponding to the writer will be
added, so that --template=special looks
for special.html for HTML output. If the
template is not found, pandoc will search for it in the user
data directory (see --data-dir). If this
option is not used, a default template appropriate for the
output format will be used (see
-D/--print-default-template).
-VKEY[=VAL],
--variable=KEY[:VAL]
Set the template variable KEY to the
value VAL when rendering the document
in standalone mode. This is generally only useful when the
--template option is used to specify a
custom template, since pandoc automatically sets the
variables used in the default templates. If no
VAL is specified, the key will be given
the value true.
-DFORMAT,
--print-default-template=FORMAT
Print the default template for an output
FORMAT. (See -t for
a list of possible FORMATs.)
--no-wrap
Disable text wrapping in output. By default, text is wrapped
appropriately for the output format.
--columns=NUMBER
Specify length of lines in characters (for text wrapping).
--toc,
--table-of-contents
Include an automatically generated table of contents (or, in
the case of latex,
context, and rst, an
instruction to create one) in the output document. This
option has no effect on man,
docbook, slidy, or
s5 output.
--no-highlight
Disables syntax highlighting for code blocks and inlines,
even when a language attribute is given.
--highlight-style=STYLE
Specifies the coloring style to be used in highlighted
source code. Options are pygments (the
default), kate,
monochrome, espresso,
haddock, and tango.
-HFILE,
--include-in-header=FILE
Include contents of FILE, verbatim, at
the end of the header. This can be used, for example, to
include special CSS or javascript in HTML documents. This
option can be used repeatedly to include multiple files in
the header. They will be included in the order specified.
Implies --standalone.
-BFILE,
--include-before-body=FILE
Include contents of FILE, verbatim, at
the beginning of the document body (e.g. after the
<body> tag in HTML, or the
\begin{document} command in LaTeX). This
can be used to include navigation bars or banners in HTML
documents. This option can be used repeatedly to include
multiple files. They will be included in the order
specified. Implies --standalone.
-AFILE,
--include-after-body=FILE
Include contents of FILE, verbatim, at
the end of the document body (before the
</body> tag in HTML, or the
\end{document} command in LaTeX). This
option can be be used repeatedly to include multiple files.
They will be included in the order specified. Implies
--standalone.
Options affecting specific writers--self-contained
Produce a standalone HTML file with no external
dependencies, using data: URIs to
incorporate the contents of linked scripts, stylesheets,
images, and videos. The resulting file should be
self-contained, in the sense that it needs no
external files and no net access to be displayed properly by
a browser. This option works only with HTML output formats,
including html, html5,
html+lhs, html5+lhs,
s5, slidy, and
dzslides. Scripts, images, and
stylesheets at absolute URLs will be downloaded; those at
relative URLs will be sought first relative to the working
directory, then relative to the user data directory (see
--data-dir), and finally relative to
pandoc’s default data directory.
--offline
Deprecated synonym for --self-contained.
-5, --html5
Produce HTML5 instead of HTML4. This option has no effect
for writers other than html.
(Deprecated: Use the
html5 output format instead.)
--ascii
Use only ascii characters in output. Currently supported
only for HTML output (which uses numerical entities instead
of UTF-8 when this option is selected).
--reference-links
Use reference-style links, rather than inline links, in
writing markdown or reStructuredText. By default inline
links are used.
--atx-headers
Use ATX style headers in markdown output. The default is to
use setext-style headers for levels 1-2, and then ATX
headers.
--chapters
Treat top-level headers as chapters in LaTeX, ConTeXt, and
DocBook output. When the LaTeX template uses the report,
book, or memoir class, this option is implied. If
--beamer is used, top-level headers will
become \part{..}.
-N, --number-sections
Number section headings in LaTeX, ConTeXt, or HTML output.
By default, sections are not numbered.
--no-tex-ligatures
Do not convert quotation marks, apostrophes, and dashes to
the TeX ligatures when writing LaTeX or ConTeXt. Instead,
just use literal unicode characters. This is needed for
using advanced OpenType features with XeLaTeX and LuaLaTeX.
Note: normally --smart is selected
automatically for LaTeX and ConTeXt output, but it must be
specified explicitly if
--no-tex-ligatures is selected. If you
use literal curly quotes, dashes, and ellipses in your
source, then you may want to use
--no-tex-ligatures without
--smart.
--listings
Use listings package for LaTeX code blocks
-i, --incremental
Make list items in slide shows display incrementally (one by
one). The default is for lists to be displayed all at once.
--slide-level=NUMBER
Specifies that headers with the specified level create
slides (for beamer,
s5, slidy,
dzslides). Headers above this level in
the hierarchy are used to divide the slide show into
sections; headers below this level create subheads within a
slide. The default is to set the slide level based on the
contents of the document; see
Structuring the
slide show, below.
--section-divs
Wrap sections in <div> tags (or
<section> tags in HTML5), and
attach identifiers to the enclosing
<div> (or
<section>) rather than the header
itself. See
Section
identifiers, below.
--email-obfuscation=none|javascript|references
Specify a method for obfuscating mailto:
links in HTML documents. none leaves
mailto: links as they are.
javascript obfuscates them using
javascript. references obfuscates them
by printing their letters as decimal or hexadecimal
character references. If --strict is
specified, references is used
regardless of the presence of this option.
--id-prefix=STRING
Specify a prefix to be added to all automatically generated
identifiers in HTML output. This is useful for preventing
duplicate identifiers when generating fragments to be
included in other pages.
-TSTRING,
--title-prefix=STRING
Specify STRING as a prefix at the
beginning of the title that appears in the HTML header (but
not in the title as it appears at the beginning of the HTML
body). Implies --standalone.
-cURL,
--css=URL
Link to a CSS style sheet.
--reference-odt=FILE
Use the specified file as a style reference in producing an
ODT. For best results, the reference ODT should be a
modified version of an ODT produced using pandoc. The
contents of the reference ODT are ignored, but its
stylesheets are used in the new ODT. If no reference ODT is
specified on the command line, pandoc will look for a file
reference.odt in the user data directory
(see --data-dir). If this is not found
either, sensible defaults will be used.
--reference-docx=FILE
Use the specified file as a style reference in producing a
docx file. For best results, the reference docx should be a
modified version of a docx file produced using pandoc. The
contents of the reference docx are ignored, but its
stylesheets are used in the new docx. If no reference docx
is specified on the command line, pandoc will look for a
file reference.docx in the user data
directory (see --data-dir). If this is
not found either, sensible defaults will be used.
--epub-stylesheet=FILE
Use the specified CSS file to style the EPUB. If no
stylesheet is specified, pandoc will look for a file
epub.css in the user data directory (see
--data-dir). If it is not found there,
sensible defaults will be used.
--epub-cover-image=FILE
Use the specified image as the EPUB cover. It is recommended
that the image be less than 1000px in width and height.
--epub-metadata=FILE
Look in the specified XML file for metadata for the EPUB.
The file should contain a series of Dublin Core elements, as
documented at
http://dublincore.org/documents/dces/.
For example:
<dc:rights>Creative Commons</dc:rights>
<dc:language>es-AR</dc:language>
By default, pandoc will include the following metadata
elements: <dc:title> (from the
document title), <dc:creator> (from
the document authors), <dc:date>
(from the document date, which should be in
ISO 8601
format), <dc:language>
(from the lang variable, or, if is not
set, the locale), and
<dc:identifier id="BookId">
(a randomly generated UUID). Any of these may be overridden
by elements in the metadata file.
--epub-embed-font=FILE
Embed the specified font in the EPUB. This option can be
repeated to embed multiple fonts. To use embedded fonts, you
will need to add declarations like the following to your CSS
(see --epub-stylesheet):
@font-face {
font-family: DejaVuSans;
font-style: normal;
font-weight: normal;
src:url("DejaVuSans-Regular.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: normal;
font-weight: bold;
src:url("DejaVuSans-Bold.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: italic;
font-weight: normal;
src:url("DejaVuSans-Oblique.ttf");
}
@font-face {
font-family: DejaVuSans;
font-style: italic;
font-weight: bold;
src:url("DejaVuSans-BoldOblique.ttf");
}
body { font-family: "DejaVuSans"; }
--latex-engine=pdflatex|lualatex|xelatex
Use the specified LaTeX engine when producing PDF output.
The default is pdflatex. If the engine is
not in your PATH, the full path of the engine may be
specified here.
Citations--bibliography=FILE
Specify bibliography database to be used in resolving
citations. The database type will be determined from the
extension of FILE, which may be
.mods (MODS format),
.bib (BibTeX/BibLaTeX format),
.ris (RIS format),
.enl (EndNote format),
.xml (EndNote XML format),
.wos (ISI format),
.medline (MEDLINE format),
.copac (Copac format), or
.json (citeproc JSON). If you want to use
multiple bibliographies, just use this option repeatedly.
--csl=FILE
Specify CSL
style to be used in formatting citations and the
bibliography. If FILE is not found,
pandoc will look for it in
$HOME/.csl
in unix and
C:\Documents And Settings\USERNAME\Application Data\csl
in Windows. If the --csl option is not
specified, pandoc will use a default style: either
default.csl in the user data directory
(see --data-dir), or, if that is not
present, the Chicago author-date style.
--citation-abbreviations=FILE
Specify a file containing abbreviations for journal titles
and other bibliographic fields (indicated by setting
form="short" in the CSL node
for the field). The format is described at
http://citationstylist.org/2011/10/19/abbreviations-for-zotero-test-release/.
Here is a short example:
{ "default": {
"container-title": {
"Lloyd's Law Reports": "Lloyd's Rep",
"Estates Gazette": "EG",
"Scots Law Times": "SLT"
}
}
}
--natbib
Use natbib for citations in LaTeX output.
--biblatex
Use biblatex for citations in LaTeX output.
Math rendering in HTML-m [URL],
--latexmathml[=URL]
Use the
LaTeXMathML
script to display embedded TeX math in HTML output. To
insert a link to a local copy of the
LaTeXMathML.js script, provide a
URL. If no URL is
provided, the contents of the script will be inserted
directly into the HTML header, preserving portability at the
price of efficiency. If you plan to use math on several
pages, it is much better to link to a copy of the script, so
it can be cached.
--mathml[=URL]
Convert TeX math to MathML (in docbook as
well as html and
html5). In standalone
html output, a small javascript (or a
link to such a script if a URL is
supplied) will be inserted that allows the MathML to be
viewed on some browsers.
--jsmath[=URL]
Use
jsMath
to display embedded TeX math in HTML output. The
URL should point to the jsMath load
script (e.g. jsMath/easy/load.js); if
provided, it will be linked to in the header of standalone
HTML documents. If a URL is not
provided, no link to the jsMath load script will be
inserted; it is then up to the author to provide such a link
in the HTML template.
--mathjax[=URL]
Use MathJax to
display embedded TeX math in HTML output. The
URL should point to the
MathJax.js load script. If a
URL is not provided, a link to the
MathJax CDN will be inserted.
--gladtex
Enclose TeX math in <eq> tags in
HTML output. These can then be processed by
gladTeX
to produce links to images of the typeset formulas.
--mimetex[=URL]
Render TeX math using the
mimeTeX
CGI script. If URL is not specified, it
is assumed that the script is at
/cgi-bin/mimetex.cgi.
--webtex[=URL]
Render TeX formulas using an external script that converts
TeX formulas to images. The formula will be concatenated
with the URL provided. If URL is not
specified, the Google Chart API will be used.
Options for wrapper scripts--dump-args
Print information about command-line arguments to
stdout, then exit. This option is
intended primarily for use in wrapper scripts. The first
line of output contains the name of the output file
specified with the -o option, or
- (for stdout) if no
output file was specified. The remaining lines contain the
command-line arguments, one per line, in the order they
appear. These do not include regular Pandoc options and
their arguments, but do include any options appearing after
a -- separator at the end of the line.
--ignore-args
Ignore command-line arguments (for use in wrapper scripts).
Regular Pandoc options are not ignored. Thus, for example,
pandoc --ignore-args -o foo.html -s foo.txt -- -e latin1
is equivalent to
pandoc -o foo.html -s
Templates
When the -s/--standalone option is used, pandoc
uses a template to add header and footer material that is needed for
a self-standing document. To see the default template that is used,
just type
pandoc -D FORMAT
where FORMAT is the name of the output format. A
custom template can be specified using the
--template option. You can also override the
system default templates for a given output format
FORMAT by putting a file
templates/default.FORMAT in the user data
directory (see --data-dir, above).
Exceptions: For odt output,
customize the default.opendocument template. For
pdf output, customize the
default.latex template. For
epub output, customize the
epub-page.html,
epub-coverimage.html, and
epub-titlepage.html templates.
Templates may contain variables. Variable names
are sequences of alphanumerics, -, and
_, starting with a letter. A variable name
surrounded by $ signs will be replaced by its
value. For example, the string $title$ in
<title>$title$</title>
will be replaced by the document title.
To write a literal $ in a template, use
$$.
Some variables are set automatically by pandoc. These vary somewhat
depending on the output format, but include:
header-includes
contents specified by
-H/--include-in-header (may have multiple
values)
toc
non-null value if --toc/--table-of-contents
was specified
include-before
contents specified by
-B/--include-before-body (may have multiple
values)
include-after
contents specified by
-A/--include-after-body (may have multiple
values)
body
body of document
title
title of document, as specified in title block
author
author of document, as specified in title block (may have
multiple values)
date
date of document, as specified in title block
lang
language code for HTML or LaTeX documents
slidy-url
base URL for Slidy documents (defaults to
http://www.w3.org/Talks/Tools/Slidy2)
s5-url
base URL for S5 documents (defaults to
ui/default)
fontsize
font size (10pt, 11pt, 12pt) for LaTeX documents
documentclass
document class for LaTeX documents
geometry
options for LaTeX geometry class, e.g.
margin=1in; may be repeated for multiple
options
mainfont, sansfont,
monofont, mathfont
fonts for LaTeX documents (works only with xelatex and
lualatex)
theme
theme for LaTeX beamer documents
colortheme
colortheme for LaTeX beamer documents
Variables may be set at the command line using the
-V/--variable option. This allows users to
include custom variables in their templates.
Templates may contain conditionals. The syntax is as follows:
$if(variable)$
X
$else$
Y
$endif$
This will include X in the template if
variable has a non-null value; otherwise it will
include Y. X and
Y are placeholders for any valid template text,
and may include interpolated variables or other conditionals. The
$else$ section may be omitted.
When variables can have multiple values (for example,
author in a multi-author document), you can use
the $for$ keyword:
$for(author)$
<meta name="author" content="$author$" />
$endfor$
You can optionally specify a separator to be used between
consecutive items:
$for(author)$$author$$sep$, $endfor$
If you use custom templates, you may need to revise them as pandoc
changes. We recommend tracking the changes in the default templates,
and modifying your custom templates accordingly. An easy way to do
this is to fork the pandoc-templates repository
(http://github.com/jgm/pandoc-templates)
and merge in changes after each pandoc release.
Pandoc’s markdown
Pandoc understands an extended and slightly revised version of John
Gruber’s
markdown
syntax. This document explains the syntax, noting differences from
standard markdown. Except where noted, these differences can be
suppressed by specifying the --strict
command-line option.
Philosophy
Markdown is designed to be easy to write, and, even more
importantly, easy to read:
A Markdown-formatted document should be publishable as-is, as
plain text, without looking like it’s been marked up with tags
or formatting instructions. –
John
Gruber