Create a static blog with Pandoc and PHP

Sometimes is just better to rid off of all the dynamicity of the site to achieve better performance and semplicity with just the static part, this is true if we are talking about of a simple blog like this. A programmer need that the things are reproducibles, scriptable, maintainable and versionable so whats better than markdown to compile a blog? In this post i will describe the procedure with which i write this blog and all the glues to connect all togheter, i use pandoc and php locally to compile the static files and upload it to the server.

So let’s start. Here is the content of the folder. We have different files and folder: the ´build.php´ to build all the files, the pages folder with the dinamic php files, the posts with the markdown files and the public folder with the result of the compilation and the static files.

.
├── build.php
├── default.html5.html
├── pages
│   ├── sitemap.xml.php
│   └── index.html.php
├── posts
│   ├── jail-sftp-user-in-ubuntu-the-simple-way.md
│   ├── reduce-and-control-network-bandwidth-in-linux.md
└── public
    └── .htaccess
    └── static
        ├── css
        │   ├── bootstrap.min.css
        │   └── main.css
        └── images

the markdown files are just plain markdown with pandoc metainformations in the beginning of the file:

the index.php will look like this:

<html>
...
<?php foreach ($POSTS as $post): ?>
<div class="col-md-4">
  <div class="p-3 m-3">
    <h3><a href="<?php echo $post['name']; ?>/"><?php echo $post['title']; ?></a></h3>
    <div><?php echo $post['date']; ?></div>
  </div>
</div>
<?php endforeach; ?>
...
</html>

and the default.html5.html is the file that we pass to pandoc to be used as the tempalte to generate the html from markdown; we will use a custom version of the html5 template of pandoc:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="$lang$" xml:lang="$lang$"$if(dir)$ dir="$dir$"$endif$>
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
$for(author-meta)$
  <meta name="author" content="$author-meta$" />
$endfor$
$if(date-meta)$
  <meta name="dcterms.date" content="$date-meta$" />
$endif$
$if(keywords)$
  <meta name="keywords" content="$for(keywords)$$keywords$$sep$, $endfor$" />
$endif$
  <title>$if(title-prefix)$$title-prefix$ – $endif$$pagetitle$</title>
  <style>
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
$if(quotes)$
      q { quotes: "“" "”" "‘" "’"; }
$endif$
  </style>
$if(highlighting-css)$
  <style>
$highlighting-css$
  </style>
$endif$
$for(css)$
  <link rel="stylesheet" href="$css$" />
$endfor$
$if(math)$
  $math$
$endif$
$for(header-includes)$
  $header-includes$
$endfor$
</head>
<body>
$for(include-before)$
$include-before$
$endfor$
$if(title)$
<header id="title-block-header">
<h1 class="title">$title$</h1>
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
$for(author)$
<p class="author">$author$</p>
$endfor$
$if(date)$
<p class="date">$date$</p>
$endif$
</header>
$endif$
$if(toc)$
<nav id="$idprefix$TOC" role="doc-toc">
$table-of-contents$
</nav>
$endif$
$body$
$for(include-after)$
$include-after$
$endfor$
</body>
</html>

The main commands of pandoc are the following. In order to convert markdown to html using a defined template and output to stdout:

In order to obtain an xml representation of the file with metadata informations and output to stdout:

The index.html.php and sitemap.xml.php are the dinamic pages and contains respectively:

  <div class="container">
    <?php foreach ($POSTS as $post): ?>
      <h3><a href="<?php echo $post['name']; ?>/"><?php echo $post['title']; ?></a></h3>
      <div><?php echo $post['date']; ?></div>
    <?php endforeach; ?>
  </div><!-- /.row -->
<?php foreach ($POSTS as $post): ?>
<url>
      <loc>https://www.robbiecode.com/<?php echo $post['name']; ?>/</loc>
      <lastmod><?php echo $post['date_eng']; ?></lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.5</priority>
</url>
<?php endforeach; ?>

So let’s compile the pages, call the build.php script in the root folder of the project:

with those contents:

<?php

// remove all previously rendered
shell_exec("rm public/*.html");  

// global variables available in pages
$POSTS = array();

// convert all posts
foreach (glob("posts/*.md") as $index => $post){
  $post_basename = basename($post);
  $post_basename_html = str_replace(".md", ".html", $post_basename);
  $post_basename_noext = str_replace(".md", "", $post_basename);
  
  // convert the post from md to html
  $post_content_converted = shell_exec("pandoc --standalone --from=markdown --to=html --template=default.html5.html {$post}");
  
  // load the meta informations using the dockbook format
  $post_dockbook_xml_str = shell_exec("pandoc --standalone --from=markdown --to=docbook {$post}");
  
  // and read it in php from xml
  $post_dockbook_xml = new SimpleXMLElement($post_dockbook_xml_str);
  
  // read the title of the post
  $title = $post_dockbook_xml->info->title->__toString();
  
  // and the date
  $date = $post_dockbook_xml->info->date->__toString();
  $date_time = (new DateTime($date))->getTimestamp();
  
  // populate global variable
  // with the timestamp and post index as the key
  $POSTS[$date_time+$index] = array(
    "title" => $title,
    "date" => date("d/m/Y", $date_time),
    // we will output the links to posts without html extension
    "name" => $post_basename_noext,
  );
  
  // write html contents to file
  file_put_contents("public/{$post_basename_html}", $post_content_converted);
}

// order posts by date, newer first
krsort($POSTS);

// now render php pages pages
foreach (glob("pages/*.php") as $page){
  $page_basename = basename($page);
  $page_basename_new = str_replace(".php", "", $page_basename);
  
  // render with php output buffering
  ob_start();
  require $page;
  $page_contents = ob_get_contents();
  ob_clean();
  ob_end_flush();
  
  // and write to file
  file_put_contents("public/{$page_basename_new}", $page_contents);
}

the result will look like this, with public folder populated by index.html and the other posts html files.

.
├── build.php
├── default.html5.html
├── pages
│   ├── sitemap.xml.php
│   └── index.html.php
├── posts
│   ├── jail-sftp-user-in-ubuntu-the-simple-way.md
│   ├── reduce-and-control-network-bandwidth-in-linux.md
└── public
    ├── .htaccess
    ├── sitemap.xml
    ├── index.html
    ├── jail-sftp-user-in-ubuntu-the-simple-way.html
    ├── reduce-and-control-network-bandwidth-in-linux.html
    └── static
        ├── css
        │   ├── bootstrap.min.css
        │   └── main.css
        └── images

Lastly add this .htaccess to achieve the structure of the url like domain.com/public-post-title/ having the pages named like public-post-title.html:

RewriteEngine On
# if the file or directory exists
RewriteCond %{DOCUMENT_ROOT}/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/$1 -d
# do as usual
RewriteRule (.*) - [L]
# else try to find the request as html file
RewriteRule ^([^\.]+)/$ $1.html

So thats all.