Hugo's Processing Model and URL Management
The Hugo static site generator takes some plain-text content, marries it to a bunch of HTML templates, and produces a set of complete, static HTML pages that can be served by any generic, stand-alone web server. Because the site generated by Hugo is entirely static, all URLs in the public site must correspond directly to objects in the filesystem.
Part 3: Processing Model, Input/Output Mapping, URL Management
In principle, Hugo takes a hierarchy of directories and files underneath the source directory, and recreates the same hierarchy in the destination directory: it couldn’t be simpler. But there are two circumstances that conspire to turn the whole topic of input/output mapping into the most confusing aspect of working with Hugo:
-
The path names of the generated files will be the public URLs of the finished site. Any amount of URL management, rewriting, or cleaning therefore amounts to changes in the mapping of source to destination files.
-
For each directory, Hugo automatically creates a page, showing all the items in that directory. This page is not based on user-provided content; it is created synthetically by Hugo. But users may want to add to or modify the content of these created pages. Hugo provides a mechanism for doing so that sometimes creates additional confusion. (In particular as the Hugo documentation of this mechanism is not noted for its clarity.)
Clean URLs
The first source of complexity is the desire to have “clean URLs” that end with a directory name, not a filename and extension:
www.example.com/news/what-happened-today/ Clean
www.example.com/news/what-happened-today.html Ugly
Because in a static site, any public URL must correspond to an object in the filesystem, the generated filesystem objects must be:
public/news/what-happened-today/index.html
Most web servers are configured to silently serve the index.html
file when the request URL points to the parent directory.
To create output at this URL, Hugo allows two different input styles:
content/news/what-happened-today.md File
content/news/what-happened-today/index.md Directory with index.md
Either of these alternatives will map to the public URL stated earlier. (Of course, you shouldn’t have both of them in your input directory; otherwise, the results will clobber each other).
Here is the problem: remember that Hugo will automatically create a
synthetic page for all directories in the input source tree? Clearly,
for the directory what-happened-today
in the second alternative,
this is not appropriate, because this directory contains only a single
item, which is itself a page. Hence Hugo has the special rule:
If a directory contains a file called
index.md
, then process this directory as if it was a file!
Why, then, allow directories that don’t contain items, but that map to single pages at all? Because they prevent cluttering the namespace if there are auxiliary files (such as images)!
Imagine that the page in question was referring to an image, say
img.png
. Hugo copies files that are not Markdown directly from
their location in the source tree to exactly the same position in
the destination directory. Hence a file at
content/news/img.png
would be copied to public/news/img.png
,
cluttering the namespace in that directory. (Alternatively, you
could have all image files in the content/static/
directory,
again cluttering the global namespace.)
By contrast, if the input file resides in its own directory, then the image file can also be placed into that directory:
content/news/what-happened-today/index.md
content/news/what-happened-today/img.png
Both files will be mapped to the directory
public/news/what-happened-today/
in the output directory. The
image file will be local to this directory, and not clutter the
wider namespace.
To summarize:
- Input can either be a Markdown file with an arbitrary name, or a
directory containing a Markdown file named
index.md
. - Either will be mapped to a directory, containing an
index.html
file, with the content placed into that file. - Directories containing an
index.md
file will not be treated as directories, but will be processed as if they were a file.
Customizing Directory Listings
For each directory, Hugo creates a synthetic page, typically showing the items in the directory. It uses the “list” template for the layout of the resulting page, and in general, there is no user-provided “content” for that page.
But what if the user would like to provide some content, after all? Or possibly just some processing instructions in the frontmatter?
To allow for this, Hugo allows for a special file to be placed into a
directory. This file must be called _index.md
. If such a file is
found, then its contents will be made available to the list template
that is used to generate the directory listing page. (It is up to the
template to make use of the content; the template may ignore it. A
typical use is for the _index.md
file to contain only processing
instructions in its frontmatter.)
To summarize:
-
If a file called
_index.md
is found in a directory, then its contents will be made available to the list template that is used to generate the directory listing page for this directory. -
The directory will be processed as a directory, not as a file.
Overriding Filenames
In everything so far, I assumed that the filesystem name of an object
in the source tree was going to become part of the public URL for the
generated page. (In the example above, either the file basename or
the directory name what-happened-today
became part of the public
URL.)
But Hugo also allows to override the filename of the input file through frontmatter parameters! In this case, the generated HTML file can be at an arbitrary position in the destination directory; no matter where its corresponding input file resides in the source tree.
There are three frontmatter parameters that matter in this context:
title
- The
title
parameter is generally important, because many themes use its value for visible headlines. But it is also the default for the page-specific part of the visible URL. slug
- The last part of a URL, identifying the specific page or piece of
content. (In
www.example.com/news/what-happened-today/
, the slug iswhat-happened-today
.) url
- The full path part of a URL (the part following the domain).
Yet another way to override the default output location is to configure
“permalinks” in the global config.toml
file. This option is only
available for “sections” (that is, for the top-level directories directly
underneath content/
). For each such “section” a URL pattern can be
specified in the site configuration file. For all content in this section,
the corresponding output will be generated at the location pointed to by
that pattern. The pattern can include fixed strings, as well a certain
variables populated by Hugo. For example, it is possible to interject
the year into the URL for blog posts:
blog = "/blog/:year/:slug/"
This will render all content underneath content/posts/
at URLs whose
path starts with the fixed string “blog”, followed by year, and the
title of the piece.
The Home Page
The Home page is a special case: one may think of it as a “content”
page. But because it sits at the top of the directory hierarchy, it
must be a “list” page. Furthermore, any user-provided content must
be in a file called _index.md
to ensure that processing does not
stop at the root of the document directory! (Many themes provide a
special template, called index.html
, that is only going to be used
to render the home page.)
A Worked Example
The following example shows the contents of a source directory, and
the directories and files that Hugo will typically map them to (assuming
nothing is overridden in any of the files' frontmatter). (Two dashes --
indicate a missing file!)
content/ public/
-- index.html LIST page
stuff.md stuff/index.html
about/
index.md about/index.html
posts/
-- posts/index.html LIST page
first.md posts/first/index.html
other/
post.md posts/other/post/index.html
fedex.md posts/other/fedex/index.html
second.md posts/second/index.html
final/
index.md posts/final/index.html
guides/
_index.md guides/index.html LIST page
victor.md guides/victor/index.html
hugo.md guides/hugo/index.html
bundle/
index.md bundle/index.html
img.png bundle/img.png direct
problem/
index.md problem/index.html SINGLE page
topic.md -- LOST
text.md -- LOST
img.png problem/img.png direct
nested/
index.md nested/index.html SINGLE page
img1.png nested/img1.png direct
deeper/ -- LOST
index.md -- LOST
img/
img2.png nested/img/img2.png direct
mixed/
index.md -- LOST
img3.png nested/mixed/img3.png direct
It is worth studying this example in some detail.
-
Although there is no user-provided content for it, Hugo does create a home page! Remember that the home page uses a list template. To provide custom content for the home page, it must be in a file called
_index.md
at the root of the source directory. -
The next two pages demonstrate the two possible types of input: either as named file (
stuff.md
) or as named directory (about/
) containing anindex.md
file. -
The
posts/
directory shows that directories can be nested. The directory listing page for theposts/
directory does not have user-provided content; it is synthetically generated by Hugo.
-
By contrast, the
guides/
directory contains an_index.md
file that is used by Hugo to supplement the directory listing page. Hugo treats theguides/
directory as directory, generating pages for the content items (victor.md
andhugo.md
). -
The
bundle/
directory shows how to bundle an image with a page. -
The next two directories show some commonly encountered problems. The
problem/
directory contains anindex.md
file, which means that Hugo treats this directory as a “page” and will not process any input (Markdown) files in this directory or any directory below. By contrast, non-input files (such as images) are faithfully copied to the destination directory. -
The
nested/
directory demonstrates the same problem with nested directories.
Hugo’s Processing Model
Hugo’s processing model for input files can be summarized like this (this may not be exactly correct, but it seems good enough for now):
-
Recursively visit each directory.
-
For each directory, create a public destination directory of the same name.
-
If the current directory contains
index.md
, the directory is considered a “leaf directory”:- use the single page template to transform
index.md
intoindex.html
in the destination directory. - STOP processing any Markdown files in this directory or any of its children.
- do copy any Non-Markdown resources (images, also those in subdirectories) to the destination directory (see step 5).
- use the single page template to transform
-
If the current directory does not contain
index.md
, then the directory is considered a “branch directory”:- use the list page template to create
index.html
in the destination directory, showing items in the current directory. - if there is an
_index.md
in the current directory, include its contents when generatingindex.html
.
- use the list page template to create
-
For all items in current directory:
- If Markdown, create a public directory, and use the single page
template to create
index.html
in that directory. - Otherwise, copy over directly, without processing, to target directory.
- If Markdown, create a public directory, and use the single page
template to create
-
Do not create a public destination directory if it would be empty (because the source directory is empty, or because it contains only materials that would be discarded).