In this blog I will explain how to generate static HTML pages from your projects Pydoc (docstring) comments with Sphinx. Then we are going to host it in an Azure Web App so that everyone in your team is able to access it. Because we use a Storage Mount, when new html files are generated, you just have to replace them in the storage account and it will be reflected on the endpoint.
This way you always have a hosted version of the latest documentation. See figure 1 for the architecture.
Why?
For big data engineering projects we use a lot of Azure Databricks. We created many Jupyter notebooks that live inside databricks. When you want to reuse your code, there is no easy way to do that. Besides that, testing is not easily done with plain Jupyter notebooks. With a library we can unit test all of our functionality before deploying it. This gives our team a lot of confidence while developing.
Because of these issues, we decided to create a library with all functionalities separated by packages. When this is installed on the cluster, we can easily call all functionality in the notebooks.
When you find a bug in a certain function, we don’t have to fix it in every notebook where we implemented the same code (duplication of code). Now we only have to fix the library and install it on our cluster (with CI/CD).
When programming notebooks we can instantly see the definition, parameters and an explanation of every function because we use Pydoc. But when you want to search a certain function, or easily see all functionality, we use the hosted version of our documentation.
Please read below on how to achieve this.
Repository
All code and settings you need for this blog are located in this repository: https://github.com/samvruggink/hosting-sphinx-docs-in-azure-webapp-blog
Step 1: Pydoc (docstring)
First of all we need to document our functions, we are using the industry standard Pydoc for this. Pydoc enables us to document our code in an easy way, please see the code block below for an example.
def plusOne(number: int) -> int:
"""[summary]
Args:
number (int): [description]
Returns:
int: [description]
"""
The first part is the summary
where you can give a short description of what the function actually does. Afterwards you can add a description to all the args
. If you specify a return value, add a description to it as well. Below is an example of a function where this is implemented.
def read_parquet(spark: SparkSession, path: str) -> DataFrame:
"""Reads a parquet and returns a DataFrame
Args:
spark (SparkSession): SparkSession
path (str): path of the input file/dir
Returns:
[DataFrame]: A Dataframe with parquet data
"""
df = spark.read.format("parquet").load(path, inferSchema=True)
if isinstance(df, DataFrame):
logger.info(f"Read parquet : {type(df).__name__}")
else:
logger.error(
f"Is an instance of : {type(df).__name__}, not a DataFrame, exiting now !"
)
return df
When you have all your functions documented it’s time to generate Sphinx documentation.
Step 2: Generate Sphinx static HTML from your Pydoc definitions
Sphinx is an amazing library to generate static html files from pydoc. It’s super customizable with endless possibilities. This also makes it a bit more complex, the guide below will explain how to generate static HTML files from your src folder using a standard template.
This is our project structure:
Demo-project-sphinx-doc
|-- src
| |-- __init__.py
| |-- foo.py
| |-- bar.py
|-- test
| |-- __init__.py
| |-- test_foo.py
| |-- test_bar.py
|-- source
| |--index.rst
| |--conf.py
| |-- _templates
| |-- custom-module-template.rst
| |-- custom-class-template.rst
| Makefile
| make.bat
We want to generate data from our functions in foo.py
and bar.py
. First you need to install Sphinx on your computer. You also need to have pip
for this. pip
is a package manager for Python (same as maven, npm, nuget). You can download and find more information here
In the project root execute the following commands:
pip install sphinx
sphinx-quickstart
sphinx-quickstart
will generate basic configuration files, we are keeping the default source name directory, but you can change it. When it asks you to separate source and build directories, type “y
“.
This will give you the following structure (see figure 2)
Now we are going to change some Sphinx settings in order to generate our static HTML files. Change your conf.py
to the following:
import os
import sys
sys.path.insert(0, os.path.abspath(".."))
# -- Project information -----------------------------------------------------
project = "demo-project-sphinx-doc"
copyright = "2021, sam"
author = "sam"
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
]
autosummary_generate = True # Turn on sphinx.ext.autosummary
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "alabaster"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
We are adding some extensions for automatic generation of static html files. Also we give our template path and make sure it knows where to find the src directory.
Now go into the index.rst
and replace it’s contents with the following:
Welcome to demo-project-sphinx-doc's documentation!
===================================================
.. autosummary::
:toctree: _autosummary
:template: custom-module-template.rst
:recursive:
.. toctree::
:maxdepth: 2
:caption: Contents:
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
The :recursive:
will make sure that we can have nested structure in our src
folder, and it will automatically discover it. For each module, it then summarises every attribute, function, class and exception in that module.
Now we need templates in order to parse our data from autosummary. Add 2 files to the _templates folder:
custom-module-template.rst
{{ fullname | escape | underline}}
.. automodule:: {{ fullname }}
{% block attributes %}
{% if attributes %}
.. rubric:: Module Attributes
.. autosummary::
:toctree:
{% for item in attributes %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block functions %}
{% if functions %}
.. rubric:: {{ _('Functions') }}
.. autosummary::
:toctree:
{% for item in functions %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block classes %}
{% if classes %}
.. rubric:: {{ _('Classes') }}
.. autosummary::
:toctree:
:template: custom-class-template.rst
{% for item in classes %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block exceptions %}
{% if exceptions %}
.. rubric:: {{ _('Exceptions') }}
.. autosummary::
:toctree:
{% for item in exceptions %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block modules %}
{% if modules %}
.. rubric:: Modules
.. autosummary::
:toctree:
:template: custom-module-template.rst
:recursive:
{% for item in modules %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
custom-class-template.rst
{{ fullname | escape | underline}}
.. currentmodule:: {{ module }}
.. autoclass:: {{ objname }}
:members:
:show-inheritance:
:inherited-members:
{% block methods %}
.. automethod:: __init__
{% if methods %}
.. rubric:: {{ _('Methods') }}
.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Attributes') }}
.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
Now we can use the command make clean html
to generate our documentation. If you go into build/html/
and open index.html
you will see the following:
Now we want to host our static HTML in an Azure Webapp
In our own environment we deploy everything using CI/CD. We deploy resources using Terraform and pipelines.
Because it’s a blog, I will show you how to host documentation while provisioning everything by hand.
What do we need to provision:
- Storage Account
- StorageV2 (general purpose v2)
- Standard/Hot data
- Linux App Service
- A cheap tier to test is B1
- Web App
- Docker Container running on Linux
I created a video on how to actually do this. Please view the video below.
Thanks for reading my blog. Leave a comment if you have any questions.