In the current DevOps industry we strive to meet certain best practices that not only make our work easier, but also make our systems and applications more reliable. Two of those practices are to automate as much as possible and to document what we do. And sometimes, we can mix them and automate the generation of documentation. Here’s an example that made our lives better by helping us simplify the generation of documentation in an automated way.
Documentation is one of the key resources in a systems engineering or development team. There are two aspects that pretty much everyone agrees on: it’s very important to have good and useful documentation and it’s very difficult to keep it updated and sane. The variables that influence this are many and generate many alternatives:
- Tools: there are a lot of systems to keep documentation, and most teams have gone through phases with several of these tools, trying them out until they find something that works reasonably well for them. Some examples of these systems are: documents in shared folders, websites, wikis, tickets, specific tools, etc.
- Format: there are also many formats in which you can write documentation: simple text, WYSIWYG text, HTML (from wiki, markdown, markup, reStructured Text, latex,…). Usability and readability are at odds here: writing plain text documents is easier, but tiresome to read and follow; good-looking web pages are easier to follow and acces but usually take more time to generate.
- Team organization: everyone thinks differently and organizes ideas in their heads differently, and this shows when writing these ideas down. For instance, someone would put the procedure to create a database dump inside the “MySQL” category, while another person would put it in the “Backups” category. This is managed in some teams by determining an ‘editor’, a person that takes the documentation bits generated by their peers and organizes them in a particular way. However, this generates an extra step or a bottleneck, which makes keeping the documentation updated harder.
At CAPSiDE, we’re one of these teams that has tried several methods. We’ve used and we use several tools, because not all documentation is the same and technology keeps evolving. Here’s a story of how we automated the generation of HTML documentation from reStructured text with git, gitlab CI/CD and Docker.
Clouddeploy is an open-source tool developed at CAPSiDE that helps us create and maintain infrastructure in AWS through code. For the documentation on this tool, we started using a wiki, but as the documentation grew so did the idea to use another system. While looking at different alternatives, we came upon Sphinx which turns reStructured Text (RST) into several formats, such as HTML or epub.
Everything in CAPSiDE is under version control, so the RST files had to be managed by git. We also have a Gitlab where our repos live. So, with the first version of this documentation, any team member did the following to contribute:
- Configure a virtualenv in her working area and install Sphinx and Read the Docs.
- Pull the documentation repo from the company Gitlab server.
- Create a branch and start working on the documentation.
- Once content with the work done, generate the HTML with Sphinx’s make html command and make sure that everything looks as expected.
- Commit the changes and push to the server.
- Copy the generated HTML to a web server.
This early process was cumbersome compared to just editing a wiki, so it needed some refinement. And since automation is at the core of systems engineering, we looked at ways to do just that.
CI to the rescue
If you add a .gitlab-ci.yml file to the root directory of your repository, and configure your GitLab project to use a Runner, then each commit or push, triggers your CI pipeline.
In the previous workflow there were several opportunities to reduce work for the team:
- Avoid making everyone set up the environment
- Compile the code into html and make sure it looks good
- Publish the generated HTML to the web server
All this could be done in two jobs in Gitlab:
- One to generate the html and provide it to the user to check
- One to publish
Gitlab provides several ways to run the CI/CD jobs. For a simple job like this in which setting up the environment was necessary, we decided to use Docker.
Gitlab Runners and Docker
The Docker image in this runner only needed to set up the environment so we could compile the code. A simple Dockerfile like this was enough for the job:
With this, we get the official Python 2.7 image and use pip to install sphinx and the RTD theme. The next step is to create the image that the runner will use in the Gitlab server:
We can now create the gitlab runner. To do this, you need to have sudo privileges in the Gitlab server and be an admin in Gitlab to obtain the token:
And answer to the questions it prompts (more information here). Some of the configuration set up by answering these questions can be changed later. The last question is about choosing the execution type:
In this case, we choose docker, which brings us to the last question where we will use the docker image we created earlier:
We now have the runner set up. The next step is to configure the pipeline so this runner is used when we push to the repository.
Creating the pipeline
We now need to create the .gitlab-ci.yml file in our repository, which will set everything in motion. The pipeline will have two jobs:
- Job 1: create the html
- Job 2: publish
Here’s the .gitlab-ci.yml we generated:
Here’s an overview of what this all means:
- We indicate which docker image we will use for this pipeline (the one we created before and configured a runner with)
- We indicate the stages of the pipeline. In this case, we have two stages: build and deploy
- We define the jobs: build and publish
- In the build job we indicate what we want to execute (make html) and what artifacts we will need to keep from this execution. The artifact will then be available after this job is done and anyone will be able to download it from the Gitlab page and look at the generated files to ensure they look as expected.
- In the publish job we copy the generated artifact to the remote documentation server. Some details with this:
- The only: master line indicates that this job will only be executed if the push was done to the master branch. Therefore, everyone can create branches with new documentation and when they push the pipeline will only execute the build job, allowing them to generate the HTML code and look at it without deploying it to production.
- The before_script section sets up the environment needed to copy the artifact to a remote server through SSH from the container. It’s important to note there the $SSH_PRIVATE_KEY variable. This variable is set in the Gitlab interface of the project in Settings -> CI/CD -> Secret variables, where we put a name and a value to the variable and decide whether it is protected or not (that is, if this variable is available in jobs triggered from all branches or only protected branches).
- Finally, the script section copies the artifact and triggers the execution of a script that deploys the code in the remote server accordingly.
The final situation
We talked in the beginning about the different steps that someone in the team needed to do to update documentation. The situation has changed now and it looks like this:
- Pull the documentation repo from the company Gitlab server
- Create a branch and start working on the documentation
- Once content with the work done, push the branch to Gitlab. This will trigger the pipeline and execute the build job, which will provide an artifact: the HTML code in a zip file which the team member can download to make sure that everything looks as expected. The status of the job, its output and the artifacts can be seen from the project page in Gitlab, in the CI/CD section
- If satisfied, launch a pull request to the master branch
- When the pull request is accepted, the pipeline will be triggered again, it will generate the code and copy it to the web server
We have then removed the need to set up the environment for every team member, to compile the code and to copy it manually to the server once is approved. This makes updating the documentation way easier than before, which is key to keep it up to date.