In this article, we will set up a Storage Gateway virtual machine on-premises that will cache and buffer backup data from Bacula.
The Storage Gateway virtual machine will then upload the data to the VTL service, having a cost-effective backup solution, enabling at the same time offsite storage of our data backups.
What is Bacula?
Bacula is an Open Source suite to manage your backups. It is very customizable and allows a very powerful storage configuration. You can store your backups in tapes (single drive or using a media changer) or to a local disk.
Bacula architecture is made of decoupled programs that makes easy to combine different storage systems. You can use multiple storage backends from the same Bacula director.
What is Virtual Tape Library?
The Virtual Tape Library (or VTL) is a tape-based storage solution of Amazon Web Services Storage Gateway. You can use the VTL service to archive backup data in Amazon Glacier, providing seamless integration between on-premises environment and the AWS cloud.
The architecture will look like this, except that we will use a single Data Center (source):
By deploying the Storage Gateway virtual machine on-premises, we will benefit of having a local cache of our data, making possible both quick access to the most recent data backed up and a buffer between our data center and the S3 storage where the virtual tapes live.
It is highly recommended that you use individual disk devices to be used as the upload buffer and cache storage in your Storage gateway. Keep in mind that both buffer sizes will have direct impact in performance because their main use is to avoid network latencies between your Storage Gateway and AWS. You can use RAW iSCSI devices if you happen to have some kind of SAN.
Connections from Bacula to VTL
You will need to install iSCSI Initiator Tools for your Linux distribution (
open-iscsi in Debian and derivatives,
iscsi-initiator-utils if you are using a Red Hat flavor). In our particular case, we are going to use Debian.
To discover the targets that the VTL appliance exposes, you can follow the instructions from the VTL documentation, in a nutshell:
Note that by default your system can detect the devices as generic SCSI devices, but for Bacula to operate correctly, we need to load the ‘
st‘ driver. You can force that by adding the module name to
You can see the newly added devices using
As you can see, we have 10 drives and a media changer, and we can use any combination of the drives if we have the storage and bandwidth capacity. In this post, we will only be using one of them, but keep in mind that you can use them all if you want to.
As Linux nowadays uses
udev to autodiscover devices, it can happen that the device names are not preserved between reboots. Since the Bacula configuration needs a static device name, we will write a
udev rule to match the media changer and give it a predefined name. Add the following lines to
This will create a link under
/dev/tape/by-id/ pointing to the detected changer device. For example:
MTX: the changer helper
Bacula is agnostic about its storage backends. It does not know how to manage a media changer or write to a tape. It uses the underlying UNIX phylosophy of “everything is a file” and it uses helper tools to manage tapes and drives.
One of the backend tools is
mtx, a program used to control media changers. You can use
mtx manually to operate with the changer, but if you make any permament change (e.g. load/unload tapes) remember always to execute the
update slots command in the bacula console.
List Data Transfer Elements (drives) as well as Storage Elements (slots) with their contents. Example:
Note that there are 3200 slots, 1600 regular slots and 1600 Import/Export slots, used when archiving and retreiving tapes to/from the Virtual Tape Shelf.
You can manually load or unload tapes from the slots (Storage Element) to the drives (Data Transfer Element).
Be careful, although, if Bacula is operating with the drives not to interfere with its normal operation.
For example, to load the tape located at slot 3 to the drive number 9:
The same for unloading (the argument order is the same):
The mtx-changer wrapper
mtx tool is handy to perform manual management of the devices, but bacula comes with a nice wrapper called
mtx-changer (in Debian systems, you can find int in
/etc/bacula/scripts/mtx-changer), the one that Bacula will actually use.
The wrapper is indeed intended to be modified to suit your particular needs, although it should work out of the box for most changers. In the particular case of the VTL, it will filter out the 1600 Import/Export slots, so if you want to use them directly form the bacula console (for example, when you create a new tape or retrieve an archived tape from the Virtual Tape Shelf, the VTL will insert it in one of the available I/E slots), you will have to tweak it a bit. In our case, we are happy to interact with the Import/Export slots using
mtx interface in a manual fashion.
Next, we need to modify the Bacula configuration to use the new devices. We will change two files:
bacula-sd.conf: Here we will configure the actual storage backends (how Bacula will actually write to the tapes and use the media changer).
bacula-dir.conf: We will define a
Storageresource and make it available to the tape pools.
To configure the Storage Daemon, you just have to create the resources in your Storage Daemon configuration (
bacula-sd.conf file). Let’s see the resource definition and then we will break down the meaning of the directives:
Autochanger: The definition of the properties of the medium changer:
- Name: Arbitrary name (identifier) for the Autochanger device
- Device: Comma-separated list of Device resources that work with this autochanger
- Changer Command: Program used when Bacula needs to load/unload tapes to/from the drives.
- Changer Device: The SCSI device to drive the autochanger. Here is where we will put the name of the symlink that we create with the udev rule.
Device: The definition each of the tape drives:
- Name: Arbitrary name (identifier) for this drive
- Drive Index: Index of the drive (used for
- Media Type: Arbitrary identifier of the tapes that this device can use. you can use the same Media Type for multiple Device resources, if they are compatible (and they are if you use VTL!), so Bacula will not tie a particular tape to be read by a single drive, instead any drive will be OK for loading any virtual tape.
- Archive Device: The actual device that Bacula will use to write to de drive
- RemovableMedia: Indicates that the media can be extracted (in contrast to a directory in the hard disk)
- RandomAccess: Tape drives can’t read/write to arbitrary positions, you need to rewind and then start moving forward until the desired portion of the tape.
- AutoChanger: This device belongs to an automatic changer
We will add a new
Pool resources to the
bacula-dir.conf configuration file:
The 2 important directives are:
1. Storage: The definition of the storage for our jobs.
- Name: Arbitrary identifier that we will use in the
- Address: IP or hostname resolvable by the clients. This string is passed verbatim to the clients so they know what Storage Daemon they must use.
- Password: Used to authenticate with the Storage Daemon
- Device: Name that we gave to the
Autochangerresource in the Storage Daemon configuration.
- Media Type: Must match the
Media Typedirective from the Autochanger definition in the Storage Daemon.
- Autochanger: Indicates that this storage is an autochanger
2. Pool: Definition of a pool of tapes that Bacula can use to make backups.
- Name: Arbitrary identifier that we will use in the
- Pool Type: Just use
- Storage: the Name we gave to the
Then, we are all set. Now we just need to use the
Pool resource that uses the VTL storage backend in our backup jobs.
As we have seen, Bacula configuration possibilities are very powerful, but it can be quite complex to set up (What? do I need to add 4 resources in two different files just to use a tape drive?), but there is where its flexibility lies.
AWS Virtual Tape Library is a perfect match for Bacula. We have local easy-to-access data (resting in the Cache Buffer in the VTL appliance on-premises) as well as safe, cloud-based remote offsite backups. We can leverage Glacier to reduce costs impact as well.