Getting started
Erlang 19.x from 'jessie-backports' on Debian Jessie
On Debian Jessie hosts, the role will configure an APT preference for backported Erlang 19.x packages from Debian Stretch. They provide better Elliptic Curve Cryptography (ECC) support and allow deactivation of TLS client-initiated protocol renegotiation, which mitigates potential DoS attacks.
Encrypted client connections
The role will check if the debops.pki and debops.dhparam Ansible roles configured their environment on a host, and will automatically enable or disable support for encrypted AMQP connections. Plaintext connections will be available if encryption is disabled.
RabbitMQ clustering
By default the debops.rabbitmq_server role configures RabbitMQ service in
a standalone mode, without external access through the firewall. To allow for
clustering, you need to define IP addresses and/or CIDR subnets, which will be
allowed to connect to the epmd (Erlang Port Mapper Daemon) and einc
(Erlang Inter-Process Communication) TCP ports. To do that, set the variable
below in the Ansible inventory:
---
# Allow for cluster communication
rabbitmq_server__cluster_allow: [ '192.0.2.0/24' ]
After that, re-run the role to apply changes to the firewall configuration.
At the moment role does not create clusters automatically. To create a cluster
manually using three hosts (host1, host2, host3) with host1
being the main cluster node, login to the other hosts and using the root
account, run the commands:
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit@host1
rabbitmqctl start_app
You can check the RabbitMQ cluster status by running the command:
rabbitmqctl cluster_status
See the RabbitMQ Clustering Guide for more details.
Rolling restart and cluster bootstrap
Starting with RabbitMQ 4.2 the broker uses Khepri (Raft) for metadata
storage by default (in 4.0 and 4.1 it is available as an opt-in feature
flag), which means that restarting a majority of cluster nodes
simultaneously causes a timeout_waiting_for_leader boot deadlock. To
prevent this, the service playbook uses serial: 1 together with
any_errors_fatal: true and max_fail_percentage: 0, and runs a
post-task health check (rabbitmqctl await_startup +
cluster_status + assert that the current node is visible in
running_nodes). Nodes are restarted one at a time and the play stops
on the first failure.
On top of that, the Restart rabbitmq-server handler first calls
rabbitmqctl stop_app (to avoid duplicate_node_name races with EPMD),
restarts the systemd unit and waits for rabbitmqctl await_startup to
return. The handler also carries throttle: 1 as a second line of
defense in case the role is used outside of the service playbook.
Both invocation modes are supported out of the box:
Running the playbook against the whole group at once:
debops run service/rabbitmq_serverserial: 1forces sequential processing, so nodes are configured and restarted one after another even if the inventory targets the whole cluster.Running the playbook per host via
--limit(useful when the role is not configured to form the cluster automatically and each node needs a manualrabbitmqctl join_clusterin between):debops run service/rabbitmq_server --limit host1 debops run service/rabbitmq_server --limit host2 debops run service/rabbitmq_server --limit host3
The post-task assertion only checks that the current node itself rejoined the cluster, so it does not trip up either scenario; peer availability is guaranteed by the sequential execution model.
Inter-node communication is not encrypted
Erlang supports encrypting communication between nodes (processes on the same
or other hosts) using TLS, which RabbitMQ can use to
secure traffic between hosts.
However one downside is that when inter-node traffic is encrypted,
Erlang uses dynamic random ports
for communication, which might interfere with the host's firewall. Therefore by
default debops.rabbitmq_server role does not configure encrypted inter-node
communication. You should consider alternative means of securing the traffic
between hosts, for example a separate VLAN or use of a VPN connection.
Example inventory
To configure RabbitMQ on a host, it should be added to the
[debops_service_rabbitmq_server] Ansible inventory group:
[debops_service_rabbitmq_server]
hostname
Example playbook
If you are using this role without DebOps, here's an example Ansible playbook
that uses the debops.rabbitmq_server role:
---
- name: Manage RabbitMQ service
collections: [ 'debops.debops' ]
hosts: [ 'debops_service_rabbitmq_server' ]
become: True
# RabbitMQ 4.x with Khepri (Raft) metadata store deadlocks with
# 'timeout_waiting_for_leader' when a majority of cluster nodes restart
# in parallel. The three play-level options below force strictly
# sequential, one-node-at-a-time execution and abort on the first
# failure so that the cluster stays healthy.
# DO NOT REMOVE without reading the "Rolling restart and cluster
# bootstrap" section in
# docs/ansible/roles/rabbitmq_server/getting-started.rst
serial: 1
max_fail_percentage: 0
any_errors_fatal: true
environment: '{{ inventory__environment | d({})
| combine(inventory__group_environment | d({}))
| combine(inventory__host_environment | d({})) }}'
pre_tasks:
- name: Prepare rabbitmq_server environment
ansible.builtin.import_role:
name: 'rabbitmq_server'
tasks_from: 'main_env'
tags: [ 'role::rabbitmq_server', 'role::secret', 'role::rabbitmq_server:config' ]
roles:
- role: secret
tags: [ 'role::secret', 'role::rabbitmq_server', 'role::rabbitmq_server:config' ]
secret__directories:
- '{{ rabbitmq_server__secret__directories }}'
- role: etc_services
tags: [ 'role::etc_services', 'skip::etc_services' ]
etc_services__dependent_list:
- '{{ rabbitmq_server__etc_services__dependent_list }}'
- role: ferm
tags: [ 'role::ferm', 'skip::ferm' ]
ferm__dependent_rules:
- '{{ rabbitmq_server__ferm__dependent_rules }}'
- role: rabbitmq_server
tags: [ 'role::rabbitmq_server', 'skip::rabbitmq_server' ]
post_tasks:
- name: Wait for RabbitMQ node to become available
ansible.builtin.command:
cmd: 'rabbitmqctl -q await_startup --timeout 120'
changed_when: false
check_mode: false
- name: Get RabbitMQ cluster status
ansible.builtin.command:
cmd: 'rabbitmqctl -q --formatter json cluster_status'
register: rabbitmq_server__register_cluster_status
changed_when: false
check_mode: false
- name: Assert this node rejoined the cluster
ansible.builtin.assert:
that:
- _my_short in _running_short
fail_msg: |
Node rabbit@{{ ansible_hostname }} did not rejoin the cluster
cleanly. running_nodes={{ _running }}.
vars:
_running: "{{ (rabbitmq_server__register_cluster_status.stdout
| from_json).running_nodes | default([]) }}"
_running_short: "{{ _running
| map('regex_replace', '^rabbit@', '')
| map('regex_replace', '\\..*$', '')
| list }}"
_my_short: "{{ ansible_hostname | regex_replace('\\..*$', '') }}"