Skip to content

Commit b371c92

Browse files
committed
MXS-6032 Update and clean up MaxScale failover tutorial
1 parent 5433195 commit b371c92

File tree

1 file changed

+90
-84
lines changed

1 file changed

+90
-84
lines changed

maxscale/mariadb-maxscale-tutorials/automatic-failover-with-mariadb-monitor.md

Lines changed: 90 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
11
# Automatic Failover With MariaDB Monitor
22

3-
The [MariaDB Monitor](../reference/maxscale-monitors/mariadb-monitor.md) is not only capable of monitoring the state of a MariaDB primary-replica cluster but is also capable of performing _failover_ and _switchover_. In addition, in some circumstances it is capable of _rejoining_ a primary that has gone down and later reappears.
4-
5-
Note that the failover (and switchover and rejoin) functionality is only supported in conjunction with GTID-based replication and initially only for simple topologies, that is, 1 primary and several replicas.
6-
7-
The failover, switchover and rejoin functionality are inherent parts of the _MariaDB Monitor_, but neither automatic failover nor automatic rejoin are enabled by default.
8-
9-
The following examples have been written with the assumption that there are four servers - `server1`, `server2`, `server3` and `server4` - of which `server1` is the initial primary and the other servers are replicas. In addition there is a monitor called _TheMonitor_ that monitors those servers.
10-
11-
Somewhat simplified, the MaxScale configuration file would look like:
3+
[MariaDB Monitor](../reference/maxscale-monitors/mariadb-monitor.md) can do more than just monitor
4+
the state of a MariaDB replication cluster. The monitor can perform cluster manipulation operations
5+
such as *failover*, *switchover* and *rejoin*. By default, these operations are launched manually,
6+
but they can be configured to also trigger automatically. All replication modifying operations
7+
assume GTID-based replication, and will refuse to launch or may work incorrectly when using
8+
file-and-position-based replication. Also, the operations are mainly designed to work with simple
9+
topologies, i.e. 1 primary and one to multiple replicas. More complicated setups (multilayered
10+
replication, multimaster rings etc.) may work, but should be tested separately to ensure the results
11+
are predictable.
12+
13+
The following examples have four servers: *server1* is the initial primary and *server2* to
14+
*server4* are replicas. The servers are monitored by *TheMonitor*. The MaxScale configuration file
15+
would look as follows:
1216

1317
```ini
1418
[server1]
@@ -34,8 +38,7 @@ servers=server1,server2,server3,server4
3438

3539
## Manual Failover
3640

37-
If everything is in order, the state of the cluster will look something
38-
like this:
41+
If everything is in order, the state of the cluster looks like this:
3942

4043
```bash
4144
$ maxctrl list servers
@@ -52,8 +55,7 @@ $ maxctrl list servers
5255
└─────────┴─────────────────┴──────┴─────────────┴─────────────────┘
5356
```
5457

55-
If the primary now for any reason goes down, then the cluster state will
56-
look like this:
58+
If the primary server goes down, the cluster looks like this:
5759

5860
```bash
5961
$ maxctrl list servers
@@ -70,27 +72,31 @@ $ maxctrl list servers
7072
└─────────┴─────────────────┴──────┴─────────────┴────────────────┘
7173
```
7274

73-
Note that the status for `server1` is _Down_.
74-
75-
Since failover is by default _not_ enabled, the failover mechanism must be
76-
invoked manually:
75+
Since automatic failover is _not_ enabled, failover needs to be invoked manually:
7776

7877
```bash
7978
$ maxctrl call command mariadbmon failover TheMonitor
8079
OK
8180
```
8281

83-
There are quite a few arguments, so let's look at each one separately_`call command` indicates that it is a module command that is to be_
84-
\&#xNAN;_invoked,_ `mariadbmon` indicates the module whose command we want to invoke (that
85-
is the MariaDB Monitor),_`failover` is the command we want to invoke, and_ `TheMonitor` is the first and only argument to that command, the name of
86-
the monitor as specified in the configuration file.
82+
The MaxCtrl command invocation is composed of the following parts:
83+
1. `call command` launches a module command
84+
2. `mariadbmon` is the module which implements the command
85+
3. `failover` is the command to invoke
86+
4. `TheMonitor` is the first and only argument to the command, specifying the target monitor
8787

88-
The MariaDB Monitor will now autonomously deduce which replica is the most
89-
appropriate one to be promoted to primary, promote it to primary and modify
90-
the other replicas accordingly.
88+
In MaxScale 25.10 and later, the configured monitor name can be used as the module name. The above
89+
command invocation can thus be shortened to `maxctrl call command TheMonitor failover`. This
90+
alternate form works for module commands in general.
9191

92-
If we now check the cluster state we will see that one of the remaining
93-
replicas has been made into primary.
92+
During failover, *TheMonitor* selects the best replica, promotes it to primary and modifies the
93+
other replicas to replicate from the new primary. The main criteria for *best replica* is being most
94+
up-to-date. If the best replica has unprocessed events in its relay log, meaning it has received
95+
binary log events from the old primary but not processed them, then failover will stall until the
96+
processing is complete. If multiple replicas are equally good, then the monitor prefers to promote
97+
servers in the order stated in the *servers*-setting.
98+
99+
After failover completes, the cluster should look like:
94100

95101
```bash
96102
$ maxctrl list servers
@@ -107,8 +113,7 @@ $ maxctrl list servers
107113
└─────────┴─────────────────┴──────┴─────────────┴─────────────────┘
108114
```
109115

110-
If `server1` now reappears, it will not be rejoined to the cluster, as
111-
shown by the following output:
116+
If *server1* comes back online, it will not be rejoined to the cluster:
112117

113118
```bash
114119
$ maxctrl list servers
@@ -125,16 +130,13 @@ $ maxctrl list servers
125130
└─────────┴─────────────────┴──────┴─────────────┴─────────────────┘
126131
```
127132

128-
Had `auto_rejoin=true` been specified in the monitor section, then an
129-
attempt to rejoin `server1` would have been made.
130-
131-
In MaxScale 2.2.1, rejoining cannot be initiated manually, but in a
132-
subsequent version a command to that effect will be provided.
133+
This case can be handled by the [rejoin-command](#rejoin). For more details on what exactly failover
134+
does, see [MariaDB Monitor documentation](../reference/maxscale-monitors/mariadb-monitor.md#operation-details).
133135

134136
## Automatic Failover
135137

136-
To enable automatic failover, simply add `auto_failover=true` to the
137-
monitor section in the configuration file.
138+
To enable automatic failover, simply add `auto_failover=true` to the monitor section in the
139+
configuration file.
138140

139141
```ini
140142
[TheMonitor]
@@ -145,7 +147,7 @@ auto_failover=true
145147
...
146148
```
147149

148-
When everything is running fine, the cluster state looks like follows:
150+
When everything is running fine, the cluster state is as follows:
149151

150152
```bash
151153
$ maxctrl list servers
@@ -162,8 +164,8 @@ $ maxctrl list servers
162164
└─────────┴─────────────────┴──────┴─────────────┴─────────────────┘
163165
```
164166

165-
If `server1` now goes down, failover will automatically be performed and
166-
an existing replica promoted to new primary.
167+
If *server1* goes down, the monitor performs failover automatically and promotes an existing replica
168+
to primary.
167169

168170
```bash
169171
$ maxctrl list servers
@@ -180,46 +182,44 @@ $ maxctrl list servers
180182
└─────────┴─────────────────┴──────┴─────────────┴────────────────────────┘
181183
```
182184

183-
If you are continuously monitoring the server states, you may notice for a
184-
brief period that the state of `server1` is _Down_ and the state of`server2` is still _Slave, Running_.
185+
If you are continuously monitoring the server states, you may notice for a brief period that the
186+
state of *server1* is _Down_ and the state of *server2* is still _Slave, Running_. This is because
187+
the monitor does not begin failover immediately as the server goes down, as the problem could be
188+
temporary. The monitor waits for *server1* to come back for
189+
[failcount](../reference/maxscale-monitors/mariadb-monitor.md#failcount) monitor intervals. The
190+
recommended value for *failcount* depends on *monitor_interval* and the stability of the network.
191+
192+
```ini
193+
[TheMonitor]
194+
type=monitor
195+
module=mariadbmon
196+
servers=server1,server2,server3,server4
197+
auto_failover=true
198+
monitor_interval=2s
199+
failcount=5
200+
...
201+
```
185202

186203
## Rejoin
187204

188-
To enable automatic rejoin, simply add `auto_rejoin=true` to the
189-
monitor section in the configuration file.
205+
To enable automatic rejoin, simply add `auto_rejoin=true` to the monitor section in the
206+
configuration file.
190207

191208
```
192209
[TheMonitor]
193210
type=monitor
194211
module=mariadbmon
195212
servers=server1,server2,server3,server4
213+
auto_failover=true
196214
auto_rejoin=true
197215
...
198216
```
199217

200-
When automatic rejoin is enabled, the MariaDB Monitor will attempt to
201-
rejoin a failed primary as a replica, if it reappears.
202-
203-
When everything is running fine, the cluster state looks like follows:
218+
When automatic rejoin is enabled, MariaDB Monitor will attempt to rejoin a failed primary as a
219+
replica should it come back online.
204220

205-
```bash
206-
$ maxctrl list servers
207-
┌─────────┬─────────────────┬──────┬─────────────┬─────────────────┐
208-
│ Server │ Address │ Port │ Connections │ State │
209-
├─────────┼─────────────────┼──────┼─────────────┼─────────────────┤
210-
│ server1 │ 192.168.121.51 │ 3306 │ 0 │ Master, Running │
211-
├─────────┼─────────────────┼──────┼─────────────┼─────────────────┤
212-
│ server2 │ 192.168.121.190 │ 3306 │ 0 │ Slave, Running │
213-
├─────────┼─────────────────┼──────┼─────────────┼─────────────────┤
214-
│ server3 │ 192.168.121.112 │ 3306 │ 0 │ Slave, Running │
215-
├─────────┼─────────────────┼──────┼─────────────┼─────────────────┤
216-
│ server4 │ 192.168.121.201 │ 3306 │ 0 │ Slave, Running │
217-
└─────────┴─────────────────┴──────┴─────────────┴─────────────────┘
218-
```
219-
220-
Assuming `auto_failover=true` has been specified in the configuration
221-
file, when `server1` goes down for some reason, failover will be performed
222-
and we end up with the following cluster state:
221+
In the next example, failover (either automatic or manual) has promoted *server2* to replace failed
222+
primary *server1*:
223223

224224
```bash
225225
$ maxctrl list servers
@@ -236,16 +236,12 @@ $ maxctrl list servers
236236
└─────────┴─────────────────┴──────┴─────────────┴─────────────────┘
237237
```
238238

239-
If `server1` now reappears, the MariaDB Monitor will detect that and
240-
attempt to rejoin the old primary as a replica.
241-
242-
Whether rejoining will succeed depends upon the actual state of the old
243-
primary. For instance, if the old primary was modified and the changes had
244-
not been replicated to the new primary, before the old primary went down,
245-
then automatic rejoin will not be possible.
239+
If *server1* now reappears, the monitor will detect that and attempt to rejoin the old primary as a
240+
replica. Rejoin is not always possible: If the old primary processed a write (just before crashing)
241+
and that write was never replicated to the new primary, then automatic rejoin will not be possible
242+
as the old and new primaries have diverged.
246243

247-
If rejoining can be performed, then the cluster state will end up looking
248-
like:
244+
If rejoin succeeds, then the cluster state will end up looking like:
249245

250246
```bash
251247
$ maxctrl list servers
@@ -264,25 +260,32 @@ $ maxctrl list servers
264260

265261
## Switchover
266262

267-
Switchover is for cases when you explicitly want to move the primary
268-
role from one server to another.
263+
Switchover is for cases when you explicitly want to move the primary role from one server to
264+
another. Switchover is safer than failover, as switchover prevents writes to the cluster during the
265+
operation.
269266

270-
If we continue from the cluster state at the end of the previous example
271-
and want to make `server1` primary again, then we must issue the following
272-
command:
267+
Continuing from the cluster state at the end of the previous example, to make *server1* primary
268+
again, issue the following command:
273269

274270
```bash
275271
$ maxctrl call command mariadbmon switchover TheMonitor server1 server2
276272
OK
277273
```
278274

279-
There are quite a few arguments, so let's look at each one separately_`call command` indicates that it is a module command that is to be_
280-
\&#xNAN;_invoked,_ `mariadbmon` indicates the module whose command we want to invoke,_`switchover` is the command we want to invoke, and_ `TheMonitor` is the first argument to the command, the name of the monitor
281-
as specified in the configuration file,_`server1` is the second argument to the command, the name of the server we_
282-
\&#xNAN;_want to make into primary, and_ `server2` is the third argument to the command, the name of the _currentprimary_.
275+
The MaxCtrl command invocation is composed of the following parts:
276+
1. `call command` launches a module command
277+
2. `mariadbmon` is the module which implements the command
278+
3. `switchover` is the command to invoke
279+
4. `TheMonitor` specifies the target monitor
280+
5. `server1` is the server to promote
281+
6. `server2` is the server to demote, the current primary
283282

284-
If the command executes successfully, we will end up with the following
285-
cluster state:
283+
Specifying the current primary is optional. The name of the new primary server can also be left out
284+
if autoselection is tolerable, leaving just `maxctrl call command mariadbmon switchover TheMonitor`.
285+
As with *failover*, in MaxScale 25.10, the configured monitor name can be used as the module name:
286+
`maxctrl call command TheMonitor switchover`.
287+
288+
If the switchover succeeds, we end up with the following cluster state:
286289

287290
```bash
288291
$ maxctrl list servers
@@ -299,6 +302,9 @@ $ maxctrl list servers
299302
└─────────┴─────────────────┴──────┴─────────────┴─────────────────┘
300303
```
301304

305+
For more details on what exactly switchover does, see
306+
[MariaDB Monitor documentation](../reference/maxscale-monitors/mariadb-monitor.md#operation-details).
307+
302308
<sub>_This page is licensed: CC BY-SA / Gnu FDL_</sub>
303309

304310
{% @marketo/form formId="4316" %}

0 commit comments

Comments
 (0)