目录

在 k8s 实际生产中 bitnami/postgres-ha 的一次 14 升级 15 经验及过程

文章简介:bitnami/postgres-ha 使用 pg_upgrade 全自动大版本升级 从 14 到 15

Infrastructural notes

Reasons to upgrade

开始我所在产品计划做基于 k8s 的集群版,postgres 需要一个可靠的可以自动主从切换的 HA 方案。本着不重新造轮子的原则,选择了 bitnami/postgres-ha 的方案。调研时发现开了 witness/pg_rewind 后,pg 反复重启。为此把 witness/pg_rewind 关了后上了生产。后续由于我们百 G 存储的 pg 测试集群偶发了一次丢所有的数据,调查后决定开启 pg_rewind 防止丢数据。调研中发现最新的 bitnami/postgres-ha 基于 postgres 15, pg 14 开启后不工作,postgres 15 已经经过一段时间的生产环境验证,决定尝试 pg14 升级 15。

当前官方提供的升级方案包括:

  • Dump & restore: 最简单也是最耗时的方法。将面临更长的停机时间,我们一个测试环境采用不落盘方案,30G 数据大约花费了 30 分钟,在更大的数据库或不同的硬件配置可能会有不同的执行时间;
  • Logical replication: 停机时间最短的方式(所有数据库实例数据同步完成后切换旧集群到新集群这段时间会有短暂秒级服务不可用)。从 pg10 开始就内置了逻辑复制。
  • PostgreSQL official upgrade tool: pg_upgrade 官方维护了一个命令行工具, 用于此场景.

我所面临的场景要求原地升级,不接受使用一个新的 bitnami/postgres-ha 集群替换新的集群,允许比较短的维护时间窗口(可停服),故采用方案 3: PostgreSQL official upgrade tool pg_upgrade。如果对停机时间有严格要求的,还是建议采用方案 2 Logical replication。

因为我们的场景属于容器化场景,在一个 k8s 集群中部署了我们所有的数据库服务及业务服务,我们先在 docker 环境下熟悉一下 pg_upgrade 的操作。

基于 docker 的 postgres 升级过程探索

调研过程中看到容器场景比较理想的迁移方案 blog Terabyte-scale PostgreSQL upgrade from 9.6 to 14 ,其中为他们的场景自制了一个用于升级的 docker image,此项目中对自己的介绍 This is a PoC for using pg_upgrade inside Docker -- learn from it, adapt it for your needs; don't expect it to work as-is!。接下来我们使用此方法复现一下大版本升级过程。

升级过程

在此过程中我们会先启动一个 postgres14, 然后使用 tianon/postgres-upgrade:14-to-15 从 14 升级到 15. 然后启动 pg15 验证是否升级成功。

先看一下 docker-compose.yaml 文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
version: "3"
services:
  db14:
    image: postgres:14-bookworm
    container_name: db14
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: test
      POSTGRES_DB: test
      PGDATA: /var/lib/postgresql/all/db14
    volumes:
      - ./data/:/var/lib/postgresql/all
    ports:
      - 5432:5432

  db15:
    image: postgres:15-bookworm
    container_name: db15
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: test
      POSTGRES_DB: test
      PGDATA: /var/lib/postgresql/all/db15
    volumes:
      - ./data/:/var/lib/postgresql/all
    ports:
      - 5433:5432

  upgrade:
    image: tianon/postgres-upgrade:14-to-15
    container_name: upgrade
    environment:
      PGDATAOLD: /var/lib/postgresql/all/db14
      PGDATANEW: /var/lib/postgresql/all/db15
      POSTGRES_USER: test
      POSTGRES_PASSWORD: password
    command: ["tail", "-f", "/dev/null"]
    volumes:
      - ./data/:/var/lib/postgresql/all

首先我们把 pg 14 启动起来,并创建一些数据。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
docker-compose up -d db14
[+] Building 0.0s (0/0)                                                          docker-container:multiarch
[+] Running 2/2
  Network docker_default  Created                                                                     0.1s
  Container db14          Started

export database='postgres://postgres:test@localhost:5432/test?sslmode=disable'
docker-compose exec db14 psql "$database" -c 'select version();'
------------------------------------------------------------------------------------------------------------
-----------------
 PostgreSQL 14.10 (Debian 14.10-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14
) 12.2.0, 64-bit
(1 row)

docker-compose exec db14 psql "$database" -c 'CREATE TABLE IF NOT EXISTS t_test( ID INT NOT NULL, NAME      TEXT  NOT NULL, AGE      INT   NOT NULL, ADDRESS    CHAR(50), SALARY     REAL );'
CREATE TABLE

docker-compose exec db14 psql "$database" -c 'select count(*) from t_test;'
 count
-------
     0
(1 row)

docker-compose exec db14 psql "$database" -c 'insert into t_test SELECT generate_series(1,1) as key,repeat( chr(int4(random()*26)+65),4), (random()*(6^2))::integer,null,(random()*(10^4))::integer;'
INSERT 0 1

docker-compose exec db14 psql "$database" -c 'select count(*) from t_test;'
 count
-------
     1
(1 row)

开始升级过程

1
2
3
docker-compose stop db14
[+] Stopping 1/1
 ✔ Container db14  Stopped
1
2
3
4
docker-compose up -d upgrade

[+] Running 1/1
 ✔ Container upgrade  Started
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
docker-compose exec upgrade docker-upgrade pg_upgrade

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/all/db15 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/all/db15 -l logfile start

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting oldest XID for new cluster                          ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
                                                            ok
Copying user relation files
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to delete old cluster                       ok
Checking for extension updates                              ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
    /usr/lib/postgresql/15/bin/vacuumdb --all --analyze-in-stages

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh
----------------
ExecutionTime: 0h:00m:12s
1
2
3
4
docker-compose up -d db15

[+] Running 1/1
 ✔ Container db15  Started
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
docker-compose logs db15

db15  |
db15  | PostgreSQL Database directory appears to contain a database; Skipping initialization
db15  |
db15  | 2023-12-24 02:50:09.279 UTC [1] LOG:  starting PostgreSQL 15.5 (Debian 15.5-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
db15  | 2023-12-24 02:50:09.279 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db15  | 2023-12-24 02:50:09.279 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db15  | 2023-12-24 02:50:09.282 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db15  | 2023-12-24 02:50:09.290 UTC [30] LOG:  database system was shut down at 2023-12-24 02:48:38 UTC
db15  | 2023-12-24 02:50:09.304 UTC [1] LOG:  database system is ready to accept connections
db15  | 2023-12-24 02:50:24.376 UTC [34] WARNING:  database "test" has a collation version mismatch
db15  | 2023-12-24 02:50:24.376 UTC [34] DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
db15  | 2023-12-24 02:50:24.376 UTC [34] HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE test REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
db15  | 2023-12-24 02:50:39.360 UTC [35] WARNING:  database "postgres" has a collation version mismatch
db15  | 2023-12-24 02:50:39.360 UTC [35] DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
db15  | 2023-12-24 02:50:39.360 UTC [35] HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
1
2
3
4
5
6
7
docker-compose exec db15 psql "$database" -c 'ALTER DATABASE test REFRESH COLLATION VERSION;'

WARNING:  database "test" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE test REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
NOTICE:  changing version from 2.31 to 2.36
ALTER DATABASE
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
docker-compose exec db15 vacuumdb --username=postgres --all --analyze-in-stages

WARNING:  database "postgres" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
WARNING:  database "postgres" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
WARNING:  database "template1" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE template1 REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "test": Generating minimal optimizer statistics (1 target)
WARNING:  database "postgres" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
WARNING:  database "template1" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE template1 REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "test": Generating medium optimizer statistics (10 targets)
WARNING:  database "postgres" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
WARNING:  database "template1" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE template1 REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
vacuumdb: processing database "template1": Generating default (full) optimizer statistics
vacuumdb: processing database "test": Generating default (full) optimizer statistics

验证一下已经升级成功

1
2
3
4
5
6
docker-compose exec db15 psql "$database" -c 'select count(*) from t_test;'

 count
-------
     1
(1 row)
1
2
3
4
5
6
docker-compose exec db15 psql "$database" -c 'select * from t_test;'

 id | name | age | address | salary
----+------+-----+---------+--------
  1 | EEEE |  29 |         |   9179
(1 row)

验证通过。

升级注意事项

如果使用了 PostgreSQL extension, 需要手动安装

在升级容器新版本 PG 手动安装

1
2
apt update && apt install postgresql-14-pgaudit
apt update && apt install postgresql-15-pgaudit

旧 postgres:14-alpine 使用 postgres:15-bookworm-pg_upgrade 新运行环境 postgres:15-alpine

使用 postgres:15-bookworm 做 pg_upgrade 遇到 warning:

1
database "postgres" has no actual collation version, but a version was recorded
1
2
3
4
REINDEX
REINDEX DATABASE splat;ALTER DATABASE splat REFRESH COLLATION VERSION; ```
```bash
ERROR:  invalid collation version change

glibc 版本/系统版本变化导致 pg_upgrade 升级后报错 官方建议 新旧及升级容器的 linux 运行时都使用相同的. 比如此例子中应都使用 alpine:3.14.

最终,写了完整验证自动化脚本:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# docker-compose.yaml
version: "3"
services:
  db14:
    image: postgres:14-bookworm
    container_name: db14
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: test
      POSTGRES_DB: test
      PGDATA: /var/lib/postgresql/all/db14
    volumes:
      - ./data/:/var/lib/postgresql/all
    ports:
      - 5432:5432

  db15:
    image: postgres:15-bookworm
    container_name: db15
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: test
      POSTGRES_DB: test
      PGDATA: /var/lib/postgresql/all/db15
    volumes:
      - ./data/:/var/lib/postgresql/all
    ports:
      - 5433:5432

  upgrade:
    image: tianon/postgres-upgrade:14-to-15
    container_name: upgrade
    environment:
      PGDATAOLD: /var/lib/postgresql/all/db14
      PGDATANEW: /var/lib/postgresql/all/db15
      POSTGRES_USER: test
      POSTGRES_PASSWORD: password
    command: ["tail", "-f", "/dev/null"]
    volumes:
      - ./data/:/var/lib/postgresql/all
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# test.sh
#!/usr/bin/env bash

set -o errtrace
set -o errexit
set -o nounset
set -o pipefail
set -o xtrace

cd "$(dirname "$0")"

export database='postgres://postgres:test@localhost:5432/test?sslmode=disable'

docker-compose down
rm -rf data
sleep 3

docker-compose up -d db14
sleep 5
docker-compose exec db14 pg_isready -d "$database" -t 20

docker-compose exec db14 psql "$database" -c 'select version();'
docker-compose exec db14 psql "$database" -c 'CREATE TABLE IF NOT EXISTS t_test( ID INT NOT NULL, NAME      TEXT  NOT NULL, AGE      INT   NOT NULL, ADDRESS    CHAR(50), SALARY     REAL );'
docker-compose exec db14 psql "$database" -c 'select count(*) from t_test;'
docker-compose exec db14 psql "$database" -c 'insert into t_test SELECT generate_series(1,1) as key,repeat( chr(int4(random()*26)+65),4), (random()*(6^2))::integer,null,(random()*(10^4))::integer;'
docker-compose exec db14 psql "$database" -c 'select count(*) from t_test;'

docker-compose stop db14

docker-compose up -d upgrade

docker-compose exec upgrade docker-upgrade pg_upgrade

docker-compose up -d db15
sleep 3
docker-compose exec db15 psql "$database" -c 'ALTER DATABASE test REFRESH COLLATION VERSION;'
docker-compose exec db15 vacuumdb --username=postgres --all --analyze-in-stages

docker-compose logs -f db15

基于 k8s 的 bitnami/postgres-ha 升级的尝试

Terabyte-scale PostgreSQL upgrade from 9.6 to 14 中有描述原地升级方法,在此复现,并将过程自动化。

重复升级方法。

首先创建一个 postgres 14 的 pg 集群

1
2
helm pull oci://registry-1.docker.io/bitnamicharts/postgresql-ha --version 12.3.1
helm upgrade --install shared postgresql-ha-12.3.1.tgz --set postgresql.image.tag='14.10.0-debian-11-r6' --set postgresql.replicaCount=1 --set postgresql.sharedPreloadLibraries='"pgaudit, repmgr, pg_stat_statements"'

先准备升级 pod

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# 停止pg
kubectl scale --replicas=0 sts shared-postgresql-ha-postgresql

cat <<'EOF' | kubectl apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: pgupgrade
spec:
  completions: 1
  backoffLimit: 1
  template:
    spec:
      restartPolicy: Never
      volumes:
        - name: data-sharedx-pgha-0
          persistentVolumeClaim:
            claimName: data-shared-postgresql-ha-postgresql-0
      containers:
        - name: pg-upgrade
          image: tianon/postgres-upgrade:14-to-15
          imagePullPolicy: IfNotPresent
          command:
            - "sh"
            - -c
            - "tail -f /dev/null"
          env:
            - name: PGDATABASE
              value: /bitnami/postgresql/
            - name: PGDATAOLD
              value: /bitnami/postgresql/data
            - name: PGDATANEW
              value: /bitnami/postgresql/datanew
          volumeMounts:
            - mountPath: "/bitnami/postgresql"
              name: data-sharedx-pgha-0
EOF

# 获得升级 shell
kubectl exec -it $(kubectl get pod -l batch.kubernetes.io/job-name=pgupgrade -o name) -- bash

尝试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
docker-upgrade pg_upgrade


Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

*failure*
Consult the last few lines of "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T060414.361/log/pg_upgrade_server.log" for
the probable cause of the failure.

connection to server on socket "/var/lib/postgresql/.s.PGSQL.50432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?

could not connect to source postmaster started with the command:
"/usr/lib/postgresql/14/bin/pg_ctl" -w -l "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T060414.361/log/pg_upgrade_server.log" -D "/bitnami/postgresql/data" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/postgresql'" start
Failure, exiting

cat /bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T060414.361/log/pg_upgrade_server.log
-----------------------------------------------------------------
  pg_upgrade run on Tue Nov 28 06:04:14 2023
-----------------------------------------------------------------

command: "/usr/lib/postgresql/14/bin/pg_ctl" -w -l "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T060414.361/log/pg_upgrade_server.log" -D "/bitnami/postgresql/data" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/postgresql'" start >> "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T060414.361/log/pg_upgrade_server.log" 2>&1
waiting for server to start....postgres: could not access the server configuration file "/bitnami/postgresql/data/postgresql.conf": No such file or directory
 stopped waiting
pg_ctl: could not start server
Examine the log output.

看起来旧数据需要 postgresql.conf 才可以启动。现将所有旧配置都复制到数据目录

1
kubectl exec -it shared-postgresql-ha-postgresql-0 -- cp /opt/bitnami/postgresql/conf/postgresql.conf /bitnami/postgresql/data
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
root@task-pg-upgrade-545cbc8cdf-6z8mf:/var/lib/postgresql# cat /bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T064300.020/log/pg_upgrade_server.log
-----------------------------------------------------------------
  pg_upgrade run on Tue Nov 28 06:43:00 2023
-----------------------------------------------------------------

command: "/usr/lib/postgresql/14/bin/pg_ctl" -w -l "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T064300.020/log/pg_upgrade_server.log" -D "/bitnami/postgresql/data" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/postgresql'" start >> "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T064300.020/log/pg_upgrade_server.log" 2>&1
waiting for server to start....2023-11-28 06:43:00.206 GMT [177] LOG:  could not open configuration directory "/bitnami/postgresql/data/conf.d": No such file or directory
2023-11-28 06:43:00.207 GMT [177] FATAL:  configuration file "/bitnami/postgresql/data/postgresql.conf" contains errors
 stopped waiting
pg_ctl: could not start server
Examine the log output.

执行 mkdir -p /bitnami/postgresql/data/conf.d 修复

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
cat/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T064744.773/log/pg_upgrade_server.log
-----------------------------------------------------------------
  pg_upgrade run on Tue Nov 28 06:47:44 2023
-----------------------------------------------------------------

command: "/usr/lib/postgresql/14/bin/pg_ctl" -w -l "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T064744.773/log/pg_upgrade_server.log" -D "/bitnami/postgresql/data" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/postgresql'" start >> "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T064744.773/log/pg_upgrade_server.log" 2>&1
waiting for server to start....2023-11-28 06:47:44.970 GMT [255] FATAL:  58P01: could not access file "repmgr": No such file or directory
2023-11-28 06:47:44.970 GMT [255] LOCATION:  internal_load_library, dfmgr.c:208
2023-11-28 06:47:44.970 GMT [255] LOG:  00000: database system is shut down
2023-11-28 06:47:44.970 GMT [255] LOCATION:  UnlinkLockFiles, miscinit.c:970
 stopped waiting
pg_ctl: could not start server
Examine the log output.

配置中有 repmgr、pgaudit ,升级过程中不需要此配置,可以从配置中删除

1
sed -i 's/repmgr, pgaudit, repmgr, //' /bitnami/postgresql/data/postgresql.conf

修复后再试

1
2
3
4
5
6
7
8
command: "/usr/lib/postgresql/14/bin/pg_ctl" -w -l "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T070009.292/log/pg_upgrade_server.log" -D "/bitnami/postgresql/data" -o "-p 50432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/postgresql'" start >> "/bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T070009.292/log/pg_upgrade_server.log" 2>&1
waiting for server to start....2023-11-28 07:00:09.671 GMT [522] FATAL:  58P01: could not open log file "/opt/bitnami/postgresql/logs/postgresql.log": No such file or directory
2023-11-28 07:00:09.671 GMT [522] LOCATION:  logfile_open, syslogger.c:1223
2023-11-28 07:00:09.688 GMT [522] LOG:  00000: database system is shut down
2023-11-28 07:00:09.688 GMT [522] LOCATION:  UnlinkLockFiles, miscinit.c:970
 stopped waiting
pg_ctl: could not start server
Examine the log output.
1
2
3
mkdir -p /opt/bitnami/postgresql/logs/
chown -R 999:999 /opt/bitnami/postgresql/logs/
chown -R 999:999 /opt/bitnami/postgresql/

修复后再试

1
connection to server on socket "/tmp/.s.PGSQL.50432" failed: fe_sendauth: no password supplied
1
2
3
echo 'local all all trust' >> /bitnami/postgresql/data/pg_hba.conf # 此方法不行

sed -i 's/local    all              all                    md5/local    all              all                    trust/' /bitnami/postgresql/data/pg_hba.conf # 此方法可以

修复后再试

1
2
3
4
5
Failure, exiting

cat /bitnami/postgresql/datanew/pg_upgrade_output.d/20231128T081622.981/loadable_libraries.txt
could not load library "$libdir/repmgr": ERROR:  could not access file "$libdir/repmgr": No such file or directory
In database: repmgr

https://opensource-db.com/the-quick-and-easy-way-to-upgrade-postgres-using-pg_upgrade/

1
apt install postgresql-15-repmgr

终于可以了。

自动化

k8s bitnami/postgres-ha 升级步骤比较多,在此汇总出自动化, 代码放到 exfly/bitnami-pg-upgrade. 可以根据情况适量修改。

参考