目录

Hadoop ResourceManager Behind Nginx, 使用 Nginx 反代 Hadoop ResourceManager 管理界面

文章简介:本文简单介绍了 Hadoop Resource Manager 页面 HA 方式,并基于此实现了一种 Openrestry + Lua 编写动态脚本,将 Hadoop Resource Manager 使用 Nginx 反向代理出来的方法。其中提供的代码,可直接运行起来。

问题场景

Hadoop Resource Manager(RM) 的管理页面采用了类似客户端负载均衡的 HA 方案。Resource Manager 分为两部分,Master/Standby。当 http 请求到 Standby,Standby 将重定向到 Master,这样浏览器即可访问到实际工作的 RM 了。而这种场景下,使用 Nginx 作为 Load Balance 代理 RM 就不合适,因为对 nginx 来说,nginx 期望所有的实例都可以对外提供服务。Nginx 是否可以提供一种机制,找到实际的 RM master 节点,将所有请求发送到这个节点呢?

解决办法

HadoopResourceManagerHA 文档 对于 Load Balancer 有如下描述

1
If you are running a set of ResourceManagers behind a Load Balancer (e.g. Azure or AWS ) and would like the Load Balancer to point to the active RM, you can use the /isActive HTTP endpoint as a health probe. http://RM_HOSTNAME/isActive will return a 200 status code response if the RM is in Active HA State, 405 otherwise.
  1. 在使用的 HadoopResourceManager 版本 没有 /isActive 这个路由,所以不能使用这个办法。
  2. 查看资料的时候,发现 http://ip:8088/jmx?qry=Hadoop:service=ResourceManager,name=ClusterMetrics 中会有主节点的 hostname, 并且这个接口只有 Master 才会有响应。基于这个特征,我们可以通过检查是否可以返回主节点信息判断是否为主。
  3. 另外,Standby 307 的 response body 中也明确标识了当前是 standby,此行为也可以作为判断是否为主的标识。

当前我们各组建版本: ResourceManager 版本3.1.1.3.1.0.0-78 from e4f82af51faec922b4804d0232a637422ec29e64 by jenkins source checksum 47cebb2682958b68f58d47415f5d2555 on 2018-12-06T12:28Z Hadoop version: 3.1.1.3.1.0.0-78 from e4f82af51faec922b4804d0232a637422ec29e64 by jenkins source checksum eab9fa2a6aa38c6362c66d8df75774 on 2018-12-06T12:26Z

具体的响应的样子如下图

尝试请求 Standby 看下效果:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
curl xx.xx.xx.xx:8088/ -k -v
*   Trying xx.xx.xx.xx:8088...
* Connected to xx.xx.xx.xx (xx.xx.xx.xx) port 8088 (#0)
> GET / HTTP/1.1
> Host: xx.xx.xx.xx:8088
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 307 Temporary Redirect
< Date: Wed, 25 Oct 2023 02:05:48 GMT
< Cache-Control: no-cache
< Expires: Wed, 25 Oct 2023 02:05:48 GMT
< Date: Wed, 25 Oct 2023 02:05:48 GMT
< Pragma: no-cache
< Content-Type: text/plain;charset=utf-8
< X-Frame-Options: SAMEORIGIN
< Location: http://bigdata3:8088/
< Content-Length: 43
<
This is standby RM. The redirect url is: /

curl 'http://xx.xx.xx.xx:8088/jmx?qry=Hadoop:service=ResourceManager,name=ClusterMetrics' -k -v
*   Trying xx.xx.xx.xx:8088...
* Connected to xx.xx.xx.xx (xx.xx.xx.xx) port 8088 (#0)
> GET /jmx?qry=Hadoop:service=ResourceManager,name=ClusterMetrics HTTP/1.1
> Host: xx.xx.xx.xx:8088
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 307 Temporary Redirect
< Date: Wed, 25 Oct 2023 02:10:32 GMT
< Cache-Control: no-cache
< Expires: Wed, 25 Oct 2023 02:10:32 GMT
< Date: Wed, 25 Oct 2023 02:10:32 GMT
< Pragma: no-cache
< Content-Type: text/plain;charset=utf-8
< X-Frame-Options: SAMEORIGIN
< Location: http://bigdata3:8088/jmx?qry=Hadoop%3Aservice%3DResourceManager%2Cname%3DClusterMetrics
< Content-Length: 109
<
This is standby RM. The redirect url is: /jmx?qry=Hadoop%3Aservice%3DResourceManager%2Cname%3DClusterMetrics
* Connection #0 to host xx.xx.xx.xx left intact

请求 Master 节点,看下请求是什么样子的:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
curl xx.xx.xx.xx:8088/ -k -v
*   Trying xx.xx.xx.xx:8088...
* Connected to xx.xx.xx.xx (xx.xx.xx.xx) port 8088 (#0)
> GET / HTTP/1.1
> Host: xx.xx.xx.xx:8088
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 302 Found
< Date: Wed, 25 Oct 2023 02:06:01 GMT
< Cache-Control: no-cache
< Expires: Wed, 25 Oct 2023 02:06:01 GMT
< Date: Wed, 25 Oct 2023 02:06:01 GMT
< Pragma: no-cache
< Content-Type: text/plain;charset=utf-8
< X-Frame-Options: SAMEORIGIN
< Vary: Accept-Encoding
< Location: http://xx.xx.xx.xx:8088/cluster
< Content-Length: 0

curl 'http://xx.xx.xx.xx:8088/jmx?qry=Hadoop:service=ResourceManager,name=ClusterMetrics' -k -v
*   Trying xx.xx.xx.xx:8088...
* Connected to xx.xx.xx.xx (xx.xx.xx.xx) port 8088 (#0)
> GET /jmx?qry=Hadoop:service=ResourceManager,name=ClusterMetrics HTTP/1.1
> Host: xx.xx.xx.xx:8088
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 25 Oct 2023 02:09:49 GMT
< Cache-Control: no-cache
< Expires: Wed, 25 Oct 2023 02:09:49 GMT
< Date: Wed, 25 Oct 2023 02:09:49 GMT
< Pragma: no-cache
< Content-Type: application/json; charset=utf8
< X-Frame-Options: SAMEORIGIN
< Vary: Accept-Encoding
< Access-Control-Allow-Methods: GET
< Access-Control-Allow-Origin: *
< Transfer-Encoding: chunked
<
{
  "beans" : [ {
    "name" : "Hadoop:service=ResourceManager,name=ClusterMetrics",
    "modelerType" : "ClusterMetrics",
    "tag.ClusterMetrics" : "ResourceManager",
    "tag.Context" : "yarn",
    "tag.Hostname" : "bigdata3",
    "NumActiveNMs" : 3,
    "NumDecommissioningNMs" : 0,
    "NumDecommissionedNMs" : 0,
    "NumLostNMs" : 0,
    "NumUnhealthyNMs" : 0,
    "NumRebootedNMs" : 0,
    "NumShutdownNMs" : 0,
    "AMLaunchDelayNumOps" : 1713,
    "AMLaunchDelayAvgTime" : 63.0,
    "AMRegisterDelayNumOps" : 1700,
    "AMRegisterDelayAvgTime" : 33170.0
  } ]
}

nginx

使用 nginx 原生支持的能力判断主 upstream 的方法可以考虑 1,2,3 三种方法。

翻了下 nginx upstream 的文档, Dynamically configurable group with periodic health checks is available as part of our commercial subscription:. 所以准备尝试一下 ngx_http_upstream_hc_module。我们直接使用 nginx:latest 镜像,配置如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
upstream backend {
        server bing.com:80;
        # server www.baidu.com:80;
}

server {
    listen       8080;
    server_name  localhost;

    location / {
		proxy_set_header Host 'bing.com';
        proxy_pass http://backend;
		health_check;
    }
}

启动 nginx 会报如下错误:

1
2023/10/23 01:56:53 [emerg] 1#1: unknown directive "health_check" in /etc/nginx/conf.d/default.conf:13

错误表示内置的 nginx 没编译 ngx_http_upstream_hc_module。如果希望使用这种方式,需要自编译 nginx 添加此模块。暂不想自编译 nginx,此方法弃用。

openresty+lua

换一个思路,如果我们期望的行为是在请求的时候,能够区分开,使用基于 nginx 的 更动态的 openresty+lua 组合写脚本方案找到当前的主,问题即可解决了。

https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

首先准备一下 openresty 的 docker 镜像

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
docker build --platform linux/amd64 -t openresty:dev -f Dockerfile .
FROM openresty/openresty:latest

RUN sed -E -i 's/(deb|security).debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list

RUN DEBIAN_FRONTEND=noninteractive apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        wget \
        curl \
    && rm -rf /var/lib/apt/lists/*

nginx.conf 文件如下

nginx.conf_root

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
daemon off;
worker_processes  3;
user root root;
events {
    worker_connections  1024;
}

http {
    default_type  application/octet-stream;
    include /etc/nginx/conf.d/*.conf;
}

默认路由核心逻辑如下,请求进来后,通过执行 shell 请求所有节点,找出当前的主的 hostname, 然后将请求路由到 hostname 对应的 upstream

default.conf

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
server {
    listen       80;
    server_name  localhost;

    location / {
        access_by_lua_block {
            os.execute("/bin/bash /etc/nginx/conf.d/hadoop-status.sh > /tmp/hadoop-status.tmp")
            handle = io.open("/tmp/hadoop-status.tmp", "r")
            result = handle:read("*a")
            handle:close()

            ngx.log(ngx.ERR, result)
            ngx.exec("@"..result)
        }
    }

    location @bigdata1 {
        proxy_pass http://192.168.100.41:8088;
    }

    location @bigdata2 {
        proxy_pass http://192.168.100.42:8088;
    }

    location @bigdata3 {
        proxy_pass http://192.168.100.43:8088;
    }
}

找到主的节点的方法比较简单,遍历所有的节点,有 jmx 返回的即为 Master。Master 中包含当前的 hostname,从脚本返回,传递给 nginx 做路由选择即可. 代码如下:

hadoop-status.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env bash

set -o errtrace
set -o errexit
set -o nounset
# set -o pipefail
# set -o xtrace

cd "$(dirname "$0")"
# cat /etc/nginx/conf.d/host_ip.txt | while read line;
cat /etc/nginx/conf.d/host_ip.txt | while read line;
do
    # 忽略备注行
    if [[ $line == \#* ]]
    then
        continue
    fi

    linearray=( $line )
	RAW_IP=${linearray[0]}
	RAW_HOSTNAME=${linearray[1]}
	MASTER_NODE=$(wget -qO- "http://${RAW_IP}:8088/jmx?qry=Hadoop:service=ResourceManager,name=ClusterMetrics" | sed 's/,/\n/g' | grep 'tag.Hostname' | sed 's/tag.Hostname" : "//g' | sed 's/[" ]//g')
	if [ "${RAW_HOSTNAME}" == "${MASTER_NODE}" ]; then
		echo -n $RAW_HOSTNAME
	fi
done

host_ip.txt 中包含所有节点 ip 和 hostname

1
2
3
192.168.100.41 bigdata1
192.168.100.42 bigdata2
192.168.100.43 bigdata3

执行下边命令后,访问 127.0.0.1:12346 看下效果

1
docker run --rm -it -p 12346:80 -v $(pwd)/:/etc/nginx/conf.d/ openresty:dev nginx -c /etc/nginx/conf.d/nginx.conf_root

总结

如上,使用 openresty+lua 编写检查下游 upstream 中所有节点主为何,然后 nginx 请求路由到主。

ref