Published on

修复PVE 6无法访问Web管理界面的问题

Authors
  • avatar
    Name
    Morphy Chan
    Twitter

Proxmox Virtual Environment (PVE) 是一款基于KVM的流行虚拟化系统,开源且高度可定制。虽然我一直用Esxi,但还是想在家用服务器上试试PVE。

按照说明安装过程很顺利,成功将PVE 6安装到了服务器的USB启动盘上。但安装完成后发现无法通过https://x.x.x.x:8006/访问Web管理界面——这本应是系统的管理入口。浏览器在几秒后直接显示连接错误,没有任何响应。

为了排查问题,我尝试了以下方法:

  1. 换了几个主流浏览器,没用;
  2. 全新安装了不同版本的PVE:6.3-16.2-1,问题依旧;
  3. 对比了下载ISO镜像的SHA哈希值,排除下载损坏的可能,哈希值与官方提供的一致。

几轮尝试后有些沮丧,开始怀疑PVE的稳定性。全新安装后就无法访问系统的情况确实少见——相比之下,Esxi从未出现过这种问题。

经过一些搜索,forum.proxmox.com上的帖子帮我定位了问题。

SSH登录到已安装的PVE系统,执行一些Linux调试命令。systemctl status pve-cluster.service的输出如下:

# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2021-02-09 22:09:24 EST; 15s ago
Process: 1494 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
...
Feb 09 22:09:24 localhost systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 09 22:09:24 localhost systemd[1]: Failed to start The Proxmox VE cluster filesystem.
-- The unit pvesr.service has entered the 'failed' state with result 'exit-code'.
Feb 09 22:16:00 localhost systemd[1]: Failed to start Proxmox VE replication runner.
...
-- The job identifier is 2800 and the job result is failed.
Feb 09 22:16:01 localhost pveproxy[1849]: worker exit
...
Feb 09 22:16:01 localhost pveproxy[1861]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PV
Feb 09 22:16:01 localhost pveproxy[1862]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PV
...

pve-cluster.service根本没有成功启动,关键错误信息是:

Feb 09 22:16:01 localhost pveproxy[1861]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PV

journalctl -u pve-cluster的结果:

# journalctl -u pve-cluster
-- Logs begin at Tue 2021-02-09 22:02:18 EST, end at Tue 2021-02-09 22:26:50 EST. --
Feb 09 22:02:22 localhost systemd[1]: Starting The Proxmox VE cluster filesystem...
Feb 09 22:02:22 localhost pmxcfs[947]: [main] crit: Unable to get local IP address
Feb 09 22:02:22 localhost pmxcfs[947]: [main] crit: Unable to get local IP address
Feb 09 22:02:22 localhost systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Feb 09 22:02:22 localhost systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 09 22:02:22 localhost systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Feb 09 22:02:22 localhost systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.

关键错误信息:

Feb 09 22:02:22 localhost pmxcfs[947]: [main] crit: Unable to get local IP address

根据这两条错误信息,最终在论坛找到了解决方案1

问题可能出在/etc/hosts和主机名配置上:你的/etc/hosts包含条目192.168.1.3 proxmox pvelocalhost——如果你配置了IP 192.168.1.3,系统会期望主机名是proxmox。

解决方法:删除127.0.1.1 debian那行(除非你确实需要),正确设置主机名,并修改/etc/hosts使其包含正确的主机名。

PVE安装后/etc/hosts文件中有几条主机记录:

127.0.0.1 localhost.localdomain localhost

192.168.*.* localhost.xxx localhost

...

第一条应该是系统默认添加的,第二条是PVE安装过程中添加的(安装时需要配置系统的IP地址和主机名)。

系统重启后,第一条主机记录与第二条发生了冲突,导致了上述问题。

注释掉第一条记录后,问题解决,Web管理界面可以正常访问了。

Footnotes

  1. /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/