Всем привет.
Столкнулся с пролемой, что proxmox примерно раз или два в месяц перезагружает VM из-за
out of memory error.
В Grafana вижу следующую картину перед отключением ноды:
Memory Basic
31,68 Total
9,72 Used
19,66 Cache + Buffer
2,04 Free
Proxmox VE 5.4
Файловая система ext4
Логи с Proxmox:
Jul 22 02:40:25 services pve-firewall[1838]: status update error: command '/sbin/iptables-save' failed: open3: fork failed: Cannot allocate memory at /usr/share/perl5/PVE/Tools.pm line 431.
Jul 22 02:40:26 services pvestatd[1834]: fork failed: Cannot allocate memory
Jul 22 02:40:26 services pvestatd[1834]: command '/sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count' failed: open3: fork failed: Cannot allocate memory at /usr/share/perl5/PVE/Tools.pm line 431.
Jul 22 02:40:34 websvc2 kernel: zabbix_agentd invoked oom-killer: gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=1, oom_score_adj=0
Jul 22 02:40:34 websvc2 kernel: zabbix_agentd cpuset=/ mems_allowed=0-1
Jul 22 02:40:34 websvc2 kernel: CPU: 3 PID: 18994 Comm: zabbix_agentd Tainted: G O 4.15.18-12-pve #1
Jul 22 02:40:34 websvc2 kernel: Call Trace:
Jul 22 02:40:34 websvc2 kernel: dump_stack+0x63/0x8b
Jul 22 02:40:34 websvc2 kernel: dump_header+0x6e/0x285
Jul 22 02:40:34 websvc2 kernel: ? security_capable_noaudit+0x4b/0x70
Jul 22 02:40:34 websvc2 kernel: oom_kill_process+0x21d/0x440
Jul 22 02:40:34 websvc2 kernel: out_of_memory+0x11d/0x4d0
Jul 22 02:40:34 websvc2 kernel: __alloc_pages_slowpath+0xdf4/0xee0
Jul 22 02:40:34 websvc2 kernel: __alloc_pages_nodemask+0x25b/0x280
Jul 22 02:40:34 websvc2 kernel: alloc_pages_current+0x6a/0xe0
Jul 22 02:40:34 websvc2 kernel: __get_free_pages+0xe/0x30
Jul 22 02:40:34 websvc2 kernel: pgd_alloc+0x1e/0x170
Jul 22 02:40:34 websvc2 kernel: mm_init+0x198/0x280
Jul 22 02:40:34 websvc2 kernel: copy_process.part.35+0xa43/0x1b00
Jul 22 02:40:34 websvc2 kernel: ? security_file_alloc+0x29/0xa0
Jul 22 02:40:34 websvc2 kernel: ? security_file_alloc+0x68/0xa0
Jul 22 02:40:34 websvc2 kernel: _do_fork+0xdf/0x3f0
Jul 22 02:40:34 websvc2 kernel: ? get_unused_fd_flags+0x30/0x40
Jul 22 02:40:34 websvc2 kernel: ? __do_pipe_flags+0x5f/0xd0
Jul 22 02:40:34 websvc2 kernel: SyS_clone+0x19/0x20
Jul 22 02:40:34 websvc2 kernel: do_syscall_64+0x73/0x130
Jul 22 02:40:34 websvc2 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jul 22 02:40:34 websvc2 kernel: RIP: 0033:0x7ff2e364238b
Jul 22 02:40:34 websvc2 kernel: RSP: 002b:00007ffd8462daa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Jul 22 02:40:34 websvc2 kernel: RAX: ffffffffffffffda RBX: 00007ffd8462daa0 RCX: 00007ff2e364238b
Jul 22 02:40:34 websvc2 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Jul 22 02:40:34 websvc2 kernel: RBP: 00007ffd8462db30 R08: 00007ff2e4ccb740 R09: 00007ff2e4ccb740
Jul 22 02:40:34 websvc2 kernel: R10: 00007ff2e4ccba10 R11: 0000000000000246 R12: 0000000000000000
Jul 22 02:40:34 websvc2 kernel: R13: 0000000000000020 R14: 0000000000000000 R15: 00007ffd8462dac0
Jul 22 02:40:34 websvc2 kernel: Mem-Info:
Jul 22 02:40:34 websvc2 kernel: active_anon:32158038 inactive_anon:202297 isolated_anon:0
active_file:191 inactive_file:0 isolated_file:0
unevictable:1251 dirty:0 writeback:0 unstable:0
slab_reclaimable:89972 slab_unreclaimable:149766
mapped:36021 shmem:327938 pagetables:78140 bounce:0
free:100563 free_pcp:1014 free_cma:0
Затем выводится список процессов, и после него:
Jul 22 02:40:34 websvc2 kernel: Out of memory: Kill process 30983 (kvm) score 247 or sacrifice child
Jul 22 02:40:34 websvc2 kernel: Killed process 30983 (kvm) total-vm:34615152kB, anon-rss:33571088kB, file-rss:404kB, shmem-rss:20kB
Как найти проблему данного сбоя?