Доброго времени суток. Сервер (2.6.27-ovz-smp-alt9 #1 SMP x86_64) используется в т.ч. как маршрутизатор. Этим заведует VE с проброшенными eth0|1 . На eth0 настроен VLAN. Всё это работало несколько месяцев. В последнюю неделю (да и раньше) ничего не обновлял... Посоветуйте кто виноват и что делать, пожалуйста. Сегодня столкнулся с проблемой: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7 root 15 -5 0 0 0 R 85.8 0.0 24:12.35 ksoftirqd/1 13 root 15 -5 0 0 0 R 70.5 0.0 21:55.17 ksoftirqd/3 4 root 15 -5 0 0 0 R 61.2 0.0 22:09.86 ksoftirqd/0 10 root 15 -5 0 0 0 R 51.3 0.0 22:51.07 ksoftirqd/2 За вчера, столкнулся с єтим 2 раза. Первый раз помог reset, второй раз охранник, наверно, спал и я отложил перезагрузку на сегодня. Поискал. Всюду ругались на проблемы драйвера сетевой карты. Пустил меня сервер только по ssh и стал я разбираться... 1. Отрубил eth0 (внутренний интерфейс) и выгрузил 8021q - не помогло 2. Понаблюдал, как появляются прерывания. Надеюсь, я правильно понял, что означают числа в /proc/interrupts: CPU0 CPU1 CPU2 CPU3 Время 14: 15448 15289 15610 15530 20:13:02 IO-APIC-edge ide0 15616 15490 15806 15721 20:17:15 00:04:13 ср. 3 16: 886216 884792 886101 885861 20:13:02 IO-APIC-fasteoi aacraid 892159 890754 891996 891830 20:17:15 00:04:13 ср. 94 17: 124397 125099 124218 124657 20:13:02 IO-APIC-fasteoi ehci_hcd:usb1,uhci_hcd:usb2 126660 127303 126480 126884 20:17:15 00:04:13 ср. 35 2296: 1302917 1303284 1301544 1299754 20:13:02 PCI-MSI-edge eth1 1303305 1303680 1301954 1300130 20:17:15 00:04:13 ср. 6 LOC: 4554114 3947750 4023168 3822185 20:13:02 Local timer interrupts 4620588 4014420 4105265 3890368 20:17:15 00:04:13 ср. 1120 RES: 48597 85163 42248 87307 20:13:02 Rescheduling interrupts 51718 87104 44109 90804 20:17:15 00:04:13 ср. 41 TLB: 2853653 2470564 2986077 3606213 20:13:02 TLB shootdowns 2855765 2471715 2987263 3607910 20:17:15 00:04:13 ср. 24 Посчитал среднее количество за 4 мин. Лидером стал aacraid (аппаратный, отдельной платой), оставшаяся сетевуха почти прерываний не создавала. 3. Остановил nagios, внутренний сайт, внешний сайт, collectd - не помогло. 4. Посмотрел, что происходит с внешним интерфейсом: # ethtool -i eth1 driver: e1000e version: 0.3.3.3-k6 firmware-version: 1.6-12 bus-info: 0000:06:00.0 # ethtool -k eth1 Offload parameters for eth1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off 06:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) Subsystem: Super Micro Computer Inc Device 0000 Flags: bus master, fast devsel, latency 0, IRQ 2296 Memory at d8420000 (32-bit, non-prefetchable) [size=128K] I/O ports at 2020 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Mask- 64bit+ Count=1/1 Enable+ Capabilities: [e0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 7e-b7-34-ff-ff-48-30-00 Kernel driver in use: e1000e Kernel modules: e1000e eth1 Link encap:Ethernet HWaddr *** inet addr:*** Bcast:*** Mask:*** UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2396787 errors:0 dropped:0 overruns:0 frame:0 TX packets:6475512 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:213544646 (203.6 MiB) TX bytes:7619333922 (7.0 GiB) Memory:d8420000-d8440000 У кое-кого на машине запущены торренты, но нагрузка должны была прекратиться, когда я отключил eth0... При этом, пинг извне - 20 мс. Дальше я сделал глупость: решил проверить, что скажет netstat. В итоге получил вот такое: May 3 21:21:35 cita kernel: [24748.676007] BUG: soft lockup - CPU#0 stuck for 61s! [netstat:12144] May 3 21:21:35 cita kernel: [24748.676007] Modules linked in: garp mptctl mptbase simfs vzethdev vznetdev vzrst vzcpt vzdquota vzmon vzdev af_packet xt_tcpudp ipt_ttl iptable_mangle ipt_REJECT rpcsec_gss_krb5 auth_rpcgss des_generic sunrpc bridge stp nf_conntrack_ftp iptable_filter dm_mod sr_mod ppdev joydev pl2303 usbserial i5000_edac sg edac_core i2c_i801 psmouse pcspkr ide_cd_mod i2c_core serio_raw cdrom e1000e floppy rtc_cmos rtc_core parport_pc parport rtc_lib evdev thermal processor container tun ipt_MASQUERADE iptable_nat ip_tables nf_nat x_tables nf_conntrack_ipv4 nf_conntrack button ohci_hcd uhci_hcd usbhid hid ff_memless ehci_hcd usb_storage libusual usbcore ext3 jbd mbcache ata_generic aacraid ata_piix pata_acpi libata dock sd_mod crc_t10dif scsi_mod ide_disk piix ide_pci_generic ide_core [last unloaded: 8021q] May 3 21:21:35 cita kernel: [24748.676007] CPU 0: May 3 21:21:35 cita kernel: [24748.676007] Modules linked in: garp mptctl mptbase simfs vzethdev vznetdev vzrst vzcpt vzdquota vzmon vzdev af_packet xt_tcpudp ipt_ttl iptable_mangle ipt_REJECT rpcsec_gss_krb5 auth_rpcgss des_generic sunrpc bridge stp nf_conntrack_ftp iptable_filter dm_mod sr_mod ppdev joydev pl2303 usbserial i5000_edac sg edac_core i2c_i801 psmouse pcspkr ide_cd_mod i2c_core serio_raw cdrom e1000e floppy rtc_cmos rtc_core parport_pc parport rtc_lib evdev thermal processor container tun ipt_MASQUERADE iptable_nat ip_tables nf_nat x_tables nf_conntrack_ipv4 nf_conntrack button ohci_hcd uhci_hcd usbhid hid ff_memless ehci_hcd usb_storage libusual usbcore ext3 jbd mbcache ata_generic aacraid ata_piix pata_acpi libata dock sd_mod crc_t10dif scsi_mod ide_disk piix ide_pci_generic ide_core [last unloaded: 8021q] May 3 21:21:35 cita kernel: [24748.676007] Pid: 12144, comm: netstat Not tainted 2.6.27-ovz-smp-alt9 #1 briullov May 3 21:21:35 cita kernel: [24748.676007] RIP: 0010:[] [] dst_release+0x22/0x40 May 3 21:21:35 cita kernel: [24748.676007] RSP: 0018:ffffffff807408b8 EFLAGS: 00000202 May 3 21:21:35 cita kernel: [24748.676007] RAX: 0000000000000002 RBX: ffffffff807408c8 RCX: ffff8801285878ce May 3 21:21:35 cita kernel: [24748.676007] RDX: 0000000000000008 RSI: 000000000000000e RDI: ffff880139acf048 May 3 21:21:35 cita kernel: [24748.676007] RBP: ffffffff80740830 R08: ffff88013d1a30e0 R09: ffff880137574000 May 3 21:21:35 cita kernel: [24748.676007] R10: 000000000000000e R11: 0000000000000000 R12: ffffffff8020d386 May 3 21:21:35 cita kernel: [24748.676007] R13: ffffffff80740830 R14: ffff880137453080 R15: ffff8800820a8c00 May 3 21:21:35 cita kernel: [24748.676007] FS: 00007fe1301f06f0(0000) GS:ffffffff80749080(0000) knlGS:0000000000000000 May 3 21:21:35 cita kernel: [24748.676007] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 3 21:21:35 cita kernel: [24748.676007] CR2: 00007fe12fd05480 CR3: 00000001259e4000 CR4: 00000000000006e0 May 3 21:21:35 cita kernel: [24748.676007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 3 21:21:35 cita kernel: [24748.676007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 3 21:21:35 cita kernel: [24748.676007] May 3 21:21:35 cita kernel: [24748.676007] Call Trace: May 3 21:21:35 cita kernel: [24748.676007] [] ? eth_type_trans+0x2e/0x100 May 3 21:21:35 cita kernel: [24748.676007] [] ? veth_xmit+0x134/0x2c0 [vzethdev] May 3 21:21:35 cita kernel: [24748.676007] [] ? dev_hard_start_xmit+0x26c/0x2f0 May 3 21:21:35 cita kernel: [24748.676007] [] ? dev_queue_xmit+0x35e/0x580 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_dev_queue_xmit+0x0/0x50 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_dev_queue_push_xmit+0x67/0x90 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_dev_queue_xmit+0x1f/0x50 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_post_routing+0x168/0x230 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? nf_iterate+0x67/0xa0 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_dev_queue_push_xmit+0x0/0x90 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? nf_hook_slow+0xa3/0x100 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_dev_queue_push_xmit+0x0/0x90 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_forward_finish+0x41/0x60 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_forward_finish+0x128/0x130 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_forward_ip+0x230/0x2f0 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? tcp_error+0xfc/0x260 [nf_conntrack] May 3 21:21:35 cita kernel: [24748.676007] [] ? nf_iterate+0x67/0xa0 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_forward_finish+0x0/0x60 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? nf_hook_slow+0xa3/0x100 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_forward_finish+0x0/0x60 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? __br_forward+0x51/0x90 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_forward+0x58/0x70 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_handle_frame_finish+0x146/0x1e0 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_pre_routing_finish+0x2e8/0x320 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_pre_routing_finish+0x0/0x320 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? nf_hook_slow+0xa3/0x100 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_pre_routing_finish+0x0/0x320 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? kmem_cache_alloc+0x80/0x100 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_nf_pre_routing+0x3d6/0x7c0 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? nf_iterate+0x67/0xa0 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_handle_frame_finish+0x0/0x1e0 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? nf_hook_slow+0xa3/0x100 May 3 21:21:35 cita kernel: [24748.676007] [] ? br_handle_frame_finish+0x0/0x1e0 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? br_handle_frame+0x1ae/0x250 [bridge] May 3 21:21:35 cita kernel: [24748.676007] [] ? ip_rcv+0x27b/0x310 May 3 21:21:35 cita kernel: [24748.676007] [] ? netif_receive_skb+0x1ed/0x660 May 3 21:21:35 cita kernel: [24748.676007] [] ? process_backlog+0x74/0xf0 May 3 21:21:35 cita kernel: [24748.676007] [] ? net_rx_action+0xee/0x230 May 3 21:21:36 cita kernel: [24748.676007] [] ? __do_softirq+0xc2/0x190 May 3 21:21:36 cita kernel: [24748.676007] [] ? call_softirq+0x1c/0x30 May 3 21:21:36 cita kernel: [24748.676007] [] ? do_softirq+0x45/0x80 May 3 21:21:36 cita kernel: [24748.676007] [] ? local_bh_enable_ip+0xa1/0xb0 May 3 21:21:36 cita kernel: [24748.676007] [] ? _read_unlock_bh+0x10/0x20 May 3 21:21:36 cita kernel: [24748.676007] [] ? established_get_first+0xf8/0x130 May 3 21:21:36 cita kernel: [24748.676007] [] ? tcp_seq_next+0xb0/0xc0 May 3 21:21:36 cita kernel: [24748.676007] [] ? seq_read+0x232/0x3a0 May 3 21:21:36 cita kernel: [24748.676007] [] ? proc_reg_read+0x78/0xb0 May 3 21:21:36 cita kernel: [24748.676007] [] ? vfs_read+0xc8/0x180 May 3 21:21:36 cita kernel: [24748.676007] [] ? sys_read+0x50/0xe0 May 3 21:21:36 cita kernel: [24748.676007] [] ? system_call_fastpath+0x16/0x1b May 3 21:21:36 cita kernel: [24748.676007]