From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Dmitry Kovalsky Organization: IMBG To: community@altlinux.ru Date: Thu, 2 Oct 2003 17:41:37 +0300 User-Agent: KMail/1.5 MIME-Version: 1.0 Content-Type: text/plain; charset="windows-1251" Content-Transfer-Encoding: 8bit Content-Disposition: inline Message-Id: <200310021741.37066.dikov@imbg.org.ua> X-AVP-RU: Passed Subject: [Comm] =?windows-1251?b?6vDo4u7lIP/k8O4gwMvMMi4yPz8/Pw==?= X-BeenThere: community@altlinux.ru X-Mailman-Version: 2.1.2 Precedence: list Reply-To: community@altlinux.ru List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Oct 2003 14:42:27 -0000 Archived-At: List-Archive: List-Post: Привет есть 2 машины dual AMD 2000+ между ними гигабит на них считаю паралелльные задачи раньше все крутилось на Мандраке и решил залить туда АЛМ2.2 каждый день виснет машина, в логах такое Oct 2 06:50:00 node1 crond[164]: (dikov) CMD (/home/dikov/bin/md_parallel_p9v) Oct 2 07:00:00 node1 crond[546]: (dikov) CMD (/home/dikov/bin/md_parallel_p9v) Oct 2 07:01:01 node1 crond[655]: (root) CMD (run-parts /etc/cron.hourly) Oct 2 07:06:38 node1 kernel: Unable to handle kernel paging request at virtual address 3747ab64 Oct 2 07:06:38 node1 kernel: printing eip: Oct 2 07:06:38 node1 kernel: c01338e5 Oct 2 07:06:38 node1 kernel: *pde = 00000000 Oct 2 07:06:38 node1 kernel: Oops: 0002 2.4.20-alt5-smp #1 SMP Sun Feb 16 16:07:02 MSK 2003 Oct 2 07:06:38 node1 kernel: CPU: 0 Oct 2 07:06:38 node1 kernel: EIP: 0010:[kmem_cache_alloc_batch+101/208] Not tainted Oct 2 07:06:38 node1 kernel: EIP: 0010:[] Not tainted Oct 2 07:06:38 node1 kernel: EFLAGS: 00010056 Oct 2 07:06:38 node1 kernel: eax: df8d6bd8 ebx: c9b39800 ecx: c547a2c0 edx: 3747ab60 Oct 2 07:06:38 node1 kernel: esi: df8d6bd0 edi: 00000016 ebp: dffc2c60 esp: cc271c94 Oct 2 07:06:38 node1 kernel: ds: 0018 es: 0018 ss: 0018 Oct 2 07:06:38 node1 kernel: Process mdrun_mpi (pid: 9934, stackpage=cc271000) Oct 2 07:06:38 node1 lamd[1566]: died: caught child death; trying to detach Oct 2 07:06:38 node1 lamd[1566]: died: detaching table entry 10 Oct 2 07:06:38 node1 lamd[1566]: died: finished Oct 2 07:06:38 node1 kernel: Stack: df8d6bd0 00000206 df8d6bd0 00000246 c0133c33 df8d6bd0 dffc2c60 000001f0 Oct 2 07:06:38 node1 kernel: cc270000 cde0a060 c5406900 cde0a1a0 000027ec c42bc8e0 00000206 000001f0 Oct 2 07:06:38 node1 kernel: 0000d8b8 c01b768f 0000071c 000001f0 c576e6c0 00000000 cde0a1a0 c01d95d4 Oct 2 07:06:38 node1 kernel: Call Trace: [kmalloc+163/384] [alloc_skb+239/448] [tcp_sendmsg+580/4656] [nf_hook_slow+266/384] [do_page_fault+0/1307] Oct 2 07:06:38 node1 kernel: Call Trace: [] [] [] [] [] Oct 2 07:06:38 node1 kernel: [error_code+52/64] [__generic_copy_to_user+48/64] [memcpy_toiovec+57/96] [skb_copy_datagram_iovec+77/592] [kfree+131/144] [inet_sendmsg+53/64] Oct 2 07:06:38 node1 kernel: [] [] [] [] [] [] Oct 2 07:06:38 node1 kernel: [sock_sendmsg+108/144] [sock_readv_writev+148/160] [sock_writev+59/80] [do_readv_writev+428/736] [poll_freewait+68/80] [do_select+566/592] Oct 2 07:06:38 node1 kernel: [] [] [] [] [] [] Oct 2 07:06:38 node1 kernel: [sys_select+1138/1152] [do_fcntl+379/656] [sys_writev+67/96] [system_call+51/64] Oct 2 07:06:38 node1 kernel: [] [] [] [] Oct 2 07:06:38 node1 kernel: Code: 89 42 04 89 10 c7 01 00 00 00 00 c7 41 04 00 00 00 00 8b 06 Oct 2 07:06:38 node1 lamd[1566]: kenyad: did not start this process with the flatd; detaching now Oct 2 07:06:38 node1 lamd[1566]: kenyad: in pdetachindex Oct 2 07:06:38 node1 lamd[1566]: kenyad: detatching, checking for RTF_FLAT (0x800) in flags: 0x159691 Oct 2 07:06:38 node1 lamd[1566]: kouter: kqdetach detached process pid=9933 Oct 2 07:06:38 node1 lamd[1566]: kouter: kqdetach calling kio_close Oct 2 07:06:38 node1 lamd[1566]: kouter: kqdetach calling knuke Oct 2 07:06:39 node1 lamd[1566]: kouter: kqdetach detached process pid=9932 Oct 2 07:06:39 node1 lamd[1566]: kouter: kqdetach calling kio_close Oct 2 07:06:39 node1 lamd[1566]: kouter: kqdetach calling knuke Oct 2 07:10:00 node1 crond[938]: (dikov) CMD (/home/dikov/bin/md_parallel_p9v) Oct 2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1041, pri=1095 Oct 2 07:10:10 node1 lamd[1566]: flatd: flqload - successfully created file /tmp/lam-dikov@node1.entry.kiev.ua/lam-flatd2 Oct 2 07:10:10 node1 lamd[1566]: flatd: flqload - file descriptor 13 Oct 2 07:10:10 node1 lamd[1566]: kenyad: pqcreating with rtf 0x159290 Oct 2 07:10:10 node1 lamd[1566]: kenyad: looking for executable "/usr/local/bin/mdrun_mpi" in directory "/home/gromacs/dikov/P9V" Oct 2 07:10:10 node1 lamd[1566]: kenyad: found "/usr/local/bin/mdrun_mpi" Oct 2 07:10:10 node1 lamd[1566]: kenyad: creating new user process... Oct 2 07:10:10 node1 lamd[1566]: kenyad: attempting to receive stdout/stderr file descriptors Oct 2 07:10:10 node1 lamd[1566]: kenyad: recv_stdio_fds: happiness Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting environment variables to pass to new process Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSFD Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSRTF Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMJOBID Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMKENYAPID Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMWORLD Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMPARENT Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMRANK Oct 2 07:10:10 node1 lamd[1566]: kenyad: checking for working directory flag Oct 2 07:10:10 node1 lamd[1566]: kenyad: working directory set explicitly Oct 2 07:10:10 node1 lamd[1566]: kenyad: running in directory /home/gromacs/dikov/P9V Oct 2 07:10:10 node1 lamd[1566]: kenyad: fork/exec succeeded, pid 1042, index 10, rtf 0x159292 Oct 2 07:10:10 node1 lamd[1566]: kenyad: create succeeded, process running Oct 2 07:10:10 node1 lamd[1566]: flatd: flqload - successfully created file /tmp/lam-dikov@node1.entry.kiev.ua/lam-flatd3 Oct 2 07:10:10 node1 lamd[1566]: flatd: flqload - file descriptor 13 Oct 2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1042, pri=0 Oct 2 07:10:10 node1 lamd[1566]: kenyad: pqcreating with rtf 0x159290 Oct 2 07:10:10 node1 lamd[1566]: kenyad: looking for executable "/usr/local/bin/mdrun_mpi" in directory "/home/gromacs/dikov/P9V" Oct 2 07:10:10 node1 lamd[1566]: kenyad: found "/usr/local/bin/mdrun_mpi" Oct 2 07:10:10 node1 lamd[1566]: kenyad: creating new user process... Oct 2 07:10:10 node1 lamd[1566]: kenyad: attempting to receive stdout/stderr file descriptors Oct 2 07:10:10 node1 lamd[1566]: kenyad: recv_stdio_fds: happiness Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting environment variables to pass to new process Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSFD Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSRTF Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMJOBID Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMKENYAPID Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMWORLD Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMPARENT Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMRANK Oct 2 07:10:10 node1 lamd[1566]: kenyad: checking for working directory flag Oct 2 07:10:10 node1 lamd[1566]: kenyad: working directory set explicitly Oct 2 07:10:10 node1 lamd[1566]: kenyad: running in directory /home/gromacs/dikov/P9V Oct 2 07:10:10 node1 lamd[1566]: kenyad: fork/exec succeeded, pid 1043, index 11, rtf 0x159292 Oct 2 07:10:10 node1 lamd[1566]: kenyad: create succeeded, process running Oct 2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1043, pri=0 Oct 2 07:10:14 node1 pam_tcb[1044]: sshd: Authentication failed for dikov from (uid=0) Oct 2 07:10:14 node1 sshd[1049]: input_userauth_request: illegal user sysadmin Oct 2 07:10:14 node1 sshd[1049]: Failed none for UNKNOWN USER from 10.1.1.1 port 49371 ssh2 Oct 2 07:10:14 node1 sshd[1049]: Failed password for UNKNOWN USER from 10.1.1.1 port 49371 ssh2 Oct 2 07:10:14 node1 sshd[1049]: Connection closed by 10.1.1.1 Oct 2 07:10:14 node1 pam_tcb[1051]: sshd: Authentication failed for dikov from (uid=0) Oct 2 07:10:14 node1 sshd[1056]: input_userauth_request: illegal user sysadmin Oct 2 07:10:14 node1 sshd[1056]: Failed none for UNKNOWN USER from 10.1.1.1 port 50883 ssh2 Oct 2 07:10:14 node1 sshd[1056]: Failed password for UNKNOWN USER from 10.1.1.1 port 50883 ssh2 Oct 2 07:10:14 node1 sshd[1056]: Connection closed by 10.1.1.1 Oct 2 07:10:14 node1 pam_tcb[1057]: sshd: Authentication failed for dikov from (uid=0) Oct 2 07:10:14 node1 pam_tcb[1061]: sshd: Authentication failed for root from (uid=0) Oct 2 07:10:14 node1 pam_tcb[1065]: sshd: Authentication failed for dikov from (uid=0) Oct 2 07:10:14 node1 pam_tcb[1069]: sshd: Authentication failed for root from (uid=0) Oct 2 07:10:14 node1 sshd[1075]: input_userauth_request: illegal user sysadmin Oct 2 07:10:14 node1 sshd[1075]: Failed none for UNKNOWN USER from 10.1.1.1 port 57066 ssh2 Oct 2 07:10:14 node1 sshd[1075]: Failed password for UNKNOWN USER from 10.1.1.1 port 57066 ssh2 Oct 2 07:10:14 node1 last message repeated 2 times Насколько я понимаю это проблемы в самом ядре, я прав? Дима -- Sincerely yours, Ph.D. Student Dmytro Kovalskyy Institute of Molecular Biology & Genetics 150 Akad. Zabolotnogo Street, Kiev-143, 03143 UKRAINE E-mail: dikov@imbg.org.ua Fax: +380 (44) 266-0759 Tel.: +380 (44) 266-5589