From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa.local.altlinux.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=QX+AsIQAtelzOYRQHMcDw4c2CiLuX1E/iwApy5vhW9I=; b=bqbYoqV6A5oYSzA4lo5Jz9bAUyiGFLHYSaR5027W4tj4GV35t0/CSnrP9KVnt0E0UF wNra7GRB74C9mQavtVLjYi3iPR2nfgQOkXLJbvGnnsMRzP8SMu1BZwlFMMSt0Lar9QBP Yvr1UBMnOz5nPmv/UjEVEZ8dHXjNoeykX6zVKDkYrITDgUSYtvzccgK4DO8jTscEyW0E NPlZEp7+Bv86Sla7yGsvtG0FGq3CB3MP6zrR1Jto+SmyHJ1kRrWmwm5BRgMC2zSogaC5 O9qp3/4sSV4Ln/uKZL9eepeUieXVgZgkqyw2lCTY+FqwlFp54+2v3zK2ho7sUJZwTNhl 2JTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=QX+AsIQAtelzOYRQHMcDw4c2CiLuX1E/iwApy5vhW9I=; b=cJBNpP89WypnABYDoML2XKeeQjsWtBwlGrpwzX3NMGY7CdBLw1wwcgepIDwJ2x3aSV nIXlyFdqqiaI494QOct7bM1rofhibCugt63ckkll/mn8HgLrtHBF2CfF7ydD7WrQH9Os uJvnhGWSae4GPDctLaBJ9HzpX+TCu3evNXx3sr4YqTcBnFCURh1hAEGIn9vSG51LR4Uz U+KrUAGhG0PX3r3WfGzZUrAn6rxnk6gBLZm6oj5+WTRYteE2wbjpq6KiIcEiT6Rvrosj b3yS2+UIxm7Ql/z0jxlamGSQUcJ2PvP57K2mSDV/v4rZio1TcUjDULnu8oHXl8JiNXjU geZA== X-Gm-Message-State: AOAM5323m8s00/KbkJBitr+rvHjqlKh8jmrgwyfvvJjzUvIYMf2jrh8z TdrMnudF8uG6r54hOO+ESfiJsr+RiNlE/bmdkw+Cmb6tkfQCkw== X-Google-Smtp-Source: ABdhPJwXzSbUgLzgZKQKz0H1z7y7eSGRrijVBtMgJFOHSwm4DkLtRKZ+GhXbNpaQn5RUk+VhQUhGmKsCLGddsoTXVsA= X-Received: by 2002:ab0:558d:: with SMTP id v13mr477531uaa.50.1632388870690; Thu, 23 Sep 2021 02:21:10 -0700 (PDT) MIME-Version: 1.0 References: <20210923081755.GA28063@imap.altlinux.org> In-Reply-To: From: Alexey Tourbin Date: Thu, 23 Sep 2021 12:20:59 +0300 Message-ID: To: ALT Linux Team development discussions Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [devel] =?utf-8?b?0J3QtSDQv9GA0L7Qv9GA0LjQtdGC0LDRgNC90YvQtSwg?= =?utf-8?b?0LAg0YHRg9Cy0LXRgNC10L3QvdGL0LUgLyBBcHBsZSBNMSAoV2FzOiBJ?= =?utf-8?b?OiBnY2MgMTEuMi4xICYmIGJpbnV0aWxzIDIuMzcp?= X-BeenThere: devel@lists.altlinux.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: ALT Linux Team development discussions List-Id: ALT Linux Team development discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Sep 2021 09:21:18 -0000 Archived-At: List-Archive: List-Post: =D0=9A=D1=81=D1=82=D0=B0=D1=82=D0=B8 =D0=B2=D1=8B=D1=88=D0=BB=D0=BE =D0=BE= =D0=BF=D0=B8=D1=81=D0=B0=D0=BD=D0=B8=D0=B5 =D0=BF=D1=80=D0=BE=D1=86=D0=B5= =D1=81=D1=81=D0=BE=D1=80=D0=B0 Apple M1. =D0=A2=D0=B0=D0=BC =D0=BE=D0=B1=D1= =8A=D1=8F=D1=81=D0=BD=D1=8F=D0=B5=D1=82=D1=81=D1=8F, =D0=BA=D0=B0=D0=BA =D0= =B5=D0=BC=D1=83 =D1=83=D0=B4=D0=B0=D0=B5=D1=82=D1=81=D1=8F =D0=B8=D1=81=D0=BF=D0=BE=D0=BB= =D0=BD=D1=8F=D1=82=D1=8C =D0=BF=D0=BE 8 =D0=B8=D0=BD=D1=81=D1=82=D1=80=D1= =83=D0=BA=D1=86=D0=B8=D0=B9 =D0=B7=D0=B0 =D1=82=D0=B0=D0=BA=D1=82. =D0=91= =D0=BE=D1=8E=D1=81=D1=8C =D1=87=D1=82=D0=BE VLIW =D0=B5=D0=B3=D0=BE =D0=BD= =D0=B5 =D0=B4=D0=BE=D0=B3=D0=BE=D0=BD=D0=B8=D1=82. What most code looks like is that it consists of short chains of sequentially dependent macroinstructions (say 5 to 7 macroinstructions, 10 to 20 instructions long in total) which store their result to memory or a register, and that memory or register is not accessed until many (hundreds) of cycles later. This means that while each sequentially dependent macroinstruction has to execute one after the other, you can execute many of the chains in parallel... That sounds good but you need a variety of machinery to track which instructions are independent of previous instructions, and to track the program order of instructions so that as branches are resolved as correct, you know which of the instructions in program order now resolve as correct. (This fact is why so many people=E2=80=99s intuition about the value of superscalarity is so flawed. Most people hone their assembly optimization skills on long stretches of sequentially dependent instructions; but such code is actually unrepresentative of most of what runs on a CPU. This fact is also why OoO superscalarity works so well, whereas most attempts to create static wide machines have been problematic. All the pieces -- out of order, prediction, and superscalarity -- work synergistically. In particular most of these chains that are running in parallel come from different basic blocks [ie are separated by some sort of if() statement that the compiler can=E2=80=99t see past] and so are impossible to aggregate statically.) https://drive.google.com/file/d/1WrMYCZMnhsGP4o3H33ioAUKL_bjuJSPt/view