From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa.local.altlinux.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.1 Date: Tue, 14 Apr 2020 17:57:13 +0300 From: Andrey Savchenko To: ALT Linux Team development discussions Message-Id: <20200414175713.7355b93735c94869697c5610@altlinux.org> In-Reply-To: <20200411233143.GC4490@altlinux.org> References: <20200410231044.1436970-1-vseleznv@altlinux.org> <20200411133631.daac861f97979c67511cf3ef@altlinux.org> <20200411233143.GC4490@altlinux.org> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.32; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="PGP-SHA512"; boundary="Signature=_Tue__14_Apr_2020_17_57_14_+0300_hZolo2A2CdOKPRqL" Subject: Re: [devel] RFC: girar: optimize rebuild X-BeenThere: devel@lists.altlinux.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: ALT Linux Team development discussions List-Id: ALT Linux Team development discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Apr 2020 14:57:39 -0000 Archived-At: List-Archive: List-Post: --Signature=_Tue__14_Apr_2020_17_57_14_+0300_hZolo2A2CdOKPRqL Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, 12 Apr 2020 02:31:43 +0300 Alexey V. Vissarionov wrote: > On 2020-04-11 13:36:31 +0300, Andrey Savchenko wrote: >=20 > >> The first part of rebuilt packages optimization for girar. > >> It introduces pkg_identity() and simple optimization of the > >> rebuilt sourcerpm. > >> pkg_identity() takes RPM package and returns a value called > >> package identity, a hash of subset of RPM package header. > >> That subset is the entire header without some nonessential > >> artifacts like buildhost, buildtime, header hashsum, etc. > > I see two problems with proposed approach: > > 1) It assumes there will be not pkg_identity hash collisions. > > This is wrong. They may occur sooner or later and the code > > *must* correctly deal with such collisions. >=20 > The solution is well known: prefix the hash with a time_t value > to let it grow monotonously while still being strictly dependent > on sensitive data. Yes, this is a good idea. =20 > Whether we'd face a hash collision, we could check whether the > timestamps differ significantly. >=20 > > 2) The hash function choise =E2=80=94 sha256 =C2=AD=E2=80=94 is very u= nfortunate: > > it has longer digest than sha1, but otherwise is vulnerable > > to the same attack; so right now it is still marginally secure, > > but it will not last long. > We don't really need any cryptographic-grade hash function here: > all we need is just a checksum with a good distribution to detect > whether something had changed - obviously enough, nobody would > try to build and exploit collisions here. Said that, we can use > almost any polynomial. Still it may be a security issue. Consider what will happen if wrong source rpm will be used: new modifications including security fixes may be silently omitted from a branch. > > Moreover sha256 is quite slow. >=20 > SHA2 is implemented in the hardware in some modern CPUs, so it's > quite fast there. Only in some and only for amd64 arch. But our man build infrastructure also uses ppc64le and aarch64, so it is very important to be efficient, especially on aarch64 which is a bottleneck for most tasks. And consider that we have secondary build systems for other arches like mips, riscv, e2k. A talk is cheap, so let's see some some numbers. 0) dd if=3D/dev/urandom of=3D/tmp/test.file bs=3D1M count=3D2048 1) Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz $ time sha256sum -b /tmp/test.file 8.67user 0.27system 0:08.94elapsed 99%CPU (0avgtext+0avgdata 1944maxresiden= t)k 8.70user 0.25system 0:08.96elapsed 99%CPU (0avgtext+0avgdata 2148maxresiden= t)k 8.65user 0.28system 0:08.93elapsed 99%CPU (0avgtext+0avgdata 2064maxresiden= t)k $ time b2sum -b /tmp/test.file 2.48user 0.32system 0:02.81elapsed 99%CPU (0avgtext+0avgdata 2120maxresiden= t)k 2.46user 0.30system 0:02.76elapsed 99%CPU (0avgtext+0avgdata 2120maxresiden= t)k 2.47user 0.29system 0:02.77elapsed 99%CPU (0avgtext+0avgdata 2068maxresiden= t)k 2) E8C (1300 MHz, MBE8C-PC v.2) $ time sha256sum -b /tmp/test.file 11.69user 0.93system 0:12.64elapsed 99%CPU (0avgtext+0avgdata 3784maxreside= nt)k 11.78user 0.85system 0:12.63elapsed 99%CPU (0avgtext+0avgdata 3836maxreside= nt)k 11.72user 0.90system 0:12.63elapsed 99%CPU (0avgtext+0avgdata 3956maxreside= nt)k $ time b2sum -b /tmp/test.file 6.90user 1.37system 0:08.27elapsed 99%CPU (0avgtext+0avgdata 3896maxresiden= t)k 6.76user 1.10system 0:07.87elapsed 99%CPU (0avgtext+0avgdata 3844maxresiden= t)k 6.93user 0.95system 0:07.88elapsed 99%CPU (0avgtext+0avgdata 3872maxresiden= t)k I see no reason for using slower and less secure sha256 algorithm. Best regards, Andrew Savchenko --Signature=_Tue__14_Apr_2020_17_57_14_+0300_hZolo2A2CdOKPRqL Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEE63ZIHsdeM+1XgNer9lNaM7oe5I0FAl6Vz0oACgkQ9lNaM7oe 5I2QpA//TAOq8jiQ3qDeP5gqihglUPSysN7PCU8CL5u3mEy+j5RV/IXNmkE954Ht VH7ZOw298LV7Ptm/KT0IG09CL7YdC73BDF1lrjtyW1Mf4+gJKismUaSuOQchCVVm XFK8hVXT69YNdJSOF2BOyQjmOV2yTYYy7SB/KuCQX1FON6ej9V+XzcvQ2a9E8TEh wCVdqo8ryT3xuHTkY0aK7wCIjfRLEs0DlnetYcaOV2M72JwiCj0O4TUTWxxKFkoL +HAUBfq2pFlSgAUAR4zIoPKkWXJ8Qm89aZDvm9YjyCCMSOwwcXzyctadzREKbTYV dIV0vBkGhQtd1eTLzbE3VURqJVyUCNLZ9hLDsZCebsBb/CdE/iGUovWGxsgROZ8P FFLp9YJQ1Ta8FkBsf3RAWcyXGKZJsG7rLBK0eb29JFZpO0/BOBpRGqCasag3bAwh MtMRStNdh8c5OgTKZWMvDMnoWEHxTXojZhZ6aer9WCy55IvIuG+pAsXwhBePQbpO qHhLmISBwuXuggtGkMYjTXHv56G6YT6iXIgf7X57BKTW4rPC6Y1PpJ07M0CEF3nZ 0lggSTdZICocQUEgnYc8sCdYjqAVzROMMcyWV0cpTm7S2ZOMaVa3t5kBrUnZ3two 10U6USZppLpuMXGYiEg6Uen4bGN22BBcQ3QIrqnolugdPzZBkUw= =BXnh -----END PGP SIGNATURE----- --Signature=_Tue__14_Apr_2020_17_57_14_+0300_hZolo2A2CdOKPRqL--