ALT Linux Team development discussions
 help / color / mirror / Atom feed
From: "Vladimir D. Seleznev" <vseleznv@altlinux.org>
To: ALT Linux Team development discussions <devel@lists.altlinux.org>
Subject: Re: [devel] RFC: girar: optimize rebuild
Date: Tue, 14 Apr 2020 19:20:00 +0300
Message-ID: <20200414162000.GA618226@portlab> (raw)
In-Reply-To: <20200414175713.7355b93735c94869697c5610@altlinux.org>

On Tue, Apr 14, 2020 at 05:57:13PM +0300, Andrey Savchenko wrote:
> On Sun, 12 Apr 2020 02:31:43 +0300 Alexey V. Vissarionov wrote:
> > On 2020-04-11 13:36:31 +0300, Andrey Savchenko wrote:
> > 
> >  >> The first part of rebuilt packages optimization for girar.
> >  >> It introduces pkg_identity() and simple optimization of the
> >  >> rebuilt sourcerpm.
> >  >> pkg_identity() takes RPM package and returns a value called
> >  >> package identity, a hash of subset of RPM package header.
> >  >> That subset is the entire header without some nonessential
> >  >> artifacts like buildhost, buildtime, header hashsum, etc.
> >  > I see two problems with proposed approach:
> >  > 1) It assumes there will be not pkg_identity hash collisions.
> >  > This is wrong. They may occur sooner or later and the code
> >  > *must* correctly deal with such collisions.
> > 
> > The solution is well known: prefix the hash with a time_t value
> > to let it grow monotonously while still being strictly dependent
> > on sensitive data.
> 
> Yes, this is a good idea.

I don't get the idea.

> > Whether we'd face a hash collision, we could check whether the
> > timestamps differ significantly.
> > 
> >  > 2) The hash function choise — sha256 ­— is very unfortunate:
> >  > it has longer digest than sha1, but otherwise is vulnerable
> >  > to the same attack; so right now it is still marginally secure,
> >  > but it will not last long.
> > We don't really need any cryptographic-grade hash function here:
> > all we need is just a checksum with a good distribution to detect
> > whether something had changed - obviously enough, nobody would
> > try to build and exploit collisions here. Said that, we can use
> > almost any polynomial.
> 
> Still it may be a security issue. Consider what will happen if
> wrong source rpm will be used: new modifications including security
> fixes may be silently omitted from a branch.

Nothing bad will happen. I see you don't understand the task: it's not
about neither the new modifications or new releases. It's only about
package rebuild. It uses no new sources.

> >  > Moreover sha256 is quite slow.
> > 
> > SHA2 is implemented in the hardware in some modern CPUs, so it's
> > quite fast there.
> 
> Only in some and only for amd64 arch. But our man build infrastructure
> also uses ppc64le and aarch64, so it is very important to be
> efficient, especially on aarch64 which is a bottleneck for most
> tasks. And consider that we have secondary build systems for other
> arches like mips, riscv, e2k.
> 
> A talk is cheap, so let's see some some numbers.
> 
> 0) dd if=/dev/urandom of=/tmp/test.file bs=1M count=2048
> 
> 1) Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> $ time sha256sum -b /tmp/test.file
> 8.67user 0.27system 0:08.94elapsed 99%CPU (0avgtext+0avgdata 1944maxresident)k
> 8.70user 0.25system 0:08.96elapsed 99%CPU (0avgtext+0avgdata 2148maxresident)k
> 8.65user 0.28system 0:08.93elapsed 99%CPU (0avgtext+0avgdata 2064maxresident)k
> 
> $ time b2sum -b /tmp/test.file
> 2.48user 0.32system 0:02.81elapsed 99%CPU (0avgtext+0avgdata 2120maxresident)k
> 2.46user 0.30system 0:02.76elapsed 99%CPU (0avgtext+0avgdata 2120maxresident)k
> 2.47user 0.29system 0:02.77elapsed 99%CPU (0avgtext+0avgdata 2068maxresident)k
> 
> 2) E8C (1300 MHz,  MBE8C-PC v.2)
> $ time sha256sum -b /tmp/test.file
> 11.69user 0.93system 0:12.64elapsed 99%CPU (0avgtext+0avgdata 3784maxresident)k
> 11.78user 0.85system 0:12.63elapsed 99%CPU (0avgtext+0avgdata 3836maxresident)k
> 11.72user 0.90system 0:12.63elapsed 99%CPU (0avgtext+0avgdata 3956maxresident)k
> 
> $ time b2sum -b /tmp/test.file
> 6.90user 1.37system 0:08.27elapsed 99%CPU (0avgtext+0avgdata 3896maxresident)k
> 6.76user 1.10system 0:07.87elapsed 99%CPU (0avgtext+0avgdata 3844maxresident)k
> 6.93user 0.95system 0:07.88elapsed 99%CPU (0avgtext+0avgdata 3872maxresident)k
> 
> I see no reason for using slower and less secure sha256 algorithm.

We can use more faster algorithm. Again, it is not about security.

-- 
   WBR,
   Vladimir D. Seleznev


  reply	other threads:[~2020-04-14 16:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-10 23:10 Vladimir D. Seleznev
2020-04-10 23:10 ` [devel] [PATCH 1/2] gb/gb-sh-functions: introduce pkg_identity() Vladimir D. Seleznev
2020-04-13 18:01   ` Dmitry V. Levin
2020-04-13 19:32     ` Vladimir D. Seleznev
2020-04-10 23:10 ` [devel] [PATCH 2/2] gb: optimize rebuilt srpm if its identity is equal to identity of srpm in the repo Vladimir D. Seleznev
2020-04-11 11:29   ` Alexey Tourbin
2020-04-14 16:42     ` Vladimir D. Seleznev
2020-04-16 21:51       ` Alexey Tourbin
2020-04-17 13:54         ` Dmitry V. Levin
2020-04-20  9:05           ` [devel] stopping a cascade of rebuilds Alexey Tourbin
2020-04-23 19:21             ` Vladimir D. Seleznev
2020-04-23 20:54               ` Dmitry V. Levin
2020-04-27  5:38               ` Alexey Tourbin
2020-04-20  8:36         ` [devel] [PATCH 2/2] gb: optimize rebuilt srpm if its identity is equal to identity of srpm in the repo Alexey Tourbin
2020-04-11 10:36 ` [devel] RFC: girar: optimize rebuild Andrey Savchenko
2020-04-11 15:33   ` Vladimir D. Seleznev
2020-04-11 23:31   ` Alexey V. Vissarionov
2020-04-14 14:57     ` Andrey Savchenko
2020-04-14 16:20       ` Vladimir D. Seleznev [this message]
2020-04-11 11:04 ` Gleb Fotengauer-Malinovskiy
2020-04-11 15:21   ` Vladimir D. Seleznev
2020-04-11 16:41     ` Gleb Fotengauer-Malinovskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200414162000.GA618226@portlab \
    --to=vseleznv@altlinux.org \
    --cc=devel@lists.altlinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

ALT Linux Team development discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://lore.altlinux.org/devel/0 devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 devel devel/ http://lore.altlinux.org/devel \
		devel@altlinux.org devel@altlinux.ru devel@lists.altlinux.org devel@lists.altlinux.ru devel@linux.iplabs.ru mandrake-russian@linuxteam.iplabs.ru sisyphus@linuxteam.iplabs.ru
	public-inbox-index devel

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://lore.altlinux.org/org.altlinux.lists.devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git