From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Sat, 11 Apr 2020 18:33:49 +0300 From: "Vladimir D. Seleznev" To: ALT Linux Team development discussions Message-ID: <20200411153349.GB1624106@portlab> References: <20200410231044.1436970-1-vseleznv@altlinux.org> <20200411133631.daac861f97979c67511cf3ef@altlinux.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200411133631.daac861f97979c67511cf3ef@altlinux.org> User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [devel] RFC: girar: optimize rebuild X-BeenThere: devel@lists.altlinux.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: ALT Linux Team development discussions List-Id: ALT Linux Team development discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2020 15:33:50 -0000 Archived-At: List-Archive: List-Post: On Sat, Apr 11, 2020 at 01:36:31PM +0300, Andrey Savchenko wrote: > On Sat, 11 Apr 2020 02:10:42 +0300 Vladimir D. Seleznev wrote: > > > > Hi! > > > > The first part of rebuilt packages optimization for girar. It introduces > > pkg_identity() and simple optimization of the rebuilt sourcerpm. > > > > pkg_identity() takes RPM package and returns a value called package identity, > > a hash of subset of RPM package header. That subset is the entire header > > without some nonessential artifacts like buildhost, buildtime, header hashsum, > > etc. > > > > The two package builds of the same NEVR might have equal or different > > package identities. The equal identities mean that build results of these > > packages are equal too, that allows build optimization. The practical > > example of simple rebuilt sourcerpm optimization also introduced. > > > > The future work can be about optimization of "copied" to another branch > > sourcerpm with retrieved from archive sourcerpm, and binary packages > > optimization (this case has an issue when binary subpackages are mixed > > archs, i.e. arch and noarch, this probably could work only with single-arch > > builds). > > > > Please review and discuss. > > I see two problems with proposed approach: > > 1) It assumes there will be not pkg_identity hash collisions. This > is wrong. They may occur sooner or later and the code *must* > correctly deal with such collisions. Remember what happened to > subversion when collision occurred in a repository, while git was > resilient. Any hashsum function has collisions by definition. The only way to avoid them is not to use hashsums. > The way proposal is now the identity hash collision will lead to > undergraded repository at best and broken at worst. No, it will not, cause any issues that this collision might bring up will be caught by later build checks. > I see no easy way to fix this problem, but it must be either fixed > or proposed optimization rejected. > > 2) The hash function choise — sha256 ­— is very unfortunate: it has > longer digest than sha1, but otherwise is vulnerable to the same > attack; so right now it is still marginally secure, but it will not > last long. Moreover sha256 is quite slow. The good news: it is not about security. > It is better to use newer generation of hash functions, e.g. > blake2b based on the chacha stream cipher. It is more future proof > and faster at the same time. You can just use the b2sum > implementation from the GNU coreutils. -- WBR, Vladimir D. Seleznev