From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa.local.altlinux.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RP_MATCHES_RCVD autolearn=unavailable autolearn_force=no version=3.4.1 Date: Sun, 12 Apr 2020 02:31:43 +0300 From: "Alexey V. Vissarionov" To: ALT Linux Team development discussions Message-ID: <20200411233143.GC4490@altlinux.org> References: <20200410231044.1436970-1-vseleznv@altlinux.org> <20200411133631.daac861f97979c67511cf3ef@altlinux.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200411133631.daac861f97979c67511cf3ef@altlinux.org> Subject: Re: [devel] RFC: girar: optimize rebuild X-BeenThere: devel@lists.altlinux.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: ALT Linux Team development discussions List-Id: ALT Linux Team development discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2020 23:31:47 -0000 Archived-At: List-Archive: List-Post: On 2020-04-11 13:36:31 +0300, Andrey Savchenko wrote: >> The first part of rebuilt packages optimization for girar. >> It introduces pkg_identity() and simple optimization of the >> rebuilt sourcerpm. >> pkg_identity() takes RPM package and returns a value called >> package identity, a hash of subset of RPM package header. >> That subset is the entire header without some nonessential >> artifacts like buildhost, buildtime, header hashsum, etc. > I see two problems with proposed approach: > 1) It assumes there will be not pkg_identity hash collisions. > This is wrong. They may occur sooner or later and the code > *must* correctly deal with such collisions. The solution is well known: prefix the hash with a time_t value to let it grow monotonously while still being strictly dependent on sensitive data. Whether we'd face a hash collision, we could check whether the timestamps differ significantly. > 2) The hash function choise — sha256 ­— is very unfortunate: > it has longer digest than sha1, but otherwise is vulnerable > to the same attack; so right now it is still marginally secure, > but it will not last long. We don't really need any cryptographic-grade hash function here: all we need is just a checksum with a good distribution to detect whether something had changed - obviously enough, nobody would try to build and exploit collisions here. Said that, we can use almost any polynomial. > Moreover sha256 is quite slow. SHA2 is implemented in the hardware in some modern CPUs, so it's quite fast there. > It is better to use newer generation of hash functions, e.g. > blake2b based on the chacha stream cipher. Both are still quite marginal... but, once again, we should not care of that too much - any hash function would do the job. Even if we'd switch to another polynomial in the future. -- Alexey V. Vissarionov gremlin ПРИ altlinux ТЧК org; +vii-cmiii-ccxxix-lxxix-xlii GPG: 0D92F19E1C0DC36E27F61A29CD17E2B43D879005 @ hkp://keys.gnupg.net