From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 14 Apr 2020 19:42:44 +0300 From: "Vladimir D. Seleznev" To: ALT Linux Team development discussions Message-ID: <20200414164244.GC618226@portlab> References: <20200410231044.1436970-1-vseleznv@altlinux.org> <20200410231044.1436970-3-vseleznv@altlinux.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [devel] [PATCH 2/2] gb: optimize rebuilt srpm if its identity is equal to identity of srpm in the repo X-BeenThere: devel@lists.altlinux.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: ALT Linux Team development discussions List-Id: ALT Linux Team development discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Apr 2020 16:42:45 -0000 Archived-At: List-Archive: List-Post: On Sat, Apr 11, 2020 at 02:29:55PM +0300, Alexey Tourbin wrote: > On Sat, Apr 11, 2020 at 2:11 AM Vladimir D. Seleznev > wrote: > > +osrpm_identity= > > +osrpm="$GB_REPO_DIR/files/SRPMS/$srpmsu" > > +if [ -f "$osrpm" ]; then > > + echo >&2 "$I: Found $srpmsu in the repo, this means the package was rebuilt" > > + osrpm_identity="$(pkg_identity "$osrpm")" > > +fi > > + > > for arch in $GB_ARCH; do > > [ -d "$arch/srpm" -o ! -s "$arch/excluded" ] || continue > > f="$arch/srpm/$srpmsu" > > [ -f "$f" ] || continue > > + srpm_identity="$(pkg_identity "$f")" > > + echo >&2 "$I: $arch $srpmsu identity = $srpm_identity" > > + # non-empty $osrpm_identity means the NEVR was rebuilt > > + # optimize rebuilt sourcerpm if identities of original and rebuilt sourcerpms are equal > > + if [ -n "$osrpm_identity" ] && > > + [ "$osrpm_identity" = "$srpm_identity" ]; then > > + echo >&2 "$I: $arch: optimize rebuilt $srpmsu cause its identity is equal to $srpmsu in the repo" > > + install -p "$osrpm" "$f" > > + fi > > built_pkgname="$(rpmquery --qf '%{name}' -p -- "$f")" > > echo "$built_pkgname" > pkgname > > break > > So how does it work in practice? Suppose I first uploaded a .src.rpm > package. Do we store the original src.rpm, the one with the uploader's > signature? When it gets rebuilt, this should not affect the original > .src.rpm (as if it was uploaded again). No special handling is > required in this case. Yes. It all was about the package build from the gear repo to not multiply generated sourcerpms. > Then suppose I build a gearifeid package from Sisyphus for p9. But > your code only handles GB_REPO_DIR, not the NEIGHBOUR_REPO_DIR the > package comes from. To be clear, that information is lost: when you > request to build a signed tag from /gears, it does not imply that > there is a corresponding .src.rpm in any REPO_DIR. It's future part. I wrote some code that check the uprepos, but I didn't like it. The correct way is checking uprepos archives as well. > There is already a problem with cross-repo copying: if done in > earnest, both repos need to be locked. And of course this is > deadlock-prone. You can do better without any locking if you identify > every package in all repos with your new identity hash. This can be > done relatively easy, since you already have that big > content-addressable storage. You can hardlink it into a shadow > identity-addressable storage. Once you've done that, you obtain the > global / beatific vision: given a package, you instantly know if you > have already seen something like this. (On the second thought: you > don't need locking because the -f test is atomic and files cannot be > removed from the storage, but there will still be race conditions. > It's not too bad in practice. Further those race conditions can be > detected at the task-commit stage.) I like the idea, but there are some issues with this solution: these *are* collisions. I explain this below, but this idea will work perfectly with sourcerpms. The problem is that if we want to hande binary rpms as well, there will be kind of collisions by design. For example, package foo has two subpackages: foo-data and libfoo. After foo rebuild foo-data has the same identity as previous foo-data build, but libfoo has the different now. According the plan, the whole rebuild has significant changes and all binary packages should be substituted with new one. And now we have two foo-data packages with the same identity value, but they are belong to different builds. > There is one specific problem with the outlined approach: the notion > of identity is flawed, because the disttag may or may not matter. > Sometimes you cannot substitute a package for another package with the > same identity but a different disttag. Specifically this is the case > with strict dependencies between subpackages. You cannot substitute a > subpackage unless you also substitute all the other subpackages. Yes, that is correct, I considered this. > This is further complicated by noarch subpackages: you need to > coordinate substitution across architectures. This is more complicated with mix-arch builds. -- WBR, Vladimir D. Seleznev