From: "Vladimir D. Seleznev" <vseleznv@altlinux.org> To: ALT Linux Team development discussions <devel@lists.altlinux.org> Subject: Re: [devel] [PATCH 2/2] gb: optimize rebuilt srpm if its identity is equal to identity of srpm in the repo Date: Tue, 14 Apr 2020 19:42:44 +0300 Message-ID: <20200414164244.GC618226@portlab> (raw) In-Reply-To: <CA+qzen=TBAFzUgt-4=XBEeC3AmO_CSMXG6VR03a9Rdpi9mo74A@mail.gmail.com> On Sat, Apr 11, 2020 at 02:29:55PM +0300, Alexey Tourbin wrote: > On Sat, Apr 11, 2020 at 2:11 AM Vladimir D. Seleznev > <vseleznv@altlinux.org> wrote: > > +osrpm_identity= > > +osrpm="$GB_REPO_DIR/files/SRPMS/$srpmsu" > > +if [ -f "$osrpm" ]; then > > + echo >&2 "$I: Found $srpmsu in the repo, this means the package was rebuilt" > > + osrpm_identity="$(pkg_identity "$osrpm")" > > +fi > > + > > for arch in $GB_ARCH; do > > [ -d "$arch/srpm" -o ! -s "$arch/excluded" ] || continue > > f="$arch/srpm/$srpmsu" > > [ -f "$f" ] || continue > > + srpm_identity="$(pkg_identity "$f")" > > + echo >&2 "$I: $arch $srpmsu identity = $srpm_identity" > > + # non-empty $osrpm_identity means the NEVR was rebuilt > > + # optimize rebuilt sourcerpm if identities of original and rebuilt sourcerpms are equal > > + if [ -n "$osrpm_identity" ] && > > + [ "$osrpm_identity" = "$srpm_identity" ]; then > > + echo >&2 "$I: $arch: optimize rebuilt $srpmsu cause its identity is equal to $srpmsu in the repo" > > + install -p "$osrpm" "$f" > > + fi > > built_pkgname="$(rpmquery --qf '%{name}' -p -- "$f")" > > echo "$built_pkgname" > pkgname > > break > > So how does it work in practice? Suppose I first uploaded a .src.rpm > package. Do we store the original src.rpm, the one with the uploader's > signature? When it gets rebuilt, this should not affect the original > .src.rpm (as if it was uploaded again). No special handling is > required in this case. Yes. It all was about the package build from the gear repo to not multiply generated sourcerpms. > Then suppose I build a gearifeid package from Sisyphus for p9. But > your code only handles GB_REPO_DIR, not the NEIGHBOUR_REPO_DIR the > package comes from. To be clear, that information is lost: when you > request to build a signed tag from /gears, it does not imply that > there is a corresponding .src.rpm in any REPO_DIR. It's future part. I wrote some code that check the uprepos, but I didn't like it. The correct way is checking uprepos archives as well. > There is already a problem with cross-repo copying: if done in > earnest, both repos need to be locked. And of course this is > deadlock-prone. You can do better without any locking if you identify > every package in all repos with your new identity hash. This can be > done relatively easy, since you already have that big > content-addressable storage. You can hardlink it into a shadow > identity-addressable storage. Once you've done that, you obtain the > global / beatific vision: given a package, you instantly know if you > have already seen something like this. (On the second thought: you > don't need locking because the -f test is atomic and files cannot be > removed from the storage, but there will still be race conditions. > It's not too bad in practice. Further those race conditions can be > detected at the task-commit stage.) I like the idea, but there are some issues with this solution: these *are* collisions. I explain this below, but this idea will work perfectly with sourcerpms. The problem is that if we want to hande binary rpms as well, there will be kind of collisions by design. For example, package foo has two subpackages: foo-data and libfoo. After foo rebuild foo-data has the same identity as previous foo-data build, but libfoo has the different now. According the plan, the whole rebuild has significant changes and all binary packages should be substituted with new one. And now we have two foo-data packages with the same identity value, but they are belong to different builds. > There is one specific problem with the outlined approach: the notion > of identity is flawed, because the disttag may or may not matter. > Sometimes you cannot substitute a package for another package with the > same identity but a different disttag. Specifically this is the case > with strict dependencies between subpackages. You cannot substitute a > subpackage unless you also substitute all the other subpackages. Yes, that is correct, I considered this. > This is further complicated by noarch subpackages: you need to > coordinate substitution across architectures. This is more complicated with mix-arch builds. -- WBR, Vladimir D. Seleznev
next prev parent reply other threads:[~2020-04-14 16:42 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-10 23:10 [devel] RFC: girar: optimize rebuild Vladimir D. Seleznev 2020-04-10 23:10 ` [devel] [PATCH 1/2] gb/gb-sh-functions: introduce pkg_identity() Vladimir D. Seleznev 2020-04-13 18:01 ` Dmitry V. Levin 2020-04-13 19:32 ` Vladimir D. Seleznev 2020-04-10 23:10 ` [devel] [PATCH 2/2] gb: optimize rebuilt srpm if its identity is equal to identity of srpm in the repo Vladimir D. Seleznev 2020-04-11 11:29 ` Alexey Tourbin 2020-04-14 16:42 ` Vladimir D. Seleznev [this message] 2020-04-16 21:51 ` Alexey Tourbin 2020-04-17 13:54 ` Dmitry V. Levin 2020-04-20 9:05 ` [devel] stopping a cascade of rebuilds Alexey Tourbin 2020-04-23 19:21 ` Vladimir D. Seleznev 2020-04-23 20:54 ` Dmitry V. Levin 2020-04-27 5:38 ` Alexey Tourbin 2020-04-20 8:36 ` [devel] [PATCH 2/2] gb: optimize rebuilt srpm if its identity is equal to identity of srpm in the repo Alexey Tourbin 2020-04-11 10:36 ` [devel] RFC: girar: optimize rebuild Andrey Savchenko 2020-04-11 15:33 ` Vladimir D. Seleznev 2020-04-11 23:31 ` Alexey V. Vissarionov 2020-04-14 14:57 ` Andrey Savchenko 2020-04-14 16:20 ` Vladimir D. Seleznev 2020-04-11 11:04 ` Gleb Fotengauer-Malinovskiy 2020-04-11 15:21 ` Vladimir D. Seleznev 2020-04-11 16:41 ` Gleb Fotengauer-Malinovskiy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200414164244.GC618226@portlab \ --to=vseleznv@altlinux.org \ --cc=devel@lists.altlinux.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
ALT Linux Team development discussions This inbox may be cloned and mirrored by anyone: git clone --mirror http://lore.altlinux.org/devel/0 devel/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 devel devel/ http://lore.altlinux.org/devel \ devel@altlinux.org devel@altlinux.ru devel@lists.altlinux.org devel@lists.altlinux.ru devel@linux.iplabs.ru mandrake-russian@linuxteam.iplabs.ru sisyphus@linuxteam.iplabs.ru public-inbox-index devel Example config snippet for mirrors. Newsgroup available over NNTP: nntp://lore.altlinux.org/org.altlinux.lists.devel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git