* [devel] rpmlndup
@ 2009-02-20 21:19 Igor Vlasenko
2009-02-20 21:20 ` Dmitry V. Levin
2009-02-20 21:22 ` Mikhail Gusarov
0 siblings, 2 replies; 9+ messages in thread
From: Igor Vlasenko @ 2009-02-20 21:19 UTC (permalink / raw)
To: devel
[-- Attachment #1: Type: text/plain, Size: 663 bytes --]
Раз уже пошла тема о скриптах,
поделюсь скриптом rpmlndup.
=head1 NAME
rpmlndup - a tool that reduces rpm repositories size by hardlinking identical rpms.
identical = name, real size and sha1/md5 sig are the same.
когда я его у себя запустил, то винт похудел на 70Гб,
и несмотря на то, что я пользуюсь --link-dest.
--link-dest ко всем dest не напишешь.
Если будет интерес, напишу help и выложу в Сизиф.
--
Dr. Igor Vlasenko
--------------------
Topology Department
Institute of Math
Kiev, Ukraine
[-- Attachment #2: rpmlndup --]
[-- Type: text/plain, Size: 5402 bytes --]
#!/usr/bin/perl -w
use strict;
use warnings;
use File::Find;
use RPM::Header;
use Getopt::Long;
my $verbose=1;
my $skipnosum=0;
my $result = GetOptions (
'quiet'=> sub {$verbose=0},
"skip-no-sum" => \$skipnosum,
"verbose+" => \$verbose,
);
my @directories = @ARGV;
map {-d $_ or die "argument is not a directory: $_\n"} @directories;
# first step is just a usual find; to find dup names
my %rpmbyname;
find(\&wanted, @directories);
sub wanted {
# $File::Find::dir = /some/path/
# $_ = foo.ext
# $File::Find::name = /some/path/foo.ext
my $name=$_;
return unless /\.rpm$/ and not -l $_;
$rpmbyname{$name}=[] unless defined $rpmbyname{$name};
my @stat=stat $name;
# 0 dev device number of filesystem
# 1 ino inode number
# 2 mode file mode (type and permissions)
# 3 nlink number of (hard) links to the file
# 4 uid numeric user ID of file's owner
# 5 gid numeric group ID of file's owner
# 6 rdev the device identifier (special files only)
# 7 size total size of file, in bytes
# 8 atime last access time in seconds since the epoch
# 9 mtime last modify time in seconds since the epoch
# 10 ctime inode change time in seconds since the epoch (*)
# 11 blksize preferred block size for file system I/O
# 12 blocks actual number of blocks allocated
my $size = $stat[7];
push @{$rpmbyname{$name}}, {
NAME=> $name,
# DIR => $File::Find::dir,
PATH=> $File::Find::name,
INODE => $stat[1],
SIZE => $stat[7],
};
}
# second step is to find genuine dups; the same size and sha1/md5sum.
my %rpmbysum;
while (my ($rpm, $lptr)=each %rpmbyname) {
next if $#{$lptr}<1;
my %inodes;
map {$inodes{$_->{INODE}}=1} @$lptr;
next if scalar keys(%inodes) < 2;
map {&bysum($_)} @$lptr;
}
undef %rpmbyname;
my $dupcount=0;
my $economy=0;
my @rpmtolink;
while (my ($rpm, $lptr)=each %rpmbysum) {
next if $#{$lptr}<1;
my %inodes;
map {$inodes{$_->{INODE}}=1} @$lptr;
next if scalar keys(%inodes) < 2;
my $dupnum=keys(%inodes)-1;
#print "$rpm\n";
$economy+=$lptr->[0]->{SIZE}*$dupnum;
$dupcount+=$dupnum;
push @rpmtolink, $lptr;
}
undef %rpmbysum;
print STDERR "hardlinking duplicate rpms will give print total economy:
$economy bytes in $dupcount rpms.\n";
print STDERR "Do you want to continue (y/n)?.\n";
@ARGV=();
$_=<>;
exit 0 unless (/^\s*y/i);
print "continue with ".scalar @rpmtolink." dups\n";
foreach my $lref (@rpmtolink) {
die "internal error! not enough files!" if @$lref < 2;
my $master=$lref->[0];
my $masterinode=$master->{INODE};
my $masterpath=$master->{PATH};
for (my $i=1; $i < @$lref; $i++) {
my $slave=$lref->[$i];
my $slavepath=$slave->{PATH};
#warn "already linked $masterpath $slavepath\n" if $slave->{INODE} = $masterinode;
if ($slave->{INODE} != $masterinode) {
die "impossible :(" if $slavepath eq $masterpath;
rename $slavepath, $slavepath.'.bak' || die "rename $slavepath, $slavepath.bak failed: $!";
unless (link $masterpath, $slavepath) {
warn "link $masterpath, $slavepath failed: $!";
rename $slavepath.'.bak', $slavepath;
die "execution aborted.";
}
system('touch','-acm','-r',$slavepath.'.bak','--',$slavepath);
unlink $slavepath.'.bak' || die "cleanup of $slavepath failed: $!";
print "linked successfully: $masterpath -> $slavepath\n" if $verbose;
}
}
}
sub bysum {
my $rpm=$_[0];
my $size = $rpm->{SIZE};
my $header;
eval {
$header=new RPM::Header $rpm->{PATH};
};
if ($@) {
warn "$rpm->{PATH} skipped: $@\n" if $verbose;
return;
}
my $sum = $header->{SHA1HEADER}->[0];
unless ($sum) {
warn "no sha1sum for $rpm->{NAME} - trying MD5\n" if $verbose;
$sum = $header->{SIGMD5}->[0];
unless ($sum) {
warn "no md5sum for $rpm->{NAME}\n" if $verbose;
return if $skipnosum;
# let at list declared size be the same
$sum=$header->{SIGSIZE}->[0];
$sum||=$size;
}
}
$rpm->{SUM}=$sum;
my $key=$rpm->{NAME}.'!'.$sum.'|'.$size;
$rpmbysum{$key}=[] unless defined $rpmbysum{$key};
push @{$rpmbysum{$key}}, $rpm;
}
=head1 NAME
rpmlndup - a tool that reduces rpm repositories size by hardlinking identical rpms.
=head1 SYNOPSIS
B<rpmlndup>
[B<-h|--help>]
[B<-v|--verbose>]
[B<-q|--quiet>]
[B<-y|--yes|--batch>]
[B<-a|--ask|--interactive>]
[B<-n|--no|--count>]
[B<-s|--skip-no-sum>]
[I<DIR>...]
=head1 DESCRIPTION
B<rpmlndup>
=head1 OPTIONS
=over
=item B<-h, --help>
Display this help and exit.
=item B<-v, --verbose>, B<-q, --quiet>
Verbosity level. Multiple -v increase the verbosity level, -q sets it to 0.
=item B<-y|--yes>, B<--batch>
Batch mode. links identical rpm after counting.
=item B<-n|--no>, B<--count>
No linking identical rpm, just counting space to be freed.
=item B<-a|--ask>, B<--interactive>
Interactive mode (default). Counts free space and asks to proceed with linking.
=item B<-s|--skip-no-sum>
Skip unsigned rpms (that have no sha1 or md5 sum).
=back
=head1 AUTHOR
Written by Igor Vlasenko <viy@altlinux.org>.
=head1 COPYING
Copyright (c) 2009 Igor Vlasenko, ALT Linux Team.
This is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software Foundation;
either version 2 of the License, or (at your option) any later version.
=cut
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [devel] rpmlndup
2009-02-20 21:19 [devel] rpmlndup Igor Vlasenko
@ 2009-02-20 21:20 ` Dmitry V. Levin
2009-02-20 21:28 ` Igor Vlasenko
2009-02-20 21:22 ` Mikhail Gusarov
1 sibling, 1 reply; 9+ messages in thread
From: Dmitry V. Levin @ 2009-02-20 21:20 UTC (permalink / raw)
To: ALT Devel discussion list
[-- Attachment #1: Type: text/plain, Size: 303 bytes --]
On Fri, Feb 20, 2009 at 11:19:13PM +0200, Igor Vlasenko wrote:
> Раз уже пошла тема о скриптах,
> поделюсь скриптом rpmlndup.
>
> =head1 NAME
>
> rpmlndup - a tool that reduces rpm repositories size by hardlinking identical rpms.
hardlink(1) из одноимённого пакета не годится?
--
ldv
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [devel] rpmlndup
2009-02-20 21:19 [devel] rpmlndup Igor Vlasenko
2009-02-20 21:20 ` Dmitry V. Levin
@ 2009-02-20 21:22 ` Mikhail Gusarov
2009-02-20 21:23 ` [devel] [jt] rpmlndup Dmitry V. Levin
1 sibling, 1 reply; 9+ messages in thread
From: Mikhail Gusarov @ 2009-02-20 21:22 UTC (permalink / raw)
To: ALT Linux Team development discussions
[-- Attachment #1: Type: text/plain, Size: 425 bytes --]
Twas brillig at 23:19:13 20.02.2009 UTC+02 when vlasenko@imath.kiev.ua did gyre and gimble:
IV> Раз уже пошла тема о скриптах, поделюсь скриптом rpmlndup.
Недавно в Debian попытались залить четвёртый пакет с ровно такой же
функциональностью :) Заливателя чуть не затравили.
--
[-- Attachment #2: Type: application/pgp-signature, Size: 196 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [devel] [jt] rpmlndup
2009-02-20 21:22 ` Mikhail Gusarov
@ 2009-02-20 21:23 ` Dmitry V. Levin
2009-02-20 21:25 ` Mikhail Gusarov
0 siblings, 1 reply; 9+ messages in thread
From: Dmitry V. Levin @ 2009-02-20 21:23 UTC (permalink / raw)
To: ALT Linux Team development discussions
[-- Attachment #1: Type: text/plain, Size: 418 bytes --]
On Sat, Feb 21, 2009 at 03:22:06AM +0600, Mikhail Gusarov wrote:
>
> Twas brillig at 23:19:13 20.02.2009 UTC+02 when vlasenko@imath.kiev.ua did gyre and gimble:
>
> IV> Раз уже пошла тема о скриптах, поделюсь скриптом rpmlndup.
>
> Недавно в Debian попытались залить четвёртый пакет с ровно такой же
> функциональностью :) Заливателя чуть не затравили.
Злые они без причины, вот и всё. :(
--
ldv
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [devel] [jt] rpmlndup
2009-02-20 21:23 ` [devel] [jt] rpmlndup Dmitry V. Levin
@ 2009-02-20 21:25 ` Mikhail Gusarov
0 siblings, 0 replies; 9+ messages in thread
From: Mikhail Gusarov @ 2009-02-20 21:25 UTC (permalink / raw)
To: ALT Linux Team development discussions
[-- Attachment #1: Type: text/plain, Size: 637 bytes --]
Twas brillig at 00:23:55 21.02.2009 UTC+03 when ldv@altlinux.org did gyre and gimble:
>> Недавно в Debian попытались залить четвёртый пакет с ровно такой же
>> функциональностью :) Заливателя чуть не затравили.
DVL> Злые они без причины, вот и всё. :(
Не совсем. "Затравили" там - это по уровню накала эмоций и злости
приблизительно как типичная приятная беседа в этом списке рассылки :)
--
[-- Attachment #2: Type: application/pgp-signature, Size: 196 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [devel] rpmlndup
2009-02-20 21:20 ` Dmitry V. Levin
@ 2009-02-20 21:28 ` Igor Vlasenko
2009-02-20 21:32 ` Mikhail Gusarov
0 siblings, 1 reply; 9+ messages in thread
From: Igor Vlasenko @ 2009-02-20 21:28 UTC (permalink / raw)
To: ALT Linux Team development discussions
On Sat, Feb 21, 2009 at 12:20:49AM +0300, Dmitry V. Levin wrote:
> On Fri, Feb 20, 2009 at 11:19:13PM +0200, Igor Vlasenko wrote:
> > Раз уже пошла тема о скриптах,
> > поделюсь скриптом rpmlndup.
> >
> > =head1 NAME
> >
> > rpmlndup - a tool that reduces rpm repositories size by hardlinking identical rpms.
>
> hardlink(1) из одноимённого пакета не годится?
Век живи, век учись :)
С другой стороны, hardlink Compare the contents of the files,
а rpmlndup только проверяет md5 sig, вшитую в RPM Header.
Теоретически, значительно быстрее должен работать,
но я о hardlink не знал, не сравнивал.
--
Dr. Igor Vlasenko
--------------------
Topology Department
Institute of Math
Kiev, Ukraine
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [devel] rpmlndup
2009-02-20 21:28 ` Igor Vlasenko
@ 2009-02-20 21:32 ` Mikhail Gusarov
2009-02-20 21:44 ` Led
0 siblings, 1 reply; 9+ messages in thread
From: Mikhail Gusarov @ 2009-02-20 21:32 UTC (permalink / raw)
To: ALT Linux Team development discussions
[-- Attachment #1: Type: text/plain, Size: 656 bytes --]
Twas brillig at 23:28:53 20.02.2009 UTC+02 when vlasenko@imath.kiev.ua did gyre and gimble:
IV> С другой стороны, hardlink Compare the contents of the files, а
IV> rpmlndup только проверяет md5 sig, вшитую в RPM Header.
IV> Теоретически, значительно быстрее должен работать, но я о hardlink
IV> не знал, не сравнивал.
hardlink(1) сравнивает содержимое только в том случае, когда размер
совпадает, так что разница будет перенебрежимо мала.
--
[-- Attachment #2: Type: application/pgp-signature, Size: 196 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [devel] rpmlndup
2009-02-20 21:32 ` Mikhail Gusarov
@ 2009-02-20 21:44 ` Led
2009-02-20 21:49 ` Igor Vlasenko
0 siblings, 1 reply; 9+ messages in thread
From: Led @ 2009-02-20 21:44 UTC (permalink / raw)
To: ALT Linux Team development discussions
On Friday, 20 February 2009 23:32:28 Mikhail Gusarov wrote:
> Twas brillig at 23:28:53 20.02.2009 UTC+02 when vlasenko@imath.kiev.ua did
> gyre and gimble:
>
> IV> С другой стороны, hardlink Compare the contents of the files, а
> IV> rpmlndup только проверяет md5 sig, вшитую в RPM Header.
> IV> Теоретически, значительно быстрее должен работать, но я о hardlink
> IV> не знал, не сравнивал.
>
> hardlink(1) сравнивает содержимое только в том случае, когда размер
> совпадает, так что разница будет перенебрежимо мала.
...и неизвечтно, в чью пользу (если rpmlndup не сравнивает размер, перед тем,
как выковыривать md5):)
--
Led
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [devel] rpmlndup
2009-02-20 21:44 ` Led
@ 2009-02-20 21:49 ` Igor Vlasenko
0 siblings, 0 replies; 9+ messages in thread
From: Igor Vlasenko @ 2009-02-20 21:49 UTC (permalink / raw)
To: ALT Linux Team development discussions
On Fri, Feb 20, 2009 at 11:44:18PM +0200, Led wrote:
> On Friday, 20 February 2009 23:32:28 Mikhail Gusarov wrote:
> > IV> С другой стороны, hardlink Compare the contents of the files, а
> > IV> rpmlndup только проверяет md5 sig, вшитую в RPM Header.
> > IV> Теоретически, значительно быстрее должен работать, но я о hardlink
> > IV> не знал, не сравнивал.
> >
> > hardlink(1) сравнивает содержимое только в том случае, когда размер
> > совпадает, так что разница будет перенебрежимо мала.
>
> ...и неизвечтно, в чью пользу (если rpmlndup не сравнивает размер, перед тем,
> как выковыривать md5):)
выковыривает md5 только из кандидатов в дубли.
--
Dr. Igor Vlasenko
--------------------
Topology Department
Institute of Math
Kiev, Ukraine
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-02-20 21:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-20 21:19 [devel] rpmlndup Igor Vlasenko
2009-02-20 21:20 ` Dmitry V. Levin
2009-02-20 21:28 ` Igor Vlasenko
2009-02-20 21:32 ` Mikhail Gusarov
2009-02-20 21:44 ` Led
2009-02-20 21:49 ` Igor Vlasenko
2009-02-20 21:22 ` Mikhail Gusarov
2009-02-20 21:23 ` [devel] [jt] rpmlndup Dmitry V. Levin
2009-02-20 21:25 ` Mikhail Gusarov
ALT Linux Team development discussions
This inbox may be cloned and mirrored by anyone:
git clone --mirror http://lore.altlinux.org/devel/0 devel/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 devel devel/ http://lore.altlinux.org/devel \
devel@altlinux.org devel@altlinux.ru devel@lists.altlinux.org devel@lists.altlinux.ru devel@linux.iplabs.ru mandrake-russian@linuxteam.iplabs.ru sisyphus@linuxteam.iplabs.ru
public-inbox-index devel
Example config snippet for mirrors.
Newsgroup available over NNTP:
nntp://lore.altlinux.org/org.altlinux.lists.devel
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git