From: Michael Shigorin <mike@osdn.org.ua> To: community@altlinux.ru Cc: oo-discuss@openoffice.ru Subject: [Comm] скриптик для конвертирования í в 8bit Date: Thu, 25 Aug 2005 14:56:00 +0300 Message-ID: <20050825115600.GP13435@osdn.org.ua> (raw) [-- Attachment #1: Type: text/plain, Size: 520 bytes --] Здравствуйте. Попался тут скриптик, который чем-то перекликается с CyrillicTools для OpenOffice.org -- умеет HTML entities (а именно latin1) перегонять в 8-bit или &#xxx;. Понадобилось при разборе тугамента, экспортированного вордом в HTML (получилась каша из восьмибитки и зюкобукв). Т.к. кажется, что может быть полезен и другим -- цепляю :) http://mailman.mamemu.de/pipermail/webmin-trans/2003-November/000052.html -- ---- WBR, Michael Shigorin <mike@altlinux.ru> ------ Linux.Kiev http://www.linux.kiev.ua/ [-- Attachment #2: char2ent.pl --] [-- Type: text/plain, Size: 5576 bytes --] #!/usr/bin/perl -w # char2ent.pl # # Simple utility to convert files with &#ddd; to/from 8bit chars # See usage at end of this file ( or ./char2ent -h ) # PS works only with 8bit chars, not talking about UTF-16 Unicode here # # mode=html (default) # Convert 8bit chars (with high bit set) to html entity &#ddd; # # mode=work: # Convert html entities &#ddd; to the corresponding 8bit char # # Christophe Chisogne <christophe@publicityweb.com> use Getopt::Long; use strict; my $PROG = 'char2ent'; # prog name to display my $VERSION = '0.02'; my $DATE = '2003/11/07'; my $BACK = 'bak'; # extension for backup files # vars from CLI options my ($mode, $backup, $confirm, $keep, $version, $help); $mode = 'html'; my $resopt = GetOptions('version|v' => \$version, 'help|h' => \$help, 'mode=s' => \$mode, 'backup|b' => \$backup, 'confirm|c' => \$confirm, 'keep|k' => \$keep, ) or usage(); version() if defined $version; usage() if (@ARGV != 1) || (defined $help); my $conv; if ($mode =~ /html/i) { print "Conversion from 8bit chars to &#ddd; entities\n"; $conv = \&char2ent; } elsif ($mode =~ /work/i) { print "Conversion from &#ddd; entities to 8bit chars\n"; $conv = \&ent2char; } else { usage(); } # Latin1 convert table taken (thanks awk ;-) from # http://www.w3.org/TR/html401/sgml/entities.html # # Portions ╘ International Organization for Standardization 1986 # Permission to copy in any form is granted for use with # conforming SGML systems and applications as defined in # ISO 8879, provided this notice is included in all copies. # warning, case sensitive for matches my %latin1 = ( ' ' => ' ', '¡' => '¡', '¢' => '¢', '£' => '£', '¤' => '¤', '¥' => '¥', '¦' => '¦', '§' => '§', '¨' => '¨', '©' => '©', 'ª' => 'ª', '«' => '«', '¬' => '¬', '­' => '­', '®' => '®', '¯' => '¯', '°' => '°', '±' => '±', '²' => '²', '³' => '³', '´' => '´', 'µ' => 'µ', '¶' => '¶', '·' => '·', '¸' => '¸', '¹' => '¹', 'º' => 'º', '»' => '»', '¼' => '¼', '½' => '½', '¾' => '¾', '¿' => '¿', 'À' => 'À', 'Á' => 'Á', 'Â' => 'Â', 'Ã' => 'Ã', 'Ä' => 'Ä', 'Å' => 'Å', 'Æ' => 'Æ', 'Ç' => 'Ç', 'È' => 'È', 'É' => 'É', 'Ê' => 'Ê', 'Ë' => 'Ë', 'Ì' => 'Ì', 'Í' => 'Í', 'Î' => 'Î', 'Ï' => 'Ï', 'Ð' => 'Ð', 'Ñ' => 'Ñ', 'Ò' => 'Ò', 'Ó' => 'Ó', 'Ô' => 'Ô', 'Õ' => 'Õ', 'Ö' => 'Ö', '×' => '×', 'Ø' => 'Ø', 'Ù' => 'Ù', 'Ú' => 'Ú', 'Û' => 'Û', 'Ü' => 'Ü', 'Ý' => 'Ý', 'Þ' => 'Þ', 'ß' => 'ß', 'à' => 'à', 'á' => 'á', 'â' => 'â', 'ã' => 'ã', 'ä' => 'ä', 'å' => 'å', 'æ' => 'æ', 'ç' => 'ç', 'è' => 'è', 'é' => 'é', 'ê' => 'ê', 'ë' => 'ë', 'ì' => 'ì', 'í' => 'í', 'î' => 'î', 'ï' => 'ï', 'ð' => 'ð', 'ñ' => 'ñ', 'ò' => 'ò', 'ó' => 'ó', 'ô' => 'ô', 'õ' => 'õ', 'ö' => 'ö', '÷' => '÷', 'ø' => 'ø', 'ù' => 'ù', 'ú' => 'ú', 'û' => 'û', 'ü' => 'ü', 'ý' => 'ý', 'þ' => 'þ', 'ÿ' => 'ÿ', ); my $ok = 'y'; foreach my $filename (@ARGV) { if (defined $confirm) { print "Convert file [$filename]? [Yn] "; $ok = <STDIN>; } unless ($ok =~ /n/i) { print "Converting file [$filename]...\n"; convertfile($filename); } } exit 0; # convertfile($filename) sub convertfile { my $filename = shift; my $tmpname = "$filename.$$"; open INFILE, $filename or die "Cant open $filename\n"; open OUTFILE, ">$tmpname" or die "Cant write $tmpname\n"; while (<INFILE>) { print OUTFILE &$conv($_); } close INFILE; close OUTFILE; if ($backup) { rename($filename, "$filename.$BACK") or die "Cant backup $filename.$BACK\n"; } rename($tmpname, $filename) or die "Cant write $filename from $tmpname\n"; } # $line2 = char2ent($line) sub char2ent { my $line = shift; $line =~ s/(.)/(ord $1 > 127) ? '&#'.ord($1).';' : $1/ge; $line; } # $line2 = ent2char($line) sub ent2char { my $line = shift; # first change all é etc to &#ddd; unless told otherwise unless (defined $keep) { foreach my $lat_ent (keys %latin1) { $line =~ s/$lat_ent/$latin1{$lat_ent}/ge; } } # then &#ddd; to 8bit char $line =~ s/&#(\d\d\d);/chr($1)/ge; $line; } # version() sub version { print "$PROG v$VERSION, $DATE\n\n"; print "Convert files with 8bit chars to/from &#ddd; entities\n"; print "Can convert &name; entities from latin1 (160-255)\n"; print "\n"; usage(); exit 0; } # usage() sub usage { print <<EOF; Usage: $PROG [--mode=html|work] [-b] [-c] [-k] 8bitfile.txt ... $PROG [--help] [--version] --mode=x, -m=x choose html mode (default) or work mode --backup, -b backup of modified file --confirm, -c confirm conversion of each file --keep, -k dont translate &name; entities to &#ddd; EOF exit 1; }
reply other threads:[~2005-08-25 11:56 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20050825115600.GP13435@osdn.org.ua \ --to=mike@osdn.org.ua \ --cc=community@altlinux.ru \ --cc=oo-discuss@openoffice.ru \ --cc=shigorin@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
ALT Linux Community general discussions This inbox may be cloned and mirrored by anyone: git clone --mirror http://lore.altlinux.org/community/0 community/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 community community/ http://lore.altlinux.org/community \ mandrake-russian@linuxteam.iplabs.ru community@lists.altlinux.org community@lists.altlinux.ru community@lists.altlinux.com public-inbox-index community Example config snippet for mirrors. Newsgroup available over NNTP: nntp://lore.altlinux.org/org.altlinux.lists.community AGPL code for this site: git clone https://public-inbox.org/public-inbox.git