ALT Linux Community general discussions
 help / color / mirror / Atom feed
* [Comm] В э
@ 2005-10-11  3:02 Andrei Lomov
  2005-10-11  6:32 ` Dmytro O. Redchuk
  2005-10-12 11:33 ` [Comm] entities 2 chars (was: В э) Michael Shigorin
  0 siblings, 2 replies; 5+ messages in thread
From: Andrei Lomov @ 2005-10-11  3:02 UTC (permalink / raw)
  To: community

С кодировкой темы все в порядке.

Пришло письмо с hotmail, примерно в таком виде:

В этот
и т.д.

Как его прочитать?

Спасибо

-- 
Всего доброго
А.Л.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Comm] В э
  2005-10-11  3:02 [Comm] В э Andrei Lomov
@ 2005-10-11  6:32 ` Dmytro O. Redchuk
  2005-10-12 11:33 ` [Comm] entities 2 chars (was: В э) Michael Shigorin
  1 sibling, 0 replies; 5+ messages in thread
From: Dmytro O. Redchuk @ 2005-10-11  6:32 UTC (permalink / raw)
  To: community

On Tue, Oct 11, 2005 at 10:02:52AM +0700, Andrei Lomov wrote:
> С кодировкой темы все в порядке.
> 
> Пришло письмо с hotmail, примерно в таком виде:
> 
> В этот
> и т.д.
> 
> Как его прочитать?
Как html.

> 
> Спасибо
> 
> -- 
> Всего доброго
> А.Л.

-- 
  _,-=._              /|_/|
  `-.}   `=._,.-=-._.,  @ @._,
     `._ _,-.   )      _,.-'
        `    G.m-"^m`m'        Dmytro O. Redchuk



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Comm] entities 2 chars (was: В э)
  2005-10-11  3:02 [Comm] В э Andrei Lomov
  2005-10-11  6:32 ` Dmytro O. Redchuk
@ 2005-10-12 11:33 ` Michael Shigorin
  2005-10-12 15:56   ` [Comm] " Andrei Lomov
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Shigorin @ 2005-10-12 11:33 UTC (permalink / raw)
  To: community


[-- Attachment #1.1: Type: text/plain, Size: 432 bytes --]

On Tue, Oct 11, 2005 at 10:02:52AM +0700, Andrei Lomov wrote:
> С кодировкой темы все в порядке.
> Пришло письмо с hotmail, примерно в таком виде:
> В этот
> и т.д.  Как его прочитать?

Гляньте в аттач, мож пригодится (что-то из подобного уникоду
в упаковке html entities им и раскурочивал).

-- 
 ---- WBR, Michael Shigorin <mike@altlinux.ru>
  ------ Linux.Kiev http://www.linux.kiev.ua/

[-- Attachment #1.2: char2ent.pl --]
[-- Type: text/plain, Size: 5811 bytes --]

#!/usr/bin/perl -w

# char2ent.pl
# 
# Simple utility to convert files with &#ddd; to/from 8bit chars
# See usage at end of this file ( or ./char2ent -h )
# PS works only with 8bit chars, not talking about UTF-16 Unicode here
#
# mode=html (default)
#   Convert 8bit chars (with high bit set) to html entity &#ddd;
#
# mode=work:
#   Convert html entities &#ddd; to the corresponding 8bit char
#
# Christophe Chisogne <christophe@publicityweb.com>

use Getopt::Long;
use strict;

my $PROG = 'char2ent';		# prog name to display
my $VERSION = '0.02';
my $DATE = '2003/11/07';
my $BACK = 'bak';		# extension for backup files

# vars from CLI options
my ($mode, $backup, $confirm, $keep, $version, $help); 
$mode = 'html';
my $resopt = GetOptions('version|v' => \$version,
	'help|h' => \$help,
	'mode=s' => \$mode,
	'backup|b' => \$backup,
	'confirm|c' => \$confirm,
	'keep|k' => \$keep,
	)
or usage();

version() if defined $version;
usage() if (@ARGV != 1) || (defined $help);
my $conv;
if ($mode =~ /html/i) {
	print "Conversion from 8bit chars to &#ddd; entities\n";
	$conv = \&char2ent;
} elsif ($mode =~ /work/i) {
	print "Conversion from &#ddd; entities to 8bit chars\n";
	$conv = \&ent2char;
} else {
	usage();
}

# Latin1 convert table taken (thanks awk ;-) from
# http://www.w3.org/TR/html401/sgml/entities.html
#
# Portions ╘ International Organization for Standardization 1986
# Permission to copy in any form is granted for use with
# conforming SGML systems and applications as defined in
# ISO 8879, provided this notice is included in all copies.

# warning, case sensitive for matches
my %latin1 = (
'&nbsp;' => '&#160;',
'&iexcl;' => '&#161;',
'&cent;' => '&#162;',
'&pound;' => '&#163;',
'&curren;' => '&#164;',
'&yen;' => '&#165;',
'&brvbar;' => '&#166;',
'&sect;' => '&#167;',
'&uml;' => '&#168;',
'&copy;' => '&#169;',
'&ordf;' => '&#170;',
'&laquo;' => '&#171;',
'&not;' => '&#172;',
'&shy;' => '&#173;',
'&reg;' => '&#174;',
'&macr;' => '&#175;',
'&deg;' => '&#176;',
'&plusmn;' => '&#177;',
'&sup2;' => '&#178;',
'&sup3;' => '&#179;',
'&acute;' => '&#180;',
'&micro;' => '&#181;',
'&para;' => '&#182;',
'&middot;' => '&#183;',
'&cedil;' => '&#184;',
'&sup1;' => '&#185;',
'&ordm;' => '&#186;',
'&raquo;' => '&#187;',
'&frac14;' => '&#188;',
'&frac12;' => '&#189;',
'&frac34;' => '&#190;',
'&iquest;' => '&#191;',
'&Agrave;' => '&#192;',
'&Aacute;' => '&#193;',
'&Acirc;' => '&#194;',
'&Atilde;' => '&#195;',
'&Auml;' => '&#196;',
'&Aring;' => '&#197;',
'&AElig;' => '&#198;',
'&Ccedil;' => '&#199;',
'&Egrave;' => '&#200;',
'&Eacute;' => '&#201;',
'&Ecirc;' => '&#202;',
'&Euml;' => '&#203;',
'&Igrave;' => '&#204;',
'&Iacute;' => '&#205;',
'&Icirc;' => '&#206;',
'&Iuml;' => '&#207;',
'&ETH;' => '&#208;',
'&Ntilde;' => '&#209;',
'&Ograve;' => '&#210;',
'&Oacute;' => '&#211;',
'&Ocirc;' => '&#212;',
'&Otilde;' => '&#213;',
'&Ouml;' => '&#214;',
'&times;' => '&#215;',
'&Oslash;' => '&#216;',
'&Ugrave;' => '&#217;',
'&Uacute;' => '&#218;',
'&Ucirc;' => '&#219;',
'&Uuml;' => '&#220;',
'&Yacute;' => '&#221;',
'&THORN;' => '&#222;',
'&szlig;' => '&#223;',
'&agrave;' => '&#224;',
'&aacute;' => '&#225;',
'&acirc;' => '&#226;',
'&atilde;' => '&#227;',
'&auml;' => '&#228;',
'&aring;' => '&#229;',
'&aelig;' => '&#230;',
'&ccedil;' => '&#231;',
'&egrave;' => '&#232;',
'&eacute;' => '&#233;',
'&ecirc;' => '&#234;',
'&euml;' => '&#235;',
'&igrave;' => '&#236;',
'&iacute;' => '&#237;',
'&icirc;' => '&#238;',
'&iuml;' => '&#239;',
'&eth;' => '&#240;',
'&ntilde;' => '&#241;',
'&ograve;' => '&#242;',
'&oacute;' => '&#243;',
'&ocirc;' => '&#244;',
'&otilde;' => '&#245;',
'&ouml;' => '&#246;',
'&divide;' => '&#247;',
'&oslash;' => '&#248;',
'&ugrave;' => '&#249;',
'&uacute;' => '&#250;',
'&ucirc;' => '&#251;',
'&uuml;' => '&#252;',
'&yacute;' => '&#253;',
'&thorn;' => '&#254;',
'&yuml;' => '&#255;',
);

my $ok = 'y';
foreach my $filename (@ARGV) {
	if (defined $confirm) {
		print "Convert file [$filename]? [Yn] ";
		$ok = <STDIN>;
	}
	unless ($ok =~ /n/i) {
		print "Converting file [$filename]...\n";
		convertfile($filename);
	}
}
exit 0;

# convertfile($filename)
sub convertfile {
	my $filename = shift;
	my $tmpname = "$filename.$$";
	open INFILE, $filename or die "Cant open $filename\n";
	open OUTFILE, ">$tmpname" or die "Cant write $tmpname\n";
	while (<INFILE>) {
		print OUTFILE &$conv($_);
	}
	close INFILE;
	close OUTFILE;
	if ($backup) {
		rename($filename, "$filename.$BACK") 
			or die "Cant backup $filename.$BACK\n";
	}
	rename($tmpname, $filename) 
		or die "Cant write $filename from $tmpname\n";
}

# $line2 = char2ent($line)
sub char2ent {
	my $line = shift;
	$line =~ s/(.)/(ord $1 > 127) ? '&#'.ord($1).';' : $1/ge;
	$line;
}

# $line2 = ent2char($line)
sub ent2char {
	my $line = shift;
	# first change all &eacute; etc to &#ddd; unless told otherwise
	unless (defined $keep) {
		foreach my $lat_ent (keys %latin1) {
			$line =~ s/$lat_ent/$latin1{$lat_ent}/ge;
		}
	}
	# then &#ddd; to 8bit char
	$line =~ s/&#(\d\d\d);/chr($1)/ge;
	$line;
}

# version()
sub version {
	print "$PROG v$VERSION, $DATE\n\n";
	print "Convert files with 8bit chars to/from &#ddd; entities\n";
	print "Can convert &name; entities from latin1 (160-255)\n";
	print "\n";	
	usage();
	exit 0;
}

# usage()
sub usage {
	print <<EOF;
Usage:
$PROG [--mode=html|work] [-b] [-c] [-k] 8bitfile.txt ...
$PROG [--help] [--version]

--mode=x,  -m=x   choose html mode (default) or work mode
--backup,  -b     backup of modified file
--confirm, -c     confirm conversion of each file
--keep,    -k     dont translate &name; entities to &#ddd;
EOF
	exit 1;
}


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Comm] Re: entities 2 chars (was: &#1042; &#1101;)
  2005-10-12 11:33 ` [Comm] entities 2 chars (was: &#1042; &#1101;) Michael Shigorin
@ 2005-10-12 15:56   ` Andrei Lomov
  2005-10-13  6:02     ` Dmytro O. Redchuk
  0 siblings, 1 reply; 5+ messages in thread
From: Andrei Lomov @ 2005-10-12 15:56 UTC (permalink / raw)
  To: community

Michael Shigorin wrote:

> On Tue, Oct 11, 2005 at 10:02:52AM +0700, Andrei Lomov wrote:
>> С кодировкой темы все в порядке.
>> Пришло письмо с hotmail, примерно в таком виде:
>> &#1042; &#1101;&#1090;&#1086;&#1090;
>> и т.д.  Как его прочитать?
> 
> Гляньте в аттач, мож пригодится (что-то из подобного уникоду
> в упаковке html entities им и раскурочивал).

Это здорово,
спасибо.

Я пока по совету выше обошелся добавлением двух тегов:

<html>
&#1042; &#1101;&#1090;&#1086;&#1090; ...
</html>

-- и дальше в браузер 

-- 
Всего доброго,
А.Л.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Comm] Re: entities 2 chars (was: &#1042; &#1101;)
  2005-10-12 15:56   ` [Comm] " Andrei Lomov
@ 2005-10-13  6:02     ` Dmytro O. Redchuk
  0 siblings, 0 replies; 5+ messages in thread
From: Dmytro O. Redchuk @ 2005-10-13  6:02 UTC (permalink / raw)
  To: community

On Wed, Oct 12, 2005 at 10:56:06PM +0700, Andrei Lomov wrote:
> Я пока по совету выше обошелся добавлением двух тегов:
> 
> <html>
> &#1042; &#1101;&#1090;&#1086;&#1090; ...
> </html>
:-)

Насколько я помню, *оба* тега есть _опциональными_ (см.
http://www.w3c.org/TR/html40) -- можно было без них.

Думаю, лучше добавлять <PRE>, чтобы сохранить разбивку строк.

> 
> -- и дальше в браузер 
> 
> -- 
> Всего доброго,
> А.Л.

-- 
  _,-=._              /|_/|
  `-.}   `=._,.-=-._.,  @ @._,
     `._ _,-.   )      _,.-'
        `    G.m-"^m`m'        Dmytro O. Redchuk



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-10-13  6:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-11  3:02 [Comm] &#1042; &#1101; Andrei Lomov
2005-10-11  6:32 ` Dmytro O. Redchuk
2005-10-12 11:33 ` [Comm] entities 2 chars (was: &#1042; &#1101;) Michael Shigorin
2005-10-12 15:56   ` [Comm] " Andrei Lomov
2005-10-13  6:02     ` Dmytro O. Redchuk

ALT Linux Community general discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://lore.altlinux.org/community/0 community/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 community community/ http://lore.altlinux.org/community \
		mandrake-russian@linuxteam.iplabs.ru community@lists.altlinux.org community@lists.altlinux.ru community@lists.altlinux.com
	public-inbox-index community

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://lore.altlinux.org/org.altlinux.lists.community


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git