Linux console tools development discussion
 help / color / mirror / Atom feed
* [kbd] [RFC] tty: kb_value with flags for better Unicode support
@ 2019-04-26 10:52 Reinis Danne
  2019-05-08 10:24 ` Alexey Gladkov
  0 siblings, 1 reply; 2+ messages in thread
From: Reinis Danne @ 2019-04-26 10:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby, linux-kernel, linux-input, kbd

Compliment already existing kbdiacruc and kbdiacrsuc structs and
KD[GS]KBDIACRUC ioctls with Unicode equivalents for kb_value, kbentry and
KD[GS]KBENT ioctls.

```
struct kb_valueuc {
	__u32 flags;		/* 15 used by KTYP */
	__u32 kb_valueuc;	/* Unicode range: 0x0–0x10ffff */
};

struct kbentryuc {
	__u32 kb_table;
	__u32 kb_index;
	struct kb_valueuc;
};

extern kb_valueuc *key_maps[MAX_NR_KEYMAPS];

#define KDGKBENTUC	0x????	/* get one entry in translation table */
#define KDSKBENTUC	0x????	/* set one entry in translation table */
```

Motivation
==========

Since I learned touchtyping, I want to have the same keyboard layout in VT as I
have in X.  So I wrote a keymap file for Latvian (modern) keyboard layout [1]
to use with the kbd package and it works, mostly.

I have three issues:
- Compose sequences with base above Latin-1 not working (fixed).
- CapsLock not working as expected for characters above Latin-1.
- Can't use Meta key with characters above Latin-1.

There are three letters above 0xff on level 1 of this keyboard layout:
ē — U+0113 Dec:275 LATIN SMALL LETTER E WITH MACRON
ā — U+0101 Dec:257 LATIN SMALL LETTER A WITH MACRON
ī — U+012B Dec:299 LATIN SMALL LETTER I WITH MACRON


Compose
=======

I have added some extra letters in the free places to be able to type not only
Latvian and English, but also German and Finnish (e.g., there is letter ö on
level 3 of ē key) for the rare occasions I need them.

This keyboard layout uses a dead key (dead_acute) to access level 3 symbols
(the same as AltGr):

compose diacr base to result
compose '\'' U+0113 to U+00F6

But it didn't work if the base in the compose sequence was above 0xff (patch
[2] is in tty-next).


Key value and flags
===================

The other two issues could be attributed to the lack of proper flags for key
values (key type is encoded in its value).

According to keymaps manual:
```
Each  keysym  may  be prefixed by a '+' (plus sign), in wich case this keysym
is treated as a "letter" and therefore affected by the "CapsLock" the same way
as by "Shift" (to be correct, the CapsLock inverts the Shift state).  The ASCII
letters ('a'-'z' and 'A'-'Z') are made CapsLock'able by default.  If
Shift+CapsLock should not produce a lower case symbol, put lines like

      keycode 30 = +a  A

in the map file.
```

But it doesn't work — CapsLock is ignored for codepoints above 0xff.  Adding
plus signs to all four maps should make them behave the same way (like in X):

#              0              1              2              3
#              Plain          Shift          AltGr          AltGr+Shift
keycode  16 = +U+0113        +U+0112        +U+00F6        +U+00D6

                          |     X       VT
--------------------------+---------------
CapsLock                ē |     Ē       ē
CapsLock+Shift          ē |     ē       Ē
CapsLock+AltGr          ē |     Ö       Ö
CapsLock+Shift+AltGr    ē |     ö       ö

For the key to behave properly, its key type (KTYP) has to be 'letter':

include/uapi/linux/keyboard.h:
#define KT_LETTER	11	/* symbol that can be acted upon by CapsLock */


Thus it is necessary to set KTYP for characters beyond Latin-1; which is not
possible now.

Currently they are defined like this:
```
include/linux/keyboard.h:

extern unsigned short *key_maps[MAX_NR_KEYMAPS];


drivers/tty/vt/defkeymap.c_shipped:

ushort *key_maps[MAX_NR_KEYMAPS] = {
	plain_map, shift_map, altgr_map, NULL,
	ctrl_map, shift_ctrl_map, NULL, NULL,
	alt_map, NULL, NULL, NULL,
	ctrl_alt_map, NULL
};


include/uapi/linux/kd.h:

struct kbentry {
	unsigned char kb_table;
	unsigned char kb_index;
	unsigned short kb_value;	<-- Important!
};


#define KDGKBENT	0x4B46	/* gets one entry in translation table */
#define KDSKBENT	0x4B47	/* sets one entry in translation table */


include/linux/kbd_kern.h:

#define U(x) ((x) ^ 0xf000)

#define BRL_UC_ROW 0x2800


include/uapi/linux/keyboard.h:

#define K(t,v)		(((t)<<8)|(v))
#define KTYP(x)		((x) >> 8)
#define KVAL(x)		((x) & 0xff)
```

The use of ``unsigned short kb_value`` in ``struct kbentry`` prevents setting
KTYP for Unicode characters beyond Latin-1 since there are only two bytes in an
``unsigned short`` and KTYP needs one, not leaving enough space for code points
beyond 0xff.

This breaks CapsLock for keyboard layouts with characters above Latin-1 [3–6].

I think those bugs are closed by mistake, since, to this day, it doesn't work.
And it can't work because of the aforementioned kernel limitations (at least as
far as CapsLock issue in Unicode mode is concerned).

To illustrate, keysym is 16 bits long:

	mmmm tttt nnnn nnnn

	m — mask for (non-)Unicode characters (U macro)
	t — KTYP
	n — KVAL

This also limits the number of Unicode characters — from 0xf000 the mask is
lost. (No Klingon input in VT [not that I want one]. I think
Documentation/admin-guide/unicode.rst talks only about the output. Or am I
missing something?)

See vt_do_kdsk_ioctl() and kbd_keycode() in drivers/tty/vt/keyboard.c for how
the mask and U macro is used.

As a side note: It seems CapsShift has never worked either.  It was suggested
as a workaround to this issue in one of the kernel bugs, but it obviously
wouldn't work.  First, CapsShift needs key map 256 and up (limited by
MAX_NR_KEYMAPS).  Second, in struct kbentry the kb_table index is unsigned char
(0–255).  So, even if one increased MAX_NR_KEYMAPS and recompiled the kernel,
they still wouldn't be able to set the key map, because the ioctl can't index
the table.


Solution
========

A possible fix could be a proper, extensible struct with flags [7] for
kb_value, used in the key_map[] and a pair of new ioctls (see the top of the
mail).

I think the increase in memory usage here is not something to worry about.

That would change key_map[] from ushort to __u64.  So instead of 2 bytes per
keysym, it would use 8 bytes.  The memory usage of keymaps would increase 4
times.  Since there are 7 keymaps by default with 256 keys each, that would
increase memory usage by:

	(8-2)*7*256=42*256=10752 B

Each additional keymap would increase memory usage by:

	8*256=2048 B

Increasing the size of kb_table and kb_index might be useful in the future for
adding multiple keyboard layout support to VT [8].

---
The increase of memory usage could be cut in half if ``__u32 flags`` is dropped
and KTYP is put at the last byte of ``__u32 kb_valueuc``:

#define K(t,v)		(((t)<<24)|(v))
#define KTYP(x)		((x) >> 24)
#define KVAL(x)		((x) & 0xffffff)

But in this case the future-proofing for flags [7,9] would be lost.

Also, there is possible conflict for programs built with old version of K
macros running on newer kernels.  The macros would have to be renamed.
---


Affected users
==============

KTYP or KVAL are used in (they would all have to be updated):
- kernel/debug/kdb/kdb_keyboard.c
- drivers/s390/char/keyboard.c
- drivers/s390/char/tty3270.c
- drivers/staging/speakup/main.c
- drivers/tty/vt/keyboard.c
- drivers/accessibility/braille/braille_console.c
- arch/m68k/atari/atakeyb.c

In addition to those, ``key_maps`` are used in:
- drivers/s390/char/defkeymap.c
- drivers/tty/vt/defkeymap.c_shipped
- drivers/input/keyboard/amikbd.c
- include/linux/keyboard.h
- arch/m68k/amiga/config.c

Also kbd package would have to be updated to take advantage of the change.


Is anybody already working on this? Maybe somebody has done it a long time ago
already, and I just have to do some magic incantations to make it work?

Is it even worth doing?

I'm new to kernel programming, comments from people with better insights are
very much appreciated.


-Reinis


[1] https://odo.lv/xwiki/bin/download/Recipes/LatvianKeyboard/Modern.png
[2] https://lkml.org/lkml/2019/4/11/362
[3] https://bugzilla.kernel.org/show_bug.cgi?id=7063
[4] https://bugzilla.kernel.org/show_bug.cgi?id=7746
[5] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=404503
[6] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/16638
[7] https://blog.ffwll.ch/2013/11/botching-up-ioctls.html
[8] https://www.happyassassin.net/2013/11/23/keyboard-layouts-in-fedora-20-and-previously/
[9] https://lwn.net/Articles/585415/


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-05-08 10:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-26 10:52 [kbd] [RFC] tty: kb_value with flags for better Unicode support Reinis Danne
2019-05-08 10:24 ` Alexey Gladkov

Linux console tools development discussion

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://lore.altlinux.org/kbd/0 kbd/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kbd kbd/ http://lore.altlinux.org/kbd \
		kbd@lists.altlinux.org kbd@lists.altlinux.ru kbd@lists.altlinux.com
	public-inbox-index kbd

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://lore.altlinux.org/org.altlinux.lists.kbd


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git