Howto Gentoo Linux with UTF-8 and Portuguese (Portugal) localization

Purpose

How to configure a Gentoo Linux system to use UTF-8 character encoding and Portuguese (Portugal) localization (language and keyboard).

Background

UTF-8 is a variable-length character encoding, which in this instance means that it uses 1 to 4 bytes per symbol. So, the first UTF-8 byte is used for encoding ASCII, giving the character set full backwards compatibility with ASCII. UTF-8 means that ASCII and Latin characters are interchangeable with little increase in the size of the data, because only the first bit is used.
UTF-8 allows you to work in a standards-compliant and internationally accepted multilingual environment, with a comparatively low data redundancy. UTF-8 is the preferred way for transmitting non-ASCII characters over the Internet, through Email, IRC or almost any other medium.

A Locale is a set of information that most programs use for determining country and language specific settings. The locales and their data are part of the system library and can be found at /usr/share/locale on most systems. A locale name is generally named ab_CD where ab is your two (or three) letter language code (as specified in ISO-639) and CD is your two letter country code (as specified in ISO-3166). Variants are often appended to locale names, e.g. en_GB.utf8 or de_DE@euro.

Solution

Specify the locales we will need in /etc/locale.gen:

# vi /etc/locale.gen
en_GB ISO-8859-1
en_GB.UTF-8 UTF-8
pt_PT ISO-8859-1
pt_PT@euro ISO-8859-15
pt_PT.UTF-8 UTF-8
pt_PT.UTF-8@euro UTF-8

The next step is to run locale-gen. It will generate all the locales we have specified in the /etc/locale.gen file.

# locale-gen

There is one environment variable that needs to be set in order to use our new UTF-8 locales: LC_CTYPE (or optionally LANG, if you want to change the system language as well). Setting the locale globally should be done using /etc/env.d/02locale.

# vi /etc/env.d/02locale
LANG="pt_PT.UTF-8@euro"

Now update the update the environment after the change

# env-update && source /etc/profile

The keyboard layout used by the console is set in /etc/conf.d/keymaps by the KEYMAP variable. For a Portuguese keyboard use pt-latin1 or pt-latin9. Set also EXTENDED_KEYMAPS attributes like "euro".

# vi /etc/conf.d/keymaps</pre>
KEYMAP="pt-latin9"
SET_WINDOWKEYS="yes"
EXTENDED_KEYMAPS="backspace keypad euro"

To enable UTF-8 on the console, you need to edit /etc/rc.conf and set UNICODE="yes".

# vi /etc/rc.conf
UNICODE="yes"

The keyboard layout to be used by the X server is specified in /etc/X11/xorg.conf by the XkbLayout option.

# vi /etc/X11/xorg.conf
Section "InputDevice"
Identifier  "Keyboard0"
Driver  	"kbd"
Option	  	"XkbLayout" "pt"
...
EndSection

There is also additional localisation variable called LINGUAS, which affects to localisation files that get installed in gettext-based programs, and decides used localisation for some specific software packages, such as kde-base/kde-i18n and app-office/openoffice. The variable takes in space-separated list of language codes, and suggested place to set it is /etc/make.conf:

# vi /etc/make.conf
LINGUAS="pt pt_PT en en_GB"

And that's it! Hopefully your system should now be running in full UTF-8/Portuguese support. Good linuxing ;)

Sources