HTML Character Sets


To display an HTML page correctly, the browser must know what character set (encoding) to use:

Example

<meta charset="UTF-8">

HTML Character Sets

All modern computer languages use the UTF-8 character set a sdefault.

The encoding for the early web was ASCII. ASCII used 7 bits for the character, and could only represent 128 different characters (English letters).

For a closer look, study our Complete ASCII Reference.

Windows-1252 was first character set in Windows. It was a copy of ASCII, but used 8-bits to represent 256 different characters (international letters). Windows-1252 is supported by all browsers.

For a closer look, study our Complete Windows-1252 Reference.

Unicode Web growth

HTML 4: ISO-8859-1

The default character from HTML 2.0 to HTML 4.01, was ISO-8859-1.

ISO-8859-1 is an extension to ASCII, with added international characters.

Example

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

In HTML 4, a character set different from ISO-8859-1 can be specified in the <meta> tag:

Example

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8">

All HTML 4 processors also support UTF-8:

Example

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

When a browser detects ISO-8859-1 it normally defaults to Windows-1252, because Windows-1252 has 32 more international characters.

For a closer look, please study: The Complete ISO-8859-1 Reference



In HTML5: Unicode UTF-8

The HTML5 specification encourages web developers to use the UTF-8 character set.

Example

<meta charset="UTF-8">

A character-set different from UTF-8 can be specified in the <meta> tag:

Example

<meta charset="ISO-8859-1">

The Unicode Consortium developed the UTF-8 and UTF-16 standards, because the ISO-8859 character-sets are limited, and not compatible a multilingual environment.

The Unicode Standard covers (almost) all the characters, punctuations, and symbols in the world.

All HTML5 and XML processors support UTF-8, UTF-16, Windows-1252, and ISO-8859.

For a closer look, please study: The Complete Unicode Reference.


Copyright 1999-2023 by Refsnes Data. All Rights Reserved.