Whitespaces

In the Internet Object format, whitespace refers to any character with a Unicode code point less than or equal to U+0020 (i.e., characters in the range U+0000 to U+0020). This range includes both non-printable control characters and common whitespace characters such as the horizontal tab (U+0009), newline (U+000A), vertical tab (U+000B), form feed (U+000C), carriage return (U+000D), and space (U+0020).

In addition to the characters in the range U+0000 to U+0020, the Internet Object format also includes characters in the Unicode whitespace category as whitespace. This includes characters such as the non-breaking space (U+00A0), em space (U+2003), and en space (U+2002), among others. Including Unicode whitespace characters can also make it easier to work with text in languages that use non-Latin scripts, such as Arabic, Chinese, or Japanese.

It's also worth noting that the Internet Object format recognizes the zero-width non-breaking space (U+FEFF) as whitespace. This character is often used as a byte order mark (BOM) in Unicode-encoded documents. Incorporating a more comprehensive range of whitespace characters in Internet Object offers several advantages that can make the format easier to work with, more readable, and more compatible with different systems and programming languages.

The following table shows a list of valid whitespace characters.

code points

Description

U+0000 to U+0020

Space, Line Feed, Carriage Return, Tab, Bell, etc.

Any character having charCode <=0x20 such as space. ASCII space and control characters

U+1680

Ogham Space Mark

Space used in Ogham scripts

U+2000

En Quad

Space equal to the width of the lowercase letter "n"

U+2001

Em Quad

Space equal to the width of the uppercase letter "M"

U+2002

En Space

Space equal to half the width of the em space

U+2003

Em Space

Space equal to the width of the em space

U+2004

Three-per-Em Space

Space equal to one-third of an em space

U+2005

Four-per-Em Space

Space equal to one-quarter of an em space

U+2006

Six-per-Em Space

Space equal to one-sixth of an em space

U+2007

Figure Space

Space equal to the width of a numeral character

U+2008

Punctuation Space

Space used for punctuation

U+2009

Thin Space

Space narrower than the regular space character

U+200A

Hair Space

Very narrow space used for special purposes

U+2028

Line Separator

Character used to separate lines in text

U+2029

Paragraph Separator

Character used to separate paragraphs in text

U+202F

Narrow No-Break Space

Non-breaking space narrower than the regular space character

U+205F

Medium Mathematical Space

Space used in mathematical notation

U+3000

Ideographic Space

Space used in East Asian scripts

U+FEFF

Byte Order Mark (BOM)

Zero Width Non-Breaking Space

Internet Object is not whitespace-sensitive, which means that the parser ignores the whitespaces surrounding the values and structural elements. However, any whitespace characters found within the values or strings themselves are preserved.

Using whitespace can help to make the Internet Object document more readable, but it is not required. When transmitting data over the wire, any whitespace characters can be safely ignored as they do not affect the document's content.

Last updated