About small caps
Small caps are letterforms shaped like capital letters but sized close to lowercase x-height. The Unicode version uses codepoints from the IPA Extensions (U+0250 to U+02AF) and Phonetic Extensions (U+1D00 to U+1D7F) blocks that were originally encoded for phonetic transcription. The generator maps each lowercase ASCII letter to its small-cap codepoint equivalent, producing a string that survives plain-text fields where CSS font-variant cannot reach.
How it works
The algorithm is a 26-entry hash table. For each input character, lowercase it, look up the small-cap codepoint, and substitute. Characters with no small-cap codepoint (digits, punctuation, q in most fonts) pass through unchanged.
small_cap(c):
c = lowercase(c)
return SMALL_CAP[c] or c
SMALL_CAP = {
a: ᴀ U+1D00, b: ʙ U+0299, c: ᴄ U+1D04, d: ᴅ U+1D05,
e: ᴇ U+1D07, f: ꜰ U+A730, g: ɢ U+0262, h: ʜ U+029C,
i: ɪ U+026A, j: ᴊ U+1D0A, k: ᴋ U+1D0B, l: ʟ U+029F,
m: ᴍ U+1D0D, n: ɴ U+0274, o: ᴏ U+1D0F, p: ᴘ U+1D18,
q: q (no small-cap codepoint), r: ʀ U+0280,
s: ꜱ U+A731, t: ᴛ U+1D1B, u: ᴜ U+1D1C, v: ᴠ U+1D20,
w: ᴡ U+1D21, x: x (no codepoint), y: ʏ U+028F,
z: ᴢ U+1D22
}
Uppercase letters in the input either pass through unchanged or are lowercased first depending on the style variant. The generator uses both approaches to give users the choice.
Worked example
Take the title Quiet Storm and apply the small-caps transform:
- Q lowercased to q has no small-cap codepoint. Output is the regular q (or original Q if uppercase preserved).
- u maps to U+1D1C ᴜ.
- i maps to U+026A ɪ.
- e maps to U+1D07 ᴇ.
- t maps to U+1D1B ᴛ.
- space -> space.
- S -> s -> U+A731 ꜱ.
- t, o, r, m -> ᴛ ᴏ ʀ ᴍ.
Qᴜɪᴇᴛ ꜱᴛᴏʀᴍ. Notice the orphan Q because Unicode has no small-cap q; the letter sits at full uppercase height while the rest of the word is small-cap. Designers who care about consistency use ǫ (U+01EB Latin Small Letter O with Ogonek) as a visual stand-in, but it is technically a different letter.Unicode small-cap codepoint reference
Small-cap codepoints are scattered across three Unicode blocks because they were added incrementally over Unicode versions 1.1 (1993), 4.0 (2003), and 5.1 (2008) as phonetic alphabets were standardised.
| Letter | Codepoint | Block | Notes |
|---|---|---|---|
| ᴀ A | U+1D00 | Phonetic Extensions | Unicode 4.0 (2003) |
| ʙ B | U+0299 | IPA Extensions | Unicode 1.1 (1993) |
| ᴇ E | U+1D07 | Phonetic Extensions | Unicode 4.0 |
| ɢ G | U+0262 | IPA Extensions | Unicode 1.1 |
| ɪ I | U+026A | IPA Extensions | Unicode 1.1 |
| ɴ N | U+0274 | IPA Extensions | Unicode 1.1 |
| q (gap) | none | not encoded | Use regular q or ǫ as stand-in |
| ʀ R | U+0280 | IPA Extensions | Unicode 1.1 |
| ꜰ ꜱ F, S | U+A730, U+A731 | Latin Extended-D | Added in Unicode 5.1 (2008); patchy on pre-2012 fonts |
| x (gap) | none | not encoded | Stays as lowercase x |
Common pitfalls and limitations
- Inconsistent set. q and x lack small-cap codepoints. F and S small caps (U+A730, U+A731) were added in Unicode 5.1 and are missing from fonts shipped before 2012; older Android, iOS 7 and earlier, and Windows XP render them as boxes.
- IPA semantic confusion. The codepoints are categorised as phonetic letters in Unicode's character database. Linguistic tools (CLDR collation, IPA dictionaries) will treat your styled bio as phonetic notation, not regular Latin text.
- Accessibility cost. Screen readers in IPA-aware mode pronounce ʟ as a uvular fricative or similar, not as L. In normal mode they often skip the character entirely. WCAG 1.4.5 (Images of Text) advises using real CSS small-caps via font-variant: small-caps.
- Domain and username rejection. Most platforms restrict @handles and email local parts to ASCII; the IPA block fails IDNA validation outright. Use small caps only in display names, bios, and post bodies.
- Search and case-folding. Search engines NFKC-normalise the IPA block to itself (no fallback to ASCII), so a page using ꜱᴀʟᴇ for "sale" will not rank for the query "sale". Keep the styled form for display, ASCII for indexable content.
Frequently asked questions
Why is the q in small caps weird or missing?
Unicode's Latin small-capital set (in IPA Extensions and Phonetic Extensions blocks) was designed for International Phonetic Alphabet annotations, not for typographic styling. As a result the set is irregular: ᴀ ʙ ᴄ ᴅ ᴇ ꜰ ɢ ʜ ɪ ᴊ ᴋ ʟ ᴍ ɴ ᴏ ᴘ ʀ ꜱ ᴛ ᴜ ᴠ ᴡ x ʏ ᴢ are all present, but q has no widely supported small-cap codepoint. Generators substitute the regular "q" or sometimes "ǫ" as a visual approximation.
How is Unicode small caps different from CSS font-variant: small-caps?
CSS small-caps is true typography: the font supplies a small-cap glyph variant at render time, and the underlying text remains regular Latin. Unicode small caps are separate codepoints, so the styled text is encoded in the string itself. The CSS version is searchable, accessible, and copies as plain text; the Unicode version travels through plain-text fields but loses semantic meaning and accessibility.
Can I use Unicode small caps in a domain name or email address?
No. Domain names are restricted to ASCII (RFC 1035) or IDN-encoded Unicode via Punycode (RFC 3492), and small-cap codepoints in the IPA block are explicitly disallowed by the IDNA 2008 protocol because they would create confusable lookalikes. Email local parts are similarly restricted by most providers.
Will small caps confuse OCR or copy-paste workflows?
OCR generally produces regular lowercase Latin when scanning small-cap glyphs because the characters are visually identical to uppercase. Copy-paste preserves the Unicode codepoints, but any downstream system that case-folds for search (e.g. Gmail, Slack search) treats ᴀ (U+1D00) and a (U+0061) as different characters. To restore searchability, normalise to NFKC and map the IPA block back to ASCII.
Sources and further reading
- Unicode Consortium (2024) IPA Extensions, U+0250 to U+02AF.
- Unicode Consortium (2024) Phonetic Extensions, U+1D00 to U+1D7F.
- Unicode Consortium (2024) Latin Extended-D, U+A720 to U+A7FF.
- IETF (2010) RFC 5891 Internationalized Domain Names in Applications (IDNA): Protocol.
