About superscript and subscript
Superscript and subscript characters are Unicode codepoints that look like raised or lowered versions of regular letters and digits. They are not styled text; they are independent glyphs originally encoded for math, chemistry, and phonetic notation. The generator maps each ASCII character to its codepoint equivalent in the U+2070 to U+209F numeric range or the scattered Latin-letter superscript and subscript codepoints, producing a string that survives paste into any plain-text field: chat, email subject, Twitter bio, code comment, CSV cell.
How it works
The algorithm is a static lookup table. Each input character is checked against a hash of ASCII to Unicode codepoints. If the character has a superscript or subscript equivalent, it is substituted; otherwise it passes through unchanged. There is no font swap and no styling, because the new codepoints are intrinsically raised or lowered glyphs.
superscript_digit(c) = U+2070 + (c - '0') for c in [0-9] (0,4,5,6,7,8,9)
= U+00B9 / U+00B2 / U+00B3 for 1,2,3 (legacy Latin-1 codepoints)
superscript_letter(c) = scattered codepoints across U+1D2C to U+1D6A,
U+2071 (i), U+207F (n), plus IPA Extensions block
Missing: q (no widely supported glyph)
subscript_digit(c) = U+2080 + (c - '0') for c in [0-9]
subscript_letter(c) = U+2090 to U+209C a,e,h,i,j,k,l,m,n,o,p,r,s,t,u,v,x
Superscript digits 2 and 3 came from Latin-1 (U+00B2, U+00B3) because typewriters and 8-bit code pages already supported them for area and volume notation; the rest were added in Unicode 1.1.
Worked example
Take the chemistry-formula string H2SO4 and apply the subscript transform:
- H (U+0048) has no subscript equivalent. Pass through unchanged.
- 2 (U+0032) maps to U+2080 + 2 = U+2082, rendered ₂.
- S (U+0053) has no subscript equivalent. Pass through.
- O (U+004F) maps to U+2092, rendered ₒ. (Note: ₒ is a phonetic codepoint, intended for IPA, but is what generators substitute.)
- 4 (U+0034) maps to U+2080 + 4 = U+2084, rendered ₄.
H₂Sₒ₄. The widget renders all four mappings in under 1 ms. Notice how the algorithm leaves uppercase H and S unchanged because no codepoint exists for them; the digits and lowercase letter all transform cleanly.Unicode block reference
Superscript and subscript codepoints are scattered across several blocks for historical reasons. Coverage varies by character; some letters have no codepoint at all.
| Style | Block | Range | Notes |
|---|---|---|---|
| Superscript digits 0, 4-9 | Superscripts and Subscripts | U+2070, U+2074 to U+2079 | Universal coverage |
| Superscript digits 1, 2, 3 | Latin-1 Supplement | U+00B9, U+00B2, U+00B3 | Inherited from Latin-1 (1985) |
| Superscript letters (Latin) | Phonetic Extensions + Modifier Letters | U+1D2C to U+1D6A, U+2071, U+207F | Originally for phonetic notation; missing q |
| Subscript digits 0-9 | Superscripts and Subscripts | U+2080 to U+2089 | Universal coverage |
| Subscript letters (Latin) | Superscripts and Subscripts | U+2090 to U+209C | a, e, h, i, j, k, l, m, n, o, p, r, s, t, u, v, x only |
Common pitfalls and limitations
- Missing codepoints. Latin q has no widely supported superscript codepoint, and capital letters (uppercase A-Z) are mostly absent from both blocks. Output for those characters falls back to plain Latin, so MATH^Q renders as MATHᴹᴬᵀᴴQ rather than fully transformed.
- Semantic loss. A subscript 2 in H₂O is just a character to spellcheckers, screen readers, and search engines, not a chemical formula marker. For real chemistry or math use MathML, LaTeX, or Word's equation editor.
- Spreadsheet trap. Excel and Google Sheets treat x² as text. =A1+1 will fail or produce garbage if A1 contains x² because the cell is a string, not a number. Numbers stored as Unicode superscript characters are not numbers.
- Search invisibility. Google indexes H₂O and H2O as different strings; a page using the styled form will rank for neither query strongly. Use plain ASCII in indexable copy, save the styled glyphs for display.
- Line spacing quirks. Superscript codepoints inherit the line height of regular characters, so they look slightly cramped against tall ascenders or descenders.
Frequently asked questions
Why does my superscript text show some letters as plain instead of raised?
Unicode only assigns superscript codepoints to a subset of the Latin alphabet. The complete set is a, b, c (no fixed codepoint, often c is missing), d, e, f, g, h, i, j, k, l, m, n, o, p, r, s, t, u, v, w, x, y, z, plus all digits. Notable gaps include q (no widely supported codepoint). The generator falls through to plain Latin for any unmapped character so the rest of the string survives.
Are these the same as CSS sup tags?
No. CSS <sup> and <sub> tags style the character at render time, controlled by the font and the host page. Unicode superscript characters are separate codepoints baked into the text itself, so they travel through plain-text fields (chat, email subject lines, Twitter posts) without any styling layer. The downside is that they share line height with regular text and look slightly smaller, while CSS superscript can be sized and positioned precisely.
Will math expressions like x squared survive a paste into Word or Excel?
Yes for the visible string, but lose all semantic meaning. Excel will store x² as a 2-character text cell, not as a formula. Word preserves it as text. For real equations use Word's equation editor (Insert > Equation) or LaTeX, which encode mathematical structure that screen readers and spellcheckers can interpret correctly.
Are subscript and superscript safe in URLs and filenames?
URLs allow them in path segments after percent-encoding, but RFC 3986 reserves the ASCII range, so x² becomes x%C2%B2 (the UTF-8 byte sequence) in a real URL. Filenames work on macOS and Linux (UTF-8 filesystems), break on legacy Windows (cp1252) volumes, and are rejected by most cloud upload validators. Avoid them in identifiers; keep them in display strings.
Sources and further reading
- Unicode Consortium (2024) Superscripts and Subscripts, U+2070 to U+209F.
- Unicode Consortium (2024) Phonetic Extensions, U+1D00 to U+1D7F.
- Berners-Lee, T., Fielding, R., Masinter, L. (2005) RFC 3986 Uniform Resource Identifier (URI): Generic Syntax.
- W3C (2014) Mathematical Markup Language (MathML) Version 3.0.
