HTML stripper
Table of Contents
- The fastest way to strip HTML tags
- How to use
- Defaults & security behavior
- Real-world examples
- 1) Remove everything (plain text)
- 2) Strip attributes only (keep tags, remove class/style/on*)
- 3) Keep links and images but remove ARIA/analytics attributes
- 4) Remove only ARIA attributes but keep all other attributes
- 5) Keep only semantic tags (tags whitelist)
- 6) Preserve inline styles intentionally (advanced)
- 7) Keep scripts or styles intentionally
- 8) Clean scraped HTML but keep simple formatting
- Tips & edge cases
- Related tools
- Extended FAQ
- Does the tool support blacklists (removing specific tags/attributes only)?
- Are <script> and <style> removed by default?
- How do I safely keep links but remove tracking or event attributes?
- Can I preserve inline styles?
- What about ARIA attributes and accessibility metadata?
- Is processing limited?
The fastest way to strip HTML tags
The fastest way to strip HTML tags
PicoToolkit's HTML Stripper cleans HTML from pasted text quickly and predictably. Choose how aggressive the tool is: remove all tags and attributes, strip attributes only (keep tags), or use tag and attribute whitelists to preserve only the markup you need. Note: this tool uses whitelists only (no blacklist mode).
How to use
- Paste your HTML into the input area.
- Choose a mode:
- Remove — HTML tags and attributes: delete every tag and its attributes. Output becomes plain text (all tags removed, including <script> and <style>).
- Attributes only: keep all tags but remove attributes that are not on the attributes whitelist. Tag contents remain (so <script> and <style> blocks are preserved in this mode).
- Tags whitelist: supply a list of tags to keep (for example: p, a, strong). Any tag not in that whitelist is removed. Because only whitelists exist, you must explicitly list tags you want to preserve.
- Attrs whitelist: define which attributes are allowed per tag (for example: a[href], img[src|alt]). Any attribute not on the whitelist will be stripped.
- Preview the result, adjust whitelists or mode, then copy or download cleaned HTML.
- Processing is limited by browser memory (no hard server-side limit). For very large inputs, process in chunks.
Defaults & security behavior
- Whitelist-only model: safe-by-default — attributes and tags are removed unless explicitly allowed by your whitelists or by choosing a mode that keeps tags (Attributes only).
- Script and style handling:
- In Remove mode: <script> and <style> tags and their contents are removed.
- In Attributes only mode: tags (including <script> and <style>) are preserved; only attributes are removed unless whitelisted.
- With Tags whitelist: only tags you list are kept; script/style remain only if you include them.
- Event-handler attributes (on*) and ARIA attributes are removed unless explicitly allowed in the Attrs whitelist.
- When allowing href/src values, avoid javascript: and unsafe data: URIs unless you intentionally permit them. Always validate URLs after cleaning.
Real-world examples
1) Remove everything (plain text)
Input: <div>Hello <a href="http://x">link</a> <img src="i.jpg"></div> Mode: Remove — HTML tags and attributes Output: Hello link i.jpg
2) Strip attributes only (keep tags, remove class/style/on*)
Input: <p class="lead" style="color:red" onclick="x()">Hi <strong>there</strong></p> Mode: Attributes only (no attrs whitelisted) Output: <p>Hi <strong>there</strong></p>
3) Keep links and images but remove ARIA/analytics attributes
Input: <a href="http://site.com" onclick="ga()" aria-label="x">Buy</a> <img src="p.png" alt="pic" data-track="1" aria-hidden="false"> Mode: Attributes only with Attrs whitelist: a[href], img[src|alt] Output: <a href="http://site.com">Buy</a> <img src="p.png" alt="pic">
4) Remove only ARIA attributes but keep all other attributes
Input: <button aria-pressed="true" class="btn" data-id="123">OK</button> Mode: Attributes only with Attrs whitelist: button[class|data-id] Output: <button class="btn" data-id="123">OK</button>
5) Keep only semantic tags (tags whitelist)
Input: <div class="wrap"><p>Intro <span class="meta">meta</span></p></div> Mode: Tags whitelist with tags: p, strong, em Output: <p>Intro meta</p>
6) Preserve inline styles intentionally (advanced)
Input: <h1 style="font-size:24px">Title</h1> Mode: Attributes only with Attrs whitelist: h1[style] Output: <h1 style="font-size:24px">Title</h1> Note: allowing style can reintroduce layout or hidden-content risk — inspect CSS values.
7) Keep scripts or styles intentionally
Input:
<style>.hid{display:none}</style><script>doEvil()</script>
Mode: Attributes only (tags preserved)
Output:
<style>.hid{display:none}</style><script>doEvil()</script>
Note: scripts/styles remain in Attributes only mode; use Remove mode or exclude those tags via Tags whitelist to eliminate them.
8) Clean scraped HTML but keep simple formatting
Input (scraped): <div class="article"><h2>News</h2><p>Text <a href="http://x" onclick="x()">link</a></p></div> Mode: Tags whitelist: h2, p, a + Attrs whitelist: a[href] Output: <h2>News</h2><p>Text <a href="http://x">link</a></p>
Tips & edge cases
- Because the tool uses whitelists only, explicitly add any tag or attribute you want to keep — otherwise it will be removed in whitelist modes.
- Data URIs (images or SVG) and style url() values can hide executable content — avoid whitelisting data: URIs unless you trust the source.
- If you need to remove only lines with certain content, combine this tool with PicoToolkit's filter tool.
- Malformed HTML is processed by a tolerant parser. Still, check critical outputs (emails, imports) before publishing.
Related tools
- HTML to Markdown — convert cleaned HTML into Markdown.
- Escape HTML — encode HTML entities for safe storage or display.
- Unescape HTML — decode entities back into tags.
- HTML to CSV — extract tables after you clean the markup.
- Text to HTML — turn plain text back into basic HTML.
Extended FAQ
Does the tool support blacklists (removing specific tags/attributes only)?
No. The tool operates with whitelists only. You must explicitly list tags and attributes you want to keep. This whitelist-first approach reduces accidental retention of unsafe markup.
Are <script> and <style> removed by default?
It depends on the mode:
- Remove mode: removes all tags and their contents — script/style are removed.
- Attributes only: does not remove tags, so script/style blocks remain (only attributes are stripped).
- Tags whitelist: only tags you list are preserved; script/style remain only if you include them.
Always check outputs when keeping script/style content.
How do I safely keep links but remove tracking or event attributes?
Use Attributes only mode with an Attrs whitelist that includes a[href] (and specific link attributes like title or rel if needed). That removes onclick and data-* tracking attributes while preserving href values.
Can I preserve inline styles?
Yes — include style in the Attrs whitelist for the specific tag (for example, h1[style]). But keep a visible warning: allowing styles can reintroduce layout or hidden-content issues. Inspect CSS values before using in production.
What about ARIA attributes and accessibility metadata?
ARIA attributes are removed by default unless you add them to the Attrs whitelist. If accessibility metadata is important for your output, include only the specific ARIA attributes you need (for example: button[aria-pressed]).
Is processing limited?
There is no hard server-side limit. Processing is constrained by the browser's available memory. For very large files, break the input into chunks or use a server-side workflow.