Home | Index | Dotnet4all forum | Dotnet4all Snippets | Submit resources 
About | Mail us 
How to remove HTML-tags from web content (C#) (16 April 2007)


Using webcontent in applications can be very annoying since webcontent usually contains lots of HTML elements. With one simple action, using regular expressions, all of these HTML elements can be removed from the content. What's left is a clean string, without HTML formatting.

Snippet:
  using System.Text.RegularExpressions;

...

public static string RemoveHTML(string in_HTML)
{
return Regex.Replace(lv_HTML, "<(.|\n)*?>", "");
}

Labels: ,


Posted by Xander Zelders



3 Comments:

Blogger sachit said...

Nice code snippet

October 9, 2007 12:24 PM  
Blogger jon said...

You might want to add

Server.HtmlDecode()

to remove any Html encoded entities like &pound; etc. So you'd have


public static string RemoveHTML(string in_HTML)
{
return Server.HtmlDecode(Regex.Replace(in_HTML, "<(.|\n)*?>", ""));
}


Note: I've replaced lv_HTML with in_HTML in your function (typo?).
Note 2: You'd have to use the fully qualified reference System.Web.HttpContext.Current.Server.HtmlDecode if this function is in a class file rather than a page, usercontrol etc.

February 22, 2008 4:34 PM  
Blogger Thomas said...

great work!

February 24, 2008 3:23 PM  

Post a Comment

<< Home

 
Previous Posts
    - How to convert DateTime to SQL valid string



Disclaimer & Terms of Use | DotNet4All.Com concept & © 2004 - 2007 by Zelders² - Holland