Using webcontent in applications can be very annoying since webcontent usually contains lots of HTML elements. With one simple action, using regular expressions, all of these HTML elements can be removed from the content. What's left is a clean string, without HTML formatting.
Snippet:
using System.Text.RegularExpressions;
...
public static string RemoveHTML(string in_HTML)
{
return Regex.Replace(lv_HTML, "<(.|\n)*?>", "");
}
Labels: HTML, Regular Expression
Posted by Xander Zelders

3 Comments:
Nice code snippet
You might want to add
Server.HtmlDecode()
to remove any Html encoded entities like £ etc. So you'd have
public static string RemoveHTML(string in_HTML)
{
return Server.HtmlDecode(Regex.Replace(in_HTML, "<(.|\n)*?>", ""));
}
Note: I've replaced lv_HTML with in_HTML in your function (typo?).
Note 2: You'd have to use the fully qualified reference System.Web.HttpContext.Current.Server.HtmlDecode if this function is in a class file rather than a page, usercontrol etc.
great work!
Post a Comment
<< Home