Home | Index | Dotnet4all forum | Dotnet4all Snippets | Submit resources 
About | Mail us 
how to extract SRC from IMG elements in HTML code (22 April 2007)


This piece of code shows how to extract the SRC URL from the IMG element in HTML code, using a regular expression (RegEx). Every match is put into an Array.

public static ArrayList ExtractAllImagesFromHTMLbyURL(string lv_HTML)
{
ArrayList lv_Images = new ArrayList();

try
{
//Find SRC URL from IMG tag
Regex lv_FindAllImages = new Regex(@"]*src\s*=\s*[\""\']?(?
[^""'>\s]*)[\""\']?[^>]*>");

// get all the matches depending upon the regular expression
// and add them to the array.
MatchCollection mMatchCollection = lv_FindAllImages.Matches(lv_HTML);
foreach(Match mMatch in mMatchCollection)
{
string lv_Image = mMatch.Groups["ImageFile"].Value;

lv_Images.Add(lv_Image);
}

return lv_Images;
}
}

Posted by Xander Zelders
 


0 Comments:

Post a Comment

<< Home

 
Previous Posts
    - How to extract URL and Anchor from HTML content
    - Grab the content of a (GZIP) webpage using C#
    - How to extract the host name from an URL (C#)
    - How to Send an email using SMTP (C#)
    - How to remove HTML-tags from web content (C#)
    - How to convert DateTime to SQL valid string



Disclaimer & Terms of Use | DotNet4All.Com concept & © 2004 - 2007 by Zelders² - Holland