How can I download HTML source in C#

C#

C# Problem Overview


How can I get the HTML source for a given web address in C#?

C# Solutions


Solution 1 - C#

You can download files with the WebClient class:

using System.Net;

using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
    client.DownloadFile("http://yoursite.com/page.html", @"C:\localfile.html");

    // Or you can get the file content without saving it
    string htmlCode = client.DownloadString("http://yoursite.com/page.html");
}

Solution 2 - C#

Basically:

using System.Net;
using System.Net.Http;  // in LINQPad, also add a reference to System.Net.Http.dll

WebRequest req = HttpWebRequest.Create("http://google.com");
req.Method = "GET";

string source;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
	source = reader.ReadToEnd();
}

Console.WriteLine(source);

Solution 3 - C#

The newest, most recent, up to date answer
This post is really old (it's 7 years old when I answered it), so no one of the other answers used the new and recommended way, which is HttpClient class.


HttpClient is considered the new API and it should replace the old ones (WebClient and WebRequest)

string url = "page url";
HttpClient client = new HttpClient();
using (HttpResponseMessage response = client.GetAsync(url).Result)
{
   using (HttpContent content = response.Content)
   {
      string pageContent = content.ReadAsStringAsync().Result;
   }
}

for more information about how to use the HttpClient class (especially in async cases), you can refer this question


NOTE 1: If you want to use async/await

string url = "page url";
HttpClient client = new HttpClient();   // actually only one object should be created by Application
using (HttpResponseMessage response = await client.GetAsync(url))
{
   using (HttpContent content = response.Content)
   {
      string pageContent = await content.ReadAsStringAsync();
   }
}


NOTE 2: If use C# 8 features

string url = "page url";
HttpClient client = new HttpClient();
using HttpResponseMessage response = await client.GetAsync(url);
using HttpContent content = response.Content;
string pageContent = await content.ReadAsStringAsync();

Solution 4 - C#

You can get the HTML source with:

var html = new System.Net.WebClient().DownloadString(siteUrl)

Solution 5 - C#

@cms way is the more recent, suggested in MS website, but I had a hard problem to solve, with both method posted here, now I post the solution for all!

problem: if you use an url like this: www.somesite.it/?p=1500 in some case you get an internal server error (500), although in web browser this www.somesite.it/?p=1500 perfectly work.

solution: you have to move out parameters, working code is:

using System.Net;
//...
using (WebClient client = new WebClient ()) 
{
    client.QueryString.Add("p", "1500"); //add parameters
    string htmlCode = client.DownloadString("www.somesite.it");
    //...
}

here official documentation

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNotDanView Question on Stackoverflow
Solution 1 - C#Christian C. SalvadóView Answer on Stackoverflow
Solution 2 - C#Diego JancicView Answer on Stackoverflow
Solution 3 - C#Hakan FıstıkView Answer on Stackoverflow
Solution 4 - C#XenonView Answer on Stackoverflow
Solution 5 - C#XilmikiView Answer on Stackoverflow