Hello Devz,

Sometimes it can be useful to copy a part of the content from a website. That’s where web scraping is useful and HTML Agility Pack is one of the best tools to do it. In this tutorial, I will show you a simple HTML Agility Pack example.

Decide what content you need

Say I wanted to have a list of all the countries in the world along with their country codes. It’s possible to do a quick search, find a website listing them and scrape it for the content. Simply open the web page with C# to get the content, find keywords and scrape the data.

Web scraping with this HTML Agility Pack example

HTML Agility Pack is a free and open source tool that is really useful to get the nodes we want from a web page.

In the below code I show you how to do this HTML Agility Pack example to get the country names and codes:

using HtmlAgilityPack;
using System;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

namespace WebScraper
    class Program
        static void Main(string[] args)

        public static void WebDataScrap()
                //Get the content of the URL from the Web
                const string url = "http://www.nationsonline.org/oneworld/country_code_list.htm";
                var web = new HtmlWeb();
                var doc = web.Load(url);

                //Get the content from a file
                //var path = "countries.html";
                //var doc = new HtmlDocument();

                //Filter the content
                                .Where(n => n.Name == "script")
                                .ForEach(n => n.Remove());

                const string classValue = "border1";
                var nodes = doc.DocumentNode.SelectNodes($"//*[@class='{classValue}']") ?? Enumerable.Empty<HtmlNode>();

                //Write the desired content to a file
                using (var file = new StreamWriter("test.txt"))
                    foreach (var node in nodes)
                        //Get the country name
                        var splittedWords = Regex.Split(node.InnerText, "\n");
                        var words = splittedWords
                            .Where(x => !x.Contains("&nbsp;") && !string.IsNullOrEmpty(x.Trim()))

                        if (words.Count() != 4) continue;

                        var countryName = words[0].Trim();
                        var countryCode = words[2].Trim();
                        var result = $"{countryName};{countryCode}";


                Console.WriteLine("\r\nPlease press a key...");
            catch (Exception ex)
                Console.WriteLine($"An error occured:\r\n{ex.Message}");

Note about CSS classes

Of course the way to get the content of a web page will depend on the page itself. This code can’t be generic, but will generally depend on CSS classes name used.

Happy web scraping!  🙂