How to extract domain from URL

You want to extract a domain from URL. The list of URLs to support might be as below.
For this list you expect the domain sub.domain.com :

  • sub.domain.com/folder?p1=v1
  • www.sub.domain.com/folder?p1=v1
  • http://sub.domain.com/folder?p1=v1
  • https://sub.domain.com/folder?p1=v1
  • https://www.sub.domain.com/folder?p1=v1

For this list you expect the domain domain.com:

  • domain.com/folder?p1=v1
  • www.domain.com/folder?p1=v1
  • http://domain.com/folder?p1=v1
  • https://domain.com/folder?p1=v1
  • https://www.domain.com/folder?p1=v1

For this list you expect the domain sub.sub.domain.com:

  • sub.sub.domain.com/folder?p1=v1
  • www.sub.sub.domain.com/folder?p1=v1
  • http://sub.sub.domain.com/folder?p1=v1
  • https://sub.sub.domain.com/folder?p1=v1
  • https://www.sub.sub.domain.com/folder?p1=v1

Extract domain

If all your URLs start with HTTP or HTTPS you may use Uri class.

var host = new Uri(url).Host;
if (host.StartsWith("www."))
    host = host.Remove(0, 4);

In order to extract the same domain from URLs without specified protocol, you should use regular expression.

const string urlPattern = @"^(http://|https://)?(www.)?((?<domain>[a-zA-Z0-9.\-_]+)\/)";
var matchedGroups = .Match(url, urlPattern).Groups;
if (matchedGroups.Count > 0)
{
    var domainGroup = matchedGroups["domain"];
    if (domainGroup != null)
        return domainGroup.Value;
}
return string.Empty;

This expression tells Regex to  look for matches starting from the beginning of the string (^). Then look for some text which ends with / and extract another group without /. It successfully extracts all required domains which are stated above.

See example at RegexStorm.net.

See also

Tags:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Post

%d bloggers like this: