WorkTheWeb Forums > extracting the domain name

Full Version: extracting the domain name

WorkTheWeb Forums > Webmaster Resources > Webmaster - General Help

William Tasso

Jul 12 2005, 03:04 PM

Greetings One and All

Been pondering domain names.

Do you have an algo (that you don't mind sharing) that can pull the domain
name from any of the likely strings (host/server name or URL) available to
the (web) server?

Either linear rule string manipulation or regular expression would suit
just fine.

Thanks for reading.
--
William Tasso

** Business as usual

Ignoramus31199

Jul 12 2005, 03:07 PM

On Tue, 12 Jul 2005 17:04:01 +0100, William Tasso <[Email Removed]> wrote:

QUOTE

Using the CGI perl module, you can use function

$cgi->virtual_host ();

i

QUOTE

Either linear rule string manipulation or regular expression would suit
just fine.

Thanks for reading.

William Tasso

Jul 12 2005, 03:27 PM

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ignoramus31199 <[Email Removed]> said:

QUOTE

...

$cgi->virtual_host ();

thanks - I can get the hostname, it's the domain name I'm after.

--
William Tasso

** Business as usual

Ignoramus31199

Jul 12 2005, 03:36 PM

On Tue, 12 Jul 2005 17:27:16 +0100, William Tasso <[Email Removed]> wrote:

QUOTE

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ignoramus31199 <[Email Removed]> said:

...

$cgi->virtual_host ();

thanks - I can get the hostname, it's the domain name I'm after.

you mean, say the hostname is www1.domain.com, you want to extract
domain.com?

I would do a "split", and then "pop" the last two items. It is good
except for countries where domains are like domain.co.uk. perhaps if
the second pop is a 'co' or some such, you could pop one more item.

i
--

Matt Probert

Jul 12 2005, 03:43 PM

Once upon a time, far far away "William Tasso"
<[Email Removed]> muttered

QUOTE

Greetings One and All

And pop-pickers?

QUOTE

Been pondering domain names.

Everyone needs a hobby....

QUOTE

Do you have an algo (that you don't mind sharing) that can pull the domain
name from any of the likely strings (host/server name or URL) available to
the (web) server?

Either linear rule string manipulation or regular expression would suit
just fine.

Can you be a bit more explicit.

QUOTE

Thanks for reading.

That's okay. The invoice is in the post.

QUOTE

--
William Tasso

Matt Probert

QUOTE

** Business as usual

Same as always

Gandalf Parker

Jul 12 2005, 04:01 PM

"William Tasso" <[Email Removed]> wrote in news:op.sts6nqcvm9g4qz-
[Email Removed]:

QUOTE

$cgi->virtual_host ();

thanks - I can get the hostname, it's the domain name I'm after.

Oh you mean you just want to extract the part that would be used for
emailing? Like setting a spider to gather possible email addresses?

I think thats been done quite abit but not openly released for obvious
reasons

Gandalf Parker

GreyWyvern

Jul 12 2005, 04:10 PM

And lo, Matt Probert didst speak in alt.www.webmaster:

QUOTE

Once upon a time, far far away "William Tasso"
<[Email Removed]> muttered

Either linear rule string manipulation or regular expression would suit
just fine.

Can you be a bit more explicit.

William is explicit enough without you asking him to be more so! :P Next
thing you know you'll be asking him to take off that Speedo of his...

Grey

--
The technical axiom that nothing is impossible sinisterly implies the
pitfall corollary that nothing is ridiculous.
- http://www.greywyvern.com/ringmaker - Orca Ringmaker: Host a web ring
from your website!

William Tasso

Jul 12 2005, 04:39 PM

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ignoramus31199 <[Email Removed]> said:

QUOTE

...
you mean, say the hostname is www1.domain.com, you want to extract
domain.com?

I would do a "split", and then "pop" the last two items. It is good
except for countries where domains are like domain.co.uk. perhaps if
the second pop is a 'co' or some such, you could pop one more item.

Yes - that is exactly the issue, consider these hosts ...

www.example.com
example.com
office.example.com
admin.office.example.com

www.example.co.uk
etc...

www.example.uk.net
etc...

--
William Tasso

** Business as usual

William Tasso

Jul 12 2005, 04:39 PM

Writing in news:alt.www.webmaster
From the safety of the GreyWyvern.com cafeteria
GreyWyvern <[Email Removed]> said:

QUOTE

And lo, Matt Probert didst speak in alt.www.webmaster:

Once upon a time, far far away "William Tasso"
<[Email Removed]> muttered

Either linear rule string manipulation or regular expression would suit
just fine.

Can you be a bit more explicit.

William is explicit enough without you asking him to be more so! :P
Next thing you know you'll be asking him to take off that Speedo of
his...

Ugh - as always, the invitation will be politely refused.

--
William Tasso

** Business as usual

William Tasso

Jul 12 2005, 04:44 PM

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Gandalf Parker <[Email Removed]> said:

QUOTE

"William Tasso" <[Email Removed]> wrote in news:op.sts6nqcvm9g4qz-
[Email Removed]:

$cgi->virtual_host ();

thanks - I can get the hostname, it's the domain name I'm after.

Oh you mean you just want to extract the part that would be used for
emailing?

well not really

[Email Removed]
and
[Email Removed]

may both be valid mail addresses. I'm only interested in the domain.

QUOTE

Like setting a spider to gather possible email addresses?

Heh he - no need these hosts/domains already sit nicely on my servers.
I'm writing a generic component that incidentally needs to know the domain
name regadless of hostname.

QUOTE

I think thats been done quite abit but not openly released for obvious
reasons

except that in the case of mail spiders there is no incentive to get it
right.

--
William Tasso

** Business as usual

John Bokma

Jul 12 2005, 05:30 PM

"William Tasso" <[Email Removed]> wrote:

QUOTE

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ignoramus31199 <[Email Removed]> said:

...
you mean, say the hostname is www1.domain.com, you want to extract
domain.com?

I would do a "split", and then "pop" the last two items. It is good
except for countries where domains are like domain.co.uk. perhaps if
the second pop is a 'co' or some such, you could pop one more item.

Yes - that is exactly the issue, consider these hosts ...

www.example.com
example.com
office.example.com
admin.office.example.com

www.example.co.uk
etc...

www.example.uk.net
etc...

Asked this some time ago in the Perl group, and uhm.. didn't get a nice
reply back :-) There is no simple rule to do this. There are also things
like example.uk.com iirc.

--
John Perl SEO tools: http://johnbokma.com/perl/
Experienced (web) developer: http://castleamber.com/
Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html

Ignoramus31199

Jul 12 2005, 05:32 PM

On 12 Jul 2005 18:30:42 GMT, John Bokma <[Email Removed]> wrote:

QUOTE

Asked this some time ago in the Perl group, and uhm.. didn't get a nice
reply back :-)

Nice reply from a perl group? That's unrealistic... :)

i

John Bokma

Jul 12 2005, 06:33 PM

Ignoramus31199 <[Email Removed]> wrote:

QUOTE

On 12 Jul 2005 18:30:42 GMT, John Bokma <[Email Removed]> wrote:
Asked this some time ago in the Perl group, and uhm.. didn't get a nice
reply back :-)

Nice reply from a perl group? That's unrealistic... :)

Nah, often the not so nice replies are justified, but in this case no. (And
I am not saying that because I asked :-D )

--
John Perl SEO tools: http://johnbokma.com/perl/
Experienced (web) developer: http://castleamber.com/
Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html

Ed Wurster

Jul 12 2005, 07:55 PM

William Tasso wrote:

QUOTE

Greetings One and All

Been pondering domain names.

Do you have an algo (that you don't mind sharing) that can pull the
domain name from any of the likely strings (host/server name or URL)
available to the (web) server?

Either linear rule string manipulation or regular expression would
suit just fine.

Comment 32 on this page looks promising:

http://tinyurl.com/c76d6

Can't say that I understand it, but would like to.

Ed

William Tasso

Jul 12 2005, 09:05 PM

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ed Wurster <[Email Removed]> said:

QUOTE

...
Comment 32 on this page looks promising:

http://tinyurl.com/c76d6

Yes, but so far as I can tell (I still don't parse regular expressions in
my head) it still fails when there is no hostname preceeding the domain
name. e.g. http://example.com would return ".com" - I think

QUOTE

Can't say that I understand it, but would like to.

It's a long road, but regular expressions can save a heap of scripting.

--
William Tasso

** Business as usual

John Bokma

Jul 12 2005, 09:22 PM

"William Tasso" <[Email Removed]> wrote:

QUOTE

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ed Wurster <[Email Removed]> said:

...
Comment 32 on this page looks promising:

http://tinyurl.com/c76d6

Yes, but so far as I can tell (I still don't parse regular expressions
in my head) it still fails when there is no hostname preceeding the
domain name. e.g. http://example.com would return ".com" - I think

Can't say that I understand it, but would like to.

It's a long road, but regular expressions can save a heap of
scripting.

for ( $domain ) {

# remove sub domain
s/www?d?.([^.]+.)/$1/g; # common ww(w)(d) prefix
s/.+.([^.]+.[a-z]{3})$/$1/g;
s/.+.([^.]+.[^.]+.[a-z]{2})$/$1/g;
}

is what I used for a project. Note: it's far from perfect :-D.

--
John Perl SEO tools: http://johnbokma.com/perl/
Experienced (web) developer: http://castleamber.com/
Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html

William Tasso

Jul 12 2005, 10:17 PM

Writing in news:alt.www.webmaster
From the safety of the Castle Amber - software development cafeteria
John Bokma <[Email Removed]> said:

QUOTE

...
for ( $domain ) {

# remove sub domain
s/www?d?.([^.]+.)/$1/g; # common ww(w)(d) prefix
s/.+.([^.]+.[a-z]{3})$/$1/g;
s/.+.([^.]+.[^.]+.[a-z]{2})$/$1/g;
}

thanks, but ...

QUOTE

is what I used for a project. Note: it's far from perfect :-D.

yes, having thought on it some more I'm coming to the conclusion it cannot
be done purely with logic.

As these domains are all known to me I think I'll have to do a match
against a data store - splitting the hostname and starting at the back,
adding elements till I find a match.

Really can't think of anything else that is guaranteed to work (for an
acceptable value of 'work')

--
William Tasso

** Business as usual

Norman L. DeForest

Jul 13 2005, 01:22 AM

On 12 Jul 2005, John Bokma wrote:

QUOTE

"William Tasso" <[Email Removed]> wrote:

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ed Wurster <[Email Removed]> said:

...
Comment 32 on this page looks promising:

http://tinyurl.com/c76d6

Yes, but so far as I can tell (I still don't parse regular expressions
in my head) it still fails when there is no hostname preceeding the
domain name. e.g. http://example.com would return ".com" - I think

Can't say that I understand it, but would like to.

It's a long road, but regular expressions can save a heap of
scripting.

for ( $domain ) {

# remove sub domain
s/www?d?.([^.]+.)/$1/g; # common ww(w)(d) prefix
s/.+.([^.]+.[a-z]{3})$/$1/g;
s/.+.([^.]+.[^.]+.[a-z]{2})$/$1/g;
}

is what I used for a project. Note: it's far from perfect :-D.

What would it do with this?
http://www.www.co.uk/

What about sites that use "www." and "www2." and "www3." and ... ?

--
Can you Change: MINDWORKS to MINDWORKS (* == Book)
*HALIFAX HALIFAX*
in 76 moves? Try http://www.chebucto.ns.ca/~af380/MHPuzzle.html
(Requires a browser supporting the W3C DOM such as Firefox or IE ver 6)

John Bokma

Jul 13 2005, 03:51 PM

"Norman L. DeForest" <[Email Removed]> wrote:

QUOTE

On 12 Jul 2005, John Bokma wrote:

"William Tasso" <[Email Removed]> wrote:

Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ed Wurster <[Email Removed]> said:

...
Comment 32 on this page looks promising:

http://tinyurl.com/c76d6

Yes, but so far as I can tell (I still don't parse regular
expressions
in my head) it still fails when there is no hostname preceeding
the
domain name. e.g. http://example.com would return ".com" - I
think

Can't say that I understand it, but would like to.

It's a long road, but regular expressions can save a heap of
scripting.

for ( $domain ) {

# remove sub domain
s/www?d?.([^.]+.)/$1/g; # common ww(w)(d) prefix
s/.+.([^.]+.[a-z]{3})$/$1/g;
s/.+.([^.]+.[^.]+.[a-z]{2})$/$1/g;
}

is what I used for a project. Note: it's far from perfect :-D.

What would it do with this?
http://www.www.co.uk/

Give you www.co.uk (as far as I can see).

It has a flaw though: s/www?d?.([^.]+.)/$1/g; should be written as:

s/^www?d?.([^.]+.)/$1/g

otherwise foo.blawww.com gives odd results.

QUOTE

What about sites that use "www." and "www2." and "www3." and ... ?

stripped off. If you use the above fix, only when it starts with those.

It also handles ww (some people use ww.example.com). If you also want to
handle wwww, you might want to use:

s/^w{2,4}d?.([^.]+.)/$1/g;

But again, this is far from perfect, and there are probably a lot of
examples which make this fail.

--
John Perl SEO tools: http://johnbokma.com/perl/
Experienced (web) developer: http://castleamber.com/
Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html

PHP Help | Linux Help | Web Hosting | Reseller Hosting | SSL Hosting

This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.