William Tasso
Jul 12 2005, 03:04 PM
Greetings One and All
Been pondering domain names.
Do you have an algo (that you don't mind sharing) that can pull the domain
name from any of the likely strings (host/server name or URL) available to
the (web) server?
Either linear rule string manipulation or regular expression would suit
just fine.
Thanks for reading.
--
William Tasso
** Business as usual
Ignoramus31199
Jul 12 2005, 03:07 PM
On Tue, 12 Jul 2005 17:04:01 +0100, William Tasso <[Email Removed]> wrote:
QUOTE |
Greetings One and All
Been pondering domain names.
Do you have an algo (that you don't mind sharing) that can pull the domain name from any of the likely strings (host/server name or URL) available to the (web) server?
|
Using the CGI perl module, you can use function
$cgi->virtual_host ();
i
QUOTE |
Either linear rule string manipulation or regular expression would suit just fine.
Thanks for reading.
|
--
William Tasso
Jul 12 2005, 03:27 PM
Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ignoramus31199 <[Email Removed]> said:
QUOTE |
...
$cgi->virtual_host ();
|
thanks - I can get the hostname, it's the domain name I'm after.
--
William Tasso
** Business as usual
Ignoramus31199
Jul 12 2005, 03:36 PM
On Tue, 12 Jul 2005 17:27:16 +0100, William Tasso <[Email Removed]> wrote:
QUOTE |
Writing in news:alt.www.webmaster From the safety of the cafeteria Ignoramus31199 <[Email Removed]> said:
...
$cgi->virtual_host ();
thanks - I can get the hostname, it's the domain name I'm after.
|
you mean, say the hostname is www1.domain.com, you want to extract
domain.com?
I would do a "split", and then "pop" the last two items. It is good
except for countries where domains are like domain.co.uk. perhaps if
the second pop is a 'co' or some such, you could pop one more item.
i
--
Matt Probert
Jul 12 2005, 03:43 PM
Once upon a time, far far away "William Tasso"
<[Email Removed]> muttered
QUOTE |
Greetings One and All
|
And pop-pickers?
QUOTE |
Been pondering domain names.
|
Everyone needs a hobby....
QUOTE |
Do you have an algo (that you don't mind sharing) that can pull the domain name from any of the likely strings (host/server name or URL) available to the (web) server?
Either linear rule string manipulation or regular expression would suit just fine.
|
Can you be a bit more explicit.
QUOTE |
Thanks for reading.
|
That's okay. The invoice is in the post.
Matt Probert
QUOTE |
** Business as usual
|
Same as always
Gandalf Parker
Jul 12 2005, 04:01 PM
"William Tasso" <[Email Removed]> wrote in news:op.sts6nqcvm9g4qz-
[Email Removed]:
QUOTE |
$cgi->virtual_host ();
thanks - I can get the hostname, it's the domain name I'm after.
|
Oh you mean you just want to extract the part that would be used for
emailing? Like setting a spider to gather possible email addresses?
I think thats been done quite abit but not openly released for obvious
reasons
Gandalf Parker
GreyWyvern
Jul 12 2005, 04:10 PM
And lo, Matt Probert didst speak in alt.www.webmaster:
QUOTE |
Once upon a time, far far away "William Tasso" <[Email Removed]> muttered
Either linear rule string manipulation or regular expression would suit just fine.
Can you be a bit more explicit.
|
William is explicit enough without you asking him to be more so! :P Next
thing you know you'll be asking him to take off that Speedo of his...
Grey
--
The technical axiom that nothing is impossible sinisterly implies the
pitfall corollary that nothing is ridiculous.
-
http://www.greywyvern.com/ringmaker - Orca Ringmaker: Host a web ring
from your website!
William Tasso
Jul 12 2005, 04:39 PM
Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ignoramus31199 <[Email Removed]> said:
QUOTE |
... you mean, say the hostname is www1.domain.com, you want to extract domain.com?
I would do a "split", and then "pop" the last two items. It is good except for countries where domains are like domain.co.uk. perhaps if the second pop is a 'co' or some such, you could pop one more item.
|
Yes - that is exactly the issue, consider these hosts ...
www.example.com
example.com
office.example.com
admin.office.example.com
www.example.co.uk
etc...
www.example.uk.net
etc...
--
William Tasso
** Business as usual
William Tasso
Jul 12 2005, 04:39 PM
Writing in news:alt.www.webmaster
From the safety of the GreyWyvern.com cafeteria
GreyWyvern <[Email Removed]> said:
QUOTE |
And lo, Matt Probert didst speak in alt.www.webmaster:
Once upon a time, far far away "William Tasso" <[Email Removed]> muttered
Either linear rule string manipulation or regular expression would suit just fine.
Can you be a bit more explicit.
William is explicit enough without you asking him to be more so! :P Next thing you know you'll be asking him to take off that Speedo of his...
|
Ugh - as always, the invitation will be politely refused.
--
William Tasso
** Business as usual
William Tasso
Jul 12 2005, 04:44 PM
Writing in news:alt.www.webmaster
From the safety of the cafeteria
Gandalf Parker <[Email Removed]> said:
QUOTE |
"William Tasso" <[Email Removed]> wrote in news:op.sts6nqcvm9g4qz- [Email Removed]:
$cgi->virtual_host ();
thanks - I can get the hostname, it's the domain name I'm after.
Oh you mean you just want to extract the part that would be used for emailing?
|
well not really
[Email Removed]
and
[Email Removed]
may both be valid mail addresses. I'm only interested in the domain.
QUOTE |
Like setting a spider to gather possible email addresses?
|
Heh he - no need these hosts/domains already sit nicely on my servers.
I'm writing a generic component that incidentally needs to know the domain
name regadless of hostname.
QUOTE |
I think thats been done quite abit but not openly released for obvious reasons
|
except that in the case of mail spiders there is no incentive to get it
right.
--
William Tasso
** Business as usual
John Bokma
Jul 12 2005, 05:30 PM
"William Tasso" <[Email Removed]> wrote:
QUOTE |
Writing in news:alt.www.webmaster From the safety of the cafeteria Ignoramus31199 <[Email Removed]> said:
... you mean, say the hostname is www1.domain.com, you want to extract domain.com?
I would do a "split", and then "pop" the last two items. It is good except for countries where domains are like domain.co.uk. perhaps if the second pop is a 'co' or some such, you could pop one more item.
Yes - that is exactly the issue, consider these hosts ...
www.example.com example.com office.example.com admin.office.example.com
www.example.co.uk etc...
www.example.uk.net etc...
|
Asked this some time ago in the Perl group, and uhm.. didn't get a nice
reply back :-) There is no simple rule to do this. There are also things
like example.uk.com iirc.
--
John Perl SEO tools:
http://johnbokma.com/perl/ Experienced (web) developer:
http://castleamber.com/Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html
Ignoramus31199
Jul 12 2005, 05:32 PM
On 12 Jul 2005 18:30:42 GMT, John Bokma <[Email Removed]> wrote:
QUOTE |
Asked this some time ago in the Perl group, and uhm.. didn't get a nice reply back :-)
|
Nice reply from a perl group? That's unrealistic... :)
i
John Bokma
Jul 12 2005, 06:33 PM
Ignoramus31199 <[Email Removed]> wrote:
QUOTE |
On 12 Jul 2005 18:30:42 GMT, John Bokma <[Email Removed]> wrote: Asked this some time ago in the Perl group, and uhm.. didn't get a nice reply back :-)
Nice reply from a perl group? That's unrealistic... :)
|
Nah, often the not so nice replies are justified, but in this case no. (And
I am not saying that because I asked :-D )
--
John Perl SEO tools:
http://johnbokma.com/perl/ Experienced (web) developer:
http://castleamber.com/Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html
Ed Wurster
Jul 12 2005, 07:55 PM
William Tasso wrote:
QUOTE |
Greetings One and All
Been pondering domain names.
Do you have an algo (that you don't mind sharing) that can pull the domain name from any of the likely strings (host/server name or URL) available to the (web) server?
Either linear rule string manipulation or regular expression would suit just fine.
|
Comment 32 on this page looks promising:
http://tinyurl.com/c76d6Can't say that I understand it, but would like to.
Ed
William Tasso
Jul 12 2005, 09:05 PM
Writing in news:alt.www.webmaster
From the safety of the cafeteria
Ed Wurster <[Email Removed]> said:
Yes, but so far as I can tell (I still don't parse regular expressions in
my head) it still fails when there is no hostname preceeding the domain
name. e.g.
http://example.com would return ".com" - I think
QUOTE |
Can't say that I understand it, but would like to.
|
It's a long road, but regular expressions can save a heap of scripting.
--
William Tasso
** Business as usual
John Bokma
Jul 12 2005, 09:22 PM
"William Tasso" <[Email Removed]> wrote:
QUOTE |
Writing in news:alt.www.webmaster From the safety of the cafeteria Ed Wurster <[Email Removed]> said:
... Comment 32 on this page looks promising:
http://tinyurl.com/c76d6
Yes, but so far as I can tell (I still don't parse regular expressions in my head) it still fails when there is no hostname preceeding the domain name. e.g. http://example.com would return ".com" - I think
Can't say that I understand it, but would like to.
It's a long road, but regular expressions can save a heap of scripting.
|
for ( $domain ) {
# remove sub domain
s/www?d?.([^.]+.)/$1/g; # common ww(w)(d) prefix
s/.+.([^.]+.[a-z]{3})$/$1/g;
s/.+.([^.]+.[^.]+.[a-z]{2})$/$1/g;
}
is what I used for a project. Note: it's far from perfect :-D.
--
John Perl SEO tools:
http://johnbokma.com/perl/ Experienced (web) developer:
http://castleamber.com/Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html
William Tasso
Jul 12 2005, 10:17 PM
Writing in news:alt.www.webmaster
From the safety of the Castle Amber - software development cafeteria
John Bokma <[Email Removed]> said:
QUOTE |
... for ( $domain ) {
# remove sub domain s/www?d?.([^.]+.)/$1/g; # common ww(w)(d) prefix s/.+.([^.]+.[a-z]{3})$/$1/g; s/.+.([^.]+.[^.]+.[a-z]{2})$/$1/g; }
|
thanks, but ...
QUOTE |
is what I used for a project. Note: it's far from perfect :-D.
|
yes, having thought on it some more I'm coming to the conclusion it cannot
be done purely with logic.
As these domains are all known to me I think I'll have to do a match
against a data store - splitting the hostname and starting at the back,
adding elements till I find a match.
Really can't think of anything else that is guaranteed to work (for an
acceptable value of 'work')
--
William Tasso
** Business as usual
Norman L. DeForest
Jul 13 2005, 01:22 AM
On 12 Jul 2005, John Bokma wrote:
QUOTE |
"William Tasso" <[Email Removed]> wrote:
Writing in news:alt.www.webmaster From the safety of the cafeteria Ed Wurster <[Email Removed]> said:
... Comment 32 on this page looks promising:
http://tinyurl.com/c76d6
Yes, but so far as I can tell (I still don't parse regular expressions in my head) it still fails when there is no hostname preceeding the domain name. e.g. http://example.com would return ".com" - I think
Can't say that I understand it, but would like to.
It's a long road, but regular expressions can save a heap of scripting.
for ( $domain ) {
# remove sub domain s/www?d?.([^.]+.)/$1/g; # common ww(w)(d) prefix s/.+.([^.]+.[a-z]{3})$/$1/g; s/.+.([^.]+.[^.]+.[a-z]{2})$/$1/g; }
is what I used for a project. Note: it's far from perfect :-D.
|
What would it do with this?
http://www.www.co.uk/What about sites that use "www." and "www2." and "www3." and ... ?
--
Can you Change: MINDWORKS to MINDWORKS (* == Book)
*HALIFAX HALIFAX*
in 76 moves? Try
http://www.chebucto.ns.ca/~af380/MHPuzzle.html(Requires a browser supporting the W3C DOM such as Firefox or IE ver 6)
John Bokma
Jul 13 2005, 03:51 PM
"Norman L. DeForest" <[Email Removed]> wrote:
QUOTE |
On 12 Jul 2005, John Bokma wrote:
"William Tasso" <[Email Removed]> wrote:
Writing in news:alt.www.webmaster From the safety of the cafeteria Ed Wurster <[Email Removed]> said:
... Comment 32 on this page looks promising:
http://tinyurl.com/c76d6
Yes, but so far as I can tell (I still don't parse regular expressions in my head) it still fails when there is no hostname preceeding the domain name. e.g. http://example.com would return ".com" - I think
Can't say that I understand it, but would like to.
It's a long road, but regular expressions can save a heap of scripting.
for ( $domain ) {
# remove sub domain s/www?d?.([^.]+.)/$1/g; # common ww(w)(d) prefix s/.+.([^.]+.[a-z]{3})$/$1/g; s/.+.([^.]+.[^.]+.[a-z]{2})$/$1/g; }
is what I used for a project. Note: it's far from perfect :-D.
What would it do with this? http://www.www.co.uk/
|
Give you www.co.uk (as far as I can see).
It has a flaw though: s/www?d?.([^.]+.)/$1/g; should be written as:
s/^www?d?.([^.]+.)/$1/g
otherwise foo.blawww.com gives odd results.
QUOTE |
What about sites that use "www." and "www2." and "www3." and ... ?
|
stripped off. If you use the above fix, only when it starts with those.
It also handles ww (some people use ww.example.com). If you also want to
handle wwww, you might want to use:
s/^w{2,4}d?.([^.]+.)/$1/g;
But again, this is far from perfect, and there are probably a lot of
examples which make this fail.
--
John Perl SEO tools:
http://johnbokma.com/perl/ Experienced (web) developer:
http://castleamber.com/Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.