WorkTheWeb Forums > Regular Expressions

WorkTheWeb Forums > Webmaster Resources > Webmaster - General Help

Dylan Parry

Jun 28 2005, 09:21 AM

Hi folks,

Does anyone know of quick way to match [A-Za-z0-9] and *all* letters
with accents, eg. and ?

Thanks,

--
Dylan Parry
http://webpageworkshop.co.uk -- FREE Web tutorials and references

Norman L. DeForest

Jun 28 2005, 02:09 PM

On Tue, 28 Jun 2005, Dylan Parry wrote:

QUOTE

Hi folks,

Does anyone know of quick way to match [A-Za-z0-9] and *all* letters
with accents, eg. and ?

Thanks,

What character set? What encoding?

The expressions needed for ISO-8859-1 would be a lot different than the
expressions needed for ISO-8859-10 and a completely different (and
extremely large) expression would be needed for UTF-8.

Finding '' ('O'-accute in ISO-8859-1) may be easy if the encoding is
plain text but (depending on context and encoding) you may also need to
search for '%D3' or =D3 or 'Ó' or 'Ó' or 'Ó' or 'Ó'
(hex C3,93 == UTF-8 encoding for D3).

Would you want to include non-Latin letters such as Cyrillic or Hebrew or
Arabic or Greek or Chinese letters or just accented *Latin* characters?
My Unicode charts also include a number of alternate forms for digits
such as the Arabic digits '۰' to '۹'. Would you want them
included as digits?

This should match all digits and letters in the ISO-8859-1 character set
unless they are escaped somehow ('%xx', '=xx', '&#nn;', etc.):
[0-9A-Za-z---]
and for CP1252, this would include a few more accented characters that are
control codes in ISO-8859-1:
[0-9A-Za-z---]

Depending on the software you are using, you may have to substitute
"ddd" for each high character with 'ddd' being the octal value of the
character or 'xdd' with 'dd' being the hexadecimal value.

--
Windows is *not* a "Toy OS".
/me desperately trying to hide the URL for the screenshot of my desktop
http://www.chebucto.ns.ca/~af380/temp/MyDe...Jun-22-2005.gif

John Bokma

Jun 28 2005, 04:10 PM

Dylan Parry <[Email Removed]> wrote:

QUOTE

Hi folks,

Does anyone know of quick way to match [A-Za-z0-9] and *all* letters
with accents, eg. and ?

Which language?

--
John Perl SEO tools: http://johnbokma.com/perl/
Experienced (web) developer: http://castleamber.com/
Get a SEO report of your site for just 100 USD:
http://johnbokma.com/websitedesign/seo-expert-help.html

Toby Inkster

Jun 28 2005, 05:14 PM

Dylan Parry wrote:

QUOTE

Does anyone know of quick way to match [A-Za-z0-9] and *all* letters
with accents, eg. ü and é?

In what language and what character set? Perl has quite good built-in
support for Unicode, so if you're using Perl and Unicode, should be as
simple as "w". See "man perlunicode" for details.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Dylan Parry

Jun 28 2005, 05:29 PM

Using a pointed stick and pebbles, John Bokma scraped:

QUOTE

Does anyone know of quick way to match [A-Za-z0-9] and *all* letters
with accents, eg. and ?

Which language?

PHP. I've settled, for now, on a string of acceptable foreign
characters, but the result is a bloody ugly looking regexp!

--
Dylan Parry
http://webpageworkshop.co.uk -- FREE Web tutorials and references

PHP Help | Linux Help | Web Hosting | Reseller Hosting | SSL Hosting

This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.