What character set? What encoding?
The expressions needed for ISO-8859-1 would be a lot different than the
expressions needed for ISO-8859-10 and a completely different (and
extremely large) expression would be needed for UTF-8.
Finding '' ('O'-accute in ISO-8859-1) may be easy if the encoding is
plain text but (depending on context and encoding) you may also need to
search for '%D3' or =D3 or 'Ó' or 'Ó' or 'Ó' or 'Ó'
(hex C3,93 == UTF-8 encoding for D3).
Would you want to include non-Latin letters such as Cyrillic or Hebrew or
Arabic or Greek or Chinese letters or just accented *Latin* characters?
My Unicode charts also include a number of alternate forms for digits
such as the Arabic digits '۰' to '۹'. Would you want them
included as digits?
This should match all digits and letters in the ISO-8859-1 character set
unless they are escaped somehow ('%xx', '=xx', '&#nn;', etc.):
[0-9A-Za-z---]
and for CP1252, this would include a few more accented characters that are
control codes in ISO-8859-1:
[0-9A-Za-z---]
Depending on the software you are using, you may have to substitute
"ddd" for each high character with 'ddd' being the octal value of the
character or 'xdd' with 'dd' being the hexadecimal value.
--
Windows is *not* a "Toy OS".
/me desperately trying to hide the URL for the screenshot of my desktop
http://www.chebucto.ns.ca/~af380/temp/MyDe...Jun-22-2005.gif