You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

92 lines
4.0 KiB

  1. <?php
  2. // Project: Web Reference Database (refbase) <http://www.refbase.net>
  3. // Copyright: Matthias Steffens <mailto:refbase@extracts.de> and the file's
  4. // original author(s).
  5. //
  6. // This code is distributed in the hope that it will be useful,
  7. // but WITHOUT ANY WARRANTY. Please see the GNU General Public
  8. // License for more details.
  9. //
  10. // File: ./includes/transtab_unicode_charset.inc.php
  11. // Repository: $HeadURL$
  12. // Author(s): Matthias Steffens <mailto:refbase@extracts.de>
  13. //
  14. // Created: 24-Jul-08, 17:00
  15. // Modified: $Date: 2012-02-29 00:20:51 +0000 (Wed, 29 Feb 2012) $
  16. // $Author$
  17. // $Revision: 1351 $
  18. // Search & replace patterns and variables for matching (and conversion of) Unicode character case & classes.
  19. // Search & replace patterns must be specified as perl-style regular expression and search patterns must include the
  20. // leading & trailing slashes.
  21. // NOTE: Quote from <http://www.onphp5.com/article/22> ("i18n with PHP5: Pitfalls"):
  22. // "PCRE and other regular expression extensions are not locale-aware. This most notably influences the \w class
  23. // that is unable to work for Cyrillic letters. There could be a workaround for this if some preprocessor for the
  24. // regex string could replace \w and friends with character range prior to calling PCRE functions."
  25. //
  26. // In case of a UTF-8 based system, Unicode character properties ("\p{...}" or "\P{...}") can be used instead of the
  27. // normal and POSIX character classes. These are available since PHP 4.4.0 and PHP 5.1.0. Note that the use of Unicode
  28. // properties requires the "/.../u" PCRE pattern modifier! More info:
  29. // <http://www.php.net/manual/en/regexp.reference.unicode.php>
  30. // The variables '$alnum', '$alpha', '$cntrl', '$dash', '$digit', '$graph', '$lower', '$print', '$punct', '$space',
  31. // '$upper', '$word' must be used within a perl-style regex character class and require the "/.../u" PCRE pattern modifier.
  32. // Matches Unicode letters & digits:
  33. $alnum = "\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}"; // Unicode-aware equivalent of "[:alnum:]"
  34. // Matches Unicode letters:
  35. $alpha = "\p{Ll}\p{Lu}\p{Lt}\p{Lo}"; // Unicode-aware equivalent of "[:alpha:]"
  36. // Matches Unicode control codes & characters not in other categories:
  37. $cntrl = "\p{C}"; // Unicode-aware equivalent of "[:cntrl:]"
  38. // Matches Unicode dashes & hyphens:
  39. $dash = "\p{Pd}";
  40. // Matches Unicode digits:
  41. $digit = "\p{Nd}"; // Unicode-aware equivalent of "[:digit:]"
  42. // Matches Unicode printing characters (excluding space):
  43. $graph = "^\p{C}\t\n\f\r\p{Z}"; // Unicode-aware equivalent of "[:graph:]"
  44. // Matches Unicode lower case letters:
  45. $lower = "\p{Ll}"; // Unicode-aware equivalent of "[:lower:]"
  46. // Matches Unicode printing characters (including space):
  47. $print = "\P{C}"; // same as "^\p{C}", Unicode-aware equivalent of "[:print:]"
  48. // Matches Unicode punctuation (printing characters excluding letters & digits):
  49. $punct = "\p{P}"; // Unicode-aware equivalent of "[:punct:]"
  50. // Matches Unicode whitespace (separating characters with no visual representation):
  51. $space = "\t\n\f\r\p{Z}"; // Unicode-aware equivalent of "[:space:]"
  52. // Matches Unicode upper case letters:
  53. $upper = "\p{Lu}\p{Lt}"; // Unicode-aware equivalent of "[:upper:]"
  54. // Matches Unicode "word" characters:
  55. $word = "_\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}"; // Unicode-aware equivalent of "[:word:]" (or "[:alnum:]" plus "_")
  56. // Defines the PCRE pattern modifier(s) to be used in conjunction with the above variables:
  57. // More info: <http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php>
  58. $patternModifiers = "u"; // the "u" (PCRE_UTF8) pattern modifier causes PHP/PCRE to treat pattern strings as UTF-8
  59. // Converts Unicode upper case letters to their corresponding lower case letter:
  60. // TODO!
  61. $transtab_upper_lower = array(
  62. );
  63. // Converts Unicode lower case letters to their corresponding upper case letter:
  64. // TODO!
  65. $transtab_lower_upper = array(
  66. );
  67. ?>