Tag Archives: Character

XML Character Encoding

Source Encoding

• Conducted at time of parsing
• Cannot be changed during parser lifetime
• Types

  1. UTF-8 (php uses this type for internal document representation; bytes up to 21)
  2. US-ASCII (single byte)
  3. ISO-8859-1 (single byte; default)

Target Encoding

• Conducted at time of php passing data to xml handlers
• Target encoding initially set to same as source encoding
• Can be changed at any time

Characters not capable of source encoding cause an error

Characters not capable of target encoding are demoted (to “?”)