XML (eXtensible Markup Language) is a data format for structured
document interchange on the Web. It is a standard defined by
The World Wide Web consortium (W3C). Information about XML and
related technologies can be found at » http://www.w3.org/XML/.
This PHP extension implements support for James Clark's
expat in PHP. This toolkit lets you
parse, but not validate, XML documents. It supports three
source character encodings
also provided by PHP: US-ASCII,
ISO-8859-1 and UTF-8.
UTF-16 is not supported.
This extension lets you create XML parsers
and then define handlers for different XML
events. Each XML parser also has a few parameters you
can adjust.
Anforderungen
This extension uses an expat compat layer by
default. It can use also expat, which can
be found at » http://www.jclark.com/xml/expat.html. The
Makefile that comes with expat does not build a library by
default, you can use this make rule for that:
These functions are enabled by default, using the bundled expat library.
You can disable XML support with
--disable-xml.
If you compile PHP as a module for Apache 1.3.9 or later, PHP will
automatically use the bundled expat library from
Apache. In order you don't want to use the bundled expat library configure
PHP --with-expat-dir=DIR, where DIR should
point to the base installation directory of expat.
Die Windowsversion von PHP enthält diese
Erweiterung. Um diese Funktionen zu verwenden, müssen Sie keine zusätzlichen
Erweiterungen aktivieren.
Laufzeit Konfiguration
Diese Erweiterung definiert keine Konfigurationseinstellungen in der php.ini.
Resource Typen
xml
The xml resource as returned by
xml_parser_create() and
xml_parser_create_ns() references an xml
parser instance to be used with the functions provided by this
extension.
Vordefinierte Konstanten
Folgende Konstanten werden von dieser
Erweiterung definiert und stehen nur zur Verfügung, wenn die Erweiterung entweder
statisch in PHP kompiliert oder dynamisch zur Laufzeit geladen wurde.
Character data is roughly all the non-markup contents of
XML documents, including whitespace between tags. Note
that the XML parser does not add or remove any whitespace,
it is up to the application (you) to decide whether
whitespace is significant.
PHP programmers should be familiar with processing
instructions (PIs) already. <?php ?> is a processing
instruction, where php is called
the "PI target". The handling of these are
application-specific, except that all PI targets starting
with "XML" are reserved.
This handler is called when the XML parser finds a
reference to an external parsed general entity. This can
be a reference to a file or URL, for example. See the external entity
example for a demonstration.
Case Folding
The element handler functions may get their element names
case-folded. Case-folding is defined by
the XML standard as "a process applied to a sequence of
characters, in which those identified as non-uppercase are
replaced by their uppercase equivalents". In other words, when
it comes to XML, case-folding simply means uppercasing.
By default, all the element names that are passed to the handler
functions are case-folded. This behaviour can be queried and
controlled per XML parser with the
xml_parser_get_option() and
xml_parser_set_option() functions,
respectively.
Error Codes
The following constants are defined for XML error codes (as
returned by xml_parse()):
XML_ERROR_NONE
XML_ERROR_NO_MEMORY
XML_ERROR_SYNTAX
XML_ERROR_NO_ELEMENTS
XML_ERROR_INVALID_TOKEN
XML_ERROR_UNCLOSED_TOKEN
XML_ERROR_PARTIAL_CHAR
XML_ERROR_TAG_MISMATCH
XML_ERROR_DUPLICATE_ATTRIBUTE
XML_ERROR_JUNK_AFTER_DOC_ELEMENT
XML_ERROR_PARAM_ENTITY_REF
XML_ERROR_UNDEFINED_ENTITY
XML_ERROR_RECURSIVE_ENTITY_REF
XML_ERROR_ASYNC_ENTITY
XML_ERROR_BAD_CHAR_REF
XML_ERROR_BINARY_ENTITY_REF
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
XML_ERROR_MISPLACED_XML_PI
XML_ERROR_UNKNOWN_ENCODING
XML_ERROR_INCORRECT_ENCODING
XML_ERROR_UNCLOSED_CDATA_SECTION
XML_ERROR_EXTERNAL_ENTITY_HANDLING
Character Encoding
PHP's XML extension supports the » Unicode character set through
different character encodings. There are
two types of character encodings, source
encoding and target encoding.
PHP's internal representation of the document is always encoded
with UTF-8.
Source encoding is done when an XML document is parsed. Upon creating an XML
parser, a source encoding can be specified (this encoding
can not be changed later in the XML parser's lifetime). The
supported source encodings are ISO-8859-1,
US-ASCII and UTF-8. The
former two are single-byte encodings, which means that each
character is represented by a single byte.
UTF-8 can encode characters composed by a
variable number of bits (up to 21) in one to four bytes. The
default source encoding used by PHP is
ISO-8859-1.
Target encoding is done when PHP passes data to XML handler
functions. When an XML parser is created, the target encoding
is set to the same as the source encoding, but this may be
changed at any point. The target encoding will affect character
data as well as tag names and processing instruction targets.
If the XML parser encounters characters outside the range that
its source encoding is capable of representing, it will return
an error.
If PHP encounters characters in the parsed XML document that can
not be represented in the chosen target encoding, the problem
characters will be "demoted". Currently, this means that such
characters are replaced by a question mark.
Beispiele
Here are some example PHP scripts parsing XML documents.
XML Element Structure Example
This first example displays the structure of the start elements in
a document with indentation.
Example#1 Show XML Element Structure
<?php $file = "data.xml"; $depth = array();
function startElement($parser, $name, $attrs) { global $depth; for ($i = 0; $i < $depth[$parser]; $i++) { echo " "; } echo "$name\n"; $depth[$parser]++; }
function endElement($parser, $name) { global $depth; $depth[$parser]--; }
$xml_parser = xml_parser_create(); xml_set_element_handler($xml_parser, "startElement", "endElement"); if (!($fp = fopen($file, "r"))) { die("could not open XML input"); }
while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser); ?>
XML Tag Mapping Example
Example#2 Map XML to HTML
This example maps tags in an XML document directly to HTML
tags. Elements not found in the "map array" are ignored. Of
course, this example will only work with a specific XML
document type.
function startElement($parser, $name, $attrs) { global $map_array; if (isset($map_array[$name])) { echo "<$map_array[$name]>"; } }
function endElement($parser, $name) { global $map_array; if (isset($map_array[$name])) { echo "</$map_array[$name]>"; } }
function characterData($parser, $data) { echo $data; }
$xml_parser = xml_parser_create(); // use case-folding so we are sure to find the tag in $map_array xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true); xml_set_element_handler($xml_parser, "startElement", "endElement"); xml_set_character_data_handler($xml_parser, "characterData"); if (!($fp = fopen($file, "r"))) { die("could not open XML input"); }
while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser); ?>
XML External Entity Example
This example highlights XML code. It illustrates how to use an
external entity reference handler to include and parse other
documents, as well as how PIs can be processed, and a way of
determining "trust" for PIs containing code.
XML documents that can be used for this example are found below
the example (xmltest.xml and
xmltest2.xml.)
Example#3 External Entity Example
<?php $file = "xmltest.xml";
function trustedFile($file) { // only trust local files owned by ourselves if (!eregi("^([a-z]+)://", $file) && fileowner($file) == getmyuid()) { return true; } return false; }
function startElement($parser, $name, $attribs) { echo "<<font color=\"#0000cc\">$name</font>"; if (count($attribs)) { foreach ($attribs as $k => $v) { echo " <font color=\"#009900\">$k</font>=\"<font color=\"#990000\">$v</font>\""; } } echo ">"; }
function endElement($parser, $name) { echo "</<font color=\"#0000cc\">$name</font>>"; }
function characterData($parser, $data) { echo "<b>$data</b>"; }
function PIHandler($parser, $target, $data) { switch (strtolower($target)) { case "php": global $parser_file; // If the parsed document is "trusted", we say it is safe // to execute PHP code inside it. If not, display the code // instead. if (trustedFile($parser_file[$parser])) { eval($data); } else { printf("Untrusted PHP code: <i>%s</i>", htmlspecialchars($data)); } break; } }