Basic use of SimpleXML. XML parsing to disassemble the XML file

30.03.2021

the Internet

The expandable XML markup language is a set of rules for coding documents in computer-readable form. XML is a popular video data exchange format. Sites that often update their content, such as news sites or blogs, often provide an XML channel for external programs to be aware of content changes. Sending and analysis of XML data is a common task for network connection applications. This lesson explains how to analyze the XML documents and use their data.

Select the syntactic analyzer

Channel Analysis

The first step in the analysis of the channel is the decision about which fields are you interested in what fields. The analyzer extracts the specified fields and ignores everything else.

Here is a channel fragment that will understand the example of the application. Each post on stackoverflow.com appears in a channel as an entry tag that contains several nested tags:

newest Questions Tagged Android - Stack Overflow ... ... http://stackoverflow.com/q/9439999. 0 Where is My Data File? cliff2310. http://stackoverflow.com/users/1128925 2012-02-25T00: 30: 54Z 2012-02-25T00: 30: 54Z

I Have An Application That Requires a Data File ...

... ...

An example of the application retrieves data from the Entry tag and its nested tags Title, Link, and Summary.

Creating an instance of a syntactic analyzer

The next step is to create an instance of the syntactic analyzer and start the parsing process. In this fragment, the analyzer is initialized so as not to handle namespaces, as well as use the inputstream as input data. The parsing process starts using the NEXTAG () call and calls a readfeed () method that retrieves and processes the data in which the application is interested:

Public class StackOverflowXmlParser (// We don "t use namespaces private static final String ns \u003d null; public List parse (InputStream in) throws XmlPullParserException, IOException (try (XmlPullParser parser \u003d Xml.newPullParser (); parser.setFeature (XmlPullParser.FEATURE_PROCESS_NAMESPACES , false); parser.setInput (in, null); parser.nextTag (); Return Readfeed (Parser);) Finally (in.close ();)) ...)

Remove the canal

readfeed () The method makes actual channel processing work. The elements marked with the "Entry" tag are the starting point for the recursive channel processing. If the next tag is not an entry tag, it is skipped. After the whole "tape" was recursively processed, readfeed () Returns the List containing records (including nested data elements), which are removed from the channel. This list is then returned by the analyzer.

Private List readFeed (XmlPullParser parser) throws XmlPullParserException, IOException (List entries \u003d new ArrayList (); parser.require (XmlPullParser.START_TAG, ns, "feed"); while (parser.next () \u003d XmlPullParser.END_TAG) (if! (Parser.geteventType ()! \u003d xmlpullparser.start_tag) (Continue;) String name \u003d parser.getname (); // Starts by looking for the entry tag if (name.equals ("Entry") (entries.add ( ReadEntry (Parser));) ELSE (SKIP (PARSER);)) Return entries;)

XML parsing

Steps to parse XML channels are as follows:

This fragment shows how the analyzer analyzes Entry, Title, Link, and Summary.

Public Stritic Class Entry (Public Final String Title; Public Final String Link; Public Final String Summary; Private Entry (String Title, String Summary, String Link) (this.title \u003d title; this.summary \u003d Summary; this.link \u003d link ;)) // Parses The Contents of An Entry. If IT Encounters a Title, Summary, OR Link Tag, Hands Them Off // to Their ReSpective "Read" Methods for Processing. Otherwise, Skips The Tag. Private Entry Readentry (XMLPullParser Parser) Throws XMLPullParserException, IoException (Parser.Require (xmlpullparser.start_tag, ns, "entry"); String title \u003d NULL; STRING SUMMARY \u003d NULL; STRING LINK \u003d NULL; While (Parser.Next ()! \u003d Xmlpullparser.end_tag) (if (Parser.geteventType ()! \u003d Xmlpullparser.start_tag) (Continue;) String Name \u003d Parser.getname (); if (name.equals ("title")) (title \u003d ReadTitle (Parser) ;) ELSE if (name.equals ("summary")) (Summary \u003d ReadSummary (Parser);) ELSE if (name.equals ("link")) (Link \u003d ReadLink (Parser);) ELSE (Skip (Parser) ;)) Return New Entry (Title, Summary, Link);) // Processes Title Tags in the Feed. private String readTitle (XmlPullParser parser) throws IOException, XmlPullParserException (parser.require (XmlPullParser.START_TAG, ns, "title"); String title \u003d readText (parser); parser.require (XmlPullParser.END_TAG, ns, "title"); Return title;) // Proceses Link Tags in the Feed. private String readLink (XmlPullParser parser) throws IOException, XmlPullParserException (String link \u003d ""; parser.require (XmlPullParser.START_TAG, ns, "link"); String tag \u003d parser.getName (); String relType \u003d parser.getAttributeValue (null , "REL"); if (tag.equals ("link")) (if (reeltpe.equals ("alternate")) (link \u003d parser.getattributevalue (NULL, "HREF"); parser.nextTag ();) ) Parser.Require (xmlpullparser.end_tag, ns, "link"); Return link;) // Processes Summary Tags in the Feed. private String readSummary (XmlPullParser parser) throws IOException, XmlPullParserException (parser.require (XmlPullParser.START_TAG, ns, "summary"); String summary \u003d readText (parser); parser.require (XmlPullParser.END_TAG, ns, "summary"); Return Summary;) // for the Tags Title and Summary, Extracts Their Text Values. PRIVATE STRING READTEXT THROWS IOEXCEPTION, XMLPULLPARSEREXCEPTION (STRING RESULT \u003d ""; if (parser.next () \u003d\u003d xmlpullparser.text) (result \u003d parser.gettext (); parser.nexttag ();) Return Result; ) ...)

Skip the elements that you do not need

One of the steps of the XML parsing described above, the syntax analyzer skips tags in which we are not interested. The SKIP () method of the method of the syntax analyzer SKIP () method is presented below:

PRIVATE VOID SKIP (XMLPULLPARSER PARSER) Throws XMLPullParserException, IoException (If (Parser.geteventType ()! \u003d Xmlpullparser.start_tag) (Throw New illegalstateException ();) int depth \u003d 1; while (depth! \u003d 0) (Switch (Parser. next ()) (Case xmlpullparser.end_tag: depth--; break; case xmlpullparser.start_tag: depth ++; break;)))

That's how it works:

The method generates an exception if the current event is not start_tag.
It consumes start_tag, and all events, up to END_TAG.
To make sure that it stops on the right end_tag, and not on the first oncoming tag after the original start_tag, it tracks the depth of nesting.

Thus, if the current element has nested elements, the Depth value will not be 0 until the analyzer will process all events between the original start_tag and its corresponding end_tag. For example, consider how the analyzer misses an element that has 2 nested elements and :

At the first pass through the While cycle, the next tag that the analyzer meets after This is start_tag for
In the second pass through the While cycle, the next tag that meets the analyzer is end_tag
In the third pass through the While cycle, the next tag that meets the analyzer is Start_Tag . DEPTH value increases to 2.
In the fourth pass through the While cycle, the next tag that meets the analyzer is End_Tag . DEPTH value decreases to 1.
On the fifth and last passage by the While cycle, the next tag that meets the analyzer is end_tag . DEPTH value decreases to 0, indicating that The element was successfully missed.

HRL data processing

An example of the application receives and analyzes the XML channel in ASYNCTASK. Processing is performed outside the main user interface stream. When the processing is completed, the application updates the user interface in the main activity (NetworkActivity).

In the fragment below, LoadPage () Method does the following:

Initializes the string variable by the URL value indicating the XML channel.
If the user settings and network connection allow you to call New DownloadXMLTASK (). Execute (URL). This creates a new downloadxmltask object (ASYNCTASK subclass) and executes its execute () method that loads and analyzes the channel and returns a string result that will be displayed in the user interface.

Public Class NetworkActivity EXTENDS Activity (Public Stating Final String WiFi \u003d "Wi-Fi"; Public Static Final String Any \u003d "Any"; Private Static Final String URL \u003d "http://stackoverflow.com/feeds/tag?Tagnames\u003dandroid&sort \u003d Newest "; // Whether There Is A Wi-Fi Connection. Private Static Boolean Wificonnected \u003d False; // Whether there IS A Mobile Connection. Private Static Boolean MobileConnected \u003d False; // Whether the Display Should Be Refreshed. Public Static Boolean RefreshDisplay \u003d True; Public Static String SPREF \u003d NULL; ... // Use Asynctask to download the xml feed from stackoverflow.com. Public void Loadpage () (if ((SPREF.EQUALS (ANY)) && (Wificonnected || MobileConnected )) (new downloadxmltask (). Execute (URL);) ELSE if ((sprf.equals (WiFi)) && (WiFiconnected)) (new downloadxmltask (). Execute (URL);) ELSE (// Show Error))

doinBackGround () performs the LoadxmlFromNetwork () method. It transmits the channel URL as a parameter. The LoadXMlFromNetwork () method receives and processes the channel. When it finishes processing, it transmits the resulting string.
onpostexecute () takes the returned string and displays it in the user interface.

// Implementation of Asynctask Used to Download XML Feed from Stackoverflow.com. Private class downloadxmltask extends asynctask (@Override Protected String Doinbackground (Return Loadxmlfromnetwork (URLS) Catch (Return GetResources (). GetString (R.String.Connection_ERROR);) Catch (XMLPullParserException E) ( Return GetString (). GetString (R.String.xml_error);)) @Override Protected Void onpostexecute (r.layout.main); // Displays The Html String In The Ui Via A WebView WebView MyWebView \u003d (WebView) FindViewByid (R.ID.Webview); MyWebView.loadData (Result, "Text / Html", NULL);))

Below is the LoadxmlFromNetwork () method, which is called from downloadxmltask. He does the following:

Creates an instance of stackoverflowxmlparser. It also creates variables for List Entry objects (entries), and Title, URL, and Summary, for storing values \u200b\u200bextracted from the XML channel for these fields.
Calls uploadurl (), which loads the channel and returns it as inputstream.
Uses stackoverflowxmlparser to parse inputstream. StackoverflowXMLPARSER fills List entries data from the channel.
Processes Entries List, and combines the channel data with HTML markup.
Returns the HTML string displayed in the Main Activity User Interface, ASYNCTASK in the onpostexecute () method.

// Uploads Xml from stackoverflow.com, Parses It, And Combines It WITH // HTML Markup. Returns HTML String. PRIVATE STRING LOADXMLFROMNETWORK (STRING URLSTRING) THROWS XMLPULLPARSEREXCEPTION, IOEXCEPTION (INPUTSTREAM STREAM \u003d NULL; // Instantiate The Parser StackoverflowXMLPARSER stackoverflowxmlparser \u003d new stackoverflowxmlparser (); List Entries \u003d NULL; STRING TITLE \u003d NULL; String URL \u003d NULL; String Summary \u003d NULL; Calendar RightNow \u003d Calendar.getInstance (); DateFormat Formatter \u003d New SimpleDateFormat ("MMM DD H: MMAA"); CHECKS WHETHER THE USER SET THE PREFERENCE TO INCLUDE SUMMARY TEXT SHAREDPREFERENCES SHAREDPREFS \u003d PreferenceEmanager.getDefaultSharedPreferences (this); Boolean Pref \u003d SharedPrefs.getBoolean ("SummaryPref", False); StringBuilder HTMLString \u003d New StringBuilder (); HTMLString.APPEND ("

"+ GetResources (). GetString (R.String.page_title) +"

"); htmlstring.append (" "+ GetResources (). GetString (R.String.updated) +" "+ Formatter.Format (RightNow.gettime ()) +""); try (stream \u003d downloadurl; entries \u003d stackoverflowxmlparser.parse (stream); // Makes Sure That The InputStream Is Closed After The App Is // finished using it.) Finally (if (Stream! \u003d NULL) (stream.close ();)) // StackoverflowXMLPARSER RETURNS A LIST (Called "Entries") of Entry Objects. // Each Entry Object Represents A Single Post in the XML Feed. // This Section Processes The Entries List to Combine Each Entry with HTML Markup. // Each Each Displayed in the UI As A Link That Optionally Includes // A Text Summary. For (Entry Entries) (HTMLString.append ("

"+ ENTRY.TITLE +"

"); // if the user set the preference to include it to the display, // adds it to the display. If (pref) Given A String Representation of A URL, Sets Up A Connection and Gets // An Input Stream. Private InputStream DownloadURL (String URL URL \u003d New URL (URLSTRING); HTTPurlConnection Conn \u003d (httpurlconnection) URL.OpenConnection () ; Conn.SetreadTimeout (10,000 / * Milliseconds * /); Conn.SetConnectTimeout (15000 / * MilliseConds * /); Conn.SetRequestMethod ("Get"); Conn.SetDoinput (True); // Starts The Query Conn.Connect ( ); Return Conn.getInputStream ();)

Some examples of this manual include an XML string. Instead of repeating it in each example, put this string to a file that and turn on in each example. This string is given in the following example. In addition, you can create an XML document and read by its function. simplexml_load_file ().

Example # 1 Example.php file with XML string

$ xmlstr \u003d.<<

PHP: Parser Appearance

MS. Coder.
Onlivia Actora

Mr. Coder.
El act

Thus, this is a language. It is still a programming language. Or
Is it a scripting language? Everything is revealed in this documentary film,
Like a horror movie.

7
5

XML;
?>

Simplexml use very simple! Try to get some string or number from the base XML document.

Example # 2 Obtaining part of the document

include "example.php";

echo $ Movies -\u003e Movie [0] -\u003e plot;
?>

Thus, this is a language. It is still a programming language. Or is it a scripting language? Everything is revealed in this documentary, similar to a horror movie.

In PHP, access the element in an XML document containing invalid characters in the title (for example, a hyphen) is possible by concluding this name of the element in curly brackets and apostrophes.

Example # 3 Obtaining a string

include "example.php";

echo $ MOVIES -\u003e MOVIE -\u003e ("Great-Lines") -\u003e Line;
?>

The result of this example:

PHP solves all my problems in the web

Example # 4 access to non-unique elements in SimpleXML

In the event that there are several instances of child elements in one parent element, then standard iteration methods should be applied.

include "example.php";

$ MOVIES \u003d New Simplexmlelement ($ xmlstr);

/ * For each node , we separate the name separately . */
foreach ($ Movies -\u003e Movie -\u003e Characters -\u003e Character AS $ Character) (
Echo $ Character -\u003e Name, "Plays", $ Character -\u003e Actor, PHP_EOL;
}

The result of this example:

MS. Coder plays Onlivia Actora Mr. Coder Plays El Act

Comment:
Properties ( $ Movies-\u003e Movie In the previous example) are not arrays. This is an iconic object in the form of an array.

Example # 5 Using Attributes

So far, we only received names and values \u200b\u200bof elements. Simplexml can also access the attributes of the element. You can access an element attribute in the same way as an array elements ( array.).

include "example.php";

$ MOVIES \u003d New Simplexmlelement ($ xmlstr);

/ * Access to the node first film.
* Also withdraw the score scale. * /
foreach ($ MOVIES -\u003e MOVIE [0] -\u003e Rating AS $ Rating) (
Switch ((String) $ Rating ["Type"]) ( // Receipt of attributes of an element by index
case "thumbs":
Echo $ Rating, "Thumbs Up";
Break;
Case "Stars":
Echo $ Rating, "Stars";
Break;
}
}
?>

The result of this example:

7 Thumbs Up5 Stars

Example # 6 Comparison of elements and attributes with text

To compare the element or attribute with a string or for transmission to a function as a text, you must bring it to the string using (String). Otherwise, PHP will consider the element as an object.

include "example.php";

$ MOVIES \u003d New Simplexmlelement ($ xmlstr);

if ((String) $ MOVIES -\u003e MOVIE -\u003e TITLE \u003d\u003d "PHP: Parser appearance") {
Print "My favorite movie.";
}

echo HtmlentiTies ((String) $ MOVIES -\u003e MOVIE -\u003e TITLE);
?>

The result of this example:

My favorite movie.php: Parser appearance

Example # 7 Comparison of two elements

Two simplexmlelements elements are considered different, even if they indicate the same object, starting with PHP 5.2.0.

include "example.php";

$ movies1 \u003d new simplexmlelement ($ xmlstr);
$ movies2 \u003d new simplexmlelement ($ xmlstr);
var_dump ($ MOVIES1 \u003d\u003d $ MOVIES2); // False Starting with PHP 5.2.0
?>

The result of this example:

Example # 8 Using XPath

Simplexml includes built-in XPath support. Search for all elements :

include "example.php";

$ MOVIES \u003d New Simplexmlelement ($ xmlstr);

foreach ($ MOVIES -\u003e XPath ("// Character") AS $ Character) (
Echo $ Character -\u003e Name, "Plays", $ Character -\u003e Actor, PHP_EOL;
}
?>

"// "Serves as a template. To specify the absolute path, omit one of the oblique traits.

The result of this example:

MS. Coder plays Onlivia Actora Mr. Coder playing by El Act

Example # 9 Sets values

The data in SimpleXML does not have to be immutable. The object allows you to manipulate all the elements.

include "example.php";
$ MOVIES \u003d New Simplexmlelement ($ xmlstr);

$ Movies -\u003e Movie [0] -\u003e Characters -\u003e Character [0] -\u003e Name \u003d "Miss Coder";

echo $ MOVIES -\u003e ASXML ();
?>

The result of this example:

PHP: Parser Appearance Miss Coder. Onlivia Actora Mr. Coder. El act 7 5

Example # 10 Adding Elements and Attributes

Starting with PHP 5.1.3, Simplexml has the ability to easily add child elements and attributes.

include "example.php";
$ MOVIES \u003d New Simplexmlelement ($ xmlstr);

$ Character \u003d $ Movies -\u003e Movie [0] -\u003e Characters -\u003e AddChild ("Character");
$ Character -\u003e AddChild ("Name", "Mr. Parser");
$ character -\u003e addCHILD ("Actor", "John Doe");

$ Rating \u003d $ MOVIES -\u003e MOVIE [0] -\u003e AddChild ("Rating", "PG");
$ Rating -\u003e AddAttribute ("Type", "MPAA");

echo $ MOVIES -\u003e ASXML ();
?>

The result of this example:

PHP: Parser Appearance MS. Coder. Onlivia Actora Mr. Coder. El act Mr. Parser.John Doe. Thus, this is a language. It is still a programming language. Or is it a scripting language? Everything is revealed in this documentary, similar to a horror movie. PHP solves all my tasks in Web 7 5 Pg.

Example # 11 interaction with DOM

PHP can convert XML nodes from SimpleXML to DOM format and vice versa. This example shows how you can change the DOM element in SimpleXML.

$ Dom \u003d New DomDocument;
$ DOM -\u003e LoadXML ( "nonsense" );
if (! $ DOM) (
Echo. "Error pairing";
exit;
}

$ Books \u003d Simplexml_import_dom ($ DOM);

echo $ Books -\u003e Book [0] -\u003e Title;
?>

The result of this example:

4 Years Ago.

There is a COMMON "TRICK" OFTEN PROPOSED TO CONVERT A SIMPLEXML OBEXT TO AN ARRAY, by Running IT Through JSON_ENCODE () And then JSON_DECODE (). I "D Like to Explain Why This Is a Bad Idea.

Most Simply, Because The Whole Point of Simplexml Is to Be Easier to Use And More Powerful Than A Plain Array. For Instance, You Can Write Bar -\u003e Baz ["Bing"]?\u003e And it Means The Same Thing As Bar [0] -\u003e Baz [0] ["Bing"]?\u003e, Regardless of How Many Bar or Baz Elements There Are in the XML; AND IF You Write Bar [0] -\u003e Baz [0]?\u003e You Get All The String Content of That Node - Including CData Sections - Regardless of Whether IT ALSO HAS CHILD Elements or Attributes. You also Have Access to Namespace Information, The XML, And Even The Ability to "Import" Into a Dom Object, for Much More Powerful Manipulation. All of this Is Lost by Turning The Object Into An Array Rather Than Reading Understanding The Examples on this page.

Additionally, Because It Is Not Designed for This Purpose, The Conversion to Json and Back Will Actually Lose Information in Some Situations. For instance, Any Elements or Attributes in a Namespace Will Simply Be Discarded, And Any Text Content Will Be Discarded If An Element Also Has Children or Attributes. Sometimes, This Won "T Matter, But If You Get In The Habit of Converting Everything to Arrays, IT" S Going To Sting You Eventually.

Of Course, You Could Write A Smarter Conversion, Which Didn "T Havese These Limitations, But At That Point, You are Getting No Value Out Of Simplexml AT ALL, And Should Just Use the Lower Level Xml Parser Functions, Or The XmlReader Class, To Create Your Structure. You Still Won "T Have the Extra Convenience Functionality of Simplexml, But That" s Your Loss.

2 Years Ago.

If Your Xml String Contains Booleans Encoded With "0" and "1", You Will Run Into Problems WHEN YOU CAST THE ELEMENT DIRECTLY TO BOOL:

$ xmlstr \u003d.<<

1
0

XML;
$ values \u200b\u200b\u003d new simplexmlelement ($ xmlstr);
$ TrueValue \u003d (Bool) $ Values-\u003e TrueValue; // True.
$ falseValue \u003d (Bool) $ values-\u003e falsevalue; // ALSO TRUE !!!

Instead You Need to Cast to String or Int First:

$ TRUEVALUE \u003d (BOOL) (INT) $ values-\u003e TrueValue; // True.
$ FalseValue \u003d (BOOL) (int) $ values-\u003e falsevalue; // False

9 Years Ago.

If You Need to Output Valid Xml in Your Response, Don "T Forget to Set Your Header Content Type to XML in addition to Echoing Out the Result of AsXml ():

$ xml \u003d simplexml_load_file ("...");
...
... xml stuff
...

// Output Xml in Your Response:
header ("Content-Type: Text / XML");
echo $ xml -\u003e asxml ();
?>

9 Years Ago.

From the Readme File:

Simplexml is Meant to Be An Easy Way to Access XML Data.

Simplexml Objects Follow Four Basic Rules:

1) Properties Denote Element ITERATORS
2) NUMERIC INDICES DENOTE ELEMENTS
3) NON NUMERIC INDICES DENOTE ATTRIBUTS
4) String Conversion Allow to Access Text Data

WHEN ITERATING PROPERTIES THEN EXTENSION ALWAYS ITERATES OVER
all Nodes with That Element Name. Thus Method Children () Must Be
called to Iterate Over Subnodes. But Also Doing The Following:
foreach ($ OBJ-\u003e NODE_NAME AS $ ELEM) (
// Do Something Wit $ Elem
}
always Results in Iteration of "Node_Name" Elements. SO No Further.
check Is Needed to Distinguish The Number of Nodes of That Type.

WHEN AN ELEMENTS TEXT DATA IS BEING ACCESSED THROUGH A PROPERTY
then The Result Does Not Include The Text Data of Subelements.

Known Issues.
============

Due to Engine Problems IT IS CURRENTLY NOT POSSIBLE TO ACCESS
a Subelement by index 0: $ Object-\u003e Property.

8 Years Ago.

Using Stuff Like: IS_Object ($ xml-\u003e module-\u003e admin) to check if there actually is a node class "admin", doesn "t seem to work as extected, since simplexml Always Returns An Object- in That Case An Empty One - Even if a Particular Node Does Not Exist.
For Me Good Old Empty () Function Sems to Work Just Fine In such Cases.

8 Years Ago.

A Quick Tip On Xpath Queries and Default Namespaces. It looks like the XML-system behind SimpleXML has the same workings as I believe the XML-system .NET uses: when one needs to address something in the default namespace, one will have to declare the namespace using registerXPathNamespace and then use its prefix to ADDRESS THE OTHERWISE IN THE DEFAULT NAMESPACE LIVING ELEMENT.

$ String \u003d.<<

Forty What?
Joe
Jane

I know that "s The Answer - But What" S The Question?

XML;

$ xml \u003d simplexml_load_string ($ string);
$ xml -\u003e registerxpathnamespace ("DEF", "http://www.w3.org/2005/atom");

$ nodes \u003d $ XML -\u003e XPath ("// Def: Document / Def: Title");

9 Years Ago.

While Simplexmlelement Claims to Be Iterable, IT Does Not Seem to Implement The Standard Iteractor Interface Functions Like :: Next And :: Reset Properly. Therefore While foreach () Works, Functions Like Next (), Current (), or Each () Don "T SEEM TO Work As You Would Expect - The Pointer Never Sems to Move or Keeps Getting Reset.

6 YEARS AGO.

If the XML document encoding differs from UTF-8, the encoding declaration should follow immediately after version \u003d "..." and in front of Standalone \u003d "...". This is the requirement of the XML standard.

IF Encoding XML Document Differs from UTF-8. Encoding Declaration SHOULD Follow Immediately After The Version \u003d "..." And Before Standalone \u003d "...". This Requirement is Standard XML.

OK.

Russian language. Russian Language.
Fatal Error: Uncaught Exception "Exception" With Message "String Could Not Be Parsed AS XML" In ...

PARSING XML is essentially means a pass through the XML document and return the relevant data. And although the increasing number of web services returns data in JSON format, but nevertheless most still use XML, so it is important to master the XML parsing if you want to use the entire spectrum of available API interfaces.

Using the extension Simplexml. In PHP, which was added back in PHP 5.0, work with XML is very easy and simple. In this article, I will show you how to do it.

Basics of use

Let's start with the following example languages.xml.:

>

> 1972>
> Dennis Ritchie. >
>

> 1995>
> Rasmus Lerdorf. >
>

> 1995>
> James Gosling >
>
>

This XML document contains a list of programming languages \u200b\u200bwith some information about each language: a year of its implementation and the name of its creator.

The first step is to load XML using the functions or simplexml_load_file ()either simpleXML_Load_String (). As it is clear from the names of the functions, the first loads XML from the file, and the second will download XML from the line.

Both functions read all the DOM tree in memory and return the object Simplexmlelement. In the example above, the object is saved in the $ Languages \u200b\u200bvariable. You can use functions. vAR_DUMP () or print_R ()To get detailed information about the refundable object, if you want.

Simplexmlelement Object.
[Lang] \u003d\u003e Array
[0] \u003d\u003e Simplexmlelement Object
[@ attributes] \u003d\u003e array
[NAME] \u003d\u003e C
[Appeared] \u003d\u003e 1972
[Creator] \u003d\u003e Dennis Ritchie
[1] \u003d\u003e Simplexmlelement Object
[@ attributes] \u003d\u003e array
[Name] \u003d\u003e PHP
[Appeared] \u003d\u003e 1995
[Creator] \u003d\u003e Rasmus Lerdorf
[2] \u003d\u003e Simplexmlelement Object
[@ attributes] \u003d\u003e array
[Name] \u003d\u003e java
[Appeared] \u003d\u003e 1995
[Creator] \u003d\u003e James Gosling
)
)

This XML contains the root element languages.within which there are three elements lang. Each element of the array corresponds to the element. lang. In the XML document.

You can access the properties of the object using the operator -> . For example, $ Languages-\u003e Lang will return to you the object Simplexmlelement, which corresponds to the first element lang.. This object contains two properties: Appeared and Creator.

$ Languages \u200b\u200b-\u003e Lang [0] -\u003e Appeared;
$ Languages \u200b\u200b-\u003e Lang [0] -\u003e Creator;

List of languages \u200b\u200blist and show their properties can be very easy with the standard cycle, such as foreach..

foreach ($ Languages \u200b\u200b-\u003e Lang AS $ Lang) (
printF (
"" ,
$ lang ["Name"]
$ lang -\u003e appeared,
$ lang -\u003e Creator
) ;
}

Please note how I got access to the name attribute of the Lang element to get the name of the language. In this way, you can access any attribute of the item represented as SimplexmLelement object.

Working with namespaces

While working with XMLs of various web services, you will not once again encounter the namespaces of the elements. Let's change our languages.xml.To show an example of using the namespace:

xMLNS: DC \u003d\u003e

> 1972>
> Dennis Ritchie. >
>

> 1995>
> Rasmus Lerdorf. >
>

> 1995>
> James Gosling >
>
>

Now element creator It is placed in the namespace dCwhich indicates http://purl.org/dc/elements/1.1/. If you try to print the creators of the language using our previous code, it will not work. In order to read the namespaces of the items names, you need to use one of the following approaches.

The first approach is to use the URI names directly in the code when we appeal to the names of the elements. In the following example, it is shown how this is done:

$ DC \u003d $ Languages \u200b\u200b-\u003e Lang [1] -\u003e Children ( "http://purl.org/dc/elements/1.1/") ;
echo $ DC -\u003e Creator;

Method children () Accepts the namespace and returns the child elements that begin with the prefix. It takes two arguments, the first of which is the XML namespace, and the second optional argument that is equal by default false. If the second argument is set as true, the namespace will be considered as a prefix. If False, the namespace will be considered as the NAML namespace.

The second approach consists in reading the URI names from the document and use them when accessing the names space of the elements. In fact, this is the best way to access items, because you should not be tightly tied to URI.

$ Namespaces \u003d $ Languages \u200b\u200b-\u003e GetNameSpaces (True);
$ DC \u003d $ Languages \u200b\u200b-\u003e Lang [1] -\u003e Children ($ Namespaces ["DC"]);

echo $ DC -\u003e Creator;

Method GetNamespaces () Returns an array of prefix names and associated URIs. It accepts an additional parameter that is equal by default false. If you install it as true.This method will return the names used in parent and subsidiaries. Otherwise, he finds the namespaces used only in the parent node.

Now you can walk on the list of languages \u200b\u200bas follows:

$ Languages \u200b\u200b\u003d SimpleXML_Load_File ("Languages.xml");
$ NS \u003d $ Languages \u200b\u200b-\u003e GetNamespaces (True);

foreach ($ Languages \u200b\u200b-\u003e Lang AS $ Lang) (
$ DC \u003d $ Lang -\u003e Children ($ ns ["DC"]);
printF (
"

% S appeared in% d and was created% s.

" ,
$ lang ["Name"]
$ lang -\u003e appeared,
$ DC -\u003e Creator
) ;
}

Practical example - Video Channel Paraceing with YouTube

Let's consider an example that RSS-feed from YouTube canal and displays links to all videos from it. For this you need to contact the following address:

http://gdata.youtube.com/feeds/api/users/xxx/uploads.

URL Returns the list of recent video from this channel in XML format. We will break XML and get the following information for each video:

Link to video
Miniature
Name

We will start with the search and download XML:

$ Channel \u003d "Channel Name";
$ url \u003d. "http://gdata.youtube.com/feeds/api/users/". $ Channel. "/ Uploads";
$ XML \u003d File_Get_Contents ($ URL);

$ feed \u003d simplexml_load_string ($ xml);
$ ns \u003d $ feed -\u003e getNamespaces (True);

If you look at XML feed, then you can see that there are several elements entity.Each of which stores detailed information about a particular video from the channel. But we use only thumbnails of images, video address and name. These three elements are descendants of the element. group.which, in turn, is a subsidiary for entry.:

>
…
>
…

Title ... >
…
>
…
>

We will just go through all elements entry., And for each of them, extracted the necessary information. note that Player, thumbnail and title Are in the media namespace. Thus, we must act as in the previous example. We receive names from the document and use namespace when accessing items.

foreach ($ Feed -\u003e Entry As $ Entry) (
$ group \u003d $ Entry -\u003e Children ($ ns ["media"]);
$ group \u003d $ Group -\u003e Group;
$ thumbnail_atts \u003d $ group -\u003e thumbnail [1] -\u003e attributes ();
$ image \u003d $ thumbnail_attrs ["URL"];
$ Player \u003d $ Group -\u003e Player -\u003e attributes ();
$ Link \u003d $ Player ["URL"];
$ title \u003d $ group -\u003e title;
printF ( "

" ,
$ Player, $ Image, $ Title);
}

Conclusion

Now that you know how to use Simplexml. To parse XML data, you can improve your skills by analyzing various XML channels with different APIs. But it is important to take into account that simplexml reads the whole DOM into memory, so if you parscrib a large data set, then you may encounter a shortage of memory. To learn more about Simplexml, read the documentation.

If you have any questions, then for the speedy answer, we recommend using our

Stage 1. Testing Testing (interaction with test circuit GIS GMP) # address of the GIS GIS test:
gisgmp.wsdllocation \u003d http: //213.59.255.182: 7777 / Gateway / Services / sid0003663? WSDL
gisgmp.wsdllocation.Endpoint \u003d http: //213.59.255.182: 7777 / Gateway / Services / SID0003663
This address is prescribed in the SP settings. Update must be prescribed in the logging settings file, specifying the value Trace.. After making the specified values, you need to run the joint venture and client ACC (restart, if it was already running) Next, from the zor or application bu / AU to pay funds, it is necessary to perform the action "Create a payment about the payment", if system controls are passed - then will be created About payment. Which later will need to unload.
After unloading, you need to check the status of the "Request Status Request". After that, Ed Minimaling about the payment passes to the status "Accepted GIS GMP" - ...

Given: MSG table with a large number of records.
CreateTableMSG (IDINTEGERNOTNULLPRIMARYKEY, DESCRIPTIONCHAR (50) NOTNULL, DATE_CREATEDATE);
Task:
You need to clear the table from the data /
Decision: To solve this task there are several ways. Below the description and an example of each of them.
The easiest way ( first option) - execute recording removal operator. When executing it, you will see the result (how many records are removed). Convenient thing when you need to know exactly and understand the correct data deleted. But there has deficiencies before other options for solving the task.

Delete frommsg; - Deletes all lines in the table - Every lines from which the creation date "2019.02.01" DELETE FROMSG WHEREDATE_CREATE \u003d "2019.02.01";

Second option. Using the operatordml to clean all rows in the table.
Truncatetablemsg;
At the same time, several features of using this operator:
He is not in Firebird, so we use the first and third option. After you ...

Actual addresses for contacts to ESMEV 3.0 We remind you that in accordance with the previously published information on the SMEV 3.0 technological portal, you must use current addresses of the placement of a single electronic service:
the address of the Unified Electronic Service for Development Environment 3.0, corresponding to the scheme 1.1 - http://smev3-d.test.gosuslugi.ru:7500/smev/v1.1/ws?wsdl, and the service will also be available at the address

Now we will study work with XML. XML is a format for exchanging data between sites. It is very similar to HTML, only XML has their own tags and attributes.

Why do you need an XML when parsing? Sometimes it happens that the site that you need to be sparking has an API with which you can get the desired, not really straining. Therefore, at once, advice - before Pousing the site, check if it does not have an API.

What is an API? This is a set of functions with which you can send a request to this site and get the right answer. This answer most often comes in XML format. Therefore, let's go to study.

Work with XML in PHP

Let you have XML. It can be in a row or stored in a file or surrendered on request to a specific URL.

Let XML be stored in a string. In this case, from this line you need to create an object with new simplexmlelement:

$ str \u003d " Kolya 25 1000 "; $ xml \u003d new simplexmlelement ($ STR);

Now we have in a variable $ xml The object with the disassembled XML is stored. Referring to the properties of this object, you can access the contents of XML tags. How exactly - we will analyze just below.

If the XML is stored in the file or is given to handle the URL (which most often happens), then the function should be used simplexml_load_filewhich makes the same object $ xml:

Kolya 25 1000

$ xml \u003d simplexml_load_file (path to file or ul);

Work techniques

In the examples below, our XML is stored in the file or URL.

Let the following XML:

Kolya 25 1000

Let's get a name, age and employee salary:

$ xml \u003d simplexml_load_file (path to file or ul); Echo $ XML-\u003e NAME; // Will out "Kohl" echo $ xml-\u003e age; // withdrawn 25 ECHO $ \u200b\u200bXML-\u003e SALARY; // withdraw 1000

As you see, the $ xml object has properties corresponding to tags.

You may notice that the tag Nowhere does not appear when contacting. This is because it is the root tag. You can rename it, for example, on - And nothing will change:

Kolya 25 1000

$ xml \u003d simplexml_load_file (path to file or ul); Echo $ XML-\u003e NAME; // Will out "Kohl" echo $ xml-\u003e age; // withdrawn 25 ECHO $ \u200b\u200bXML-\u003e SALARY; // withdraw 1000

The root tag in XML can be only one, as well as the tag In the usual HTML.

Let's slightly modify our XML:

Kolya 25 1000

In this case, we will have a chain of appeals:

$ xml \u003d simplexml_load_file (path to file or ul); Echo $ Xml-\u003e Worker-\u003e Name; // Will out "Kohl" Echo $ Xml-\u003e Worker-\u003e Age; // withdrawn 25 ECHO $ \u200b\u200bXML-\u003e WORKER-\u003e SALARY; // withdraw 1000

Work with attributes

Let some of the data are stored in attributes:

Number 1

$ xml \u003d simplexml_load_file (path to file or ul); Echo $ Xml-\u003e Worker ["Name"]; // withdraw "Kohl" Echo $ XML-\u003e WORKER ["AGE"]; // will display 25 ECHO $ \u200b\u200bXML-\u003e WORKER ["SALARY"]; // withdraw 1000 ECHO $ \u200b\u200bXML-\u003e WORKER; // Will "number 1"

Tags with hyphens

In XML, tags (and attributes) with a hyphen are allowed. In this case, the appeal to such tags is true:

Kolya Ivanov

$ xml \u003d simplexml_load_file (path to file or ul); Echo $ xml-\u003e Worker -\u003e (First-Name); // Will out "Kohl" Echo $ Xml-\u003e Worker -\u003e (Last-Name); // withdraw "Ivanov"

Bust cycle

Let now we have one employee, but a few. In this case, we can sort out our object using Foreach cycle:

Kolya 25 1000 Vasya 26 2000 Petya 27 3000

$ xml \u003d simplexml_load_file (path to file or ul); Foreach ($ XML AS $ Worker) (Echo $ Worker-\u003e Name; // Will "Kolya", "Vasya", "Peter")

From the object in a normal array

If you are inconvenient to work with the object, you can convert it to a normal PHP array using the following tricky reception:

$ xml \u003d simplexml_load_file (path to file or ul); VAR_DUMP (JSON_DECODE (JSON_ENCODE ($ XML), TRUE));

More information

Sitemap.xml-based parsing

Often there is a sitemap.xml file on the site. This file stores links to all site pages for the convenience of indexing them by search engines (indexation is essentially the site PARSING Yandex and Google).

In general, we must be worried about, why this file is needed, the main thing is that if it is - you can not climb on the pages of the site by any cinema methods, but simply use this file.

How to check the availability of this file: Let we pass the site site.ru, then contact the browser to site.ru/sitemap.xml - if you see something, it means there is there, and if you don't see - then alas.

If Sitemap is - then it contains links to all site pages in XML format. Quietly pick up this XML, Parshet it, separate the links to the pages you need by any convenient way (for example, the analysis of the URL, which was described in the spider method).

As a result, you get a list of links for the parsing, it remains only to go to them and resort the content you need.

Read more about SiteMap.xml device in Wikipedia.