4
4
The DomCrawler Component
5
5
========================
6
6
7
- DomCrawler Component eases DOM navigation for HTML and XML documents.
7
+ The DomCrawler Component eases DOM navigation for HTML and XML documents.
8
8
9
9
Installation
10
10
------------
21
21
The :class: `Symfony\\ Component\\ DomCrawler\\ Crawler ` class provides methods
22
22
to query and manipulate HTML and XML documents.
23
23
24
- Instance of the Crawler represents a set (:phpclass: `SplObjectStorage `)
25
- of :phpclass: `DOMElement ` objects:
26
-
27
- .. code-block :: php
24
+ An instance of the Crawler represents a set (:phpclass: `SplObjectStorage `)
25
+ of :phpclass: `DOMElement ` objects, which are basically nodes that you can
26
+ traverse easily::
28
27
29
28
use Symfony\Component\DomCrawler\Crawler;
30
29
@@ -43,16 +42,14 @@ of :phpclass:`DOMElement` objects:
43
42
print $domElement->nodeName;
44
43
}
45
44
46
- More specialized :class: `Symfony\\ Component\\ DomCrawler\\ Link ` and
45
+ Specialized :class: `Symfony\\ Component\\ DomCrawler\\ Link ` and
47
46
:class: `Symfony\\ Component\\ DomCrawler\\ Form ` classes are useful for
48
- interacting with html links and forms.
47
+ interacting with html links and forms as you traverse through the HTML tree .
49
48
50
49
Node Filtering
51
50
~~~~~~~~~~~~~~
52
51
53
- Using XPath expressions is really simplified:
54
-
55
- .. code-block :: php
52
+ Using XPath expressions is really easy::
56
53
57
54
$crawler = $crawler->filterXPath('descendant-or-self::body/p');
58
55
@@ -61,15 +58,12 @@ Using XPath expressions is really simplified:
61
58
:phpmethod: `DOMXPath::query ` is used internally to actually perform
62
59
an XPath query.
63
60
64
- Filtering is even easier if you have CssSelector Component installed:
65
-
66
- .. code-block :: php
61
+ Filtering is even easier if you have the ``CssSelector `` Component installed.
62
+ This allows you to use jQuery-like selectors to traverse::
67
63
68
64
$crawler = $crawler->filter('body > p');
69
65
70
- Anonymous function can be used to filter with more complex criteria:
71
-
72
- .. code-block :: php
66
+ Anonymous function can be used to filter with more complex criteria::
73
67
74
68
$crawler = $crawler->filter('body > p')->reduce(function ($node, $i) {
75
69
// filter even nodes
@@ -86,35 +80,25 @@ To remove a node the anonymous function must return false.
86
80
Node Traversing
87
81
~~~~~~~~~~~~~~~
88
82
89
- Access node by its position on the list:
90
-
91
- .. code-block :: php
83
+ Access node by its position on the list::
92
84
93
85
$crawler->filter('body > p')->eq(0);
94
86
95
- Get the first or last node of the current selection:
96
-
97
- .. code-block :: php
87
+ Get the first or last node of the current selection::
98
88
99
89
$crawler->filter('body > p')->first();
100
90
$crawler->filter('body > p')->last();
101
91
102
- Get the nodes of the same level as the current selection:
103
-
104
- .. code-block :: php
92
+ Get the nodes of the same level as the current selection::
105
93
106
94
$crawler->filter('body > p')->siblings();
107
95
108
- Get the same level nodes after or before the current selection:
109
-
110
- .. code-block :: php
96
+ Get the same level nodes after or before the current selection::
111
97
112
98
$crawler->filter('body > p')->nextAll();
113
99
$crawler->filter('body > p')->previousAll();
114
100
115
- Get all the child or parent nodes:
116
-
117
- .. code-block :: php
101
+ Get all the child or parent nodes::
118
102
119
103
$crawler->filter('body')->children();
120
104
$crawler->filter('body > p')->parents();
@@ -127,43 +111,35 @@ Get all the child or parent nodes:
127
111
Accessing Node Values
128
112
~~~~~~~~~~~~~~~~~~~~~
129
113
130
- Access the value of the first node of the current selection:
131
-
132
- .. code-block :: php
114
+ Access the value of the first node of the current selection::
133
115
134
116
$message = $crawler->filterXPath('//body/p')->text();
135
117
136
- Access the attribute value of the first node of the current selection:
137
-
138
- .. code-block :: php
118
+ Access the attribute value of the first node of the current selection::
139
119
140
120
$class = $crawler->filterXPath('//body/p')->attr('class');
141
121
142
- Extract attribute and/or node values from the list of nodes:
143
-
144
- .. code-block :: php
122
+ Extract attribute and/or node values from the list of nodes::
145
123
146
124
$attributes = $crawler->filterXpath('//body/p')->extract(array('_text', 'class'));
147
125
148
- .. note :: Special attribute ``_text`` represents a node value.
126
+ .. note ::
149
127
150
- Call an anonymous function on each node of the list:
128
+ Special attribute `` _text `` represents a node value.
151
129
152
- .. code-block :: php
130
+ Call an anonymous function on each node of the list::
153
131
154
132
$nodeValues = $crawler->filter('p')->each(function ($node, $i) {
155
133
return $node->nodeValue;
156
134
});
157
135
158
136
The anonymous function receives the position and the node as arguments.
159
- Result is an array of values returned by anonymous function calls.
137
+ The result is an array of values returned by the anonymous function calls.
160
138
161
139
Adding the Content
162
140
~~~~~~~~~~~~~~~~~~
163
141
164
- Crawler supports multiple ways of adding the content:
165
-
166
- .. code-block :: php
142
+ The crawler supports multiple ways of adding the content::
167
143
168
144
$crawler = new Crawler('<html><body /></html>');
169
145
@@ -176,7 +152,7 @@ Crawler supports multiple ways of adding the content:
176
152
$crawler->add('<html><body /></html>');
177
153
$crawler->add('<root><node /></root>');
178
154
179
- As Crawler's implementation is based on the DOM extension it is also able
155
+ As the Crawler's implementation is based on the DOM extension, it is also able
180
156
to interact with native :phpclass: `DOMDocument `, :phpclass: `DOMNodeList `
181
157
and :phpclass: `DOMNode ` objects:
182
158
@@ -196,10 +172,132 @@ and :phpclass:`DOMNode` objects:
196
172
Form and Link support
197
173
~~~~~~~~~~~~~~~~~~~~~
198
174
199
- todo:
175
+ Special treatment is given to links and forms inside the DOM tree.
176
+
177
+ Links
178
+ .....
179
+
180
+ To find a link by name (or a clickable image by its ``alt `` attribute), use
181
+ the ``selectLink `` method on an existing crawler. This returns a Crawler
182
+ instance with just the selected link(s). Calling ``link() `` gives us a special
183
+ :class: `Symfony\\ Component\\ DomCrawler\\ Link ` object::
184
+
185
+ $linksCrawler = $crawler->selectLink('Go elsewhere...');
186
+ $link = $linksCrawler->link();
187
+
188
+ // or do this all at once
189
+ $link = $crawler->selectLink('Go elsewhere...')->link();
190
+
191
+ The :class: `Symfony\\ Component\\ DomCrawler\\ Link ` object has several useful
192
+ methods to get more information about the selected link itself::
193
+
194
+ // return the raw href value
195
+ $href = $link->getRawUri();
196
+
197
+ // return the proper URI that can be used to make another request
198
+ $uri = $link->getUri();
199
+
200
+ The ``getUri() `` is especially useful as it cleans the ``href `` value and
201
+ transforms it into how it should really be processed. For example, for a
202
+ link with ``href="#foo" ``, this would return the full URI of the current
203
+ page suffixed with ``#foo ``. The return from ``getUri() `` is always a full
204
+ URI that you can act on.
205
+
206
+ Forms
207
+ .....
208
+
209
+ Special treatment is also given to forms. A ``selectButton() `` method is
210
+ available on the Crawler which returns another Crawler that matches a button
211
+ (``input[type=submit] ``, ``input[type=image] ``, or a ``button ``) with the
212
+ given text. This method is especially useful because you can use it to return
213
+ a :class: `Symfony\\ Component\\ DomCrawler\\ Form ` object that represents the
214
+ form that the button lives in::
215
+
216
+ $form = $crawler->selectButton('validate')->form();
217
+
218
+ // or "fill" the form fields with data
219
+ $form = $crawler->selectButton('validate')->form(array(
220
+ 'name' => 'Ryan',
221
+ ));
222
+
223
+ The :class: `Symfony\\ Component\\ DomCrawler\\ Form ` object has lots of very
224
+ useful methods for working with forms:
225
+
226
+ $uri = $form->getUri();
227
+
228
+ $method = $form->getMethod();
229
+
230
+ The :method: `Symfony\\ Component\\ DomCrawler\\ Form::getUri ` method does more
231
+ than just return the ``action `` attribute of the form. If the form method
232
+ is GET, then it mimics the browsers behavior and returns a the ``action ``
233
+ attribute followed by a query string of all of the form's values.
234
+
235
+ You can virtually set and get values on the form
236
+
237
+ // set values on the form internally
238
+ $form->setValues(array(
239
+ 'registration[username]' => 'symfonyfan',
240
+ 'registration[terms]' => 1,
241
+ ));
242
+
243
+ // get back an array of values - in the "flat" array like above
244
+ $values = $form->getValues();
245
+
246
+ // returns the values like PHP would see them, where "registration" is its own array
247
+ $values = $form->getPhpValues();
248
+
249
+ This is great, but it gets better! The ``Form `` object allows you to interact
250
+ with your form like a browser, selecting radio values, ticking checkboxes,
251
+ and uploading files::
252
+
253
+ $form['registration[username]']->setValue('symfonyfan');
254
+
255
+ // check or uncheck a checkbox
256
+ $form['registration[terms]']->tick();
257
+ $form['registration[terms]']->untick();
258
+
259
+ // select an option
260
+ $form['registration[birthday][year]']->select(1984);
261
+
262
+ // select many options from a "multiple" select or checkboxes
263
+ $form['registration[interests]']->select(array('symfony', 'cookies'));
264
+
265
+ // even fake a file upload
266
+ $form['registration[photo]']->upload('/path/to/lucas.jpg');
267
+
268
+ What's the point of doing all of this? If you're testing internally, you
269
+ can grab the information off of your form as if it had just been submitted
270
+ by using the PHP values::
271
+
272
+ $values = $form->getPhpValues();
273
+ $files = $form->getPhpFiles();
274
+
275
+ If you're using an external HTTP client, you can use the form to grab all
276
+ of the information you need to create a POST request for the form::
277
+
278
+ $uri = $form->getUri();
279
+ $method = $form->getMethod();
280
+ $values = $form->getValues();
281
+ $files = $form->getFiles();
282
+
283
+ // now use some HTTP client and post using this information
284
+
285
+ One great example of an integrated system that uses all of this is `Goutte `_.
286
+ Goutte understands the Symfony Crawler object and can use it to submit forms
287
+ directly::
288
+
289
+ use Goutte\Client;
290
+
291
+ // make a real reqeust to an external site
292
+ $client = new Client();
293
+ $crawler = $client->request('GET', 'https://github.com/login');
294
+
295
+ // select the form and fill in some values
296
+ $form = $crawler->selectButton('Log in')->form();
297
+ $form['login'] = 'symfonyfan';
298
+ $form['password'] = 'anypass';
299
+
300
+ // submit that form
301
+ $crawler = $client->submit($form);
200
302
201
- * selectLink()
202
- * selectButton()
203
- * link()
204
- * links()
205
- * form()
303
+ .. _`Goutte` : https://github.com/fabpot/goutte
0 commit comments