Skip to content

Commit de2ccc4

Browse files
committed
[components] Finishing up the DomCrawler bootstrap and proofreading
1 parent 8a8d24a commit de2ccc4

File tree

1 file changed

+152
-54
lines changed

1 file changed

+152
-54
lines changed

components/dom_crawler.rst

Lines changed: 152 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
The DomCrawler Component
55
========================
66

7-
DomCrawler Component eases DOM navigation for HTML and XML documents.
7+
The DomCrawler Component eases DOM navigation for HTML and XML documents.
88

99
Installation
1010
------------
@@ -21,10 +21,9 @@ Usage
2121
The :class:`Symfony\\Component\\DomCrawler\\Crawler` class provides methods
2222
to query and manipulate HTML and XML documents.
2323

24-
Instance of the Crawler represents a set (:phpclass:`SplObjectStorage`)
25-
of :phpclass:`DOMElement` objects:
26-
27-
.. code-block:: php
24+
An instance of the Crawler represents a set (:phpclass:`SplObjectStorage`)
25+
of :phpclass:`DOMElement` objects, which are basically nodes that you can
26+
traverse easily::
2827

2928
use Symfony\Component\DomCrawler\Crawler;
3029

@@ -43,16 +42,14 @@ of :phpclass:`DOMElement` objects:
4342
print $domElement->nodeName;
4443
}
4544

46-
More specialized :class:`Symfony\\Component\\DomCrawler\\Link` and
45+
Specialized :class:`Symfony\\Component\\DomCrawler\\Link` and
4746
:class:`Symfony\\Component\\DomCrawler\\Form` classes are useful for
48-
interacting with html links and forms.
47+
interacting with html links and forms as you traverse through the HTML tree.
4948

5049
Node Filtering
5150
~~~~~~~~~~~~~~
5251

53-
Using XPath expressions is really simplified:
54-
55-
.. code-block:: php
52+
Using XPath expressions is really easy::
5653

5754
$crawler = $crawler->filterXPath('descendant-or-self::body/p');
5855

@@ -61,15 +58,12 @@ Using XPath expressions is really simplified:
6158
:phpmethod:`DOMXPath::query` is used internally to actually perform
6259
an XPath query.
6360

64-
Filtering is even easier if you have CssSelector Component installed:
65-
66-
.. code-block:: php
61+
Filtering is even easier if you have the ``CssSelector`` Component installed.
62+
This allows you to use jQuery-like selectors to traverse::
6763

6864
$crawler = $crawler->filter('body > p');
6965

70-
Anonymous function can be used to filter with more complex criteria:
71-
72-
.. code-block:: php
66+
Anonymous function can be used to filter with more complex criteria::
7367

7468
$crawler = $crawler->filter('body > p')->reduce(function ($node, $i) {
7569
// filter even nodes
@@ -86,35 +80,25 @@ To remove a node the anonymous function must return false.
8680
Node Traversing
8781
~~~~~~~~~~~~~~~
8882

89-
Access node by its position on the list:
90-
91-
.. code-block:: php
83+
Access node by its position on the list::
9284

9385
$crawler->filter('body > p')->eq(0);
9486

95-
Get the first or last node of the current selection:
96-
97-
.. code-block:: php
87+
Get the first or last node of the current selection::
9888

9989
$crawler->filter('body > p')->first();
10090
$crawler->filter('body > p')->last();
10191

102-
Get the nodes of the same level as the current selection:
103-
104-
.. code-block:: php
92+
Get the nodes of the same level as the current selection::
10593

10694
$crawler->filter('body > p')->siblings();
10795

108-
Get the same level nodes after or before the current selection:
109-
110-
.. code-block:: php
96+
Get the same level nodes after or before the current selection::
11197

11298
$crawler->filter('body > p')->nextAll();
11399
$crawler->filter('body > p')->previousAll();
114100

115-
Get all the child or parent nodes:
116-
117-
.. code-block:: php
101+
Get all the child or parent nodes::
118102

119103
$crawler->filter('body')->children();
120104
$crawler->filter('body > p')->parents();
@@ -127,43 +111,35 @@ Get all the child or parent nodes:
127111
Accessing Node Values
128112
~~~~~~~~~~~~~~~~~~~~~
129113

130-
Access the value of the first node of the current selection:
131-
132-
.. code-block:: php
114+
Access the value of the first node of the current selection::
133115

134116
$message = $crawler->filterXPath('//body/p')->text();
135117

136-
Access the attribute value of the first node of the current selection:
137-
138-
.. code-block:: php
118+
Access the attribute value of the first node of the current selection::
139119

140120
$class = $crawler->filterXPath('//body/p')->attr('class');
141121

142-
Extract attribute and/or node values from the list of nodes:
143-
144-
.. code-block:: php
122+
Extract attribute and/or node values from the list of nodes::
145123

146124
$attributes = $crawler->filterXpath('//body/p')->extract(array('_text', 'class'));
147125

148-
.. note:: Special attribute ``_text`` represents a node value.
126+
.. note::
149127

150-
Call an anonymous function on each node of the list:
128+
Special attribute ``_text`` represents a node value.
151129

152-
.. code-block:: php
130+
Call an anonymous function on each node of the list::
153131

154132
$nodeValues = $crawler->filter('p')->each(function ($node, $i) {
155133
return $node->nodeValue;
156134
});
157135

158136
The anonymous function receives the position and the node as arguments.
159-
Result is an array of values returned by anonymous function calls.
137+
The result is an array of values returned by the anonymous function calls.
160138

161139
Adding the Content
162140
~~~~~~~~~~~~~~~~~~
163141

164-
Crawler supports multiple ways of adding the content:
165-
166-
.. code-block:: php
142+
The crawler supports multiple ways of adding the content::
167143

168144
$crawler = new Crawler('<html><body /></html>');
169145

@@ -176,7 +152,7 @@ Crawler supports multiple ways of adding the content:
176152
$crawler->add('<html><body /></html>');
177153
$crawler->add('<root><node /></root>');
178154

179-
As Crawler's implementation is based on the DOM extension it is also able
155+
As the Crawler's implementation is based on the DOM extension, it is also able
180156
to interact with native :phpclass:`DOMDocument`, :phpclass:`DOMNodeList`
181157
and :phpclass:`DOMNode` objects:
182158

@@ -196,10 +172,132 @@ and :phpclass:`DOMNode` objects:
196172
Form and Link support
197173
~~~~~~~~~~~~~~~~~~~~~
198174

199-
todo:
175+
Special treatment is given to links and forms inside the DOM tree.
176+
177+
Links
178+
.....
179+
180+
To find a link by name (or a clickable image by its ``alt`` attribute), use
181+
the ``selectLink`` method on an existing crawler. This returns a Crawler
182+
instance with just the selected link(s). Calling ``link()`` gives us a special
183+
:class:`Symfony\\Component\\DomCrawler\\Link` object::
184+
185+
$linksCrawler = $crawler->selectLink('Go elsewhere...');
186+
$link = $linksCrawler->link();
187+
188+
// or do this all at once
189+
$link = $crawler->selectLink('Go elsewhere...')->link();
190+
191+
The :class:`Symfony\\Component\\DomCrawler\\Link` object has several useful
192+
methods to get more information about the selected link itself::
193+
194+
// return the raw href value
195+
$href = $link->getRawUri();
196+
197+
// return the proper URI that can be used to make another request
198+
$uri = $link->getUri();
199+
200+
The ``getUri()`` is especially useful as it cleans the ``href`` value and
201+
transforms it into how it should really be processed. For example, for a
202+
link with ``href="#foo"``, this would return the full URI of the current
203+
page suffixed with ``#foo``. The return from ``getUri()`` is always a full
204+
URI that you can act on.
205+
206+
Forms
207+
.....
208+
209+
Special treatment is also given to forms. A ``selectButton()`` method is
210+
available on the Crawler which returns another Crawler that matches a button
211+
(``input[type=submit]``, ``input[type=image]``, or a ``button``) with the
212+
given text. This method is especially useful because you can use it to return
213+
a :class:`Symfony\\Component\\DomCrawler\\Form` object that represents the
214+
form that the button lives in::
215+
216+
$form = $crawler->selectButton('validate')->form();
217+
218+
// or "fill" the form fields with data
219+
$form = $crawler->selectButton('validate')->form(array(
220+
'name' => 'Ryan',
221+
));
222+
223+
The :class:`Symfony\\Component\\DomCrawler\\Form` object has lots of very
224+
useful methods for working with forms:
225+
226+
$uri = $form->getUri();
227+
228+
$method = $form->getMethod();
229+
230+
The :method:`Symfony\\Component\\DomCrawler\\Form::getUri` method does more
231+
than just return the ``action`` attribute of the form. If the form method
232+
is GET, then it mimics the browsers behavior and returns a the ``action``
233+
attribute followed by a query string of all of the form's values.
234+
235+
You can virtually set and get values on the form
236+
237+
// set values on the form internally
238+
$form->setValues(array(
239+
'registration[username]' => 'symfonyfan',
240+
'registration[terms]' => 1,
241+
));
242+
243+
// get back an array of values - in the "flat" array like above
244+
$values = $form->getValues();
245+
246+
// returns the values like PHP would see them, where "registration" is its own array
247+
$values = $form->getPhpValues();
248+
249+
This is great, but it gets better! The ``Form`` object allows you to interact
250+
with your form like a browser, selecting radio values, ticking checkboxes,
251+
and uploading files::
252+
253+
$form['registration[username]']->setValue('symfonyfan');
254+
255+
// check or uncheck a checkbox
256+
$form['registration[terms]']->tick();
257+
$form['registration[terms]']->untick();
258+
259+
// select an option
260+
$form['registration[birthday][year]']->select(1984);
261+
262+
// select many options from a "multiple" select or checkboxes
263+
$form['registration[interests]']->select(array('symfony', 'cookies'));
264+
265+
// even fake a file upload
266+
$form['registration[photo]']->upload('/path/to/lucas.jpg');
267+
268+
What's the point of doing all of this? If you're testing internally, you
269+
can grab the information off of your form as if it had just been submitted
270+
by using the PHP values::
271+
272+
$values = $form->getPhpValues();
273+
$files = $form->getPhpFiles();
274+
275+
If you're using an external HTTP client, you can use the form to grab all
276+
of the information you need to create a POST request for the form::
277+
278+
$uri = $form->getUri();
279+
$method = $form->getMethod();
280+
$values = $form->getValues();
281+
$files = $form->getFiles();
282+
283+
// now use some HTTP client and post using this information
284+
285+
One great example of an integrated system that uses all of this is `Goutte`_.
286+
Goutte understands the Symfony Crawler object and can use it to submit forms
287+
directly::
288+
289+
use Goutte\Client;
290+
291+
// make a real reqeust to an external site
292+
$client = new Client();
293+
$crawler = $client->request('GET', 'https://github.com/login');
294+
295+
// select the form and fill in some values
296+
$form = $crawler->selectButton('Log in')->form();
297+
$form['login'] = 'symfonyfan';
298+
$form['password'] = 'anypass';
299+
300+
// submit that form
301+
$crawler = $client->submit($form);
200302

201-
* selectLink()
202-
* selectButton()
203-
* link()
204-
* links()
205-
* form()
303+
.. _`Goutte`: https://github.com/fabpot/goutte

0 commit comments

Comments
 (0)