Open
Description
Write spider by config file or scripts.
Choices:
1. xml
<spider>
<site>
<charset>utf-8</charset>
<user-agent></user-agent>
<cookies>
<cookie domain="" path="" name="" value="">
</cookie>
</cookies>
<heads>
<head name="" value=""/>
</heads>
</site>
<startUrls>
<url></url>
</startUrls>
<extraction targetUrl="" helpUrl="">
<field name="title">
<extractor type="xpath" value="//div[@class='title']"/>
</field>
<field name="content">
<extractor type="xpath" value="//div[@class='content']"/>
</field>
</extraction>
</spider>
2. json
3. yaml
4.javascript
var name=xpath("//h1[@class='entry-title public']/strong/a/text()")
var readme=xpath("//div[@id='readme']/tidyText()")
var star=xpath("//ul[@class='pagehead-actions']/li[1]//a[@class='social-count js-social-count']/text()")
5.jruby
name= xpath "//h1[@class='entry-title public']/strong/a/text()"
readme = xpath "//div[@id='readme']/tidyText()"
star = xpath "//ul[@class='pagehead-actions']/li[1]//a[@class='social-count js-social-count']/text()"
fork = xpath "//ul[@class='pagehead-actions']/li[2]//a[@class='social-count']/text()"
6. Java
Just write PageProcessor and load it dynamicly…