Skip to content

Commit a4bf15d

Browse files
authored
Removing Pig, Storm, and Spark 1 support, as well as support for Scala 2.10 on Spark 2 (#2092)
1 parent 034cb57 commit a4bf15d

File tree

211 files changed

+44
-20228
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

211 files changed

+44
-20228
lines changed

.gitignore

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,6 @@ metastore_db
1414
/hdfs/data
1515
/repository-hdfs/data/*
1616
/mr/src/main/resources/esh-build.properties
17-
/pig/tmp-pig/*
1817
/spark/keyvaluerdd.parquet
19-
/spark/sql-13/default_*
20-
/spark/sql-13/with_meta_*
2118
out/
2219
localRepo/

README.md

Lines changed: 1 addition & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Elasticsearch Hadoop [![Build Status](https://travis-ci.org/elastic/elasticsearch-hadoop.svg?branch=master)](https://travis-ci.org/elastic/elasticsearch-hadoop)
22
Elasticsearch real-time search and analytics natively integrated with Hadoop.
3-
Supports [Map/Reduce](#mapreduce), [Apache Hive](#apache-hive), [Apache Pig](#apache-pig), [Apache Spark](#apache-spark) and [Apache Storm](#apache-storm).
3+
Supports [Map/Reduce](#mapreduce), [Apache Hive](#apache-hive), and [Apache Spark](#apache-spark).
44

55
See [project page](http://www.elastic.co/products/hadoop/) and [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for detailed information.
66

@@ -184,33 +184,6 @@ INSERT OVERWRITE TABLE artists
184184

185185
As one can note, currently the reading and writing are treated separately but we're working on unifying the two and automatically translating [HiveQL][] to Elasticsearch queries.
186186

187-
## [Apache Pig][]
188-
ES-Hadoop provides both read and write functions for Pig so you can access Elasticsearch from Pig scripts.
189-
190-
Register ES-Hadoop jar into your script or add it to your Pig classpath:
191-
```
192-
REGISTER /path_to_jar/es-hadoop-<version>.jar;
193-
```
194-
Additionally one can define an alias to save some chars:
195-
```
196-
%define ESSTORAGE org.elasticsearch.hadoop.pig.EsStorage()
197-
```
198-
and use `$ESSTORAGE` for storage definition.
199-
200-
### Reading
201-
To read data from ES, use `EsStorage` and specify the query through the `LOAD` function:
202-
```SQL
203-
A = LOAD 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?q=me*');
204-
DUMP A;
205-
```
206-
207-
### Writing
208-
Use the same `Storage` to write data to Elasticsearch:
209-
```SQL
210-
A = LOAD 'src/artists.dat' USING PigStorage() AS (id:long, name, url:chararray, picture: chararray);
211-
B = FOREACH A GENERATE name, TOTUPLE(url, picture) AS links;
212-
STORE B INTO 'radio/artists' USING org.elasticsearch.hadoop.pig.EsStorage();
213-
```
214187
## [Apache Spark][]
215188
ES-Hadoop provides native (Java and Scala) integration with Spark: for reading a dedicated `RDD` and for writing, methods that work on any `RDD`. Spark SQL is also supported
216189

@@ -313,30 +286,6 @@ DataFrame df = sqlContext.read.json("examples/people.json")
313286
JavaEsSparkSQL.saveToEs(df, "spark/docs")
314287
```
315288

316-
## [Apache Storm][]
317-
ES-Hadoop provides native integration with Storm: for reading a dedicated `Spout` and for writing a specialized `Bolt`
318-
319-
### Reading
320-
To read data from ES, use `EsSpout`:
321-
```java
322-
import org.elasticsearch.storm.EsSpout;
323-
324-
TopologyBuilder builder = new TopologyBuilder();
325-
builder.setSpout("es-spout", new EsSpout("storm/docs", "?q=me*"), 5);
326-
builder.setBolt("bolt", new PrinterBolt()).shuffleGrouping("es-spout");
327-
```
328-
329-
### Writing
330-
To index data to ES, use `EsBolt`:
331-
332-
```java
333-
import org.elasticsearch.storm.EsBolt;
334-
335-
TopologyBuilder builder = new TopologyBuilder();
336-
builder.setSpout("spout", new RandomSentenceSpout(), 10);
337-
builder.setBolt("es-bolt", new EsBolt("storm/docs"), 5).shuffleGrouping("spout");
338-
```
339-
340289
## Building the source
341290

342291
Elasticsearch Hadoop uses [Gradle][] for its build system and it is not required to have it installed on your machine. By default (`gradlew`), it automatically builds the package and runs the unit tests. For integration testing, use the `integrationTests` task.
@@ -370,10 +319,8 @@ under the License.
370319

371320
[Hadoop]: http://hadoop.apache.org
372321
[Map/Reduce]: http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
373-
[Apache Pig]: http://pig.apache.org
374322
[Apache Hive]: http://hive.apache.org
375323
[Apache Spark]: http://spark.apache.org
376-
[Apache Storm]: http://storm.apache.org
377324
[HiveQL]: http://cwiki.apache.org/confluence/display/Hive/LanguageManual
378325
[external table]: http://cwiki.apache.org/Hive/external-tables.html
379326
[Apache License]: http://www.apache.org/licenses/LICENSE-2.0

0 commit comments

Comments
 (0)