-
Notifications
You must be signed in to change notification settings - Fork 993
Implement delete operation #1324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, is the pull request still missing something ? |
@vincentarnaud90 Thanks for re-commenting on this PR, as it slipped through my todo list. I'll get a review up for it soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review here. This is a larger lift than it looks, especially with our serialization code being as complicated as it is.
I left a few code style nitpicks, but most importantly I think that using the ValueWriter to throw out the document source on the delete operation might be the wrong way forward here. I think we should make a new implementation of BulkCommand
and have AbstractBulkFactory
allow subclasses to return a different command (in this case, all the factories other than Delete would use the existing logic in the super class)
mr/src/main/java/org/elasticsearch/hadoop/serialization/bulk/DeleteBulkFactory.java
Outdated
Show resolved
Hide resolved
public static final class NoDataWriter implements ValueWriter<Object> { | ||
|
||
@Override | ||
public Result write(Object writable, Generator generator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it only supports Hadoop Writable objects here. For instance, Spark uses regular Java and Scala objects for data types.
Since integrations can be added that have different types than described here, it makes me wonder if this is really the right place to have the connector ignore the document content for the bulk request. I don't think a ValueWriter will have enough wisdom at runtime to make the decision on a call-by-call basis. Maybe there needs to be work done in order to have a new BulkCommand
that is returned from the DeleteBulkFactory
ignore the document instead.
@@ -221,6 +221,11 @@ public String getSerializerValueWriterClassName() { | |||
return getProperty(ES_SERIALIZATION_WRITER_VALUE_CLASS); | |||
} | |||
|
|||
public Settings setSerializerValueWriterClassName(String className) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the ValueWriter doesn't seem like the right place to ignore the document source for delete operations, this won't be needed.
@@ -253,10 +253,18 @@ public static void validateSettings(Settings settings) { | |||
Assert.isTrue(settings.getMappingExcludes().isEmpty(), "When writing data as JSON, the field exclusion feature is ignored. This is most likely not what the user intended. Bailing out..."); | |||
} | |||
|
|||
//check the configuration is coherent in order to use the delete operation | |||
if (ConfigurationOptions.ES_OPERATION_DELETE.equals(settings.getOperation())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks for adding this
mr/src/main/java/org/elasticsearch/hadoop/rest/InitializationUtils.java
Outdated
Show resolved
Hide resolved
|
||
public DeleteBulkFactory(Settings settings, MetadataExtractor metaExtractor, EsMajorVersion version) { | ||
// we only want a specific serializer for this particular bulk factory | ||
super(settings.copy().setSerializerValueWriterClassName(NoDataWriter.class.getName()), metaExtractor, version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not required if we use a new implementation of BulkCommand
mr/src/test/java/org/elasticsearch/hadoop/serialization/CommandTest.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/elasticsearch/hadoop/serialization/CommandTest.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/elasticsearch/hadoop/serialization/CommandTest.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/elasticsearch/hadoop/serialization/CommandTest.java
Outdated
Show resolved
Hide resolved
Hello, DeleteBulkFactory extends AbstractBulkFactory {
@Override
public BulkCommand createBulk(){
//Inside this BulkCommand, I can ignore the document source.
return DeleteTemplateBulk();
}
} Instead of overriding the createBulk method in the DeleteBulkFactory class, can I just add my logic inside the createBulk method of AbstractBulkFactory class in order to return the DeleteTemplateBulk public AbstractBulkFactory implements BulkFactory{
@Override
public BulkCommand createBulk(){
//the code which is actually written
if (isDeleteOp){
//Inside this BulkCommand, I can ignore the document source.
return DeleteTemplateBulk();
}
}
} |
I think that should be fine considering it already detects if the operation is an update vs an index operation. |
db71a2d
to
308aa3f
Compare
Implement delete operation and tests
Add license headers
Revert changes from codestyle and generate data in CommandTest with loops
308aa3f
to
e7bfc39
Compare
Hello @jbaiera , sorry for the really late reply. |
Thanks for putting in those changes! I'll run the full test suite on my own since TravisCI is busted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks very much for your dedication on getting this in! I'll go ahead and merge it in to master and backport it to the 7.x branch. It should be available in the 7.8.0 release when that lands.
This PR adds support for delete operations. Document ID's must be present for each document for this to work. A new bulk command factory has been added which will render a delete operation with the id and no document body in the bulk output.
Hi @jbaiera, not sure if this is right place to post my question but its relevant to your closing comment. We are using ElasticSearch and are in need of Delete feature. As you mentioned above, this feature will be back ported in 7.8.0 but I don't see that the case. Delete is still an |
This PR adds support for delete operations. Document ID's must be present for each document for this to work. A new bulk command factory has been added which will render a delete operation with the id and no document body in the bulk output.
@jbaiera As I mentioned above in my comment, we are using ElasticSearch 7.4.2 and it does not have delete feature support. I have created a PR against branch 7.4.2 to backport delete feature. Please review and let me know if I need to make any changes to code or process since this is my first commit here. #1492 |
This PR adds support for delete operations. Document ID's must be present for each document for this to work. A new bulk command factory has been added which will render a delete operation with the id and no document body in the bulk output.
@krish2code This backport fell between the cracks and ended up not making it in for the 7.8 release train. It should be available in 7.9. Sorry for the confusion! |
Hi, is there a documentation on how to implement this? |
AFAIK, when i search the code in this repository (on the callpath of DataFrame sink), Response response = execute(PUT, resource.bulk(), data); So far, i can tell delete operation is not yet provided on RDD nor DataFrame. Is it right ? |
Implement delete operation and tests. This is a duplicate from #1242 with testing.
Thank you for submitting a pull request!
Please make sure you have signed our Contributor License Agreement (CLA).
We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code.
You only need to sign the CLA once.