Skip to content

Fix issue 137: added Transform.CurrentRawRecord() for caller of omniparser to access the raw ingested record. #138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2021

Conversation

jf-tech
Copy link
Owner

@jf-tech jf-tech commented Jan 9, 2021

Fix #137

…iparser to access the raw ingested record.

See details in #137.
@jf-tech
Copy link
Owner Author

jf-tech commented Jan 9, 2021

FYI @DGollings

@codecov
Copy link

codecov bot commented Jan 9, 2021

Codecov Report

Merging #138 (413a598) into master (69749f5) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #138   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           43        43           
  Lines         1961      1976   +15     
=========================================
+ Hits          1961      1976   +15     
Impacted Files Coverage Δ
extensions/omniv21/ingester.go 100.00% <100.00%> (ø)
transform.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69749f5...413a598. Read the comment docs.

@jf-tech jf-tech merged commit 4aa56d6 into master Jan 9, 2021
@jf-tech jf-tech deleted the rawrecord branch January 9, 2021 20:15
@DGollings
Copy link

Cool, thanks. Two remarks:

Maybe instead of returning interface{} return say

type RawRecorder interface {
  UUIDv3() // (or Checksum() if you want to be more schema agnostic)
}

That way a user can get useful output without any casts or RTFM, if they want more they can always cast to actual type.

I kind of liked the output of

            "hash_json_input": { "custom_func": {
                "name": "javascript_with_context",
                "args": [ { "const": "_node" } ]
            }},

Seeing that say SG25/SG26/position had value "1" is much clearer than the concatenated fieldless output of rawRecord.Node.InnerText()

Had a look around, none of the simpel options (json.marshal and Sprintf("%+v") resulted in similar output. Could probably generate similar output by walking the tree but that's non-trivial code. Was wondering if you had a better solution.

Although this is not really an issue, I already store a checksum of the entire schema, so if schemaChecksum == storedChecksum && rawRecordChecksum == storedRecordChecksum I can safely assume that the schema can be used to debug an original input file. Even without knowing the exact input.
And in a sense, that's better, I can feed the transformed output to a retry function (fixing import errors), I can't feed {""SG25/SG26/position" : "1"} to anything. Well, not without making changes

For context, say fixing an incomplete address for delivery purposes.

rawRecord.UUIDv3() + transformed output is probably better, I just kind of liked how hash_json_input/_node looked :)

@jf-tech
Copy link
Owner Author

jf-tech commented Jan 10, 2021

  1. given we have vastly different implementations of schema handlers across different versions I don't want to force any constraints (e.g. an interface) onto what each schema handler wants to define as a raw record. It's highly version specific. So no.

  2. you can use idr.JSONify2(*idr.Node) to marshal an IDR tree into a mimified JSON blob. That's how javascript_with_context does it.

@DGollings
Copy link

  1. but it wouldn't enforce anything regarding implementation, constraints or definition of a raw record in any way? The only thing it would enforce would be the existence of a function called Hash() or Checksum(), which returns a string, and that string could be "not implemented"
  2. thanks :)

@jf-tech
Copy link
Owner Author

jf-tech commented Jan 10, 2021

After some thinking, I agree with 1, PR coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return unique hash for input
2 participants