Return unique hash for input

So i'm busy ingesting shipments, they arrive as either csv, json, xml or edi

The interface I'm working should take an array of shipments, divide that into individual shipments, hash those and store the original input for success/audit/retry/failure tracking reasons. This would make it easier to ingest 99/100 shipments and retry (after localizing and fixing the issue) that one shipment that's invalid for whatever reason.

In order to decide whether something has been ingested correctly I thought a solution could be hashing it 'unit' of input and storing the original input somewhere as well

Quite easy for csv

Weird python-and-bash-esque psuedocode:
```
for line in csv:
  process(line) && hash(line) && gzip(line) -> store result, hash, line in db
```
  
 It becomes less so for json and xml, even marshal and unmarshal is not 100% identical to the input
 
 Even worse is EDI
 
 So, even though I liked the idea of storing the _original_ it quickly becomse cumbersome. A decent alternative is is hashing and storing the output of transform.Read()
 
 But that comes with several issues
 - I can change the output and thus the hash using the schema (not really an issue)
 - its not original (but it is more consistent (all json)), so kind of bug/feature
 - I don't see what I haven't told omniparser to see, so new fields that might have been added
 
 None of these are a major issue, but part of hashing a new representation of the input, not the input itself
 
 I was wondering how hard would it be to hash the input of whatever generates the output would be?
 So:
 hash, data, err := transform.Read
 
 Is your internal data stable enough? That you could say 'for loop' the IDR input through the sha256 encoder (it supports streaming) and return a stable/unchanging hash?

As in, in theory ["a", "b", "c"] should return the same hash for a, b and c regardless of ordering

Also, I imagine being able to verify whether a file has been fully processed is interesting for more than one usecase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Return unique hash for input #137

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Return unique hash for input #137

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions