Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

jf-tech · 2022-09-19T05:04:11Z

Because we use encoding/csv.Reader.ReuseRecord, each call of csv.Reader.Read() return value []string is a cached slice. Given we need to potentially cache multiple lines thus multiple calls to Read(), what's in the linesBuf.record is always duplicate!!

We could fix this problem trivially by turning ReuseRecord off, but that would incur an allocation cost for vast majority of single-line csv operation. That is completely undesired.

So instead, we ourselves cache all the returned strings from (potentially multiple) calls to csv.Reader.Rdad() into reader.records []string slice, that managing that buffer ourselves, thus practically eliminate the over-allocation problem. Accordingly, in the reader.linesBuf, instead of having a record []string, we have recordStart and recordNum to reference into the reader.records.

…ementation Because we use `encoding/csv.Reader.ReuseRecord`, each call of `csv.Reader.Read()` return value `[]string` is a cached slice. Given we need to potentially cache multiple lines thus multiple calls to `Read()`, what's in the `linesBuf.record` is always duplicate!! We could fix this problem trivially by turning `ReuseRecord` off, but that would incur an allocation cost for vast majority of single-line csv operation. That is completely undesired. So instead, we ourselves cache all the returned strings from (potentially multiple) calls to `csv.Reader.Rdad()` into `reader.records []string` slice, that managing that buffer ourselves, thus practically eliminate the over-allocation problem. Accordingly, in the `reader.linesBuf`, instead of having a `record []string`, we have `recordStart` and `recordNum` to reference into the `reader.records`.

codecov · 2022-09-19T05:06:09Z

Codecov Report

Merging #179 (f450a84) into master (e9d3be8) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #179   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           52        53    +1     
  Lines         2971      3020   +49     
=========================================
+ Hits          2971      3020   +49

Impacted Files	Coverage Δ
extensions/omniv21/fileformat/flatfile/csv/decl.go	`100.00% <100.00%> (ø)`
...tensions/omniv21/fileformat/flatfile/csv/format.go	`100.00% <100.00%> (ø)`
...tensions/omniv21/fileformat/flatfile/csv/reader.go	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

jf-tech requested review from wangjia007bond and liangxibing September 19, 2022 05:04

jf-tech self-assigned this Sep 19, 2022

jf-tech merged commit d24ee6a into master Sep 19, 2022

jf-tech deleted the csv2f branch September 19, 2022 05:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

Uh oh!

jf-tech commented Sep 19, 2022

Uh oh!

codecov bot commented Sep 19, 2022

Uh oh!

Uh oh!

Uh oh!

Adding csv2 file format. Fixing a major bug in the csv2.reader implementation #179

Adding csv2 file format. Fixing a major bug in the csv2.reader implementation #179

Uh oh!

Conversation

jf-tech commented Sep 19, 2022

Uh oh!

codecov bot commented Sep 19, 2022

Codecov Report

Uh oh!

Uh oh!

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179