Skip to content

Adding csv2 file format. Fixing a major bug in the csv2.reader implementation #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2022

Conversation

jf-tech
Copy link
Owner

@jf-tech jf-tech commented Sep 19, 2022

Because we use encoding/csv.Reader.ReuseRecord, each call of csv.Reader.Read() return value []string is a cached slice. Given we need to potentially cache multiple lines thus multiple calls to Read(), what's in the linesBuf.record is always duplicate!!

We could fix this problem trivially by turning ReuseRecord off, but that would incur an allocation cost for vast majority of single-line csv operation. That is completely undesired.

So instead, we ourselves cache all the returned strings from (potentially multiple) calls to csv.Reader.Rdad() into reader.records []string slice, that managing that buffer ourselves, thus practically eliminate the over-allocation problem. Accordingly, in the reader.linesBuf, instead of having a record []string, we have recordStart and recordNum to reference into the reader.records.

…ementation

Because we use `encoding/csv.Reader.ReuseRecord`, each call of `csv.Reader.Read()` return value `[]string`
is a cached slice. Given we need to potentially cache multiple lines thus multiple calls to `Read()`,
what's in the `linesBuf.record` is always duplicate!!

We could fix this problem trivially by turning `ReuseRecord` off, but that would incur an allocation
cost for vast majority of single-line csv operation. That is completely undesired.

So instead, we ourselves cache all the returned strings from (potentially multiple) calls to
`csv.Reader.Rdad()` into `reader.records []string` slice, that managing that buffer ourselves, thus
practically eliminate the over-allocation problem. Accordingly, in the `reader.linesBuf`, instead of
having a `record []string`, we have `recordStart` and `recordNum` to reference into the `reader.records`.
@jf-tech jf-tech self-assigned this Sep 19, 2022
@codecov
Copy link

codecov bot commented Sep 19, 2022

Codecov Report

Merging #179 (f450a84) into master (e9d3be8) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #179   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           52        53    +1     
  Lines         2971      3020   +49     
=========================================
+ Hits          2971      3020   +49     
Impacted Files Coverage Δ
extensions/omniv21/fileformat/flatfile/csv/decl.go 100.00% <100.00%> (ø)
...tensions/omniv21/fileformat/flatfile/csv/format.go 100.00% <100.00%> (ø)
...tensions/omniv21/fileformat/flatfile/csv/reader.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jf-tech jf-tech merged commit d24ee6a into master Sep 19, 2022
@jf-tech jf-tech deleted the csv2f branch September 19, 2022 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant