Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

Clean reconstructed objects outside pack window #716

Merged
merged 1 commit into from
Jan 15, 2018

Conversation

jfontan
Copy link
Contributor

@jfontan jfontan commented Jan 11, 2018

Object walk reconstructs delta objects but these are not cleaned up
after they got out the pack window. Without this change all
reconstructed objects reside in memory.

restoreOriginal call is moved before calling Size(). Now we can not
guarantee that the object is already undeltified.

Object walk reconstructs delta objects but these are not cleaned up
after they got out the pack window. Without this change all
reconstructed objects reside in memory.

restoreOriginal call is moved before calling Size(). Now we can not
guarantee that the object is already undeltified.

Signed-off-by: Javi Fontan <[email protected]>
@@ -261,6 +267,16 @@ func (dw *deltaSelector) walk(
}

func (dw *deltaSelector) tryToDeltify(indexMap map[plumbing.Hash]*deltaIndex, base, target *ObjectToPack) error {
// Original object might not be present if we're reusing a delta, so we
// ensure it is restored.
if err := dw.restoreOriginal(target); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this making it actually slower than before (even if it uses way less memory)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just run borges pack with repository https://github.com/numpy/numpy (downloaded locally) and the results are quite close.

  • Original version:
    • 203.83user 12.34system 2:28.80elapsed 2.4 Gb RAM
    • 204.40user 12.27system 2:29.30elapsed 2.4 Gb RAM
  • With proposed changes:
    • 217.82user 11.97system 2:25.72elapsed 1 Gb RAM
    • 220.06user 11.81system 2:26.28elapsed 1 Gb RAM

It seems that it's a bit slower (user time is bigger) but it spends more system time. Maybe heap allocation.

@erizocosmico
Copy link
Contributor

erizocosmico commented Jan 11, 2018 via email

@mcuadros
Copy link
Contributor

Can we test this with more diverse repository? Not only with the numpy repository?

@jfontan
Copy link
Contributor Author

jfontan commented Jan 15, 2018

I've did new tests with the following repos:

The times and memory are only from the push action from local to a new repository. Each test is executed twice and the smaller value for time and memory is selected.

repository master fix memory
cangallo 145ms 3 MiB 151ms 2.6MiB
octoprint-tft 4s 32.6MiB 3.8s 33.2MiB
upsilon 54.2s 387MiB 53.4s 387MiB
numpy 1m25s 1.9GiB 1m19s 0.65GiB
tensorflow 3m16s 3.9GiB 2m59s 1.28GiB
bismuth 12m16s 17.7GiB 10m37s 3.13GiB

The code to do the benchmark is here: https://gist.github.com/jfontan/42fbfe0761e5280012d285c1a4290f5d

@erizocosmico
Copy link
Contributor

So it's actually faster? And uses half the memory? Nice, great job!

@mcuadros
Copy link
Contributor

Networking is envolved, so speed number aren't reliable

@jfontan
Copy link
Contributor Author

jfontan commented Jan 15, 2018

I've just updated the table with two small repos.

@mcuadros mcuadros merged commit 861e399 into src-d:master Jan 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants