Skip to content
This repository was archived by the owner on Mar 27, 2024. It is now read-only.

feat: support emerge packages analyzer #337

Merged
merged 1 commit into from
Dec 21, 2020

Conversation

thehackercat
Copy link
Contributor

@thehackercat thehackercat commented Jul 26, 2020

Hi, we are using container-diff to diff our production images in every sdk release. Thanks that it helps a lot to us making our sdk image CHANGELOG.

But our images base on Gentoo/Linux, so most of packages are managed by emerge. According to Porage Uncyclo, packages installed by emerge all locate at /var/db/pkg/ with same directory foramt {package-group}/{package-name}-{package-version}, such as /var/db/pkg/app-crypt/gnupg-2.2.20

It isn't hard to get metadata of packages in /var/db/pkg/, like revision, size, build time etc. So following this make-your-own-differ suggestion, I add a emerge-differ.

It works properly on gentoo-based images. Here is one of outputs of our image diffed by emerge-differ

lexuslee@belba1 ~/tools/lexus-container-diff/container-diff $ ./container-diff diff image1 image2 --type=emerge

-----Emerge-----

Packages found only in image1:
NAME                      VERSION        SIZE
-sys-apps/gawk            5.0.1          4.4M
-sys-apps/iproute2        5.4.0          3.1M

Packages found only in image2:
NAME                          VERSION        SIZE
-dev-libs/librdkafka          1.4.0          2.2M
-dev-python/pbr               5.1.1          1.5M
-dev-python/setuptools        44.1.0         9.1M
-net-misc/rsync               3.1.3          977.7K
-sys-apps/kbd                 2.0.4          3.1M

Version differences:
PACKAGE                   IMAGE1 (image1)        IMAGE2 (image2)
-app-crypt/gnupg          2.2.20, 11.9M                                  2.2.19, 11.9M
-dev-libs/libtasn1        4.16.0, 340.9K                                 4.13, 344.5K
-net-libs/gnutls          3.6.14, 2.9M                                   3.6.13, 2.9M
-net-misc/curl            7.71.0, 3.8M                                   7.69.1, 3.7M
-sys-apps/groff           1.22.4, 14.7M                                  1.22.3, 13.7M
-sys-apps/sandbox         2.18, 619.6K                                   2.13, 605.3K
-sys-apps/sed             4.8, 1.1M                                      4.7, 1M
-sys-fs/ncdu              1.14.2, 112.7K                                 1.14.1, 117K

@nkubala hope this will help, any comments?

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@thehackercat
Copy link
Contributor Author

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@thehackercat thehackercat force-pushed the emerge-differ branch 4 times, most recently from 4be029c to d886a6c Compare July 26, 2020 06:30
@thehackercat
Copy link
Contributor Author

Hi, this PR has been widely tested and used in our production cluster. Hope someone could review this PR and merge it.

If any concern, please let me know, thanks : )

}

func getPkgSize(pkgPath string) int64 {
var sizeFile *os.File

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I am missing something, but can this just use os.Stat to get the file size? https://gist.github.com/miguelmota/08bca5febf2ce460a289ef14c7e9af4f

If not, maybe a comment that says

// getPkgSize does X because it's special

Copy link
Contributor Author

@thehackercat thehackercat Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this SIZE file and created by emerge, since emerge will count the total size of a package and store it as a SIZE file in packages metadata directory.

So getPkgSize reads the file to get the size of a packge.

And it's quite complex using os.Stat to get the file size of a package, because all dependency files are scattered. For example, if installing dev-python/mysqlclient-3.0.0, it will generate a link file /usr/lib/_mysql.so, client codes in /usr/lib64/python3.7/site-packages/mysqlclient/ and python egg file in /usr/lib64/python3.7/ . Obviously, it's not easy to get size of all these files.

fileBody, err := ioutil.ReadAll(sizeFile)
if err != nil {
logrus.Debugf("unable to read SIZE for pkg %s", pkgPath)
return 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Masking the error seems weird here. I think err should be returned rather than an inaccurate size given.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, I will fix this.

@thehackercat thehackercat force-pushed the emerge-differ branch 2 times, most recently from 8a3c8f1 to b41bcb2 Compare September 3, 2020 02:15
@thehackercat
Copy link
Contributor Author

thehackercat commented Oct 10, 2020

@tstromberg Updated, any other comments on this PR?

It would be appreciated if you could take a look, thanks.

Copy link
Contributor

@nkubala nkubala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @thehackercat, i'm almost embarrassed to be getting back to this PR so late 😬 our team doesn't work much on this project anymore so it's sometimes hard to find time to review PRs, but we really appreciate the contribution!

i've left several comments on here mostly around readability of the code, but the logic of this looks good. i'm going to merge this and then address a few of the changes in a follow up PR myself, and then i'll cut a v1.16.0 release so this will get shipped and you can use it in an official release!

thanks again for the contribution, and again very sorry we left you hanging for so long. we rely on contributors like you to keep this project going, so i just want to personally say thank you and i really appreciate it 🙏

@@ -0,0 +1,123 @@
/*
Copyright 2018 Google, Inc. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be 2020. i'll merge this and fix myself, so we don't have to go back and forth.

Comment on lines +34 to +35
type EmergeAnalyzer struct {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
type EmergeAnalyzer struct {
}
type EmergeAnalyzer struct {}

"github.com/sirupsen/logrus"
)

//Emerge package database location
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
//Emerge package database location
// Emerge package database location

Comment on lines +53 to +54
var path string
path = image.FSPath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in go, it's more idiomatic to say:

Suggested change
var path string
path = image.FSPath
path := image.FSPath

Comment on lines +55 to +60
switch path {
case "":
path = emergePkgFile
default:
path = filepath.Join(path, emergePkgFile)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this switch statement is really a two-case if statement in disguise, so it decreases the readability. i think it's easier to read if you just say:

Suggested change
switch path {
case "":
path = emergePkgFile
default:
path = filepath.Join(path, emergePkgFile)
}
if path == "" {
path = emergePkgFile
} else {
path = filepath.Join(path, emergePkgFile)
}

return packages, err
}

for i := 0; i < len(contents); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in go, you can iterate directly over the items in a slice:

Suggested change
for i := 0; i < len(contents); i++ {
for _, content := range contents {

@nkubala nkubala merged commit c432e42 into GoogleContainerTools:master Dec 21, 2020
@thehackercat thehackercat deleted the emerge-differ branch December 22, 2020 02:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants