Skip to content

telemetry: report segment events asynchronously #1222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 28, 2023
Merged

telemetry: report segment events asynchronously #1222

merged 3 commits into from
Jun 28, 2023

Conversation

gcurtis
Copy link
Collaborator

@gcurtis gcurtis commented Jun 27, 2023

Instead of blocking the process from exiting, buffer Segment events to disk and start another process to upload them. With this change all telemetry should be non-blocking.

Most of the Segment event logic now lives in the telemetry package instead of the cobra middleware. A bunch of the unused logic around tracking CLI SSH sessions has also been removed. The telemetry package API now looks something like:

type EventName int

const (
	EventCommandSuccess EventName = iota
	EventShellInteractive
	EventShellReady
)

func Error(err error, meta Metadata)
func Event(e EventName, meta Metadata)
func Start()
func Stop()
func Upload()

Fixes #1003.

@gcurtis gcurtis requested review from savil and ipince June 27, 2023 01:54
Instead of blocking the process from exiting, buffer Segment events to
disk and start another process to upload them. With this change all
telemetry should be non-blocking.

Most of the Segment event logic now lives in the telemetry package
instead of the cobra middleware. The telemetry package API now looks
something like:

```go
type EventName int

const (
	EventCommandSuccess EventName = iota
	EventShellInteractive
	EventShellReady
)

func Error(err error, meta Metadata)
func Event(e EventName, meta Metadata)
func Start()
func Stop()
func Upload()
```

Fixes #1003.
Copy link
Collaborator

@savil savil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

@ibuildthecloud
Copy link
Contributor

FWIW, this completely breaks in my env :)

I wanted to test this because the telemetry is completely killing me. You never knew how much a couple 100ms can drive you nuts.

Anyhow.... This hangs forever

 cd -
/home/darren/src/oss/devbox
direnv: loading ~/src/oss/devbox/.envrc
direnv: using devbox
direnv: ([/home/linuxbrew/.linuxbrew/Cellar/direnv/2.32.3/bin/direnv export bash]) is taking a while to execute. Use CTRL-C to give up.

I'm debugging why. It's odd because if I run things manually it seems all happy. To be continued....

@ibuildthecloud
Copy link
Contributor

This seems suspicious... I have a bunch of /home/darren/src/oss/devbox/dist/devbox bug processes laying around. 🤔

@ibuildthecloud
Copy link
Contributor

And this commit definitely breaks it. I just tested if this is a just a regression else so I rebased on main.

@ibuildthecloud
Copy link
Contributor

Ok, it's definitely that. The devbox bug command that is getting spawned is blocking. I'll let you debug why, I think just doing any direnv will reproduce this. If I have to guess why this is a happening I would say that the devbox bug process is stuck on reading stdin or writing stdout/stderr since it's being ran as a subshell in direnv and direnv probably blocks all I/O interaction from the shell.

The background process that uploads telemetry was hanging when the
telemetry keys weren't set. This was due to a goroutine returning
before calling `WaitGroup.Done`. Add a defer for the `WaitGroup.Done`
call and add a new goroutine that calls `os.Exit` if the process
doesn't exit after 5 seconds.
@gcurtis
Copy link
Collaborator Author

gcurtis commented Jun 28, 2023

@ibuildthecloud thanks for catching that. It turns out this was happening because a goroutine was returning without calling WaitGroup.Done when the telemetry key wasn't set (which usually only happens in dev builds). I fixed that bug and added a fallback that nukes the process after 5 seconds to make sure we don't leave processes laying around.

@ibuildthecloud
Copy link
Contributor

Cool this works now (sorta/maybe). I just now realized that dev build probably don't have telemetry enabled so I don't really know if this is now faster.

@gcurtis gcurtis merged commit 80a444d into main Jun 28, 2023
@gcurtis gcurtis deleted the gcurtis/telem branch June 28, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: telemetry makes the cli tool very slow
3 participants