-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Use S3 node store with garage #3498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #3498 +/- ##
=======================================
Coverage 98.06% 98.06%
=======================================
Files 3 3
Lines 207 207
=======================================
Hits 203 203
Misses 4 4 ☔ View full report in Codecov by Sentry. |
Any reason why you didn't use SeaweedFS per what you said yesterday? |
install/bootstrap-garage.sh
Outdated
|
||
if [[ $($garage bucket list | tail -1 | awk '{print $1}') != 'nodestore' ]]; then | ||
node_id=$($garage status | tail -1 | awk '{print $1}') | ||
$garage layout assign -z dc1 -c 100G "$node_id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this 100G
be a variable somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should add a new GARAGE_STORAGE_SIZE
env var to .env
. That said not sure if that makes much sense as we would not honor any changes to that after the initial installation. Unless this actually reserves 100G, I think leaving it hard-coded to a "good enough" value and then documenting how to change this if needed would be a better option.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aldy505 added the env var regardless. Do you think 100G is a good size for the average self-hosted operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if 100G will be immediately allocated by Garage. If it's not (meaning the current storage space won't be modified) then I think it's good. If it's not, I think it's better to just allocate it to 25G space.
sentry/sentry.conf.example.py
Outdated
SENTRY_NODESTORE = "sentry_nodestore_s3.S3PassthroughDjangoNodeStorage" | ||
SENTRY_NODESTORE_OPTIONS = { | ||
"delete_through": True, | ||
"write_through": False, | ||
"read_through": True, | ||
"compression": False, # we have compression enabled in Garage itself | ||
"endpoint_url": "http://garage:3900", | ||
"bucket_path": "nodestore", | ||
"bucket_name": "nodestore", | ||
"retry_attempts": 3, | ||
"region_name": "garage", | ||
"aws_access_key_id": "<GARAGE_KEY_ID>", | ||
"aws_secret_access_key": "<GARAGE_SECRET_KEY>", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Docs) should we provide ways for the user to offload these to actual S3 or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something under the "experimental" part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably would be better if we put it on the Experimental -> External Storage page. Will backlog that.
Co-authored-by: Reinaldy Rafli <[email protected]>
Well I started with that and realized 3 things:
Garage fits the bill much better as it is explicitly created for smaller setups like this, easy to expand without specialized roles, doesn't have any paid thing in it, and has much more decent and familiar S3 interface support. |
when I tried seaweedfs last time (and I still use it for sourcemap/profile storage tbh) it had single node ability via
Some of them enabled by default. |
I think garage/minio simpler for small setups, seaweedfs looks necessary for mid to high setups because all other services I know keep files as is. And thousands of thousands small files like profiles not ideal to store on most popular filesystems i guess. |
@doc-sheet Hey, I'm going to work on this PR. I'd think seaweed is better for self-hosted Sentry. One thing I don't like about Garage is that we need to specify the storage allocation beforehand, if we set it to 100GB, there might be some people that have more data than 100GB, I don't want that to cause any issues. That said, since you said you've used seaweed before: How was your experience? How does it compare to MinIO or Ceph?
Yeah if we set up an object storage, we might as well move filestore & profiles there too. But let's focus on nodestore first. |
It is a bit strange sometimes. But it is fine. It has multiple options for filer store. At first I tried redis it worked for several months and then... I just lost all data. I don't know if issue was in redis or weed itself. i suspect bug with ttl could be the reason too. But after that incident I wiped cluster and started new one with scylla as a filer backend and it works fine for almost a year already despite that ttl bug. Seaweedfs have multiple versions like
I suggest to use large_disk always. Documentation is not clear but it is easy to reach that limit I don't know difference between full and normal and just use Also I don't use s3 auth - I was too lazy to set it up. Other than all that I have no problems and barely touched it after initial setup. It just works. As for minio and ceph. But minio was the reason to look for alternatives. Tons of profiles from js-sdk stored as different files started to affect my monitoring script and soon it might start to affect minio performance too. And it is not that easy to scale minio. And probably impossible to optimize for small-files storage. At least in my low-cost setup. |
If seaweedfs would control ttl then there is another catch. weed have it's own settings for collections and it creates collection for each s3-bucket. But if sentry itself would cleanup old data I guess there is no difference. |
Good to know about Seaweed
Ah so everyone has the same experience with minio.
The sentry cleanup job only cleans up the one on filesystem. If we're using S3, it won't clean up anything. We need to configure S3 data cleanup on our own. |
Looks like i missed that seaweedfs now have an ability to control ttl with s3 api. And I even linked to correct section of FAQ. :) I'd like to look into new integraton with seaweedfs. Nad by the way I like the idea of expanding sentry images. I am myself install some packages and modules. Like maybe an extra step in install to build user provided Dockerfiles. |
Yes but I don't think people would go to non-default setup if they don't need anything. |
Note
This patch may or may not make it to the main branch so please do not rely on this yet. You are, however, free to use it as a blueprint for your own, custom S3 or S3-like variations.
Enables S3 node store using Garage and sentry-nodestore-s3 by @stayallive
This should alleviate all the issues stemming from (ab)using PostgreSQL as the node store.