-
-
Notifications
You must be signed in to change notification settings - Fork 218
[feature] Base setup for WHOIS model #1032 #1033 #1037 #1045 #1054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gsoc25-whois
Are you sure you want to change the base?
Conversation
f20b597
to
5ea2bbe
Compare
5ea2bbe
to
e2fd7fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress.
Please start creating a dedicated docs page for this feature, shouldn't be long, just add basic info for now.
I don't see a way to turn off this feature, we need a setting to turn this off, I recommend also adding a fallback field at organization level to turn it on selectively for specific organizations, see https://openwisp.io/docs/24.11/utils/developer/custom-fields.html#openwisp-utils-fields-fallbackbooleanchoicefield.
if ( | ||
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip) | ||
and self.last_ip | ||
and ip_address(self.last_ip).is_global |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could move all this logic to a private method, eg: _needs_whois_update()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the last meeting, should we create separate folders, classes, and functions for the WHOIS functionality to maintain modularity in the OpenWISP codebase?
We could introduce a new class that takes the device instance as input, where methods like _needs_whois_update
, trigger_whois_lookup
, and the WHOIS-related Celery tasks would be defined. Going forward, we would only update this class, and simply call it from the current context.
What are your thoughts on this approach, @DragnEmperor @pandafy @devkapilbansal @nemesifier ?
openwisp_controller/config/tasks.py
Outdated
@@ -157,3 +158,46 @@ def invalidate_device_checksum_view_cache(organization_id): | |||
Device.objects.filter(organization_id=organization_id).only('id').iterator() | |||
): | |||
DeviceChecksumView.invalidate_get_device_cache(device) | |||
|
|||
|
|||
@shared_task(soft_time_limit=7200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the default OpenWISP celery task settings?
@shared_task(soft_time_limit=7200) | |
@shared_task(base=OpenwispCeleryTask) |
I don't see reasons for which this task should take 7200 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DragnEmperor
Do we know how much time if takes to fetch data once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kapil,
The API responds within few milliseconds : https://support.maxmind.com/hc/en-us/articles/26441247274011-Latency-and-Uptime-for-the-GeoIP-Web-Services
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip) | ||
and self.last_ip | ||
and ip_address(self.last_ip).is_global |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also re-order the conditions, in and operations, if first is false then other statements are not evaluated
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip) | |
and self.last_ip | |
and ip_address(self.last_ip).is_global | |
self.last_ip | |
and (not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip) | |
and ip_address(self.last_ip).is_global |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in the weekly meeting, I proposed the following:
- if last_ip is not falsy
- and is a public IP
- and:
- the device doesn't have whois yet
- or the device has whois but the ip is changing
Hopefully I am not forgetting anything here.
openwisp_controller/config/tasks.py
Outdated
@@ -157,3 +158,46 @@ def invalidate_device_checksum_view_cache(organization_id): | |||
Device.objects.filter(organization_id=organization_id).only('id').iterator() | |||
): | |||
DeviceChecksumView.invalidate_get_device_cache(device) | |||
|
|||
|
|||
@shared_task(soft_time_limit=7200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DragnEmperor
Do we know how much time if takes to fetch data once?
for a device. | ||
""" | ||
|
||
device = models.OneToOneField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason behind this One to One mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the initial discussions in April, we came to a conclusion that adding whois record per device would reduce bloating which might occur in case we map to ip_address. So this was done in accordance to it.
As per today's discussion and more research regarding the rate limits on webservice, mapping to ip_address adds more benefit. I still want to look into it a bit more, if there can be some cases we might be missing before doing this change so as to have a clear picture. Adding a link to the discussion : #1032
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important point to check for switching to using the ip address as primary key for this model is how to make sure we don't leave orphan data in the DB when devices are deleted. Worst case scenario we need to handle this with a post_delete
signal receiver function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated gsoc25-whois to keep it up to date (let's do that as often as possible please).
Could you run openwisp-qa-format
again and merge with the latest version of the target branch?
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip) | ||
and self.last_ip | ||
and ip_address(self.last_ip).is_global |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in the weekly meeting, I proposed the following:
- if last_ip is not falsy
- and is a public IP
- and:
- the device doesn't have whois yet
- or the device has whois but the ip is changing
Hopefully I am not forgetting anything here.
for a device. | ||
""" | ||
|
||
device = models.OneToOneField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important point to check for switching to using the ip address as primary key for this model is how to make sure we don't leave orphan data in the DB when devices are deleted. Worst case scenario we need to handle this with a post_delete
signal receiver function.
help_text=_( | ||
'indicates the IP address logged from ' | ||
'the last request coming from the device' | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get rid of this help text as we won't show this in the admin I believe as it's going to be redundant with last_ip
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add basic validation which raises a ValidationError if someone tries to save this with a private IP.
mock_response.traits.autonomous_system_number = 15169 | ||
mock_response.traits.network = '172.217.22.0/24' | ||
mock_response.location.time_zone = 'America/Los_Angeles' | ||
mock_client.return_value.city.return_value = mock_response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If more convenient we could use https://github.com/kevin1024/vcrpy for mocking API responses.
It would make test if we need to do this over and over.
# Triggering the whois lookup for new devices and old devices based | ||
# on last_ip is changed or not | ||
self.trigger_whois_lookup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment would be better placed in the function’s docstring. Additionally, we can make it configurable and trigger it only when the feature is enabled.
if ( | ||
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip) | ||
and self.last_ip | ||
and ip_address(self.last_ip).is_global |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the last meeting, should we create separate folders, classes, and functions for the WHOIS functionality to maintain modularity in the OpenWISP codebase?
We could introduce a new class that takes the device instance as input, where methods like _needs_whois_update
, trigger_whois_lookup
, and the WHOIS-related Celery tasks would be defined. Going forward, we would only update this class, and simply call it from the current context.
What are your thoughts on this approach, @DragnEmperor @pandafy @devkapilbansal @nemesifier ?
GEOIP_ACCOUNT_ID = get_setting('GEOIP_ACCOUNT_ID', None) | ||
GEOIP_LICENSE_KEY = get_setting('GEOIP_LICENSE_KEY', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add these settings in docs and steps to get these values.
openwisp_controller/config/tasks.py
Outdated
address = ', '.join( | ||
[ | ||
data.city.name, | ||
data.country.name, | ||
data.continent.name, | ||
str(data.postal.code), | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we concatenate all of this into a single field, or store each component in separate fields? If adding more fields isn’t ideal, we could consider using a jsonb field instead.
By the way, what would be the maximum length after concatenation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, storing the address as JSONField is also an option. Gives the flexibility to construct complete address as per requirement. We can create a helper to provide a standard constructed address as above.
The max length is : 130 (Country : 44 city : 64 continent : 14 postal : 8). I queried the CSV version of Maxmind's DB for this.
openwisp_controller/config/tasks.py
Outdated
except requests.RequestException as e: | ||
logger.error(f'Error fetching WHOIS details for {ip}: {e}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s monitor for request exceptions. If it’s a rate limit issue, we can retry the request after a delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s monitor for request exceptions. If it’s a rate limit issue, we can retry the request after a delay.
It depends if the rate limiting is daily or for bursts.
Added WHOIS model with the required fields. Implemented a new celery task to fetch WHOIS details using the `geoip2` web service. In order to trigger the task whenever `last_ip` changes, using existing logic of `_changed_fields` to track the changes. Closes #1032 Closes #1033 Signed-off-by: DragnEmperor <[email protected]>
Added tests for checking if `fetch_whois_details` task is called properly or not, and for checking creation of WHOIS record for a device if the last_ip recorded is public. As now whois record is also fetched whenever fetching a device, there is an increase in query count in some of the tests. Closes #1045 Signed-off-by: DragnEmperor <[email protected]>
Closes #1037 Signed-off-by: DragnEmperor <[email protected]>
e2fd7fc
to
e4a6972
Compare
@@ -335,7 +337,7 @@ def _get_initial_values_for_checked_fields(self): | |||
if not present_values: | |||
return | |||
self.refresh_from_db(fields=present_values.keys()) | |||
for field in self._changed_checked_fields: | |||
for field in present_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to add a comment here, will add it in next commit :
Let me know if I am wrong, but we are adding fields in present_values
based on certain conditions just above, so shouldn't we loop in present_values
instead of _changed_checked_fields
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure on this but it should work
Using JSONField for address gives the flexibility to format address as per convenience. Added fallbacks for typical address fields in `fetch_whois_details`. Updated docs for WhoIs feature with steps to obtain and setup the required Credentials. Signed-off-by: DragnEmperor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have few doubts that I asked in below comments.
docs/user/whois.rst
Outdated
Setup | ||
----- | ||
|
||
Ensure the your project ``settings.py`` contains the variables as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure the your project ``settings.py`` contains the variables as follows: | |
Ensure that your project ``settings.py`` contains the variables as follows: |
docs/user/whois.rst
Outdated
|
||
.. code-block:: python | ||
|
||
GEOIP_ACCOUNT_ID = os.getenv("GEOIP_ACCOUNT_ID", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be OPENWISP_CONTROLLER_GEOIP_ACCOUNT_ID
?
GEOIP_ACCOUNT_ID = os.getenv("GEOIP_ACCOUNT_ID", "") | |
OPENWISP_CONTROLLER_GEOIP_ACCOUNT_ID = os.getenv("OPENWISP_CONTROLLER_GEOIP_ACCOUNT_ID", "") |
Same for below as well
docs/user/whois.rst
Outdated
- Generate a New license Key. Name it whatever you like .. image:: | ||
https://github.com/user-attachments/assets/57df27bc-4f9d-4701-88bf-91e6b715e4a6 | ||
- Copy the *Account Id* and *License Key* and Paste it in the environment | ||
variables: **GEOIP_ACCOUNT_ID** and **GEOIP_LICENSE_KEY**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variables: **GEOIP_ACCOUNT_ID** and **GEOIP_LICENSE_KEY**. | |
variables: **OPENWISP_CONTROLLER_GEOIP_ACCOUNT_ID** and **OPENWISP_CONTROLLER_GEOIP_LICENSE_KEY**. |
return ( | ||
getattr(org_settings, "whois_enabled", app_settings.WHOIS_ENABLED) | ||
and new_ip | ||
and ip_address(new_ip).is_global | ||
and (not hasattr(self, "whois_info") or old_ip != new_ip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make calls only when all other conditions are met
return ( | |
getattr(org_settings, "whois_enabled", app_settings.WHOIS_ENABLED) | |
and new_ip | |
and ip_address(new_ip).is_global | |
and (not hasattr(self, "whois_info") or old_ip != new_ip) | |
return ( | |
getattr(org_settings, "whois_enabled", app_settings.WHOIS_ENABLED) | |
and new_ip | |
and (not hasattr(self, "whois_info") or old_ip != new_ip) | |
and ip_address(new_ip).is_global |
openwisp_controller/config/tasks.py
Outdated
# the actual delay is calculated as a random value between 0 and this value. | ||
# https://docs.celeryq.dev/en/latest/userguide/tasks.html#Task.autoretry_for | ||
retry_backoff = 10 | ||
max_retries = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we keep this configurable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found something that we are using to define these options for zerotier tasks:
API_TASK_RETRY_OPTIONS = get_setting(
"API_TASK_RETRY_OPTIONS",
dict(max_retries=5, retry_backoff=True, retry_backoff_max=600, retry_jitter=True),
)
Is it fine if I use this or are we looking for creating these settings dedicated for WhoIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can use it.
@codesankalp What are your opinions?
openwisp_controller/config/tasks.py
Outdated
# Check if WhoIs lookup is enabled for the organization | ||
org_settings = device._get_organization__config_settings() | ||
if not getattr(org_settings, "whois_enabled", app_settings.WHOIS_ENABLED): | ||
logger.info( | ||
f"WhoIs lookup is disabled for organization {device.organization_id}." | ||
) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be the first check? Also, I doubt it this check will ever be executed since we already have this condition in _need_whois_lookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need device in order to fetch organization so thats why it is after device fetch.
Yes, this function is currently executed only via trigger_whois_lookup
in device model, but imo its better to add the checks here as well, its a personal preference.
Let me know if these checks needs to be removed (including the device exists check, since that will also be guaranteed to exist as we are running the task via transaction.on_commit
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devkapilbansal i have updated the above comment, please take a look at your convenience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo its better to add the checks here as well
Why do you think so? Like, what benefits it may serve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo its better to add the checks here as well
Why do you think so? Like, what benefits it may serve?
I think keeping the critical checks (whois enabled or not in above), is better for cases like the task gets delayed and in between its next execution the feature gets disabled or device gets deleted then we will have these checks to ensure the task returns before fetching data.
Again, it is a personal preference and I can change it if these checks seem redundant.
Let me know your thoughts!
Edit: Now that we are going to map to ip_address, we will not require the device check and thus these checks seem redundant, but again let me know if the above case is valid or not.
openwisp_controller/config/tasks.py
Outdated
return | ||
|
||
try: | ||
# 'geolite.info' host is used for GeoLite2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reasons for the same? Comments should basically focus on why instead of what
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add the link to docs here, something like this:
For reference : https://geoip2.readthedocs.io/en/latest/#sync-web-service-example
docs/user/whois.rst
Outdated
- Create Maxmind account using the following link: `Create Account | ||
<https://www.maxmind.com/en/geolite2/signup?utm_source=kb&utm_medium=kb-link&utm_campaign=kb-create-account>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove the query params:
- Create Maxmind account using the following link: `Create Account | |
<https://www.maxmind.com/en/geolite2/signup?utm_source=kb&utm_medium=kb-link&utm_campaign=kb-create-account>`_. | |
- Create Maxmind account using the following link: `Create Account | |
<https://www.maxmind.com/en/geolite2/signup`_. |
To reduce data redundancy for duplicate IPs, WhoIs model now uses IP address as primary key and devices are mapped to it via manual lookups. Updated the test cases to reflect the same. Formatting changes in WhoIs doc page and settings variables. Signed-off-by: DragnEmperor <[email protected]>
To improve modularity and flexibility, all WhoIs related code is kept under `whois` subdirectory under `config` module. This includes tests, tasks, on delete handlers as well. Added `cache` to prevent repeated lookups during multiple WhoIs details fetch during device listing. Signed-off-by: DragnEmperor <[email protected]>
@property | ||
def whois_info(self): | ||
""" | ||
Used as a shortcut to get WhoIsInfo | ||
""" | ||
service = WhoIsService(self) | ||
return service.get_whois_info() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s better to initialize the service using a cached_property
. This way, you can access it directly from the device with device.who_is_service
and call all its methods as needed from other places
@property | |
def whois_info(self): | |
""" | |
Used as a shortcut to get WhoIsInfo | |
""" | |
service = WhoIsService(self) | |
return service.get_whois_info() | |
from functools import cached_property | |
@cached_property | |
def who_is_service(self): | |
return WhoIsService(self) |
service = WhoIsService(self) | ||
service.trigger_whois_lookup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service = WhoIsService(self) | |
service.trigger_whois_lookup() | |
self.who_is_service.trigger_whois_lookup() |
openwisp_controller/config/utils.py
Outdated
elif not getattr(device, "whois_info", None): | ||
service = WhoIsService(device) | ||
service.trigger_whois_lookup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif not getattr(device, "whois_info", None): | |
service = WhoIsService(device) | |
service.trigger_whois_lookup() | |
elif not device.who_is_service.get_whois_info(): | |
device.who_is_service.trigger_whois_lookup() |
# This file is created to allow test runners to discover the tests | ||
# defined in this directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add this comment
# If other active devices are linked to it, then new lookup will | ||
# be triggered for them. | ||
def device_whois_info_delete_handler(instance, **kwargs): | ||
transaction.on_commit(lambda: WhoIsService.delete_whois_record(instance.last_ip)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be sync, or can it be async?
asking because it is adding some time in device save operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make this function as a separate task to make the deletion non-blocking.
I was also using the same function in fetch_whois_details
task, so there won't be a problem as I can call it directly without .delay
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should keep it async
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay!
ip_client = geoip2_webservice.Client( | ||
account_id=app_settings.GEOIP_ACCOUNT_ID, | ||
license_key=app_settings.GEOIP_LICENSE_KEY, | ||
host="geolite.info", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this client can be a separate method in service.
@@ -0,0 +1,103 @@ | |||
from ipaddress import ip_address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file name can be service.py
ip = self.device.last_ip | ||
if not ip or not ip_address(ip).is_global: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check can be on top to avoid extra db query.
|
||
def __init__(self, device): | ||
self.device = device | ||
self.org_settings = device._get_organization__config_settings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only load the configuration when it’s actually needed. you can load it in respective method.
""" | ||
whois_info = self.get_whois_info() | ||
return ( | ||
getattr(self.org_settings, "whois_enabled", WHOIS_ENABLED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be in the last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sankalp,
Why it should be the last? I believe that third party calls should come at last so that we can prevent calling them in case other conditions are not met
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this line, it uses the ipaddress package to confirm the IP isn’t private.
Since we already fetched the WhoIs info earlier (which might involve an external lookup), we can just check if the IP is private here and return immediately if it is. This avoids any extra checks.
Also, reviewing both functions again, I notice there’s duplicated logic — this could be refactored - @DragnEmperor
@@ -335,7 +337,7 @@ def _get_initial_values_for_checked_fields(self): | |||
if not present_values: | |||
return | |||
self.refresh_from_db(fields=present_values.keys()) | |||
for field in self._changed_checked_fields: | |||
for field in present_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure on this but it should work
if ip_address(self.ip_address).is_private: | ||
raise ValidationError( | ||
_("WhoIs information cannot be created for private IP addresses.") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this condition required? We are only creating WhoIs Info if IP is not private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to add a validation check in model itself : #1054 (comment)
@@ -181,7 +181,7 @@ def test_get_cached_checksum(self): | |||
url = reverse("controller:device_checksum", args=[d.pk]) | |||
|
|||
with self.subTest("first request does not return value from cache"): | |||
with self.assertNumQueries(4): | |||
with self.assertNumQueries(3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might have to check but do you know why the number of queries have been reduced here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reduced count was due to prefetched organization settings when device was retrieved.
# If other active devices are linked to it, then new lookup will | ||
# be triggered for them. | ||
def device_whois_info_delete_handler(instance, **kwargs): | ||
transaction.on_commit(lambda: WhoIsService.delete_whois_record(instance.last_ip)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should keep it async
# slots are used to define a fixed set of attributes for the class, | ||
# which can improve performance by avoiding the overhead of a dynamic dictionary. | ||
# Particularly useful in scenarios where many instances of the class are created. | ||
# Reference: https://wiki.python.org/moin/UsingSlots | ||
__slots__ = ("device", "org_settings") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are creating a cached property, do we need to define slots? I am not sure of their relevance here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially there was no cached property here, but working on the changes I found one property 'is_whois_enabled' (will be added in the new commit) which we can cache, so removing slots for the time being.
Thanks!
|
||
whois_info = WhoIsInfo.objects.filter(ip_address=ip).first() | ||
if whois_info: | ||
cache.set(key, whois_info, timeout=60 * 60 * 24) # Cache for 24 hours |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this
# The function is staticmethod because it does not depend on the instance | ||
# of the class and deletes the WhoIs record for a given IP address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not necessary here
# The function is staticmethod because it does not depend on the instance | ||
# of the class and deletes the WhoIs record for a given IP address. | ||
@staticmethod | ||
def delete_whois_record(ip_address): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we using this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When deleting record along with device in device_whois_info_delete_handler
and in fetch_whois_details
when deleting old whois record (refering to commit : 0b1bef4)
device_pk = kwargs.get("device_pk") | ||
new_ip_address = kwargs.get("new_ip_address") | ||
device = Device.objects.get(pk=device_pk) | ||
# Notify the user about the failure via web notification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment not needed
Moved all tasks to service file to keep one source. Refactored the way whois_info is fetcehd for a device. Modified the order of whois checks to keep expensive checks at the bottom. As in views like `DeviceChecksumView` whole device instance is being cached, now org settings will be fetched from db rather than through device to maintain consistency. This has led to increase in queries in some test cases. Signed-off-by: DragnEmperor <[email protected]>
Org config settings are now cached to ensure DeviceChecksumview is not degraded. This will be invalidated on org settings update/delete which are rare. Refactored code and tests for readability. Signed-off-by: DragnEmperor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick reqview on he caching of OrganizationConfigSettings and shared few suggestions. I haven't performed manual testing.
# Remove the related WhoIs record for that ip from db and cache | ||
# If other active devices are linked to it, then new lookup will | ||
# be triggered for them. | ||
def device_who_is_info_delete_handler(instance, **kwargs): | ||
transaction.on_commit( | ||
lambda: WhoIsService.delete_who_is_record.delay(instance.last_ip) | ||
) | ||
|
||
|
||
def invalidate_org_settings_cache(instance, **kwargs): | ||
org_pk = instance.organization_id | ||
cache.delete(WhoIsService.ORG_SETTINGS_CACHE_KEY.format(org_pk=org_pk)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these to class methods in the Whois model? In general, we want to keep all business logic in the models.
org_settings = OrganizationConfigSettings.objects.get( | ||
organization=Subquery( | ||
Device.objects.filter(pk=self.device.pk).values( | ||
"organization_id" | ||
)[:1] | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query seems over-complicated to me.
We already have the required org_pk
. Why can't we do this?
OrganizationConfiguSettings.objects.get(organization_id=org_pk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I had a case in mind what would happen if device's org is changed, so i did this to ensure latest value is fetched. I missed that we are invalidating the cache on device save :
post_save.connect( |
It was my mistake, I will remove this subquery. Thanks!
organization=Subquery( | ||
Device.objects.filter(pk=device_pk).values("organization_id")[:1] | ||
org_pk = self.device.organization.pk | ||
org_settings = cache.get(self.ORG_SETTINGS_CACHE_KEY.format(org_pk=org_pk)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For keeping it DRY (Don't Repeat Yourself), let's create a static method get_cache_key(org_id)
which would return the cache key for the organization.
# device_pk is used when task fails to report for which device failure occurred | ||
@shared_task( | ||
bind=True, | ||
base=WhoIsCeleryRetryTask, | ||
**app_settings.API_TASK_RETRY_OPTIONS, | ||
) | ||
def fetch_who_is_details(self, device_pk, initial_ip_address, new_ip_address): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is celery worker picking up this task? AFAIK, it only registers task which are present in tasks,py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c24fec3
to
5f91215
Compare
Migrated all receivers related to WhoIs to the model class. Includes minor refactoring as well. Signed-off-by: DragnEmperor <[email protected]>
5f91215
to
5539474
Compare
The imports are restructured to avoid app registry errors and aid whois task discovery. Signed-off-by: DragnEmperor <[email protected]>
10332b3
to
59f97bd
Compare
There was an issue that the task was triggered for a device whose last_ip is updated but without checking if we already have WhoIs for the latest ip. Have added a check to return from the task if that is the case. Signed-off-by: DragnEmperor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DragnEmperor – I followed the docs and ran this locally; it worked perfectly 👏.
I think we can deploy this on staging environment.
Just a few minor suggestions below 👇
elif not device.who_is_service.get_device_who_is_info(): | ||
device.who_is_service.trigger_who_is_lookup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do this when flag is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @codesankalp ,we are talking about the whois_enabled flag, right? The trigger_whois_lookup task has the conditions to check for all these.
return ( | ||
self.is_valid_public_ip_address(new_ip) | ||
and (initial_ip != new_ip or not self._get_who_is_info_from_db(new_ip)) | ||
and self.is_who_is_enabled | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic is complex, i think we can separate it into multiple if statement for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have split it into multiple lines as:
def _need_who_is_lookup(self, initial_ip, new_ip):
"""
This is used to determine if the WhoIs lookup should be triggered
when the device is saved.
The lookup is not triggered if:
- The new IP address is None or it is a private IP address.
- The WhoIs information of new ip is already present.
- WhoIs is disabled in the organization settings. (query from db)
"""
if not self.is_valid_public_ip_address(new_ip):
return False
if self._get_who_is_info_from_db(new_ip):
return False
return self.is_who_is_enabled
I modified the second conditions as I found that we need not to check if ip is updated or not, checking if it already exists in db should be enough.
WhoIsInfo = load_model("config", "WhoIsInfo") | ||
|
||
try: | ||
ip_client = WhoIsService._get_geoip2_client() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client initialization can be outside of this block
The checks in `_need_who_is_lookup` are split for better readability Signed-off-by: DragnEmperor <[email protected]>
07c94fb
to
18f67d6
Compare
WhoIsInfo.objects.create( | ||
organization_name=data.traits.autonomous_system_organization, | ||
asn=data.traits.autonomous_system_number, | ||
country=data.country.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is correct? I see an error on the staging VM:
value too long for type character varying(4)
For my IP address, the value of data.country.name
is India
. Did you mean to store a country code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was to store country code. As sqlite does not enforce this so was not able to catch this. Thanks @pandafy
Signed-off-by: DragnEmperor <[email protected]>
3e739d6
to
774ab9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a lot of usage of WHO_IS_ENABLED
, we want to make sure if we set it to False most of if not all of the whois code is not executed.
I am also reasoning on the possibility of making the geoip dependency optional. I need to analyze the stability and other dependencies of this library before deciding, but in the meantime I wanted to share this view to make all of us aware of this.
@@ -67,3 +67,6 @@ def get_setting(option, default): | |||
"API_TASK_RETRY_OPTIONS", | |||
dict(max_retries=5, retry_backoff=True, retry_backoff_max=600, retry_jitter=True), | |||
) | |||
GEOIP_ACCOUNT_ID = get_setting("GEOIP_ACCOUNT_ID", None) | |||
GEOIP_LICENSE_KEY = get_setting("GEOIP_LICENSE_KEY", None) | |||
WHO_IS_ENABLED = get_setting("WHO_IS_ENABLED", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is t true, we need to assert that GEOIP_ACCOUNT_ID and GEOIP_LICENSE_KEY are set (not falsy) otherwise we should raise ImproperlyConfigured
.
Checklist
Reference to Existing Issue
Closes #1032
Closes #1033
Closes #1037
Closes #1045
Description of Changes
Added a new model to store WHOIS details for a device with the required fields.
Implemented a new celery task to fetch WHOIS details using the
geoip2
web service.Using an external library instead of the wrapper class provided by django because the wrapper class is designed to work with local databases and also is not currently designed to work with ASN databases.
In order to trigger the task whenever
last_ip
changes, using existing logic of_changed_fields
to track the changes.Steps to obtain GeoIP creds
Create Maxmind account using the following link: https://www.maxmind.com/en/geolite2/signup?utm_source=kb&utm_medium=kb-link&utm_campaign=kb-create-account
Go to

Manage License Keys
Generate a New license Key. Name it whatever you like

Copy the Account Id and License Key and Paste it in the environment variables:
GEOIP_ACCOUNT_ID
andGEOIP_LICENSE_KEY
.