Skip to content

[feature] Base setup for WHOIS model #1032 #1033 #1037 #1045 #1054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: gsoc25-whois
Choose a base branch
from

Conversation

DragnEmperor
Copy link
Member

@DragnEmperor DragnEmperor commented May 23, 2025

Checklist

  • I have read the OpenWISP Contributing Guidelines.
  • I have manually tested the changes proposed in this pull request.
  • I have written new test cases for new code and/or updated existing tests for changes to existing code.
  • I have updated the documentation.

Reference to Existing Issue

Closes #1032
Closes #1033
Closes #1037
Closes #1045

Description of Changes

Added a new model to store WHOIS details for a device with the required fields.
Implemented a new celery task to fetch WHOIS details using the geoip2 web service.
Using an external library instead of the wrapper class provided by django because the wrapper class is designed to work with local databases and also is not currently designed to work with ASN databases.
In order to trigger the task whenever last_ip changes, using existing logic of _changed_fields to track the changes.

Steps to obtain GeoIP creds

  1. Create Maxmind account using the following link: https://www.maxmind.com/en/geolite2/signup?utm_source=kb&utm_medium=kb-link&utm_campaign=kb-create-account

  2. Go to Manage License Keys
    image

  3. Generate a New license Key. Name it whatever you like
    image

  4. Copy the Account Id and License Key and Paste it in the environment variables: GEOIP_ACCOUNT_ID and GEOIP_LICENSE_KEY.

@nemesifier nemesifier added gsoc Part of a Google Summer of Code project enhancement labels May 23, 2025
@DragnEmperor DragnEmperor marked this pull request as ready for review May 26, 2025 11:04
@DragnEmperor DragnEmperor force-pushed the issues/1032-1033-whois-setup branch from f20b597 to 5ea2bbe Compare May 26, 2025 11:05
@DragnEmperor DragnEmperor changed the title [feature] Base setup for WHOIS model #1032 #1033 [feature] Base setup for WHOIS model #1032 #1033 #1045 May 26, 2025
@DragnEmperor DragnEmperor force-pushed the issues/1032-1033-whois-setup branch from 5ea2bbe to e2fd7fc Compare May 26, 2025 17:05
Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress.

Please start creating a dedicated docs page for this feature, shouldn't be long, just add basic info for now.

I don't see a way to turn off this feature, we need a setting to turn this off, I recommend also adding a fallback field at organization level to turn it on selectively for specific organizations, see https://openwisp.io/docs/24.11/utils/developer/custom-fields.html#openwisp-utils-fields-fallbackbooleanchoicefield.

if (
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip)
and self.last_ip
and ip_address(self.last_ip).is_global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could move all this logic to a private method, eg: _needs_whois_update()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the last meeting, should we create separate folders, classes, and functions for the WHOIS functionality to maintain modularity in the OpenWISP codebase?

We could introduce a new class that takes the device instance as input, where methods like _needs_whois_update, trigger_whois_lookup, and the WHOIS-related Celery tasks would be defined. Going forward, we would only update this class, and simply call it from the current context.

What are your thoughts on this approach, @DragnEmperor @pandafy @devkapilbansal @nemesifier ?

@@ -157,3 +158,46 @@ def invalidate_device_checksum_view_cache(organization_id):
Device.objects.filter(organization_id=organization_id).only('id').iterator()
):
DeviceChecksumView.invalidate_get_device_cache(device)


@shared_task(soft_time_limit=7200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the default OpenWISP celery task settings?

Suggested change
@shared_task(soft_time_limit=7200)
@shared_task(base=OpenwispCeleryTask)

I don't see reasons for which this task should take 7200 seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DragnEmperor
Do we know how much time if takes to fetch data once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 388 to 390
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip)
and self.last_ip
and ip_address(self.last_ip).is_global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also re-order the conditions, in and operations, if first is false then other statements are not evaluated

Suggested change
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip)
and self.last_ip
and ip_address(self.last_ip).is_global
self.last_ip
and (not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip)
and ip_address(self.last_ip).is_global

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in the weekly meeting, I proposed the following:

  • if last_ip is not falsy
  • and is a public IP
  • and:
    • the device doesn't have whois yet
    • or the device has whois but the ip is changing

Hopefully I am not forgetting anything here.

@@ -157,3 +158,46 @@ def invalidate_device_checksum_view_cache(organization_id):
Device.objects.filter(organization_id=organization_id).only('id').iterator()
):
DeviceChecksumView.invalidate_get_device_cache(device)


@shared_task(soft_time_limit=7200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DragnEmperor
Do we know how much time if takes to fetch data once?

for a device.
"""

device = models.OneToOneField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason behind this One to One mapping?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the initial discussions in April, we came to a conclusion that adding whois record per device would reduce bloating which might occur in case we map to ip_address. So this was done in accordance to it.

As per today's discussion and more research regarding the rate limits on webservice, mapping to ip_address adds more benefit. I still want to look into it a bit more, if there can be some cases we might be missing before doing this change so as to have a clear picture. Adding a link to the discussion : #1032

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important point to check for switching to using the ip address as primary key for this model is how to make sure we don't leave orphan data in the DB when devices are deleted. Worst case scenario we need to handle this with a post_delete signal receiver function.

Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated gsoc25-whois to keep it up to date (let's do that as often as possible please).
Could you run openwisp-qa-format again and merge with the latest version of the target branch?

Comment on lines 388 to 390
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip)
and self.last_ip
and ip_address(self.last_ip).is_global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in the weekly meeting, I proposed the following:

  • if last_ip is not falsy
  • and is a public IP
  • and:
    • the device doesn't have whois yet
    • or the device has whois but the ip is changing

Hopefully I am not forgetting anything here.

for a device.
"""

device = models.OneToOneField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important point to check for switching to using the ip address as primary key for this model is how to make sure we don't leave orphan data in the DB when devices are deleted. Worst case scenario we need to handle this with a post_delete signal receiver function.

help_text=_(
'indicates the IP address logged from '
'the last request coming from the device'
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of this help text as we won't show this in the admin I believe as it's going to be redundant with last_ip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add basic validation which raises a ValidationError if someone tries to save this with a private IP.

mock_response.traits.autonomous_system_number = 15169
mock_response.traits.network = '172.217.22.0/24'
mock_response.location.time_zone = 'America/Los_Angeles'
mock_client.return_value.city.return_value = mock_response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If more convenient we could use https://github.com/kevin1024/vcrpy for mocking API responses.
It would make test if we need to do this over and over.

Comment on lines 288 to 291
# Triggering the whois lookup for new devices and old devices based
# on last_ip is changed or not
self.trigger_whois_lookup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment would be better placed in the function’s docstring. Additionally, we can make it configurable and trigger it only when the feature is enabled.

if (
(not hasattr(self, 'whois_info') or self.last_ip != self._initial_last_ip)
and self.last_ip
and ip_address(self.last_ip).is_global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the last meeting, should we create separate folders, classes, and functions for the WHOIS functionality to maintain modularity in the OpenWISP codebase?

We could introduce a new class that takes the device instance as input, where methods like _needs_whois_update, trigger_whois_lookup, and the WHOIS-related Celery tasks would be defined. Going forward, we would only update this class, and simply call it from the current context.

What are your thoughts on this approach, @DragnEmperor @pandafy @devkapilbansal @nemesifier ?

Comment on lines 70 to 71
GEOIP_ACCOUNT_ID = get_setting('GEOIP_ACCOUNT_ID', None)
GEOIP_LICENSE_KEY = get_setting('GEOIP_LICENSE_KEY', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add these settings in docs and steps to get these values.

Comment on lines 178 to 245
address = ', '.join(
[
data.city.name,
data.country.name,
data.continent.name,
str(data.postal.code),
]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we concatenate all of this into a single field, or store each component in separate fields? If adding more fields isn’t ideal, we could consider using a jsonb field instead.

By the way, what would be the maximum length after concatenation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, storing the address as JSONField is also an option. Gives the flexibility to construct complete address as per requirement. We can create a helper to provide a standard constructed address as above.

The max length is : 130 (Country : 44 city : 64 continent : 14 postal : 8). I queried the CSV version of Maxmind's DB for this.

Comment on lines 200 to 201
except requests.RequestException as e:
logger.error(f'Error fetching WHOIS details for {ip}: {e}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s monitor for request exceptions. If it’s a rate limit issue, we can retry the request after a delay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s monitor for request exceptions. If it’s a rate limit issue, we can retry the request after a delay.

It depends if the rate limiting is daily or for bursts.

Added WHOIS model with the required fields.
Implemented a new celery task to fetch WHOIS
details using the `geoip2` web service.
In order to trigger the task whenever `last_ip`
changes, using existing logic of `_changed_fields`
to track the changes.

Closes #1032
Closes #1033

Signed-off-by: DragnEmperor <[email protected]>
Added tests for checking if `fetch_whois_details` task
is called properly or not, and for checking creation
of WHOIS record for a device if the last_ip recorded
is public.
As now whois record is also fetched whenever fetching
a device, there is an increase in query count in some
of the tests.

Closes #1045

Signed-off-by: DragnEmperor <[email protected]>
@devkapilbansal devkapilbansal self-requested a review May 29, 2025 19:15
@DragnEmperor DragnEmperor force-pushed the issues/1032-1033-whois-setup branch from e2fd7fc to e4a6972 Compare May 29, 2025 20:39
@DragnEmperor DragnEmperor changed the title [feature] Base setup for WHOIS model #1032 #1033 #1045 [feature] Base setup for WHOIS model #1032 #1033 #1037 #1045 May 29, 2025
@@ -335,7 +337,7 @@ def _get_initial_values_for_checked_fields(self):
if not present_values:
return
self.refresh_from_db(fields=present_values.keys())
for field in self._changed_checked_fields:
for field in present_values:
Copy link
Member Author

@DragnEmperor DragnEmperor May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to add a comment here, will add it in next commit :

Let me know if I am wrong, but we are adding fields in present_values based on certain conditions just above, so shouldn't we loop in present_values instead of _changed_checked_fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure on this but it should work

Using JSONField for address gives the flexibility to
format address as per convenience. Added fallbacks
for typical address fields in `fetch_whois_details`.
Updated docs for WhoIs feature with steps to obtain
and setup the required Credentials.

Signed-off-by: DragnEmperor <[email protected]>
Copy link
Member

@devkapilbansal devkapilbansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have few doubts that I asked in below comments.

Setup
-----

Ensure the your project ``settings.py`` contains the variables as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ensure the your project ``settings.py`` contains the variables as follows:
Ensure that your project ``settings.py`` contains the variables as follows:


.. code-block:: python

GEOIP_ACCOUNT_ID = os.getenv("GEOIP_ACCOUNT_ID", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be OPENWISP_CONTROLLER_GEOIP_ACCOUNT_ID?

Suggested change
GEOIP_ACCOUNT_ID = os.getenv("GEOIP_ACCOUNT_ID", "")
OPENWISP_CONTROLLER_GEOIP_ACCOUNT_ID = os.getenv("OPENWISP_CONTROLLER_GEOIP_ACCOUNT_ID", "")

Same for below as well

- Generate a New license Key. Name it whatever you like .. image::
https://github.com/user-attachments/assets/57df27bc-4f9d-4701-88bf-91e6b715e4a6
- Copy the *Account Id* and *License Key* and Paste it in the environment
variables: **GEOIP_ACCOUNT_ID** and **GEOIP_LICENSE_KEY**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
variables: **GEOIP_ACCOUNT_ID** and **GEOIP_LICENSE_KEY**.
variables: **OPENWISP_CONTROLLER_GEOIP_ACCOUNT_ID** and **OPENWISP_CONTROLLER_GEOIP_LICENSE_KEY**.

Comment on lines 386 to 390
return (
getattr(org_settings, "whois_enabled", app_settings.WHOIS_ENABLED)
and new_ip
and ip_address(new_ip).is_global
and (not hasattr(self, "whois_info") or old_ip != new_ip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make calls only when all other conditions are met

Suggested change
return (
getattr(org_settings, "whois_enabled", app_settings.WHOIS_ENABLED)
and new_ip
and ip_address(new_ip).is_global
and (not hasattr(self, "whois_info") or old_ip != new_ip)
return (
getattr(org_settings, "whois_enabled", app_settings.WHOIS_ENABLED)
and new_ip
and (not hasattr(self, "whois_info") or old_ip != new_ip)
and ip_address(new_ip).is_global

# the actual delay is calculated as a random value between 0 and this value.
# https://docs.celeryq.dev/en/latest/userguide/tasks.html#Task.autoretry_for
retry_backoff = 10
max_retries = 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we keep this configurable as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found something that we are using to define these options for zerotier tasks:

API_TASK_RETRY_OPTIONS = get_setting(
    "API_TASK_RETRY_OPTIONS",
    dict(max_retries=5, retry_backoff=True, retry_backoff_max=600, retry_jitter=True),
)

Is it fine if I use this or are we looking for creating these settings dedicated for WhoIs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can use it.
@codesankalp What are your opinions?

Comment on lines 222 to 228
# Check if WhoIs lookup is enabled for the organization
org_settings = device._get_organization__config_settings()
if not getattr(org_settings, "whois_enabled", app_settings.WHOIS_ENABLED):
logger.info(
f"WhoIs lookup is disabled for organization {device.organization_id}."
)
return
Copy link
Member

@devkapilbansal devkapilbansal Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be the first check? Also, I doubt it this check will ever be executed since we already have this condition in _need_whois_lookup

Copy link
Member Author

@DragnEmperor DragnEmperor Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need device in order to fetch organization so thats why it is after device fetch.
Yes, this function is currently executed only via trigger_whois_lookup in device model, but imo its better to add the checks here as well, its a personal preference.
Let me know if these checks needs to be removed (including the device exists check, since that will also be guaranteed to exist as we are running the task via transaction.on_commit)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devkapilbansal i have updated the above comment, please take a look at your convenience

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo its better to add the checks here as well

Why do you think so? Like, what benefits it may serve?

Copy link
Member Author

@DragnEmperor DragnEmperor Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo its better to add the checks here as well

Why do you think so? Like, what benefits it may serve?

I think keeping the critical checks (whois enabled or not in above), is better for cases like the task gets delayed and in between its next execution the feature gets disabled or device gets deleted then we will have these checks to ensure the task returns before fetching data.
Again, it is a personal preference and I can change it if these checks seem redundant.

Let me know your thoughts!

Edit: Now that we are going to map to ip_address, we will not require the device check and thus these checks seem redundant, but again let me know if the above case is valid or not.

return

try:
# 'geolite.info' host is used for GeoLite2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reasons for the same? Comments should basically focus on why instead of what

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the link to docs here, something like this:
For reference : https://geoip2.readthedocs.io/en/latest/#sync-web-service-example

Comment on lines 34 to 35
- Create Maxmind account using the following link: `Create Account
<https://www.maxmind.com/en/geolite2/signup?utm_source=kb&utm_medium=kb-link&utm_campaign=kb-create-account>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the query params:

Suggested change
- Create Maxmind account using the following link: `Create Account
<https://www.maxmind.com/en/geolite2/signup?utm_source=kb&utm_medium=kb-link&utm_campaign=kb-create-account>`_.
- Create Maxmind account using the following link: `Create Account
<https://www.maxmind.com/en/geolite2/signup`_.

To reduce data redundancy for duplicate IPs,
WhoIs model now uses IP address as primary key
and devices are mapped to it via manual lookups.
Updated the test cases to reflect the same.
Formatting changes in WhoIs doc page and
settings variables.

Signed-off-by: DragnEmperor <[email protected]>
To improve modularity and flexibility, all
WhoIs related code is kept under `whois` subdirectory
under `config` module.
This includes tests, tasks, on delete handlers as well.
Added `cache` to prevent repeated lookups during
multiple WhoIs details fetch during device listing.

Signed-off-by: DragnEmperor <[email protected]>
Comment on lines 427 to 433
@property
def whois_info(self):
"""
Used as a shortcut to get WhoIsInfo
"""
service = WhoIsService(self)
return service.get_whois_info()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s better to initialize the service using a cached_property. This way, you can access it directly from the device with device.who_is_service and call all its methods as needed from other places

Suggested change
@property
def whois_info(self):
"""
Used as a shortcut to get WhoIsInfo
"""
service = WhoIsService(self)
return service.get_whois_info()
from functools import cached_property
@cached_property
def who_is_service(self):
return WhoIsService(self)

Comment on lines 384 to 385
service = WhoIsService(self)
service.trigger_whois_lookup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
service = WhoIsService(self)
service.trigger_whois_lookup()
self.who_is_service.trigger_whois_lookup()

Comment on lines 79 to 81
elif not getattr(device, "whois_info", None):
service = WhoIsService(device)
service.trigger_whois_lookup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif not getattr(device, "whois_info", None):
service = WhoIsService(device)
service.trigger_whois_lookup()
elif not device.who_is_service.get_whois_info():
device.who_is_service.trigger_whois_lookup()

Comment on lines 1 to 2
# This file is created to allow test runners to discover the tests
# defined in this directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to add this comment

# If other active devices are linked to it, then new lookup will
# be triggered for them.
def device_whois_info_delete_handler(instance, **kwargs):
transaction.on_commit(lambda: WhoIsService.delete_whois_record(instance.last_ip))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be sync, or can it be async?

asking because it is adding some time in device save operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this function as a separate task to make the deletion non-blocking.
I was also using the same function in fetch_whois_details task, so there won't be a problem as I can call it directly without .delay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should keep it async

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay!

Comment on lines 70 to 74
ip_client = geoip2_webservice.Client(
account_id=app_settings.GEOIP_ACCOUNT_ID,
license_key=app_settings.GEOIP_LICENSE_KEY,
host="geolite.info",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this client can be a separate method in service.

@@ -0,0 +1,103 @@
from ipaddress import ip_address
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name can be service.py

Comment on lines 74 to 76
ip = self.device.last_ip
if not ip or not ip_address(ip).is_global:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check can be on top to avoid extra db query.


def __init__(self, device):
self.device = device
self.org_settings = device._get_organization__config_settings()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only load the configuration when it’s actually needed. you can load it in respective method.

"""
whois_info = self.get_whois_info()
return (
getattr(self.org_settings, "whois_enabled", WHOIS_ENABLED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be in the last.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sankalp,
Why it should be the last? I believe that third party calls should come at last so that we can prevent calling them in case other conditions are not met

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this line, it uses the ipaddress package to confirm the IP isn’t private.

Since we already fetched the WhoIs info earlier (which might involve an external lookup), we can just check if the IP is private here and return immediately if it is. This avoids any extra checks.

Also, reviewing both functions again, I notice there’s duplicated logic — this could be refactored - @DragnEmperor

@@ -335,7 +337,7 @@ def _get_initial_values_for_checked_fields(self):
if not present_values:
return
self.refresh_from_db(fields=present_values.keys())
for field in self._changed_checked_fields:
for field in present_values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure on this but it should work

Comment on lines 54 to 57
if ip_address(self.ip_address).is_private:
raise ValidationError(
_("WhoIs information cannot be created for private IP addresses.")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this condition required? We are only creating WhoIs Info if IP is not private

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to add a validation check in model itself : #1054 (comment)

@@ -181,7 +181,7 @@ def test_get_cached_checksum(self):
url = reverse("controller:device_checksum", args=[d.pk])

with self.subTest("first request does not return value from cache"):
with self.assertNumQueries(4):
with self.assertNumQueries(3):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have to check but do you know why the number of queries have been reduced here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reduced count was due to prefetched organization settings when device was retrieved.

# If other active devices are linked to it, then new lookup will
# be triggered for them.
def device_whois_info_delete_handler(instance, **kwargs):
transaction.on_commit(lambda: WhoIsService.delete_whois_record(instance.last_ip))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should keep it async

Comment on lines 17 to 22
# slots are used to define a fixed set of attributes for the class,
# which can improve performance by avoiding the overhead of a dynamic dictionary.
# Particularly useful in scenarios where many instances of the class are created.
# Reference: https://wiki.python.org/moin/UsingSlots
__slots__ = ("device", "org_settings")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are creating a cached property, do we need to define slots? I am not sure of their relevance here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially there was no cached property here, but working on the changes I found one property 'is_whois_enabled' (will be added in the new commit) which we can cache, so removing slots for the time being.
Thanks!


whois_info = WhoIsInfo.objects.filter(ip_address=ip).first()
if whois_info:
cache.set(key, whois_info, timeout=60 * 60 * 24) # Cache for 24 hours
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this

Comment on lines 88 to 89
# The function is staticmethod because it does not depend on the instance
# of the class and deletes the WhoIs record for a given IP address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not necessary here

# The function is staticmethod because it does not depend on the instance
# of the class and deletes the WhoIs record for a given IP address.
@staticmethod
def delete_whois_record(ip_address):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we using this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When deleting record along with device in device_whois_info_delete_handler and in fetch_whois_details when deleting old whois record (refering to commit : 0b1bef4)

device_pk = kwargs.get("device_pk")
new_ip_address = kwargs.get("new_ip_address")
device = Device.objects.get(pk=device_pk)
# Notify the user about the failure via web notification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment not needed

Moved all tasks to service file to keep one source.
Refactored the way whois_info is fetcehd for a device.
Modified the order of whois checks to keep
expensive checks at the bottom.
As in views like `DeviceChecksumView` whole device
instance is being cached, now org settings will be
fetched from db rather than through device to
maintain consistency. This has led to increase in
queries in some test cases.

Signed-off-by: DragnEmperor <[email protected]>
Org config settings are now cached to ensure
DeviceChecksumview is not degraded. This will
be invalidated on org settings update/delete
which are rare.
Refactored code and tests for readability.

Signed-off-by: DragnEmperor <[email protected]>
Copy link
Member

@pandafy pandafy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick reqview on he caching of OrganizationConfigSettings and shared few suggestions. I haven't performed manual testing.

Comment on lines 7 to 18
# Remove the related WhoIs record for that ip from db and cache
# If other active devices are linked to it, then new lookup will
# be triggered for them.
def device_who_is_info_delete_handler(instance, **kwargs):
transaction.on_commit(
lambda: WhoIsService.delete_who_is_record.delay(instance.last_ip)
)


def invalidate_org_settings_cache(instance, **kwargs):
org_pk = instance.organization_id
cache.delete(WhoIsService.ORG_SETTINGS_CACHE_KEY.format(org_pk=org_pk))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these to class methods in the Whois model? In general, we want to keep all business logic in the models.

Comment on lines 133 to 139
org_settings = OrganizationConfigSettings.objects.get(
organization=Subquery(
Device.objects.filter(pk=self.device.pk).values(
"organization_id"
)[:1]
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query seems over-complicated to me.

We already have the required org_pk. Why can't we do this?

OrganizationConfiguSettings.objects.get(organization_id=org_pk)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I had a case in mind what would happen if device's org is changed, so i did this to ensure latest value is fetched. I missed that we are invalidating the cache on device save :

It was my mistake, I will remove this subquery. Thanks!

organization=Subquery(
Device.objects.filter(pk=device_pk).values("organization_id")[:1]
org_pk = self.device.organization.pk
org_settings = cache.get(self.ORG_SETTINGS_CACHE_KEY.format(org_pk=org_pk))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For keeping it DRY (Don't Repeat Yourself), let's create a static method get_cache_key(org_id) which would return the cache key for the organization.

Comment on lines +201 to +207
# device_pk is used when task fails to report for which device failure occurred
@shared_task(
bind=True,
base=WhoIsCeleryRetryTask,
**app_settings.API_TASK_RETRY_OPTIONS,
)
def fetch_who_is_details(self, device_pk, initial_ip_address, new_ip_address):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is celery worker picking up this task? AFAIK, it only registers task which are present in tasks,py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the tasks are getting discovered:

Screenshot 2025-06-10 at 6 06 21 PM

@DragnEmperor DragnEmperor force-pushed the issues/1032-1033-whois-setup branch from c24fec3 to 5f91215 Compare June 10, 2025 14:24
Migrated all receivers related to WhoIs to the
model class.
Includes minor refactoring as well.

Signed-off-by: DragnEmperor <[email protected]>
@DragnEmperor DragnEmperor force-pushed the issues/1032-1033-whois-setup branch from 5f91215 to 5539474 Compare June 10, 2025 14:27
The imports are restructured to avoid app
registry errors and aid whois task discovery.

Signed-off-by: DragnEmperor <[email protected]>
There was an issue that the task was triggered
for a device whose last_ip is updated but without
checking if we already have WhoIs for the latest
ip. Have added a check to return from the task
if that is the case.

Signed-off-by: DragnEmperor <[email protected]>
Copy link
Member

@codesankalp codesankalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DragnEmperor – I followed the docs and ran this locally; it worked perfectly 👏.

I think we can deploy this on staging environment.

Just a few minor suggestions below 👇

Comment on lines +77 to +78
elif not device.who_is_service.get_device_who_is_info():
device.who_is_service.trigger_who_is_lookup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do this when flag is enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @codesankalp ,we are talking about the whois_enabled flag, right? The trigger_whois_lookup task has the conditions to check for all these.

Comment on lines 170 to 174
return (
self.is_valid_public_ip_address(new_ip)
and (initial_ip != new_ip or not self._get_who_is_info_from_db(new_ip))
and self.is_who_is_enabled
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is complex, i think we can separate it into multiple if statement for readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have split it into multiple lines as:

def _need_who_is_lookup(self, initial_ip, new_ip):
        """
        This is used to determine if the WhoIs lookup should be triggered
        when the device is saved.

        The lookup is not triggered if:
            - The new IP address is None or it is a private IP address.
            - The WhoIs information of new ip is already present.
            - WhoIs is disabled in the organization settings. (query from db)
        """
        if not self.is_valid_public_ip_address(new_ip):
            return False
        
        if self._get_who_is_info_from_db(new_ip):
            return False
        
        return self.is_who_is_enabled

I modified the second conditions as I found that we need not to check if ip is updated or not, checking if it already exists in db should be enough.

WhoIsInfo = load_model("config", "WhoIsInfo")

try:
ip_client = WhoIsService._get_geoip2_client()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client initialization can be outside of this block

The checks in `_need_who_is_lookup` are split for better readability

Signed-off-by: DragnEmperor <[email protected]>
@DragnEmperor DragnEmperor force-pushed the issues/1032-1033-whois-setup branch from 07c94fb to 18f67d6 Compare June 12, 2025 19:28
WhoIsInfo.objects.create(
organization_name=data.traits.autonomous_system_organization,
asn=data.traits.autonomous_system_number,
country=data.country.name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is correct? I see an error on the staging VM:

value too long for type character varying(4)

For my IP address, the value of data.country.name is India. Did you mean to store a country code here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was to store country code. As sqlite does not enforce this so was not able to catch this. Thanks @pandafy

@DragnEmperor DragnEmperor force-pushed the issues/1032-1033-whois-setup branch from 3e739d6 to 774ab9a Compare June 13, 2025 18:02
Copy link
Member

@nemesifier nemesifier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a lot of usage of WHO_IS_ENABLED, we want to make sure if we set it to False most of if not all of the whois code is not executed.

I am also reasoning on the possibility of making the geoip dependency optional. I need to analyze the stability and other dependencies of this library before deciding, but in the meantime I wanted to share this view to make all of us aware of this.

@@ -67,3 +67,6 @@ def get_setting(option, default):
"API_TASK_RETRY_OPTIONS",
dict(max_retries=5, retry_backoff=True, retry_backoff_max=600, retry_jitter=True),
)
GEOIP_ACCOUNT_ID = get_setting("GEOIP_ACCOUNT_ID", None)
GEOIP_LICENSE_KEY = get_setting("GEOIP_LICENSE_KEY", None)
WHO_IS_ENABLED = get_setting("WHO_IS_ENABLED", False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is t true, we need to assert that GEOIP_ACCOUNT_ID and GEOIP_LICENSE_KEY are set (not falsy) otherwise we should raise ImproperlyConfigured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement gsoc Part of a Google Summer of Code project
Development

Successfully merging this pull request may close these issues.

5 participants