Skip to content

Fix logging configuration to prevent OpenTelemetry handler conflicts #8269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dmavrommatis
Copy link
Contributor

Fix logging configuration to prevent OpenTelemetry handler conflicts

This PR modifies how DSPy configures its loggers to prevent conflicts with OpenTelemetry logging handlers and avoid long blocking operations during logger configuration.

Problem

The current implementation uses logging.config.dictConfig() which triggers a full handler cleanup through _clearExistingHandlers(). This causes issues when OpenTelemetry logging handlers are present in the environment because:

  1. The cleanup process attempts to flush and shutdown all existing handlers
  2. OpenTelemetry handlers' flush operation can block for a long time (30+ seconds)
  3. This affects any application using both DSPy and OpenTelemetry for logging

Here's an example that demonstrates the issue run with opentelemetry-instrument:

import logging
import logging.config

logger = logging.getLogger(__name__)

def main():
    import time
    logger.info("Starting main")
    starttime = time.time()
    logging.config.dictConfig({"version":1})
    logger.info(f"logging config change took {time.time()-starttime}")

if __name__ == "__main__":
    main()

This simple code takes 30+ seconds to run and often needs to be interrupted. Here's the stack trace showing where it blocks:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in run_code
exec(code, run_globals)
File "/root/test_run.py", line 16, in <module>
main()
File "/root/test_run.py", line 11, in main
logging.config.dictConfig({"version":1})
File "/usr/lib/python3.10/logging/config.py", line 811, in dictConfig
dictConfigClass(config).configure()
File "/usr/lib/python3.10/logging/config.py", line 538, in configure
clearExistingHandlers()
File "/usr/lib/python3.10/logging/config.py", line 275, in clearExistingHandlers
logging.shutdown(logging.handlerList[:])
File "/usr/lib/python3.10/logging/init.py", line 2182, in shutdown
h.flush()
File "/usr/local/lib/python3.10/dist-packages/opentelemetry/sdk/logs/_internal/init.py", line 571, in flush
self.logger_provider.force_flush()
File "/usr/local/lib/python3.10/dist-packages/opentelemetry/sdk/logs/_internal/init.py", line 715, in force_flush
return self.multi_log_record_processor.force_flush(timeout_millis)
File "/usr/local/lib/python3.10/dist-packages/opentelemetry/sdk/logs/_internal/init.py", line 350, in force_flush
if not lp.force_flush((deadline_ns - current_ts) // 1000000):
File "/usr/local/lib/python3.10/dist-packages/opentelemetry/sdk/logs/_internal/export/init.py", line 378, in force_flush
ret = flush_request.event.wait(timeout_millis / 1e3)
File "/usr/lib/python3.10/threading.py", line 607, in wait
signaled = self.cond.wait(timeout)
File "/usr/lib/python3.10/threading.py", line 324, in wait
gotit = waiter.acquire(True, timeout)
KeyboardInterrupt

Solution

This PR replaces dictConfig with direct logger configuration using Python's logging API. The new implementation:

  1. Creates a named handler specific to DSPy
  2. Only removes previous DSPy handlers (if any) while preserving other handlers (including OpenTelemetry)
  3. Configures the logger with the same settings but without triggering global handler cleanup

Key changes:

  • Use logging.StreamHandler and logging.Formatter directly
  • Give the DSPy handler a specific name for identification
  • Selectively remove only DSPy handlers when reconfiguring
  • Preserve all other logging handlers in the environment

This approach allows DSPy to:

  • Coexist with OpenTelemetry and other logging systems
  • Avoid long blocking operations during configuration
  • Maintain the same logging functionality and format
  • Be a better citizen in the Python logging ecosystem

Testing

The modified code runs instantly without delays, even in environments with OpenTelemetry handlers present. The logging output remains unchanged, maintaining backward compatibility while fixing the performance issue.

Signed-off-by: dmavrommatis <[email protected]>
@okhat okhat requested a review from Copilot June 3, 2025 16:36
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the global dictConfig approach with direct logger/handler setup to avoid flushing OpenTelemetry handlers and to only clear DSPy’s own handler.

  • Use logging.Formatter and StreamHandler directly instead of dictConfig.
  • Name the DSPy-specific handler "dspy_handler" and remove only handlers with that name.
  • Preserve all other handlers to prevent long blocking operations.
Comments suppressed due to low confidence (1)

dspy/utils/logging_utils.py:68

  • [nitpick] There are no tests verifying that existing DSPy handlers are correctly removed and new ones added; consider adding tests for multiple consecutive calls to configure_dspy_loggers to validate handler cleanup logic.
for existing_handler in logger.handlers[:]:

@chenmoneygithub
Copy link
Collaborator

@dmavrommatis Thanks for the PR! Is this the correct code snippet to reproduce the slowdown of using opentelemetry together with DSPy logging?

import logging
import logging.config

logger = logging.getLogger(__name__)

def main():
    import time
    logger.info("Starting main")
    starttime = time.time()
    logging.config.dictConfig({"version":1})
    logger.info(f"logging config change took {time.time()-starttime}")

if __name__ == "__main__":
    main()

This finishes instantly on my side.

@dmavrommatis
Copy link
Contributor Author

@chenmoneygithub you need to also have an opentelemetry-collector running and to have the environment variables so when you call opentelemetry-instrument python example.py it sets up the OTEL handlers.

To setup everything to replicate the behavior you can run

docker run -p 127.0.0.1:4317:4317 -p 127.0.0.1:4318:4318 -p 127.0.0.1:55679:55679 otel/opentelemetry-collector-contrib:0.127.0 2>&1

then have an example.py with

import logging
import logging.config
import time
import sys


def setup_logging():
    root_logger = logging.getLogger()
    log_level = logging.INFO
    root_logger.setLevel(log_level)
    for handler in root_logger.handlers:
        handler.setLevel(log_level)
    handler = logging.StreamHandler(sys.stdout)
    handler.setLevel(log_level)
    formatter = logging.Formatter("%(levelname)-8s; %(name)s; %(message)s;")
    handler.setFormatter(formatter)
    root_logger.addHandler(handler)


def main():
    setup_logging() # add an stdout logger additional to the OTEL loggers
    logger = logging.getLogger("example")

    logger.info("Starting main")
    starttime = time.time()
    logging.config.dictConfig(
        {
            "version": 1,
            "disable_existing_loggers": False,
            "formatters": {
                "brief": {
                    "datefmt": "%H:%M:%S",
                    "format": "%(levelname)-8s; %(name)s; %(message)s;",
                },
            },
            "handlers": {
                "console": {
                    "level": "DEBUG",
                    "class": "logging.StreamHandler",
                    "formatter": "brief",
                    "stream": "ext://sys.stdout",
                },
            },
            "loggers": {
                "example": {
                    "handlers": ["console"],
                    "level": "INFO",
                    "propagate": False,
                },
            },
        }
    )
    logger.info(f"logging config change took {time.time()-starttime}")


if __name__ == "__main__":
    main()

and then run it with OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true opentelemetry-instrument python example.py which will output

INFO    ; example; Starting main;
WARNING ; opentelemetry.sdk._logs._internal.export; Timeout was exceeded in force_flush().;
INFO    ; example; logging config change took 29.99991464614868;

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, LGTM!

@chenmoneygithub chenmoneygithub merged commit 3e1185f into stanfordnlp:main Jun 6, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants