38
38
39
39
40
40
class FrameworkProfile :
41
- """Configuration for the collection of framework metrics in the profiler.
41
+ """
42
+ Sets up the profiling configuration for framework metrics.
43
+
44
+ Validates user inputs and fills in default values if no input is provided.
45
+ There are three main profiling options to choose from:
46
+ :class:`~sagemaker.debugger.metrics_config.DetailedProfilingConfig`,
47
+ :class:`~sagemaker.debugger.metrics_config.DataloaderProfilingConfig`, and
48
+ :class:`~sagemaker.debugger.metrics_config.PythonProfilingConfig`.
49
+
50
+ The following list shows available scenarios of configuring the profiling options.
51
+
52
+ 1. None of the profiling configuration, step range, or time range is specified.
53
+ SageMaker Debugger activates framework profiling based on the default settings
54
+ of each profiling option.
55
+
56
+ .. code-block:: python
57
+
58
+ from sagemaker.debugger import ProfilerConfig, FrameworkProfile
59
+
60
+ profiler_config=ProfilerConfig(
61
+ framework_profile_params=FrameworkProfile()
62
+ )
63
+
64
+ 2. Target step or time range is specified to
65
+ this :class:`~sagemaker.debugger.metrics_config.FrameworkProfile` class.
66
+ The requested target step or time range setting propagates to all of
67
+ the framework profiling options.
68
+ For example, if you configure this class as following, all of the profiling options
69
+ profiles the 6th step:
70
+
71
+ .. code-block:: python
72
+
73
+ from sagemaker.debugger import ProfilerConfig, FrameworkProfile
74
+
75
+ profiler_config=ProfilerConfig(
76
+ framework_profile_params=FrameworkProfile(start_step=6, num_steps=1)
77
+ )
78
+
79
+ 3. Individual profiling configurations are specified through
80
+ the ``*_profiling_config`` parameters.
81
+ SageMaker Debugger profiles framework metrics only for the specified profiling configurations.
82
+ For example, if the :class:`~sagemaker.debugger.metrics_config.DetailedProfilingConfig` class
83
+ is configured but not the other profiling options, Debugger only profiles based on the settings
84
+ specified to the
85
+ :class:`~sagemaker.debugger.metrics_config.DetailedProfilingConfig` class.
86
+ For example, the following example shows a profiling configuration to perform
87
+ detailed profiling at step 10, data loader profiling at step 9 and 10,
88
+ and Python profiling at step 12.
89
+
90
+ .. code-block:: python
91
+
92
+ from sagemaker.debugger import ProfilerConfig, FrameworkProfile
93
+
94
+ profiler_config=ProfilerConfig(
95
+ framework_profile_params=FrameworkProfile(
96
+ detailed_profiling_config=DetailedProfilingConfig(start_step=10, num_steps=1),
97
+ dataloader_profiling_config=DataloaderProfilingConfig(start_step=9, num_steps=2),
98
+ python_profiling_config=PythonProfilingConfig(start_step=12, num_steps=1),
99
+ )
100
+ )
101
+
102
+ If the individual profiling configurations are specified in addition to
103
+ the step or time range,
104
+ SageMaker Debugger prioritizes the individual profiling configurations and ignores
105
+ the step or time range. For example, in the following code,
106
+ the ``start_step=1`` and ``num_steps=10`` will be ignored.
107
+
108
+ .. code-block:: python
109
+
110
+ from sagemaker.debugger import ProfilerConfig, FrameworkProfile
111
+
112
+ profiler_config=ProfilerConfig(
113
+ framework_profile_params=FrameworkProfile(
114
+ start_step=1,
115
+ num_steps=10,
116
+ detailed_profiling_config=DetailedProfilingConfig(start_step=10, num_steps=1),
117
+ dataloader_profiling_config=DataloaderProfilingConfig(start_step=9, num_steps=2),
118
+ python_profiling_config=PythonProfilingConfig(start_step=12, num_steps=1)
119
+ )
120
+ )
42
121
43
- Validates user input and fills in default values wherever necessary.
44
122
"""
45
123
46
124
def __init__ (
@@ -59,41 +137,34 @@ def __init__(
59
137
start_unix_time = None ,
60
138
duration = None ,
61
139
):
62
- """Set up the profiling configuration for framework metrics based on user input.
63
-
64
- There are three main options for the user to choose from.
65
- 1. No custom metrics configs or step range or time range specified. Default profiling is
66
- done for each set of framework metrics.
67
- 2. Custom metrics configs are specified. Do profiling for the metrics whose configs are
68
- specified and no profiling for the rest of the metrics.
69
- 3. Custom step range or time range is specified. Profiling for all of the metrics will
70
- occur with the provided step/time range. Configs with additional parameters beyond
71
- step/time range will use defaults for those additional parameters.
72
-
73
- If custom metrics configs are specified in addition to step or time range being specified,
74
- then we ignore the step/time range and default to using custom metrics configs.
140
+ """Initialize the FrameworkProfile class object.
75
141
76
142
Args:
77
- local_path (str): The path where profiler events have to be saved.
78
- file_max_size (int): Max size a trace file can be, before being rotated.
79
- file_close_interval (float): Interval in seconds from the last close, before being
80
- rotated.
81
- file_open_fail_threshold (int): Number of times to attempt to open a trace fail before
82
- marking the writer as unhealthy.
83
143
detailed_profiling_config (DetailedProfilingConfig): The configuration for detailed
84
- profiling done by the framework.
85
- dataloader_profiling_config (DataloaderProfilingConfig): The configuration for metrics
86
- collected in the data loader.
144
+ profiling. Configure it using the
145
+ :class:`~sagemaker.debugger.metrics_config.DetailedProfilingConfig` class.
146
+ Pass ``DetailedProfilingConfig()`` to use the default configuration.
147
+ dataloader_profiling_config (DataloaderProfilingConfig): The configuration for
148
+ dataloader metrics profiling. Configure it using the
149
+ :class:`~sagemaker.debugger.metrics_config.DataloaderProfilingConfig` class.
150
+ Pass ``DataloaderProfilingConfig()`` to use the default configuration.
87
151
python_profiling_config (PythonProfilingConfig): The configuration for stats
88
152
collected by the Python profiler (cProfile or Pyinstrument).
89
- horovod_profiling_config (HorovodProfilingConfig): The configuration for metrics
90
- collected by horovod when using horovod for distributed training.
91
- smdataparallel_profiling_config (SMDataParallelProfilingConfig): The configuration for
92
- metrics collected by SageMaker Distributed training.
153
+ Configure it using the
154
+ :class:`~sagemaker.debugger.metrics_config.PythonProfilingConfig` class.
155
+ Pass ``PythonProfilingConfig()`` to use the default configuration.
93
156
start_step (int): The step at which to start profiling.
94
157
num_steps (int): The number of steps to profile.
95
- start_unix_time (int): The UNIX time at which to start profiling.
96
- duration (float): The duration in seconds to profile for.
158
+ start_unix_time (int): The Unix time at which to start profiling.
159
+ duration (float): The duration in seconds to profile.
160
+
161
+ .. tip::
162
+ Available profiling range parameter pairs are
163
+ (**start_step** and **num_steps**) and (**start_unix_time** and **duration**).
164
+ The two parameter pairs are mutually exclusive, and this class validates
165
+ if one of the two pairs is used. If both pairs are specified, a
166
+ conflict error occurs.
167
+
97
168
"""
98
169
self .profiling_parameters = {}
99
170
self ._use_default_metrics_configs = False
@@ -132,6 +203,7 @@ def _process_trace_file_parameters(
132
203
rotated.
133
204
file_open_fail_threshold (int): Number of times to attempt to open a trace fail before
134
205
marking the writer as unhealthy.
206
+
135
207
"""
136
208
assert isinstance (local_path , str ), ErrorMessages .INVALID_LOCAL_PATH .value
137
209
assert (
@@ -152,13 +224,17 @@ def _process_trace_file_parameters(
152
224
def _process_metrics_configs (self , * metrics_configs ):
153
225
"""Helper function to validate and set the provided metrics_configs.
154
226
155
- In this case, the user specifies configs for the metrics they want profiled.
156
- Profiling does not occur for metrics if configs are not specified for them.
227
+ In this case,
228
+ the user specifies configurations for the metrics they want to profile.
229
+ Profiling does not occur
230
+ for metrics if the configurations are not specified for them.
157
231
158
232
Args:
159
233
metrics_configs: The list of metrics configs specified by the user.
234
+
160
235
Returns:
161
- bool: Whether custom metrics configs will be used for profiling.
236
+ bool: Indicates whether custom metrics configs will be used for profiling.
237
+
162
238
"""
163
239
metrics_configs = [config for config in metrics_configs if config is not None ]
164
240
if len (metrics_configs ) == 0 :
@@ -173,16 +249,19 @@ def _process_metrics_configs(self, *metrics_configs):
173
249
def _process_range_fields (self , start_step , num_steps , start_unix_time , duration ):
174
250
"""Helper function to validate and set the provided range fields.
175
251
176
- Profiling will occur for all of the metrics using these fields as the specified
177
- range and default parameters for the rest of the config fields (if necessary).
252
+ Profiling occurs
253
+ for all of the metrics using these fields as the specified range and default parameters
254
+ for the rest of the configuration fields (if necessary).
178
255
179
256
Args:
180
257
start_step (int): The step at which to start profiling.
181
258
num_steps (int): The number of steps to profile.
182
259
start_unix_time (int): The UNIX time at which to start profiling.
183
- duration (float): The duration in seconds to profile for.
260
+ duration (float): The duration in seconds to profile.
261
+
184
262
Returns:
185
- bool: Whether custom step or time range will be used for profiling.
263
+ bool: Indicates whether a custom step or time range will be used for profiling.
264
+
186
265
"""
187
266
if start_step is num_steps is start_unix_time is duration is None :
188
267
return False
0 commit comments