pytorch · shoumikhin · Apr 21, 2025 · Apr 19, 2025 · Apr 19, 2025 · Apr 19, 2025
@@ -133,7 +133,7 @@ will be executed on the GPU.
 
 
 ::::{note}
-The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/partitioner/supported_ops.py)
+The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/op_registry.py#L194)
 Vulkan partitioner code can be inspected to examine which ops are currently
 implemented in the Vulkan delegate.
 ::::

@@ -399,9 +399,9 @@ BUILTIN_STL_SUPPORT    = NO
 CPP_CLI_SUPPORT        = NO
 
 # Set the SIP_SUPPORT tag to YES if your project consists of sip (see:
-# https://www.riverbankcomputing.com/software/sip/intro) sources only. Doxygen
-# will parse them like normal C++ but will assume all classes use public instead
-# of private inheritance when no explicit protection keyword is present.
+# https://python-sip.readthedocs.io/en/stable/introduction.html) sources only.
+# Doxygen will parse them like normal C++ but will assume all classes use public
+# instead of private inheritance when no explicit protection keyword is present.
 # The default value is: NO.
 
 SIP_SUPPORT            = NO
@@ -1483,8 +1483,9 @@ HTML_INDEX_NUM_ENTRIES = 100
 # output directory. Running make will produce the docset in that directory and
 # running make install will install the docset in
 # ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find it at
-# startup. See https://developer.apple.com/library/archive/featuredarticles/Doxy
-# genXcode/_index.html for more information.
+# startup. See
+# https://developer.apple.com/library/archive/featuredarticles/DoxygenXcode/_index.html
+# for more information.
 # The default value is: NO.
 # This tag requires that the tag GENERATE_HTML is set to YES.
 

@@ -89,7 +89,7 @@ executorch
 
 ***AoT (Ahead-of-Time) Components***:
 
-The AoT folder contains all of the python scripts and functions needed to export the model to an ExecuTorch `.pte` file. In our case, [export_example.py](https://github.com/pytorch/executorch/blob/main/backends/cadence/aot/export_example.py) is an API that takes a model (nn.Module) and representative inputs and runs it through the quantizer (from [quantizer.py](https://github.com/pytorch/executorch/blob/main/backends/cadence/aot/quantizer.py)). Then a few compiler passes, also defined in [quantizer.py](https://github.com/pytorch/executorch/blob/main/backends/cadence/aot/quantizer.py), will replace operators with custom ones that are supported and optimized on the chip. Any operator needed to compute things should be defined in [ops_registrations.py](https://github.com/pytorch/executorch/blob/main/backends/cadence/aot/ops_registrations.py) and have corresponding implemetations in the other folders.
+The AoT folder contains all of the python scripts and functions needed to export the model to an ExecuTorch `.pte` file. In our case, [export_example.py](https://github.com/pytorch/executorch/blob/main/backends/cadence/aot/export_example.py) is an API that takes a model (nn.Module) and representative inputs and runs it through the quantizer (from [quantizer.py](https://github.com/pytorch/executorch/blob/main/backends/cadence/aot/quantizer/quantizer.py)). Then a few compiler passes, also defined in [quantizer.py](https://github.com/pytorch/executorch/blob/main/backends/cadence/aot/quantizer/quantizer.py), will replace operators with custom ones that are supported and optimized on the chip. Any operator needed to compute things should be defined in [ops_registrations.py](https://github.com/pytorch/executorch/blob/main/backends/cadence/aot/ops_registrations.py) and have corresponding implemetations in the other folders.
 
 ***Operators***:
 
@@ -115,8 +115,8 @@ python3 -m examples.portable.scripts.export --model_name="add"
 ***Quantized Operators***:
 
 The other, more complex model are custom operators, including:
-  - a quantized [linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) operation. The model is defined [here](https://github.com/pytorch/executorch/blob/main/examples/cadence/operators/quantized_linear_op.py#L28). Linear is the backbone of most Automatic Speech Recognition (ASR) models.
-  - a quantized [conv1d](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html) operation. The model is defined [here](https://github.com/pytorch/executorch/blob/main/examples/cadence/operators/quantized_conv1d_op.py#L36). Convolutions are important in wake word and many denoising models.
+  - a quantized [linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) operation. The model is defined [here](https://github.com/pytorch/executorch/blob/main/examples/cadence/operators/test_quantized_linear_op.py#L30). Linear is the backbone of most Automatic Speech Recognition (ASR) models.
+  - a quantized [conv1d](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html) operation. The model is defined [here](https://github.com/pytorch/executorch/blob/main/examples/cadence/operators/test_quantized_conv1d_op.py#L40). Convolutions are important in wake word and many denoising models.
 
 In both cases the generated file is called `CadenceDemoModel.pte`.
 

@@ -133,7 +133,7 @@ will be executed on the GPU.
 
 
 ::::{note}
-The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/partitioner/supported_ops.py)
+The [supported ops list](https://github.com/pytorch/executorch/blob/main/backends/vulkan/op_registry.py#L194)
 Vulkan partitioner code can be inspected to examine which ops are currently
 implemented in the Vulkan delegate.
 ::::

@@ -192,7 +192,7 @@
 # Example configuration for intersphinx: refer to the Python standard library.
 intersphinx_mapping = {
     "python": ("https://docs.python.org/", None),
-    "numpy": ("https://docs.scipy.org/doc/numpy/", None),
+    "numpy": ("https://numpy.org/doc/stable/", None),
     "torch": ("https://pytorch.org/docs/stable/", None),
 }
 

@@ -92,8 +92,8 @@ Before you can start writing any code, you need to get a copy of ExecuTorch code
     Depending on how you cloned your repo (HTTP, SSH, etc.), this should print something like:
 
     ```bash
-    origin  https://github.com/YOUR_GITHUB_USERNAME/executorch.git (fetch)
-    origin  https://github.com/YOUR_GITHUB_USERNAME/executorch.git (push)
+    origin  https://github.com/{YOUR_GITHUB_USERNAME}/executorch.git (fetch)
+    origin  https://github.com/{YOUR_GITHUB_USERNAME}/executorch.git (push)
     upstream        https://github.com/pytorch/executorch.git (fetch)
     upstream        https://github.com/pytorch/executorch.git (push)
     ```

@@ -20,4 +20,4 @@ We provide access to all the profiling data via the Python [Inspector API](model
     - Through the Inspector API, users can do a wide range of analysis varying from printing out performance details to doing more finer granular calculation on module level.
 
 
-Please refer to the [Developer Tools tutorial](https://pytorch.org/executorch/main/tutorials/devtools-integration-tutorial.rst) for a step-by-step walkthrough of the above process on a sample model.
+Please refer to the [Developer Tools tutorial](https://pytorch.org/executorch/main/tutorials/devtools-integration-tutorial) for a step-by-step walkthrough of the above process on a sample model.
@@ -9,7 +9,7 @@
 Template Tutorial
 =================
 
-**Author:** `FirstName LastName <https://github.com/username>`_
+**Author:** `FirstName LastName <https://github.com/{username}>`_
 
 .. grid:: 2
 

@@ -59,8 +59,8 @@ You can also directly specify an AAR file in the app. We upload pre-built AAR to
 ### Snapshots from main branch
 
 Starting from 2025-04-12, you can download nightly `main` branch snapshots:
-* `executorch.aar`: `https://ossci-android.s3.amazonaws.com/executorch/release/snapshot-YYYYMMDD/executorch.aar`
-* `executorch.aar.sha256sums`: `https://ossci-android.s3.amazonaws.com/executorch/release/snapshot-YYYYMMDD/executorch.aar.sha256sums`
+* `executorch.aar`: `https://ossci-android.s3.amazonaws.com/executorch/release/snapshot-{YYYYMMDD}/executorch.aar`
+* `executorch.aar.sha256sums`: `https://ossci-android.s3.amazonaws.com/executorch/release/snapshot-{YYYYMMDD}/executorch.aar.sha256sums`
 * Replace `YYYYMMDD` with the actual date you want to use.
 * AAR file is generated by [this workflow](https://github.com/pytorch/executorch/blob/c66b37d010c88a113560693b14dc6bd112593c11/.github/workflows/android-release-artifacts.yml#L14-L15).
 

@@ -73,7 +73,7 @@ python -m examples.models.llama.export_llama --model "llama3_2" --checkpoint <pa
 ```
 For convenience, an [exported ExecuTorch bf16 model](https://huggingface.co/executorch-community/Llama-3.2-1B-ET/blob/main/llama3_2-1B.pte) is available on Hugging Face. The export was created using [this detailed recipe notebook](https://huggingface.co/executorch-community/Llama-3.2-1B-ET/blob/main/ExportRecipe_1B.ipynb).
 
-For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-llama-3.2-lightweight-models-(1b/3b)-).
+For more detail using Llama 3.2 lightweight models including prompt template, please go to our official [website](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2/#-llama-3.2-lightweight-models-(1b/3b)-).
 
 ### For Llama 3.1 and Llama 2 models
 
@@ -134,7 +134,7 @@ BUCK2_RELEASE_DATE="2024-12-16"
 BUCK2_ARCHIVE="buck2-aarch64-apple-darwin.zst"
 BUCK2=".venv/bin/buck2"
 
-curl -LO "https://github.com/facebook/buck2/releases/download/$BUCK2_RELEASE_DATE/$BUCK2_ARCHIVE"
+curl -LO "https://github.com/facebook/buck2/releases/download/${BUCK2_RELEASE_DATE}/${BUCK2_ARCHIVE}"
 zstd -cdq "$BUCK2_ARCHIVE" > "$BUCK2" && chmod +x "$BUCK2"
 rm "$BUCK2_ARCHIVE"
 

@@ -63,7 +63,7 @@ shuffle: True
 batch_size: 1
 ```
 
-Torchtune supports datasets using huggingface dataloaders, so custom datasets could also be defined. For examples on defining your own datasets, review the [torchtune docs](https://pytorch.org/torchtune/stable/tutorials/datasets.html#hugging-face-datasets).
+Torchtune supports datasets using huggingface dataloaders, so custom datasets could also be defined. For examples on defining your own datasets, review the [torchtune docs](https://pytorch.org/torchtune/stable/basics/text_completion_datasets.html#loading-text-completion-datasets-from-hugging-face).
 
 ### Loss
 

@@ -17,7 +17,7 @@ pip install -U "huggingface_hub[cli]"
 huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Llama-8B --local-dir /target_dir/DeepSeek-R1-Distill-Llama-8B --local-dir-use-symlinks False
 ```
 
-2. Download the [tokenizer.model](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/original/tokenizer.model) from the Llama3.1 repo which will be needed later on when running the model using the runtime.
+2. Download the [tokenizer.model](https://huggingface.co/meta-llama/Llama-3.1-8B/tree/main/original) from the Llama3.1 repo which will be needed later on when running the model using the runtime.
 
 3. Convert the model to pth file.
 ```
@@ -48,16 +48,13 @@ print("saving checkpoint")
 torch.save(sd, "/tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/checkpoint.pth")
 ```
 
-4. Download and save the params.json file
-```
-wget https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/original/params.json -o /tmp/params.json
-```
+4. Download and save the [params.json](https://huggingface.co/meta-llama/Llama-3.1-8B/tree/main/original) file.
 
 5. Generate a PTE file for use with the Llama runner.
 ```
 python -m examples.models.llama.export_llama \
     --checkpoint /tmp/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/checkpoint.pth \
-	-p /tmp/params.json \
+	-p params.json \
 	-kv \
 	--use_sdpa_with_kv_cache \
 	-X \

@@ -124,9 +124,9 @@ class TestImageTransform:
     same output as the reference model.
 
     Reference model: CLIPImageTransform
-        https://github.com/pytorch/torchtune/blob/main/torchtune/models/clip/inference/_transforms.py#L115
+        https://github.com/pytorch/torchtune/blob/main/torchtune/models/clip/inference/_transform.py#L127
     Eager and exported models: _CLIPImageTransform
-        https://github.com/pytorch/torchtune/blob/main/torchtune/models/clip/inference/_transforms.py#L26
+        https://github.com/pytorch/torchtune/blob/main/torchtune/models/clip/inference/_transform.py#L28
     """
 
     models_no_resize = initialize_models(resize_to_max_canvas=False)
@@ -147,7 +147,7 @@ def prepare_inputs(
             without distortion.
 
         These calculations are done by the reference model inside __init__ and __call__
-        https://github.com/pytorch/torchtune/blob/main/torchtune/models/clip/inference/_transforms.py#L115
+        https://github.com/pytorch/torchtune/blob/main/torchtune/models/clip/inference/_transform.py#L198
         """
         image_tensor = F.to_dtype(
             F.grayscale_to_rgb_image(F.to_image(image)), scale=True

@@ -19,7 +19,7 @@ Note that the pre-compiled context binaries could not be futher fine-tuned for o
 2. Follow instructions in https://huggingface.co/qualcomm/Llama-v2-7B-Chat to export context binaries (will take some time to finish)
 
 ```bash
-# tokenizer.model: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer.model
+# tokenizer.model: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main
 # tokenizer.bin:
 python -m examples.models.llama.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin
 ```
@@ -54,4 +54,4 @@ Please refer to [Check context binary version](../../README.md#check-context-bin
 ```bash
 # AIHUB_CONTEXT_BINARIES: ${PATH_TO_AIHUB_WORKSPACE}/build/llama_v3_8b_chat_quantized
 python examples/qualcomm/qaihub_scripts/llama/llama3/qaihub_llama3_8b.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --context_binaries ${AIHUB_CONTEXT_BINARIES} --tokenizer_model tokenizer.model --prompt "What is baseball?"
-```
+```
@@ -103,8 +103,7 @@ def get_fine_tuned_mobilebert(artifacts_dir, pretrained_weight, batch_size):
 
     # grab dataset
     url = (
-        "https://raw.githubusercontent.com/susanli2016/NLP-with-Python"
-        "/master/data/title_conference.csv"
+        "https://raw.githubusercontent.com/susanli2016/NLP-with-Python/master/data/title_conference.csv"
     )
     content = requests.get(url, allow_redirects=True).content
     data = pd.read_csv(BytesIO(content))

@@ -241,7 +241,7 @@ using namespace c10::xpu;
 #ifdef __HIPCC__
 // Unlike CUDA, HIP requires a HIP header to be included for __host__ to work.
 // We do this #include here so that C10_HOST_DEVICE and friends will Just Work.
-// See https://github.com/ROCm-Developer-Tools/HIP/issues/441
+// See https://github.com/ROCm/hip/issues/441
 #include <hip/hip_runtime.h>
 #endif
 

@@ -9,6 +9,7 @@ set -euo pipefail
 
 status=0
 green='\e[1;32m'; red='\e[1;31m'; cyan='\e[1;36m'; yellow='\e[1;33m'; reset='\e[0m'
+user_agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"
 last_filepath=
 
 while IFS=: read -r filepath url; do
@@ -18,7 +19,7 @@ while IFS=: read -r filepath url; do
   fi
   code=$(curl -gsLm30 -o /dev/null -w "%{http_code}" -I "$url") || code=000
   if [ "$code" -ge 400 ]; then
-    code=$(curl -gsLm30 -o /dev/null -w "%{http_code}" -r 0-0 -A "Mozilla/5.0" "$url") || code=000
+    code=$(curl -gsLm30 -o /dev/null -w "%{http_code}" -r 0-0 -A "$user_agent" "$url") || code=000
   fi
   if [ "$code" -ge 200 ] && [ "$code" -lt 400 ]; then
     printf "${green}%s${reset} ${cyan}%s${reset}\n" "$code" "$url"
@@ -27,17 +28,20 @@ while IFS=: read -r filepath url; do
     status=1
   fi
 done < <(
-  git --no-pager grep --no-color -I -o -E \
-    'https?://[^[:space:]<>\")\{\(\$]+' \
+  git --no-pager grep --no-color -I -P -o \
+    '(?<!git\+)(?<!\$\{)https?://(?![^\s<>\")]*[\{\}\$])[^[:space:]<>\")\[\]\(]+' \
     -- '*' \
     ':(exclude).*' \
     ':(exclude)**/.*' \
     ':(exclude)**/*.lock' \
     ':(exclude)**/*.svg' \
     ':(exclude)**/*.xml' \
+    ':(exclude)**/*.gradle*' \
+    ':(exclude)**/*gradle*' \
     ':(exclude)**/third-party/**' \
-  | sed 's/[[:punct:]]*$//' \
+  | sed -E 's/[^/[:alnum:]]+$//' \
   | grep -Ev '://(0\.0\.0\.0|127\.0\.0\.1|localhost)([:/])' \
+  | grep -Ev 'fwdproxy:8080' \
   || true
 )
 

diff --git a/setup.py b/setup.py
@@ -606,8 +606,8 @@ def run(self):
         # be found in the pip package. This is the subset of headers that are
         # essential for building custom ops extensions.
         # TODO: Use cmake to gather the headers instead of hard-coding them here.
-        # For example: https://discourse.cmake.org/t/installing-headers-the-modern-
-        # way-regurgitated-and-revisited/3238/3
+        # For example:
+        # https://discourse.cmake.org/t/installing-headers-the-modern-way-regurgitated-and-revisited/3238/3
         for include_dir in [
             "runtime/core/",
             "runtime/kernel/",

@@ -220,8 +220,7 @@ def get_cudnn_version(run_lambda):
         cudnn_cmd = '{} /R "{}\\bin" cudnn*.dll'.format(where_cmd, cuda_path)
     elif get_platform() == "darwin":
         # CUDA libraries and drivers can be found in /usr/local/cuda/. See
-        # https://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html#install
-        # https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#installmac
+        # https://docs.nvidia.com/cuda/archive/10.1/cuda-installation-guide-mac-os-x/index.html#3.2-Install
         # Use CUDNN_LIBRARY when cudnn library is installed elsewhere.
         cudnn_cmd = "ls /usr/local/cuda/lib/libcudnn*"
     else:

@@ -44,7 +44,7 @@ def _from_pstat_to_static_html(stats: Stats, html_filename: str):
         html_filename: Output filename in which populated template is rendered
     """
     RESTR = r'(?<!] \+ ")/static/'
-    REPLACE_WITH = "https://cdn.rawgit.com/jiffyclub/snakeviz/v0.4.2/snakeviz/static/"
+    REPLACE_WITH = "https://cdn.jsdelivr.net/gh/jiffyclub/snakeviz@v0.4.2/snakeviz/static/"
 
     if not isinstance(html_filename, str):
         raise ValueError("A valid file name must be provided.")
Original file line number	Diff line number	Diff line change
Expand Up		@@ -20,4 +20,4 @@ We provide access to all the profiling data via the Python [Inspector API](model
		- Through the Inspector API, users can do a wide range of analysis varying from printing out performance details to doing more finer granular calculation on module level.


		Please refer to the [Developer Tools tutorial](https://pytorch.org/executorch/main/tutorials/devtools-integration-tutorial.rst) for a step-by-step walkthrough of the above process on a sample model.
		Please refer to the [Developer Tools tutorial](https://pytorch.org/executorch/main/tutorials/devtools-integration-tutorial) for a step-by-step walkthrough of the above process on a sample model.