Skip to content

fix: fix repack pipeline step by putting inference.py in "code" sub dir #2342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 13, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion src/sagemaker/workflow/_repack_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@
import tarfile
import tempfile

# Repack Model
# The following script is run via a training job which takes an existing model and a custom
# entry point script as arguments. The script creates a new model archive with the custom
# entry point in the "code" directory along with the existing model. Subsequently, when the model
# is unpacked for inference, the custom entry point will be used.
# Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-toolkits.html

# distutils.dir_util.copy_tree works way better than the half-baked
# shutil.copytree which bombs on previously existing target dirs...
# alas ... https://bugs.python.org/issue10948
Expand All @@ -33,17 +40,28 @@
parser.add_argument("--model_archive", type=str, default="model.tar.gz")
args = parser.parse_args()

# the data directory contains a model archive generated by a previous training job
data_directory = "/opt/ml/input/data/training"
model_path = os.path.join(data_directory, args.model_archive)

# create a temporary directory
with tempfile.TemporaryDirectory() as tmp:
local_path = os.path.join(tmp, "local.tar.gz")
# copy the previous training job's model archive to the temporary directory
shutil.copy2(model_path, local_path)
src_dir = os.path.join(tmp, "src")
# create the "code" directory which will contain the inference script
os.makedirs(os.path.join(src_dir, "code"))
# extract the contents of the previous training job's model archive to the "src"
# directory of this training job
with tarfile.open(name=local_path, mode="r:gz") as tf:
tf.extractall(path=src_dir)

# generate a path to the custom inference script
entry_point = os.path.join("/opt/ml/code", args.inference_script)
shutil.copy2(entry_point, os.path.join(src_dir, args.inference_script))
# copy the custom inference script to the "src" dir
shutil.copy2(entry_point, os.path.join(src_dir, "code", args.inference_script))

# copy the "src" dir, which includes the previous training job's model and the
# custom inference script, to the output of this training job
copy_tree(src_dir, "/opt/ml/model")