Skip to content

[AOSP] add file descriptor support to file_data_loader #6611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 4, 2024
Merged

Conversation

davidlin54
Copy link
Contributor

@davidlin54 davidlin54 commented Nov 1, 2024

Summary:
For Google's ODP, the model will execute in a process that does not have access to the file system. Instead, the caller must open the file then pass ownership of the FD to the model executor.

  1. Added support for reading files by FD passed by the prefix "fd:///" (copied from tensorflow https://cs.android.com/android/platform/superproject/main/+/main:external/federated-compute/fcp/tensorflow/file_descriptor_filesystem.cc;l=41;bpv=1;bpt=1) into file_data_loader.
  2. Added test cases by opening file first and passing the FD to the data loader

Test Plan:

  1. run C++ tests
  2. use script with following instruction:
$ cd aosp
$ source build/envsetup.sh
$ lunch aosp_arm64-trunk_staging-userdebug
$ m executor_runner
$ FILENAME="<model_path>"
$ exec {FD}<${FILENAME}     # open file for read, assign descriptor
$ echo "Opened ${FILENAME} for read using descriptor ${FD}"
$ out/host/linux-x86/bin/executor_runner --model_path fd:///${FD} --is_fd_uri=true
$ exec {FD}<&-    # close file

result:

Opened out/host/linux-x86/bin/add.pte for read using descriptor 10
I 00:00:00.000256 executorch:executor_runner.cpp:94] Model file fd:///10 is loaded.
I 00:00:00.000279 executorch:executor_runner.cpp:103] Using method forward
I 00:00:00.000282 executorch:executor_runner.cpp:150] Setting up planned buffer 0, size 48.
I 00:00:00.000294 executorch:executor_runner.cpp:173] Method loaded.
I 00:00:00.000310 executorch:executor_runner.cpp:183] Inputs prepared.
I 00:00:00.000336 executorch:executor_runner.cpp:192] Model executed successfully.
I 00:00:00.000345 executorch:executor_runner.cpp:196] 1 outputs: 
Output 0: tensor(sizes=[1], [2.])

Copy link

pytorch-bot bot commented Nov 1, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6611

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 83f3ff9 with merge base 97a4600 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 1, 2024
@@ -26,6 +26,24 @@ namespace extension {
*/
class FileDataLoader final : public executorch::runtime::DataLoader {
public:
/**
* Creates a new FileDataLoader that wraps the named file descriptor.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be mention that the fd ownership will be passed?

Copy link
Contributor Author

@davidlin54 davidlin54 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does passing the FD ownership mean that ET is responsible for closing the file afterwards? or is ownership of the FD independent of ownership of the file opened?

edit: read the destructor for FileDataLoader, makes sense now!

@@ -0,0 +1,6 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not necessary to put this file to repo. Maybe just mention it in commit message?

@kirklandsign
Copy link
Contributor

Overall LGTM! Let's try to merge to main, not aosp-test?

I will rebase aosp-test later

lind added 2 commits November 4, 2024 08:59
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@davidlin54 davidlin54 changed the base branch from aosp-test to main November 4, 2024 17:01
int fd = parsed_fd.get();

// Cache the file size.
struct stat st;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's possible to factor the common code between here and FileDataLoader::from?

@davidlin54
Copy link
Contributor Author

@kirklandsign moved centralized logic to a private method

@davidlin54 davidlin54 merged commit 3a1f8d2 into main Nov 4, 2024
37 of 38 checks passed
@davidlin54 davidlin54 deleted the lind/aosp branch November 4, 2024 23:14
davidlin54 pushed a commit to davidlin54/executorch that referenced this pull request Nov 5, 2024
Summary: pytorch#6611 causes internal builds to break due to missing static qualifier. adding it here

Differential Revision: D65490319
davidlin54 pushed a commit that referenced this pull request Nov 5, 2024
@dbort
Copy link
Contributor

dbort commented Nov 5, 2024

This shouldn't have been part of FileDataLoader; we should create a new DataLoader with this functionality. This also caused the test-binary-size-linux-gcc job to start failing: https://github.com/pytorch/executorch/actions/runs/11674336719/job/32509252770

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants