-
Notifications
You must be signed in to change notification settings - Fork 205
Expose http_cache parameter, with its docs and recipe #407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
46814f2
to
f88a573
Compare
I encountered failure while adopting the
This is because CLI creates multiple MSAL https://github.com/python/cpython/blob/v3.9.7/Lib/dbm/dumb.py#L215-L221
once MSAL changes the content of |
On Windows, there is no https://github.com/python/cpython/blob/v3.9.7/Lib/dbm/dumb.py#L1-L22 (from python/cpython@9f824a7 created in 1995)
https://github.com/python/cpython/blob/v3.9.7/Lib/dbm/dumb.py#L232-L235
Problems:
|
Alternative solutions I can think of is
|
Thank you, Jiashuo, for your in-depth report, as usual! Let me answer your questions backwards.
This PR introduces a generic http cache concept. When an instance of
The current PR, at its core, is defining/exposing a generic http cache as a dict-like interface. It does not tie to any specific implementation of such a dict-like object. The An earlier prototype of recipe was indeed based on
Thanks for bring me to the amusing history of Python dbm. I wasn't aware your term "its very developer" means the GvR. :-)
But it seems only those beginning lines were added at the first commit and then never changed. All the implementation have been changed since then. In particular, Jiashuo your quote above, was added in this commit, which came with this commit message: "databases are associated with corruption problems, so I studied this code carefully and ran some brutal stress tests. I didn't find any bugs...".
True. If you/we foresee multiple Azure CLI will be run concurrently, Azure CLI can consider use
This might be the very specific reason for your running into the issue. Perhaps you can try creating only one |
As I tested and inspected the source code, this task still is not implemented:
Consider a very simple snippet: import dbm.dumb
db = dbm.dumb.open(r'D:\testdbm')
db['mykey'] = 'a' * 512
db['mykey'] = 'b' * 1024 Running this script multiple times will make
This would certainly be a nice-to-have feature if MSAL wants to support concurrency.
This is exactly what I am going to do next. |
On Linux, Consider a simple script: # testdbm.py
import dbm.gnu
import time
db = dbm.gnu.open('testdbm', 'c')
time.sleep(60) # Put it in background
$ python3 testdbm.py &
[1] 1007
# This will fail
$ python3 testdbm.py
Traceback (most recent call last):
File "testdbm.py", line 4, in <module>
db = dbm.gnu.open('testdbm', 'c')
_gdbm.error: [Errno 11] Resource temporarily unavailable: 'testdbm' I also see this: https://docs.python.org/3/library/shelve.html#restrictions
|
OK, good to know. Now, given that you mentioned there could potentially be multiple Azure CLI instances running on the same machine, the goal becomes "to find a dict-like component that supports concurrent access". We may be able to sieve and find some existing packages from PyPI to see if they would fit, or we may eventually develop a new package for that. But for now, perhaps this recipe would be a good-enough workaround. |
Just want to mention that the file should be opened in binary mode https://docs.python.org/3/library/pickle.html#pickle.Pickler
|
Also, writing to the same file with 2 processes can result in corrupted file content (such as Azure/azure-cli#9427). Unpickling from corrupted binary file will result in |
Thanks for the catch! Your both comments were correct, and I have now updated the previous recipe accordingly. Please give me a confirmation, and then I'll update it into this PR, and then merge it in. |
I further noticed that the document mentions: https://docs.python.org/3/library/pickle.html#pickle.UnpicklingError
I will try to ignore all errors during unpickling. |
Good finding, and an intriguing topic.
|
0f9b4b8
to
6c92857
Compare
I admit the Python document is pretty vague about the error.
We don't know what exactly the bug is. In other words, we are not sure which kinds of corrupted files result in which kinds of exceptions - it may be possible the file is corrupted in one way which triggers For example, unpicking from an empty file results in:
|
6c92857
to
d9d0e38
Compare
msal/application.py
Outdated
IOError, # A non-exist http cache file | ||
pickle.UnpicklingError, # A corrupted http cache file | ||
EOFError, # An empty http cache file | ||
AttributeError, ImportError, IndexError. # Other corruption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc says "but not necessarily limited to". I think other errors could break the execution. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, yes. That was exactly why I mentioned earlier that "There is no way for us to list ALL exceptions that could potentially be raised by pickle.load(...)
".
But I still think those underlying exceptions would probably only be raised when an issue/bug occurs OUTSIDE OF the pickle
module, therefore they even "surprise" the pickle
module. For example, the EOFError
that you managed to produce in our lab environment, would probably NOT happen in normal concurrent pickle.dump()
, because even pickle.dumps(None)
(or pickle.dumps("")
etc.) would produce non-empty outcome. In other words, when/if EOFError
happens in real life, it would be more likely caused by issues in OUR application, such as our app incorrectly truncates the http cache file. In that case, we would want such an EOFError to bubble up, so that we would be aware and go fix it in our application code base.
In short, I still think catching those "EOFError, AttributeError, ImportError, IndexError" are not strictly needed to be caught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems EOFError, AttributeError, ImportError, IndexError
still made its way into the final commit fcf34a2.
d9d0e38
to
62752ad
Compare
This PR implements the internal proposal of http cache persistence.
Because MSAL 1.14's in-memory http cache behavior (i.e. #379) was specifically designed with a mechanism of "using a generic key-value pair storage as http cache", all the heavy-lifting has already been done in the MSAL 1.14. This PR adds only 1 line to expose the new
http_cache
parameter, and another line to wire it up. The rest of change is documentation, available for proofreading (please searchinghttp_cache
keyword in this doc_staging environment).When app developer utilizes this PR's new parameter to use a persisted http cache, MSAL Python will cache those internal "instance metadata" across different
PublicClientApplication
and/orConfidentialClientApplication
, therefore resolves #80. Consequently, it also resolves #334. UPDATE: Thanks @jiasli for testing and confirming that this PR also resolves #334.