Skip to content

Add support for internal enums via stub generation #7212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bwoebi
Copy link
Member

@bwoebi bwoebi commented Jul 3, 2021

This uses persistently malloc()'ed objects, which get duplicated to an request-unique object.
Those request-unique objects are supposed to be immutable for the lifetime of the request, i.e. they can be just efree()'d without issues.

This patch requires nikic/PHP-Parser#792 (and a subsequent update of the tag) before merging. Done.

@bwoebi bwoebi force-pushed the internal_enums branch 2 times, most recently from a6b1741 to 53bdf98 Compare July 3, 2021 01:59
@bwoebi bwoebi force-pushed the internal_enums branch 7 times, most recently from 40a7b1c to 75f8e12 Compare July 3, 2021 19:02
Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use more tests, especially things like unserialize. Can't comment on persistent objects.

}

return null;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary nesting.


case "string":
$suffix = "_string";
$additionalParams = ", \"$value\"";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work for simple string values, but I guess that's enough for stubs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've been inspired by other places using string values… but an addslashes() would do no harm though

@@ -47,6 +48,22 @@ static zend_always_inline zval *zend_enum_fetch_case_value(zend_object *zobj)
return OBJ_PROP_NUM(zobj, 1);
}

ZEND_API zend_class_entry *zend_register_internal_enum(const char *name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these variations? I often find it the many functions hard to navigate. Maybe that's just me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends … well, you have to know that IS_UNDEF means unbacked and NULL means no functions (i.e. you don't need to define an empty array with just ZEND_FE_END). It's not always obvious what you would have to do if these simpler functions wouldn't be there.

Especially the distinction backed vs. unbacked makes sense to me

@bwoebi
Copy link
Member Author

bwoebi commented Jul 3, 2021

Good point, didn't think about deserialization at all. Will check that out. Also need to add a test for assignment to simple constant (there shall be no constant propagation of enum objects).

@bwoebi bwoebi force-pushed the internal_enums branch 4 times, most recently from 59f5465 to 892a15e Compare July 3, 2021 22:50
@bwoebi
Copy link
Member Author

bwoebi commented Jul 3, 2021

Deserialization indeed had only ZEND_COPY instead of ZEND_COPY_OR_DUP. Fixed that. Added test. Anything else you can think of where a test would be needed? @iluuu1994
Note that for most cases it should be identical to userland enums, just that the persistent ZEND_OBJECT should never leak into userland. I.e. all places possibly accessing class constants must prevent their propagation, but that's in general covered by ZEND_COPY_OR_DUP() which applies also to internal arrays being in class constants.

This uses persistently malloc()'ed objects, which get duplicated to an request-unique object.
Those request-unique objects are supposed to be immutable for the lifetime of the request, i.e. they can be just efree()'d without issues.
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so here is my understanding of this patch:

  • Persistent objects are malloced and get a GC_PERSISTENT flag. We don't have space for this flag, so one bit of the GC root space is stolen.
  • Persistent objects cannot be returned into userland code and need to be duplicated first. It looks like you missed doing this at least in the from/tryFrom and cases implementation. Try running with ZEND_RC_DEBUG=1: https://gist.github.com/nikic/5bd0c0f1280c217c957ef8fa55c82f8e.
  • Persistent objects are duplicated in the following way: There is a single allocation for all persistent objects, whose number and size is known at startup. The persistent object stores the offset into this allocation. If the refcount of the corresponding object is zero, it is initialized once using a shallow copy. All per-request objects are freed at once at the end of the request and cannot be individually freed.

I think the principal alternative to this approach is to do the same as in userland, i.e. use CONSTANT_AST constant values that evaluate to the (non-persistent objects). However, this would require support for mutable_data for internal classes. I think overall this may be the cleaner solution, because it brings handling of internal and userland classes closer and does not require introduction of any new concepts.

Anyway, I would like to hear @dstogov's thoughts on this patch and the persistent object concept.

return zend_register_internal_backed_enum_ex(name, IS_UNDEF, NULL);
}

ZEND_API zend_class_entry *zend_register_internal_enum_ex(const char *name, const zend_function_entry *functions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd combine these two functions. Especially as these are intended for use with stubs, having an _ex variant to save a NULL doesn't seem particularly useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tend to agree, I should just distinguish backed vs unbacked.

zend_class_constant *c = zend_declare_class_constant_ex(ce, name, &casezv, ZEND_ACC_PUBLIC, NULL);
ZEND_CLASS_CONST_FLAGS(c) |= ZEND_CLASS_CONST_IS_CASE;

zend_string_release(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you releasing the name here? APIs generally shouldn't release values they are passed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's unergonomic to do zend_string * ... = zend_string_init(..); zend_register_enum_case(...); zend_string_release(...); it essentially becomes three statements instead of just one zend_register_enum_case(... zend_string_init (..)) … I particular I cannot imagine a case where you would want to retain the zend_string afterwards.

_zend_object_minimal_init(obj, ce);
GC_TYPE_INFO(obj) |= IS_OBJ_PERSISTENT << GC_FLAGS_SHIFT;
obj->handle = zend_objects_persistent_handle_counter;
zend_objects_persistent_handle_counter += (alloc_size - 1) / sizeof(zend_object) + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're doing something like this anyway, why is persistent_objects/zend_objects_persistent_handle_counter quantized to zend_objects rather than just a raw memory region?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably does not make too much sense, the main reason is that I'm accessing EG(persistent_objects) + obj->handle where EG(persistent_objects) is a zend_object-array. But could just cast to char and back to zend_object at that point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, why is it zend_object* in the first place, rather than char*?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it contains objects :-D that was the logical thought I had, but obviously, this chunk of memory can just be made more generic and the zend_objects_persistent_handle_counter just be a counter. There's no reason why this chunk of memory cannot be just shared between multiple potential usages. (and then I'd also decouple that "handle_counter" from zend_objects.c)

@bwoebi
Copy link
Member Author

bwoebi commented Jul 5, 2021

It looks like you missed doing this at least in the from/tryFrom and cases implementation.

Yes I did. Arghs. I checked around where constants are accessed, but forgot to check the parts where the enums are directly accessed :-/

I think the principal alternative to this approach is to do the same as in userland, i.e. use CONSTANT_AST constant values that evaluate to the (non-persistent objects). However, this would require support for mutable_data for internal classes. I think overall this may be the cleaner solution, because it brings handling of internal and userland classes closer and does not require introduction of any new concepts.

You precisely found the issue. Doing the same as userland is not so easy though, because the constant gets replaced by the unique object. (in userland) ... In internal there is just one overall copy of the constants table. If we'd replace it there with ast evaluation, it would break. Creating a new object obviously is out of question as well.

Also, how would you even implement that mutable data? In particular with ZTS, you need thread local mutable data. Which presumably would live somewhere in EG() in some mapping I guess. At least it's not possible to just do ce->thread_local_mutable_data = &zend_tls_marked_variable; because the address is unique per thread... (EDIT: technically you could do ce->thread_local_mutable_getter = &function_returning_pointer_to_zend_tls_data;)
At that point, obviously, we could have an AST node, which, when evaluated goes to look up into the mapping for mutable class data and from there - needs to initialize that mutable data if not present yet, then check if the object exists and if not, do a shallow copy as emalloc()'ed refcountable object.

Maybe I'm overestimating what would need to be done, but yeah, how do you imagine that working out? If you find a nice easy way.

Comment on lines +499 to +500
zend_string_release(name);
zend_string_release(str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static zend_always_inline void zend_string_release(zend_string *s)
{
	if (!ZSTR_IS_INTERNED(s)) {
		if (GC_DELREF(s) == 0) {
			pefree(s, GC_FLAGS(s) & IS_STR_PERSISTENT);
		}
	}
}
Suggested change
zend_string_release(name);
zend_string_release(str);

str = zend_new_interned_string(str);

zval namezv;
ZVAL_STR_COPY(&namezv, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ZVAL_STR_COPY(&namezv, name);
ZVAL_INTERNED_STR(&namezv, name);

}

zval zv;
ZVAL_STR(&zv, str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ZVAL_STR(&zv, str);
ZVAL_INTERNED_STR(&zv, str);

ZVAL_LONG(&zv, value);
zend_object *case_obj = add_internal_enum_case(ce, name, &zv);

zend_string_release(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments on ZVAL_INTERNED_STR and not needing zval_string_released on zend_new_interned_string. It seems like both zend_string_init_interned_permanent and zend_new_interned_string_request return interned strings where refcounting or freeing isn't necessary.

The code works but it's longer than it needs to be?

@nikic
Copy link
Member

nikic commented Jul 23, 2021

I've posted an alternative implementation at #7302.

nikic added a commit that referenced this pull request Jul 27, 2021
This adds support for internal enums with the same basic approach
as userland enums. Enum values are stored as CONSTANT_AST and
objects created during constant updating at runtime. This means
that we need to use mutable_data for internal enums.

This just adds basic support and APIs, it does not include the
stubs integration from #7212.

Closes GH-7302.
@nikic
Copy link
Member

nikic commented Aug 31, 2021

Added the stub generation part in 32d4821, so internal enum support should now be complete.

@nikic nikic closed this Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants