-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
Better uop coverage in the JIT optimizer #131798
Copy link
Copy link
Open
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagePerformance or resource usagetopic-JITtype-featureA feature request or enhancementA feature request or enhancement
Metadata
Metadata
Assignees
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagePerformance or resource usagetopic-JITtype-featureA feature request or enhancementA feature request or enhancement
Out of 263 total uops, 155 of these are ignored by the tier two optimizer. These represent over half of all uops by dynamic execution count.
This issue will serve as a checklist for auditing these missing uops, and adding them where they make sense. At first glance, there's quite a bit of potential here... especially around ability to narrow known output types (like
_CONTAINS_OP_SET), and the ability to narrow and remove guards on input types (like_BINARY_OP_SUBSCR_LIST_INT). As I'm going through, I'llcross outanything that doesn't seem like it makes sense to add.First, here are the 53 missing uops that each represent at least 0.1% of all uops executed:
Details
(12.1%)_SET_IP(10.1%)_CHECK_VALIDITY(6.5%)_CHECK_VALIDITY_AND_SET_IP(3.1%)_CHECK_PERIODIC(2.8%)_MAKE_WARM(1.7%)_START_EXECUTOR_GUARD_NOS_INT(1.5%)(1.0%)_BINARY_OP_SUBSCR_LIST_INT_CHECK_FUNCTION(1.0%)(0.7%)_CHECK_MANAGED_OBJECT_HAS_VALUES_ITER_CHECK_LIST(0.7%)_CONTAINS_OP_SET(0.6%)_FOR_ITER_TIER_TWO(0.6%)(0.6%)_GUARD_NOT_EXHAUSTED_LIST(0.6%)_ITER_NEXT_LIST_TIER_TWO(0.6%)_SAVE_RETURN_OFFSET_CALL_LEN(0.5%)_CALL_LIST_APPEND(0.5%)_POP_TOP(0.5%)_RESUME_CHECK(0.5%)_BINARY_OP_SUBSCR_STR_INT(0.4%)(0.4%)_GUARD_DORV_VALUES_INST_ATTR_FROM_DICT(0.4%)_GUARD_KEYS_VERSION(0.3%)_BINARY_OP_SUBSCR_DICT_CALL_BUILTIN_FAST(0.3%)_CHECK_STACK_SPACE_OPERAND(0.3%)_GET_ITER(0.3%)(0.3%)_STORE_SUBSCR(0.2%)_GUARD_NOT_EXHAUSTED_RANGE_BINARY_SLICE(0.2%)_BUILD_LIST(0.2%)_CALL_BUILTIN_O(0.2%)_CALL_NON_PY_GENERAL(0.2%)_CHECK_IS_NOT_PY_CALLABLE(0.2%)_GUARD_NOS_FLOAT(0.2%)_ITER_CHECK_RANGE(0.2%)_ITER_CHECK_TUPLE(0.2%)_LOAD_DEREF(0.2%)(0.2%)_STORE_SUBSCR_LIST_INT_BINARY_OP_EXTEND(0.1%)_CALL_ISINSTANCE(0.1%)_CALL_METHOD_DESCRIPTOR_FAST(0.1%)_CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS(0.1%)_CALL_METHOD_DESCRIPTOR_NOARGS(0.1%)_CALL_TYPE_1(0.1%)_CHECK_ATTR_CLASS(0.1%)_CONTAINS_OP_DICT(0.1%)_GUARD_BINARY_OP_EXTEND(0.1%)(0.1%)_GUARD_NOT_EXHAUSTED_TUPLE(0.1%)_ITER_NEXT_TUPLE(0.1%)_LIST_APPEND(0.1%)_STORE_ATTR_SLOT(0.1%)_STORE_SUBSCR_DICTAnd here are the 102 missing uops that are less than 0.1%. These are less important, but still may net us some wins on individual benchmarks:
Details
_BINARY_OP_SUBSCR_CHECK_FUNC_BINARY_OP_SUBSCR_TUPLE_INT_BUILD_MAP_BUILD_SET_BUILD_SLICE_BUILD_STRING_CALL_BUILTIN_CLASS_CALL_BUILTIN_FAST_WITH_KEYWORDS_CALL_INTRINSIC_1_CALL_INTRINSIC_2_CALL_KW_NON_PY_CALL_METHOD_DESCRIPTOR_O_CALL_STR_1_CALL_TUPLE_1_CHECK_ATTR_METHOD_LAZY_DICT_CHECK_EG_MATCH_CHECK_EXC_MATCH_CHECK_FUNCTION_VERSION_INLINE_CHECK_FUNCTION_VERSION_KW_CHECK_IS_NOT_PY_CALLABLE_KW_CHECK_METHOD_VERSION_CHECK_METHOD_VERSION_KW_CHECK_PERIODIC_IF_NOT_YIELD_FROM_CONVERT_VALUE_COPY_FREE_VARS_DELETE_ATTR_DELETE_DEREF_DELETE_FAST_DELETE_GLOBAL_DELETE_NAME_DELETE_SUBSCR_DEOPT_DICT_MERGE_DICT_UPDATE_END_FOR_END_SEND_ERROR_POP_N_EXIT_INIT_CHECK_EXPAND_METHOD_EXPAND_METHOD_KW_FATAL_ERROR_FORMAT_SIMPLE_FORMAT_WITH_SPEC_GET_AITER_GET_ANEXT_GET_AWAITABLE_GET_LEN_GET_YIELD_FROM_ITER_GUARD_DORV_NO_DICT_GUARD_GLOBALS_VERSION_GUARD_TOS_FLOAT_GUARD_TOS_INT_GUARD_TYPE_VERSION_AND_LOCK_IMPORT_FROM_IMPORT_NAME_IS_NONE_LIST_EXTEND_LOAD_ATTR_NONDESCRIPTOR_NO_DICT_LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES_LOAD_BUILD_CLASS_LOAD_COMMON_CONSTANT_LOAD_FAST_LOAD_FAST_LOAD_FROM_DICT_OR_DEREF_LOAD_GLOBAL_LOAD_GLOBAL_BUILTINS_LOAD_GLOBAL_MODULE_LOAD_LOCALS_LOAD_NAME_LOAD_SUPER_ATTR_ATTR_LOAD_SUPER_ATTR_METHOD_MAKE_CALLARGS_A_TUPLE_MAKE_CELL_MAKE_FUNCTION_MAP_ADD_MATCH_CLASS_MATCH_KEYS_MATCH_MAPPING_MATCH_SEQUENCE_MAYBE_EXPAND_METHOD_KW_NOP_POP_EXCEPT_POP_TWO_LOAD_CONST_INLINE_BORROW_PUSH_EXC_INFO_PUSH_NULL_CONDITIONAL_SETUP_ANNOTATIONS_SET_ADD_SET_FUNCTION_ATTRIBUTE_SET_UPDATE_STORE_ATTR_STORE_ATTR_INSTANCE_VALUE_STORE_ATTR_WITH_HINT_STORE_DEREF_STORE_FAST_LOAD_FAST_STORE_FAST_STORE_FAST_STORE_GLOBAL_STORE_NAME_STORE_SLICE_TIER2_RESUME_CHECK_UNARY_INVERT_UNARY_NEGATIVE_UNPACK_SEQUENCE_LIST_WITH_EXCEPT_STARTLinked PRs
int/float/strguards #131800_CONTAINS_OP_SETto bool #132057_BINARY_OP_SUBSCR_STR_INTtostr#132153self/NULLchecks for some known non-methods #132278dict,frozenset,list,set, andtuple#132289CALL_TYPE_1into several uops #132419sym_new_typeinstead ofsym_new_not_nullfor_BUILD_LIST,_BUILD_SLICE, and_BUILD_MAP#132434sym_new_typeinstead ofsym_new_not_nullfor _BUILD_STRING, _BUILD_SET #132564_BINARY_OP_SUBSCR_TUPLE_INT#133003isinstancefor some known arguments #133172CALL_ISINSTANCEinto several uops #133339_GET_LENto int #133345_POP_CALL_TWO_LOAD_CONST_INLINE_BORROW#134268_POP_CALL_TWO_LOAD_CONST_INLINE_BORROW#134369_CALL_ISINSTANCEfor class tuples #134543remove_unneeded_uops#134554_ITER_CHECK_TUPLE#134803_CALL_TYPE_1when the result is known #135194_UNARY_INVERT#135222_UNARY_NEGATIVE#135223_GUARD_TOS_SLICE#144470_ITER_CHECK_RANGEand_ITER_CHECK_LISTin the JIT #144583CHECK_FUNCTION_VERSIONinto two guards in the JIT #145080_FORMAT_SIMPLEand_FORMAT_WITH_SPECtostr#146639_CALL_BUILTIN_CLASSto smaller uops #148094MATCH_SEQUENCE/MAPPINGto the JIT optimizer #148124