shadPS4

mirror of https://github.com/shadps4-emu/shadPS4.git synced 2025-07-12 04:35:56 +00:00

Author	SHA1	Message	Date
Paris Oplopoios	4eaa992aff	Rename 'AddCary' to 'AddCarry' (#3206 )	2025-07-07 13:29:11 +03:00
Paris Oplopoios	146e81a56a	Fix V_ADDC_U32 carry-out edge cases (#3200 ) * Fix V_ADDC_U32 carry-out edge cases * Use IAddCarry instead	2025-07-07 12:44:06 +03:00
TheTurtle	48460d1cbe	vector_alu: Improve handling of mbcnt append/consume patterns (#3184 ) Some checks are pending Build and Release / macos-sdl (push) Blocked by required conditions Details Build and Release / reuse (push) Waiting to run Details Build and Release / clang-format (push) Waiting to run Details Build and Release / get-info (push) Waiting to run Details Build and Release / windows-sdl (push) Blocked by required conditions Details Build and Release / windows-qt (push) Blocked by required conditions Details Build and Release / macos-qt (push) Blocked by required conditions Details Build and Release / linux-sdl (push) Blocked by required conditions Details Build and Release / linux-qt (push) Blocked by required conditions Details Build and Release / linux-sdl-gcc (push) Blocked by required conditions Details Build and Release / linux-qt-gcc (push) Blocked by required conditions Details Build and Release / pre-release (push) Blocked by required conditions Details * vector_alu: Improve handling of mbcnt append/consume patterns The existing implementation was written to handle a single pattern of mbcnt before the DS_APPEND instruction v_mbcnt_hi_u32_b32 vX, exec_hi, 0 v_mbcnt_lo_u32_b32 vX, exec_lo, vX ds_append vY offset:4 gds v_add_i32 vX, vcc, vY, vX In this case however the DS_APPEND is before the mbcnt pattern ds_append vX gds v_mbcnt_hi_u32_b32 vY, exec_hi, vX v_mbcnt_lo_u32_b32 vZ, exec_lo, vY The mbcnt instructions are always in pairs of hi/lo and in general are quite flexible. But they assume the subgroup size is 64 so they are not recompiled literally. Together with DS_APPEND they are used to derive a unique per thread index in a buffer (different from using thread_id as order could be random). DS_APPEND instruction works on per subgroup level, by adding number of active threads of subgroup to the GDS counter, essentially giving a multiple-of-64 base index to all threads. Then each thread executes the mbcnt pair which returns the number of active threads with id less than the itself and adds it with the base. The recompiler translates DS_APPEND into an atomic increment of a storage buffer counter, which already gives the desired unique index, so this pattern is a no-op. On main it was set to zero as per the first pattern to avoid altering the DS_APPEND result. The new handling passes through the initial value of the pattern instead, which has the same effect but works on either case. * vk_rasterizer: Always sync DMA buffers	2025-07-03 13:19:38 +03:00
nickci2002	9eae6b57ce	V_CMP_EQ_U64 support (#3153 ) Some checks are pending Build and Release / clang-format (push) Waiting to run Details Build and Release / reuse (push) Waiting to run Details Build and Release / get-info (push) Waiting to run Details Build and Release / windows-sdl (push) Blocked by required conditions Details Build and Release / windows-qt (push) Blocked by required conditions Details Build and Release / macos-sdl (push) Blocked by required conditions Details Build and Release / macos-qt (push) Blocked by required conditions Details Build and Release / linux-sdl (push) Blocked by required conditions Details Build and Release / linux-qt (push) Blocked by required conditions Details Build and Release / linux-sdl-gcc (push) Blocked by required conditions Details Build and Release / linux-qt-gcc (push) Blocked by required conditions Details Build and Release / pre-release (push) Blocked by required conditions Details * Added V_CMP_EQ_U64 shader opcode support and added 64-bit relational operators (<,>,<=,>=) * Fixed clang-format crying because I typed xargs clang-format instead of xargs clang-format-19 * Replaced V_CMP_EQ_U64 code to match V_CMP_U32 to test * Updated V_CMP_U64 for future addons	2025-07-02 19:22:30 +03:00
Marcin Mikołajczyk	4d1a1ce9c2	v_rcp_legacy_f32 (#3040 )	2025-06-04 16:55:47 -07:00
Marcin Mikołajczyk	8fffdc3918	Handle V_CVT_F64_U32 (#3008 ) Some checks are pending Build and Release / reuse (push) Waiting to run Details Build and Release / clang-format (push) Waiting to run Details Build and Release / get-info (push) Waiting to run Details Build and Release / windows-sdl (push) Blocked by required conditions Details Build and Release / windows-qt (push) Blocked by required conditions Details Build and Release / macos-sdl (push) Blocked by required conditions Details Build and Release / macos-qt (push) Blocked by required conditions Details Build and Release / linux-sdl (push) Blocked by required conditions Details Build and Release / linux-qt (push) Blocked by required conditions Details Build and Release / linux-sdl-gcc (push) Blocked by required conditions Details Build and Release / linux-qt-gcc (push) Blocked by required conditions Details Build and Release / pre-release (push) Blocked by required conditions Details	2025-05-29 12:20:16 -07:00
Marcin Mikołajczyk	484fbcc320	Handle -1 as V_CMP_NE_U64 argument (#2919 )	2025-05-13 13:19:56 -07:00
baggins183	e816bc4b99	Use GetSrc in VALU insts instead of assuming vector reg (was vcc_lo) (#2845 ) * Use GetSrc in v_add_i32 instead of assuming vector reg (was vcc_lo) * some other cases	2025-04-25 19:44:03 -07:00
squidbus	52ab1ed04b	shader_recompiler: Implement S_FLBIT_I32_B32 and V_MUL_HI_I32. (#2793 )	2025-04-16 18:08:09 +03:00
squidbus	bec1b9056f	shader_recompiler: Misc shader fixes. (#2781 ) * shader_recompiler: Fix frexp exponent type. * shader_recompiler: Implement V_CMP_CLASS_F32 negative class mask. * shader_recompiler: Define operands for DS_ORDERED_COUNT.	2025-04-13 23:46:30 -07:00
squidbus	afd0251dd2	shader_recompiler: Use VK_AMD_shader_trinary_minmax when available. (#2739 ) * shader_recompiler: Use VK_AMD_shader_trinary_minmax when available. * shader_recompiler: Simplify signed/unsigned trinary instruction variants.	2025-04-02 23:36:54 +03:00
squidbus	cfe249debe	shader_recompiler: Replace texel buffers with in-shader buffer format interpretation (#2363 ) * shader_recompiler: Replace texel buffers with in-shader buffer format interpretation * shader_recompiler: Move 10/11-bit float conversion to functions and address some comments. * vulkan: Remove VK_KHR_maintenance5 as it is no longer needed for buffer views. * shader_recompiler: Add helpers for composites and bitfields in pack/unpack. * shader_recompiler: Use initializer_list for bitfield insert helper.	2025-02-06 20:40:49 -08:00
squidbus	56f4b8a2b8	shader_recompiler: Implement shader export formats. (#2226 )	2025-01-24 10:41:58 -08:00
squidbus	4d12de8149	hotfix: 64-bit shift fixups	2025-01-24 03:14:37 -08:00
Marcin Mikołajczyk	9dcf40e261	Handle more 64bit shifts in Translator (#1825 )	2025-01-24 03:07:36 -08:00
DanielSvoboda	1c3048ccc2	Fix V_FRACT_F64 (#2156 )	2025-01-15 16:45:02 +01:00
squidbus	82cb298c5c	shader_recompiler: Remove AMD native CubeFaceCoord. (#2129 )	2025-01-11 13:57:49 -08:00
squidbus	5810c88c00	hotfix: Fix cube instructions.	2025-01-11 12:04:46 -08:00
squidbus	725814ce01	shader_recompiler: Improvements to array and cube handling. (#2083 ) * shader_recompiler: Account for instruction array flag in image type. * shader_recompiler: Check da flag for all mimg instructions. * shader_recompiler: Convert cube images into 2D arrays. * shader_recompiler: Move image resource functions into sharp type. * shader_recompiler: Use native AMD cube instructions when possible. * specialization: Fix buffer storage mistake.	2025-01-10 10:48:12 +02:00
squidbus	86038e6a71	shader_recompiler: Fix V_CMP_U_F32 (#2082 )	2025-01-07 11:36:14 +02:00
baggins183	3c0c921ef5	Tessellation (#1528 ) * shader_recompiler: Tessellation WIP * fix compiler errors after merge DONT MERGE set log file to /dev/null DONT MERGE linux pthread bb fix save work DONT MERGE dump ir save more work fix mistake with ES shader skip list add input patch control points dynamic state random stuff * WIP Tessellation partial implementation. Squash commits * test: make local/tcs use attr arrays * attr arrays in TCS/TES * dont define empty attr arrays * switch to special opcodes for tess tcs/tes reads and tcs writes * impl tcs/tes read attr insts * rebase fix * save some work * save work probably broken and slow * put Vertex LogicalStage after TCS and TES to fix bindings * more refactors * refactor pattern matching and optimize modulos (disabled) * enable modulo opt * copyright * rebase fixes * remove some prints * remove some stuff * Add TCS/TES support for shader patching and use LogicalStage * refactor and handle wider DS instructions * get rid of GetAttributes for special tess constants reads. Immediately replace some upon seeing readconstbuffer. Gets rid of some extra passes over IR * stop relying on GNMX HsConstants struct. Change runtime_info.hs_info and some regs * delete some more stuff * update comments for current implementation * some cleanup * uint error * more cleanup * remove patch control points dynamic state (because runtime_info already depends on it) * fix potential problem with determining passthrough --------- Co-authored-by: IndecisiveTurtle <47210458+raphaelthegreat@users.noreply.github.com>	2024-12-14 12:56:17 +02:00
squidbus	f1c23d514b	shader_recompiler: Implement FREXP instructions. (#1766 )	2024-12-13 21:51:39 +02:00
squidbus	c076ba69e8	shader_recompiler: Implement V_LSHL_B64 for immediate arguments. (#1674 )	2024-12-07 23:28:17 +02:00
Daniel R.	6904764aab	shader_recompiler/frontend: implement `V_MIN3_U32`	2024-11-21 19:52:48 +01:00
Ruah Devlin	96cd79f272	Implement V_MED3_U32 vector ALU Opcode (#1553 )	2024-11-20 17:23:59 +01:00
TheTurtle	87f8fea4de	renderer_vulkan: Commize and adjust buffer bindings (#1412 ) * shader_recompiler: Implement finite cmp class * shader_recompiler: Implement more opcodes * renderer_vulkan: Commonize buffer binding * liverpool: More dma data impl * fix * copy_shader: Handle additional instructions from Knack * translator: Add V_CMPX_GE_I32	2024-10-19 15:30:58 +03:00
squidbus	09cbccb40b	shader_recompiler: Implement V_SUBB_U32 and V_SUBBREV_U32. (#1331 )	2024-10-10 19:40:19 +03:00
Mahmoud Adel	76644a0169	add Opcodes to switch case (#1233 ) * add Opcodes to switch case Added Opcodes to switch case, they were done here but weren't added to switch `9f79764b01 (diff-9a6c2e2027c03231e88aaaab30908baecae202661839f35c31a777fec2500c7aR659)` * clang	2024-10-04 11:24:45 +03:00
korenkonder	9f79764b01	Add various V_CVT opcodes (#1223 )	2024-10-04 08:48:05 +02:00
korenkonder	da519f9091	Moved opcode to it's proper location (#1221 )	2024-10-03 22:47:26 +02:00
dbz400	54dafce541	Add V_CVT_F64_I32 (#1219 )	2024-10-03 18:48:28 +02:00
dbz400	c7ff0419ad	Fix V_CMP_CLASS_F32 (#1153 )	2024-09-30 11:36:26 +03:00
jnack	beb809b612	add V_CMPX_LE_I32 (#1056 )	2024-09-24 18:22:31 +03:00
TheTurtle	ee38eec7fe	shader_recompiler: Additional scope handling and user data as push constants (#1013 ) * shader_recompiler: Use push constants for user data regs * shader: Add some GR2 instructions * shader: Add some instructions * shader: Add instructions for knack * touchups * spirv: Better names * buffer_cache: Ignore non gpu modified images * clang format * Add log * more fixes	2024-09-23 08:55:43 +02:00
squidbus	a18419dd73	shader_recompiler: Exclude non-float results from output modifiers. (#1016 )	2024-09-22 15:03:17 +03:00
korenkonder	8811cc5cc6	Add V_CVT_PK_U8_F32 opcode (#1022 )	2024-09-22 15:02:34 +03:00
TheTurtle	edde0a3e7e	hotfix: Revert ADDC change	2024-09-22 01:53:10 +03:00
squidbus	dd184fd95d	shader_recompiler: Use SetDst in more instructions. (#1015 )	2024-09-22 01:41:19 +03:00
korenkonder	07de1ee977	Sort opcodes by their indices. Group them too when applicable (#945 )	2024-09-19 20:29:56 +02:00
Raven	84e2c4d3bb	Add other 64-bit floating point shader instructions (#944 )	2024-09-17 18:01:33 +02:00
Daniel R.	dcf245b814	shader_recompiler: Implement basic 64-bit floating point support (#915 ) * shader_recompiler: Implement basic 64-bit floating point support * Fix formatting	2024-09-15 22:53:08 +02:00
TheTurtle	13743b27fc	shader_recompiler: Implement data share append and consume operations (#814 ) * shader_recompiler: Add more format swap modes * texture_cache: Handle stencil texture reads * emulator: Support loading font library * readme: Add thanks section * shader_recompiler: Constant buffers as integers * shader_recompiler: Typed buffers as integers * shader_recompiler: Separate thread bit scalars * We can assume guest shader never mixes them with normal sgprs. This helps avoid errors where ssa could view an sgpr write dominating a thread bit read, due to how control flow is structurized, even though its not possible in actual control flow * shader_recompiler: Implement data append/consume operations * clang format * buffer_cache: Simplify invalidation scheme * video_core: Remove some invalidation remnants * adjust	2024-09-07 00:14:51 +03:00
baggins183	bb29224daf	Implement V_MOVREL variants (#745 ) * shader_recompiler: Implement V_MOVRELS_B32, V_MOVRELD_B32, V_MOVRELSD_B32 Generates a ton of OpSelects to hardcode reading or writing from each possible vgpr depending on the value of m0 Future work is to do range analysis to put an upper bound on m0 and check fewer registers. * fix runtime info after rebase	2024-09-06 23:47:47 +03:00
squidbus	d48836d5ae	shader_recompiler: Limit src0 to 4-bit in V_CVT_OFF_F32_I4 (#759 )	2024-09-03 21:37:52 +03:00
TheTurtle	f087f43736	shader_recompiler: Implement render target swizzles when no format is available (#739 ) * shader_recompiler: Use null image when shader is compiled with unbound sharp * video_core: Refactor and render target swizzles * liverpool_to_vk: Add missing swap format from RDR * video_core: Refactor shader recompiler interface * Makes it much easier to pass runtime information to the recompiler and have it treated as part of the shader key. Also pulls out most runtime state from Info struct * shader_recompiler: Avoid some asserts	2024-09-03 14:04:30 +03:00
baggins183	101aeb920d	Implement V_BFM_B32 and V_FFBH_U32 (#663 ) * Implement V_BFM_B32 * Render.Recompiler: Implement V_FFBH_U32 * fix clang-format	2024-09-01 22:20:42 +03:00
TheTurtle	66e96dd944	video_core: Account of runtime state changes when compiling shaders (#575 ) * video_core: Compile shader permutations * spirv: Only specific storage image format for atomics * ir: Avoid cube coord patching for storage image * spirv: Fix default attributes * data_share: Add more instructions * video_core: Query storage flag with runtime state * kernel: Use std::list for semaphore * video_core: Use texture buffers for untyped format load/store * buffer_cache: Limit view usage * vk_pipeline_cache: Fix invalid iterator * image_view: Reduce log spam when alpha=1 in storage swizzle * video_core: More features and proper spirv feature detection * video_core: Attempt no2 for specialization * spirv: Remove conflict * vk_shader_cache: Small cleanup	2024-08-29 19:29:54 +03:00
Grégoire Hage	288db9a0cf	Implement V_LSHL_B64 (#608 )	2024-08-27 14:15:32 +03:00
DanielSvoboda	2a737d0800	V_NOP \| PfpSyncMe \| S_CMPK_EQ_U32 (#426 ) * V_NOP V_NOP = Do nothing * PfpSyncMe PfpSyncMe ensures that all previous commands are completed before continuing. 'break' should be enough for now * S_CMPK_EQ_U32 S_CMPK_EQ_U32 SCC = (D.u == SIMM16) * S_CMPK_EQ_U32 * OperandField::Undefined: * Update translate.cpp remove OperandField::Undefined: * Update image_view.cpp [Render.Vulkan] <Error> image_view.cpp:ImageViewInfo:109: Storage image (num_comps = 4) requires swizzling [BGRA] format 43 dst_sel 3886 * Update liverpool_to_vk.cpp * S_CMPK_EQ_U32 * S_CMPK_EQ_U32	2024-08-25 22:07:46 +02:00
psucien	b687ae5e34	GnmDriver: Clear context support (#567 ) * gnmdriver: added support for gpu context reset * shader_recompiler: minor validation fixes * shader_recompiler: added `V_CMPX_GT_I32` * shader_recompiler: fix for crash on inline sampler access * compilation warnings and dead code elimination * amdgpu: fix for registers addressing * libraries: videoout: reduce logging pressure * shader_recompiler: fix for devergence scope detection	2024-08-25 23:01:05 +03:00

1 2

79 commits