Commit graph

95 commits

Author SHA1 Message Date
Lander Gallastegi
f9bbde9c79
video_core: Implement DMA. (#2819)
* Import memory

* 64K pages and fix memory mapping

* Queue coverage

* Buffer syncing, faulted readback adn BDA in Buffer

* Base DMA implementation

* Preparations for implementing SPV DMA access

* Base impl (pending 16K pages and getbuffersize)

* 16K pages and stack overflow fix

* clang-format

* clang-format but for real this time

* Try to fix macOS build

* Correct decltype

* Add testing log

* Fix stride and patch phi node blocks

* No need to check if it is a deleted buffer

* Clang format once more

* Offset in bytes

* Removed host buffers (may do it in another PR)

Also some random barrier fixes

* Add IR dumping from my read-const branch

* clang-format

* Correct size insteed of end

* Fix incorrect assert

* Possible fix for NieR deadlock

* Copy to avoid deadlock

* Use 2 mutexes insteed of copy

* Attempt to range sync error

* Revert "Attempt to range sync error"

This reverts commit dd287b48682b50f215680bb0956e39c2809bf3fe.

* Fix size truncated when syncing range

And memory barrier

* Some fixes (and async testing (doesn't work))

* Use compute to parse fault buffer

* Process faults on submit

* Only sync in the first time we see a readconst

Thsi is partialy wrong. We need to save the state into the submission context itself, not the rasterizer since we can yield and process another sumission (if im not understanding wrong).

* Use spec const and 32 bit atomic

* 32 bit counter

* Fix store_index

* Better sync (WIP, breaks PR now)

* Fixes for better sync

* Better sync

* Remove memory coveragte logic

* Point sirit to upstream

* Less waiting and barriers

* Correctly checkout moltenvk

* Bring back applying pending operations in wait

* Sync the whole buffer insteed of only the range

* Implement recursive shared/scoped locks

* Iterators

* Faster syncing with ranges

* Some alignment fixes

* fixed clang format

* Fix clang-format again

* Port page_manager from readbacks-poc

* clang-format

* Defer memory protect

* Remove RENDERER_TRACE

* Experiment: only sync on first readconst

* Added profiling (will be removed)

* Don't sync entire buffers

* Added logging for testing

* Updated temporary workaround to use 4k pages

* clang.-format

* Cleanup part 1

* Make ReadConst a SPIR-V function

---------

Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
2025-05-22 21:00:15 +03:00
squidbus
99eaba7c96
liverpool: Lower SetQueueReg to warning log. (#2930)
Some checks are pending
Build and Release / reuse (push) Waiting to run
Build and Release / clang-format (push) Waiting to run
Build and Release / get-info (push) Waiting to run
Build and Release / windows-sdl (push) Blocked by required conditions
Build and Release / windows-qt (push) Blocked by required conditions
Build and Release / macos-sdl (push) Blocked by required conditions
Build and Release / macos-qt (push) Blocked by required conditions
Build and Release / linux-sdl (push) Blocked by required conditions
Build and Release / linux-qt (push) Blocked by required conditions
Build and Release / linux-sdl-gcc (push) Blocked by required conditions
Build and Release / linux-qt-gcc (push) Blocked by required conditions
Build and Release / pre-release (push) Blocked by required conditions
2025-05-13 16:08:25 -07:00
Marcin Mikołajczyk
647b1d3ee4
Handle VgtStreamoutFlush event (#2929) 2025-05-13 14:34:22 -07:00
Marcin Mikołajczyk
e5b675d607
Handle IT_WAIT_REG_MEM with Register argument (#2927) 2025-05-13 13:56:20 -07:00
squidbus
02d3ed4973
liverpool: Log more information on SetQueueReg. (#2912) 2025-05-11 20:27:43 -07:00
squidbus
a1439b15cf
gnm: Implement sceGnmDrawIndexIndirectMulti (#2889) 2025-05-09 10:04:37 -07:00
kalaposfos13
5caab76a45
Implement DmaDataSrc::MemoryUsingL2 and DmaDataDst::MemoryUsingL2 (#2680)
* Implement DmaDataSrc::MemoryUsingL2 and DmaDataDst::MemoryUsingL2

* Add L2 handling to the other place it's used
2025-03-26 23:03:50 +02:00
kalaposfos13
a1ec8b0a88
Handle compute packets that are split between the ends of two command buffers (#2476)
* Squashed initial implementation

* Logging for checking if buffers are memory contiguous

* Add check to see if first instruction is valid in the next buffer to avoid false positives

* Oof

* Replace old code with IndecisiveTurtle's new, better implementation

* Add `unlikely` keyword to the split packet handling branches

Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>

---------

Co-authored-by: IndecisiveTurtle <47210458+raphaelthegreat@users.noreply.github.com>
2025-03-26 00:01:21 +02:00
TheTurtle
76b4da6212
video_core: Various small improvements and bug fixes (#2525)
* ir_passes: Add barrier at end of block too

* vk_platform: Always assign names to resources

* texture_cache: Better overlap handling

* liverpool: Avoid resuming ce_task when its finished

* spirv_quad_rect: Skip default attributes

Fixes some crashes

* memory: Improve buffer size clamping

* liverpool: Relax binary header validity check

* liverpool: Stub SetPredication with a warning

* Better than outright crash

* emit_spirv: Implement round to zero mode

* liverpool: queue::pop takes the front element

* image_info: Remove obsolete assert

The old code assumed the mip only had 1 layer thus a right overlap could not return mip 0. But with the new path we handle images that are both mip-mapped and multi-layer, thus this can happen

* tile_manager: Fix size calculation

* spirv_quad_rect: Skip default attributes

---------

Co-authored-by: poly <47796739+polybiusproxy@users.noreply.github.com>
Co-authored-by: squidbus <175574877+squidbus@users.noreply.github.com>
2025-02-24 14:31:12 +02:00
squidbus
a5a1253185
liverpool: Implement PM4 MEM_SEMAPHORE. (#2235) 2025-01-25 04:12:18 -08:00
squidbus
eb49193309
liverpool: Revert queue scope markers. (#2166) 2025-01-16 18:24:29 -08:00
squidbus
b3739bea92
renderer_vulkan: Simplify debug marker settings. (#2159)
* renderer_vulkan: Simplify debug marker settings.

* liverpool: Add scope markers for graphics/compute queues.

* liverpool: Remove unneeded extra label from command buffer markers.

* vk_rasterizer: Add scopes around filtered draw passes.
2025-01-16 12:14:34 +02:00
psucien
9d3143231c macOS build fixed; indirect_args_addr moved out from queues context 2025-01-04 22:44:46 +01:00
psucien
7459d9c333 hot-fix: amdgpu: use different indirect dispatch packet on ASC 2025-01-04 22:23:12 +01:00
polybiusproxy
a76e8f0211
clang-format 2025-01-01 13:21:00 +01:00
psucien
d69341fd31 hot-fix: detiler: forgotten lut optimizations 2025-01-01 03:40:28 +01:00
TheTurtle
f09a95453e
hot-fix: Correct queue id in dispatch indirect
I missed this
2024-12-29 12:48:45 +02:00
Mahmoud Adel
e952013fe0
add EventWrite and DispatchIndirect to ProcessCompute (#1948)
* add EventWrite and DispatchIndirect to ProcessCompute

helps Alienation go Ingame

* apply review changes

Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>

---------

Co-authored-by: TheTurtle <47210458+raphaelthegreat@users.noreply.github.com>
2024-12-29 12:47:15 +02:00
psucien
e7c4ffe032 hot-fix: Tracy operation restored; memory leak fix as a bonus 2024-12-15 20:53:29 +01:00
psucien
0fd1ab674b
GPU processor refactoring (#1787)
* coroutine code prettification

* asc queues submission refactoring

* better asc ring context handling

* final touches and review notes

* even more simplification for context saving
2024-12-15 00:54:46 +02:00
TheTurtle
e9ede8d627
Revert "DmaData and Recompiler fixes (#1775)" (#1784)
This reverts commit cafd40f2c2.
2024-12-14 16:17:14 +02:00
Vladislav Mikhalin
cafd40f2c2
DmaData and Recompiler fixes (#1775)
* liverpool: fix dmadata packet handling

* recompiler: emit a label right after s_branch to prevent dead code interferrence

* specialize barriers
2024-12-14 14:33:06 +02:00
Daniel R.
2a953391ef
liverpool: implement Rewind and IndirectBuffer packets 2024-12-11 19:40:45 +01:00
squidbus
17abbcd74d
misc: Fix clang format (#1673) 2024-12-06 02:21:35 +02:00
IndecisiveTurtle
77da8bac00 core: Return proper address of eh frame/add more opcodes 2024-12-06 00:47:11 +02:00
TheTurtle
22a2741ea0
shader_recompilers: Improvements to SSA phi generation and lane instruction elimination (#1667)
* shader_recompiler: Add use tracking for Insts

* ssa_rewrite: Recursively remove phis

* ssa_rewrite: Correct recursive trivial phi elimination

* ir: Improve read lane folding pass

* control_flow: Avoid adding unnecessary divergant blocks

* clang format

* externals: Update ext-boost

---------

Co-authored-by: Frodo Baggins <baggins31084@proton.me>
2024-12-05 23:14:16 +02:00
Marcin Mikołajczyk
642dedea8c
Handle INDIRECT_BUFFER_CONST in ProcessCeUpdate (#1613) 2024-12-05 23:09:59 +02:00
Daniel R.
98f0cb65d7
The way to Unity, pt.1 (#1659) 2024-12-05 17:21:35 +01:00
psucien
16e1d679dc
video_core: clean-up of indirect draws logic (#1589) 2024-11-24 15:43:28 +01:00
TheTurtle
c4506da0ae
kernel: Rewrite pthread emulation (#1440)
* libkernel: Cleanup some function places

* kernel: Refactor thread functions

* kernel: It builds

* kernel: Fix a bunch of bugs, kernel thread heap

* kernel: File cleanup pt1

* File cleanup pt2

* File cleanup pt3

* File cleanup pt4

* kernel: Add missing funcs

* kernel: Add basic exceptions for linux

* gnmdriver: Add workload functions

* kernel: Fix new pthreads code on macOS. (#1441)

* kernel: Downgrade edeadlk to log

* gnmdriver: Add sceGnmSubmitCommandBuffersForWorkload

* exception: Add context register population for macOS. (#1444)

* kernel: Pthread rewrite touchups for Windows

* kernel: Multiplatform thread implementation

* mutex: Remove spamming log

* pthread_spec: Make assert into a log

* pthread_spec: Zero initialize array

* Attempt to fix non-Windows builds

* hotfix: change incorrect NID for scePthreadAttrSetaffinity

* scePthreadAttrSetaffinity implementation

* Attempt to fix Linux

* windows: Address a bunch of address space problems

* address_space: Fix unmap of region surrounded by placeholders

* libs: Reduce logging

* pthread: Implement condvar with waitable atomics and sleepqueue

* sleepq: Separate and make faster

* time: Remove delay execution

* Causes high cpu usage in Tohou Luna Nights

* kernel: Cleanup files again

* pthread: Add missing include

* semaphore: Use binary_semaphore instead of condvar

* Seems more reliable

* libraries/sysmodule: log module on `sceSysmoduleIsLoaded`

* libraries/kernel: implement `scePthreadSetPrio`

---------

Co-authored-by: squidbus <175574877+squidbus@users.noreply.github.com>
Co-authored-by: Daniel R. <47796739+polybiusproxy@users.noreply.github.com>
2024-11-21 22:59:38 +02:00
psucien
8fbd9187f8
libraries: gnmdriver: few more functions implemented (#1544) 2024-11-18 11:23:21 +02:00
TheTurtle
87f8fea4de
renderer_vulkan: Commize and adjust buffer bindings (#1412)
* shader_recompiler: Implement finite cmp class

* shader_recompiler: Implement more opcodes

* renderer_vulkan: Commonize buffer binding

* liverpool: More dma data impl

* fix

* copy_shader: Handle additional instructions from Knack

* translator: Add V_CMPX_GE_I32
2024-10-19 15:30:58 +03:00
Vinicius Rangel
25de4d6b65
Devtools improvements I (#1392)
* devtools: fix showing entire depth instead of bits

* devtools: show button for stage instead of menu bar

- fix batch view dockspace not rendering when window collapsed

* devtools: removed useless "Batch" collapse & don't collapse last batch

* devtools: refactor DrawRow to templating

* devtools: reg popup size adjusted to the content

* devtools: better window names

* devtools: regview layout compacted

* devtools: option to show collapsed frame dump

keep most popups open when selection changes
best popup windows positioning

* devtools: show compute shader regs

* devtools: tips popup
2024-10-16 13:12:46 +03:00
Vinicius Rangel
cf2e617f08
Devtools - Inspect regs/User data/Shader disassembly (#1358)
* devtools: pm4 - show markers

* SaveDataDialogLib: fix compile with mingw

* devtools: pm4 - show program state

* devtools: pm4 - show program disassembly

* devtools: pm4 - show frame regs

* devtools: pm4 - show color buffer info as popup

add ux improvements for open new windows with shift+click
better window titles

* imgui: skip all textures to avoid hanging with crash diagnostic enabled

not sure why this happens :c

* devtools: pm4 - show reg depth buffer
2024-10-13 15:02:22 +03:00
korenkonder
6e986f8133
video_core: Implement sceGnmInsertPushColorMarker (#989) 2024-10-10 18:03:12 +03:00
Daniel R.
80bf46da4c
core/memory: Pooled memory implementation (#1085) 2024-09-29 10:28:41 +03:00
baggins183
bc66fe8fb5
Fix copyGpuBuffers when resize invalidates commands in flight (#876)
* Fix copyGpuBuffers when resize invalidates commands in flight

* Use _MB macro for size constant
2024-09-12 21:54:54 +02:00
psucien
adfb3af95f hot-fix: nullGpu functionality restored 2024-09-09 08:59:47 +02:00
TheTurtle
13743b27fc
shader_recompiler: Implement data share append and consume operations (#814)
* shader_recompiler: Add more format swap modes

* texture_cache: Handle stencil texture reads

* emulator: Support loading font library

* readme: Add thanks section

* shader_recompiler: Constant buffers as integers

* shader_recompiler: Typed buffers as integers

* shader_recompiler: Separate thread bit scalars

* We can assume guest shader never mixes them with normal sgprs. This helps avoid errors where ssa could view an sgpr write dominating a thread bit read, due to how control flow is structurized, even though its not possible in actual control flow

* shader_recompiler: Implement data append/consume operations

* clang format

* buffer_cache: Simplify invalidation scheme

* video_core: Remove some invalidation remnants

* adjust
2024-09-07 00:14:51 +03:00
psucien
34ffd95306
video_core: added VK_LAYER_LUNARG_crash_diagnostic (#751) 2024-09-03 21:56:23 +02:00
baggins183
3f8a8d3a24
video_core: Add bounds checking for subspan use in liverpool functions (#717) 2024-09-03 13:58:45 +03:00
psucien
ca1613258f
video_core: added support for indirect draws (#678)
* video_core: added support for indirect draws

* barriers simplified
2024-08-30 22:59:56 +02:00
psucien
9d349a1308 video_core: added support for indirect dispatches (gfx only) 2024-08-29 12:32:37 +02:00
georgemoralis
be49871c68
Merge pull request #618 from vertver/main
video_core: Added copyGPUCmdBuffers option
2024-08-28 14:00:26 +03:00
Anton Kovalev
dfb30ea955 Use pair of spans instead of references in copy command buffers function 2024-08-28 11:24:15 +02:00
Random
c37679154e
Handle PM4 type-2 packets (#556)
* video_core: handle PM4 type-2 packets

* video_core: rewrite pm4 comand type handling into a switch statement
2024-08-28 09:53:27 +02:00
Anton Kovalev
87ccfdfbbd Fixed type on function 2024-08-28 09:42:31 +02:00
Anton Kovalev
1a02efbd15 clang-format style fix 2024-08-28 05:42:48 +02:00
Anton Kovalev
3842993a43 Use input dcb and ccb instead of copy 2024-08-28 00:21:12 +02:00
Anton Kovalev
3d46a5d492 Do not shrink buffer's size on submit 2024-08-27 23:33:24 +02:00