* Import memory
* 64K pages and fix memory mapping
* Queue coverage
* Buffer syncing, faulted readback adn BDA in Buffer
* Base DMA implementation
* Preparations for implementing SPV DMA access
* Base impl (pending 16K pages and getbuffersize)
* 16K pages and stack overflow fix
* clang-format
* clang-format but for real this time
* Try to fix macOS build
* Correct decltype
* Add testing log
* Fix stride and patch phi node blocks
* No need to check if it is a deleted buffer
* Clang format once more
* Offset in bytes
* Removed host buffers (may do it in another PR)
Also some random barrier fixes
* Add IR dumping from my read-const branch
* clang-format
* Correct size insteed of end
* Fix incorrect assert
* Possible fix for NieR deadlock
* Copy to avoid deadlock
* Use 2 mutexes insteed of copy
* Attempt to range sync error
* Revert "Attempt to range sync error"
This reverts commit dd287b48682b50f215680bb0956e39c2809bf3fe.
* Fix size truncated when syncing range
And memory barrier
* Some fixes (and async testing (doesn't work))
* Use compute to parse fault buffer
* Process faults on submit
* Only sync in the first time we see a readconst
Thsi is partialy wrong. We need to save the state into the submission context itself, not the rasterizer since we can yield and process another sumission (if im not understanding wrong).
* Use spec const and 32 bit atomic
* 32 bit counter
* Fix store_index
* Better sync (WIP, breaks PR now)
* Fixes for better sync
* Better sync
* Remove memory coveragte logic
* Point sirit to upstream
* Less waiting and barriers
* Correctly checkout moltenvk
* Bring back applying pending operations in wait
* Sync the whole buffer insteed of only the range
* Implement recursive shared/scoped locks
* Iterators
* Faster syncing with ranges
* Some alignment fixes
* fixed clang format
* Fix clang-format again
* Port page_manager from readbacks-poc
* clang-format
* Defer memory protect
* Remove RENDERER_TRACE
* Experiment: only sync on first readconst
* Added profiling (will be removed)
* Don't sync entire buffers
* Added logging for testing
* Updated temporary workaround to use 4k pages
* clang.-format
* Cleanup part 1
* Make ReadConst a SPIR-V function
---------
Co-authored-by: georgemoralis <giorgosmrls@gmail.com>
* shader_recompiler: Implement AMD buffer bounds checking behavior.
* shader_recompiler: Use SRT flatbuf for bounds check size.
* shader_recompiler: Fix buffer atomic bounds check.
* buffer_cache: Prevent false image-to-buffer sync.
Lowering vertex fetch to formatted buffer surfaced an issue where a CPU modified range may be overwritten with stale GPU modified image data.
* Address review comments.
* shader_recompiler: Move shared mem lowering into emitter
* IR can be quite verbose during first stages of translation, before ssa and constant prop passes have run that drastically simplify it. This lowering can also be done during emission so why not do it then to save some compilation time
* runtime_info: Pack PsColorBuffer into 8 bytes
* Drops the size of the total structure by half from 396 to 204 bytes. Also should make comparison of the array a bit faster, since its a hot path done every draw
* emit_spirv_context: Add infrastructure for buffer aliases
* Splits out the buffer creation function so it can be reused when defining multiple type aliases
* shader_recompiler: Merge srt_flatbuf into buffers list
* Its no longer a special case, yay
* shader_recompiler: Complete buffer aliasing support
* Add a bunch more types into buffers, such as F32 for float reads/writes and 8/16 bit integer types for formatted buffers
* shader_recompiler: Remove existing shared memory emulation
* The current impl relies on backend side implementaton and hooking into every shared memory access. It also doesnt handle atomics. Will be replaced by an IR pass that solves these issues
* shader_recompiler: Reintroduce shared memory on ssbo emulation
* Now it is performed with an IR pass, and combined with the previous commit cleanup, is fully transparent from the backend, other than requiring workgroup_index be provided as an attribute (computing this on every shared memory access is gonna be too verbose
* clang format
* buffer_cache: Reduce buffer sizes
* vk_rasterizer: Cleanup resource binding code
* Reduce noise in the functions, also remove some arguments which are class members
* Fix gcc
* shader_recompiler: Replace texel buffers with in-shader buffer format interpretation
* shader_recompiler: Move 10/11-bit float conversion to functions and address some comments.
* vulkan: Remove VK_KHR_maintenance5 as it is no longer needed for buffer views.
* shader_recompiler: Add helpers for composites and bitfields in pack/unpack.
* shader_recompiler: Use initializer_list for bitfield insert helper.
* shader_recompiler: Account for instruction array flag in image type.
* shader_recompiler: Check da flag for all mimg instructions.
* shader_recompiler: Convert cube images into 2D arrays.
* shader_recompiler: Move image resource functions into sharp type.
* shader_recompiler: Use native AMD cube instructions when possible.
* specialization: Fix buffer storage mistake.
* shader_recompiler: Tessellation WIP
* fix compiler errors after merge
DONT MERGE set log file to /dev/null
DONT MERGE linux pthread bb fix
save work
DONT MERGE dump ir
save more work
fix mistake with ES shader
skip list
add input patch control points dynamic state
random stuff
* WIP Tessellation partial implementation. Squash commits
* test: make local/tcs use attr arrays
* attr arrays in TCS/TES
* dont define empty attr arrays
* switch to special opcodes for tess tcs/tes reads and tcs writes
* impl tcs/tes read attr insts
* rebase fix
* save some work
* save work probably broken and slow
* put Vertex LogicalStage after TCS and TES to fix bindings
* more refactors
* refactor pattern matching and optimize modulos (disabled)
* enable modulo opt
* copyright
* rebase fixes
* remove some prints
* remove some stuff
* Add TCS/TES support for shader patching and use LogicalStage
* refactor and handle wider DS instructions
* get rid of GetAttributes for special tess constants reads. Immediately replace some upon seeing readconstbuffer. Gets rid of some extra passes over IR
* stop relying on GNMX HsConstants struct. Change runtime_info.hs_info and some regs
* delete some more stuff
* update comments for current implementation
* some cleanup
* uint error
* more cleanup
* remove patch control points dynamic state (because runtime_info already depends on it)
* fix potential problem with determining passthrough
---------
Co-authored-by: IndecisiveTurtle <47210458+raphaelthegreat@users.noreply.github.com>
* page_manager: Enable userfaultfd by default
* Much faster than page faults and causes less problems
* shader_recompiler: Add texel buffer multiplier
* Fixes format mismatch assert when vsharp stride is multiple of format stride
* shader_recompiler: Specialize UBOs on size
* Some games can perform manual vertex pulling and thus bind read only buffers of varying size. We only recompile when the vsharp size is larger than size in shader, in opposite case its not needed
* clang format
* Implement shader resource tables
* fix after rebase + squash
* address some review comments
* fix pipeline_common
* cleanup debug stuff
* switch to using single codegenerator
* shader_recompiler: Define fragment output type based on number format.
* shader_recompiler: Fix GetAttribute SPIR-V output type.
* shader_recompiler: Don't bitcast on SetAttribute unless integer target.
* shader_recompiler: Use push constants for user data regs
* shader: Add some GR2 instructions
* shader: Add some instructions
* shader: Add instructions for knack
* touchups
* spirv: Better names
* buffer_cache: Ignore non gpu modified images
* clang format
* Add log
* more fixes
* video_core: texture: image subresources state tracking
* shader_recompiler: use one binding if the same image is read and written
* video_core: added rebinding of changed textures after overlap resolve
* don't use pointers; slight `FindTexture` refactoring
* video_core: buffer_cache: don't copy over the image size
* redundant barriers removed; fixes
* regression fixes
* texture_cache: 3d texture layers count fixup
* shader_recompiler: support for partially bound cubemaps
* added support for cubemap arrays
* don't bind unused color buffers
* fixed depth promotion to do not use stencil
* doors
* bonfire lit
* cubemap array index calculation
* final touches
* shader_recompiler: Use null image when shader is compiled with unbound sharp
* video_core: Refactor and render target swizzles
* liverpool_to_vk: Add missing swap format from RDR
* video_core: Refactor shader recompiler interface
* Makes it much easier to pass runtime information to the recompiler and have it treated as part of the shader key. Also pulls out most runtime state from Info struct
* shader_recompiler: Avoid some asserts
* video_core: Compile shader permutations
* spirv: Only specific storage image format for atomics
* ir: Avoid cube coord patching for storage image
* spirv: Fix default attributes
* data_share: Add more instructions
* video_core: Query storage flag with runtime state
* kernel: Use std::list for semaphore
* video_core: Use texture buffers for untyped format load/store
* buffer_cache: Limit view usage
* vk_pipeline_cache: Fix invalid iterator
* image_view: Reduce log spam when alpha=1 in storage swizzle
* video_core: More features and proper spirv feature detection
* video_core: Attempt no2 for specialization
* spirv: Remove conflict
* vk_shader_cache: Small cleanup
* translator: Use templates for stronger type guarantees
* spirv: Define buffer offsets upfront
* Saves a lot of shader instructions
* buffer_cache: Use dynamic vertex input when available
* Fixes issues when games like dark souls rebind vertex buffers with different stride
* externals: Update boost
* spirv: Use runtime array for ssbos
* ssbos can be large and typically their size will vary, especially in generic copy/clear cs shaders
* fs: Lock when doing case insensitive search
* Dark Souls does fs lookups from different threads
* texture_cache: More precise invalidation from compute
* Fixes unrelated render targets being cleared
* texture_cache: Use hashes for protect gpu modified images from reupload
* translator: Treat V_CNDMASK as float
* Sometimes it can have input modifiers. Worst this will cause is some extra calls to uintBitsToFloat and opposite. But most often this is used as float anyway
* translator: Small optimization for V_SAD_U32
* Fix review
* clang format
* translator: Implemtn f32 to f16 convert
* shader_recompiler: Add bit instructions
* shader_recompiler: More data share instructions
* shader_recompiler: Remove exec contexts, fix S_MOV_B64
* shader_recompiler: Split instruction parsing into categories
* shader_recompiler: Better BFS search
* shader_recompiler: Constant propagation pass for cmp_class_f32
* shader_recompiler: Partial readfirstlane implementation
* shader_recompiler: Stub readlane/writelane only for non-compute
* hack: Fix swizzle on RDR
* Will properly fix this when merging this
* clang format
* address_space: Bump user area size to full
* shader_recompiler: V_INTERP_MOV_F32
* Should work the same as spirv will emit flat decoration on demand
* kernel: Add MAP_OP_MAP_FLEXIBLE
* image_view: Attempt to apply storage swizzle on format
* vk_scheduler: Barrier attachments on renderpass end
* clang format
* liverpool: cs state backup
* shader_recompiler: More instructions and formats
* vector_alu: Proper V_MBCNT_U32_B32
* shader_recompiler: Port some dark souls things
* file_system: Implement sceKernelRename
* more formats
* clang format
* resource_tracking_pass: Back to assert
* translate: Tracedata
* kernel: Remove tracy lock
* Solves random crashes in Dark Souls
* code: Review comments
* amdgpu: proper CB and DB sizes calculation; minor refactoring
* texture_cache: separate file for image_info
* texture_cache: image guest address moved into image info
* texture_cache: surface size calculation
* shader_recompiler: fixed sin/cos
Thanks to red_pring and gandalfthewhite0173
* initial preparations for subresources upload
* review comments
* vk_rasterizer: Clear depth buffer when DB_RENDER_CONTROL says so
* video_core: Preliminary storage image support, more opcodes
* renderer_vulkan: a fix for vertex buffers merging
* renderer_vulkan: a heuristic for blend override when alpha out is masked
---------
Co-authored-by: psucien <bad_cast@protonmail.com>
* video_core: Remove hack in rasterizer
* The hack was to skip the first draw as the display buffer had not been created yet and the texture cache couldn't create one itself. With this patch it now can, using the color buffer parameters from registers
* shader_recompiler: Implement attribute loads/stores
* video_core: Add basic vertex, index buffer handling and pipeline caching
* externals: Make xxhash lowercase