ryujinx

Author	SHA1	Message	Date
gdkchan	611bec6e44	Implement DrawTexture functionality (#2747 ) * Implement DrawTexture functionality * Non-NVIDIA support * Disable some features that should not affect draw texture (slow path) * Remove space from shader source * Match 2D engine names * Fix resolution scale and add missing XML docs * Disable transform feedback for draw texture fallback	2021-11-10 15:37:49 -03:00
gdkchan	d512ce122c	Initial tessellation shader support (#2534 ) * Initial tessellation shader support * Nits * Re-arrange built-in table * This is not needed anymore * PR feedback	2021-10-18 18:38:04 -03:00
gdkchan	464a92d8a7	Force index buffer update for games using Vulkan (#2726 )	2021-10-12 23:46:42 +02:00
riperiperi	0bce4a074a	Don't force scaling on 2D copy sources (#2701 ) Some games (GameMaker Studio) build texture atlases out of sprites during initialization, using the 2D copy method. These copies are done from textures loaded into memory, not rendered, so they are not scaled to begin with. I had set srcTexture in these copies to force scaling, but really it only needs to scale if the texture already exists and was scaled by rendering or something else. I just set that to false, so it doesn't change if the texture is scaled or not. This will also avoid the destination being scaled if the source wasn't. The copy can handle mismatching scales just fine. This prevents scaling artifacts in GMS games, and maybe others (not Super Mario Maker 2, that has another issue).	2021-10-12 23:12:17 +02:00
riperiperi	d92fff541b	Replace CacheResourceWrite with more general "precise" write (#2684 ) * Replace CacheResourceWrite with more general "precise" write The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance. This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced. The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization. I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested. This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle) I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing) * Add tests. * Add XML docs for GpuRegionHandle * Skip UpdateProtection if only precise actions were called This allows precise actions to skip reprotection costs.	2021-09-29 02:27:03 +02:00
gdkchan	fd7567a6b5	Only make render target 2D textures layered if needed (#2646 ) * Only make render target 2D textures layered if needed * Shader cache version bump * Ensure topology is updated on channel swap	2021-09-29 01:55:12 +02:00
riperiperi	7c5ead1c19	Fast path for Inline2Memory buffer write that skips write tracking (#2624 ) * Fast path for Inline2Memory buffer write This PR adds a method to PhysicalMemory that attempts to write all cached resources directly, so that memory tracking can be avoided. The goal of this is both to avoid flushing buffer data, and to avoid raising the sequence number when data is written, which causes buffer and texture handles to be re-checked. This currently only targets buffers, with a side check on textures that falls back to a tracked write if any exist within the target range. It's not expected to write textures from here - this is just a mechanism to protect us if someone does decide to do that. It's possible to add a fast path for this in future (and for ShaderCache, once that starts using tracking) The forced read before inline2memory begins has been skipped, as the data is fully written when the transfer is completed anyways. This allows us to flush on read in emergency situations, but still write the new data over the flushed data. Improves performance on Xenoblade 2 and DE, which was flushing buffer data on the GPU thread when trying to write compute data. May improve performance in other games that write SSBOs from compute, and update data in the same/nearby pages often. Super Smash Bros Ultimate should probably be tested to make sure the vertex explosions haven't returned, as I think that's what this AdvanceSequence was for. * ForceDirty before write, to make sure data does not flush over the new write	2021-09-19 15:09:53 +02:00
riperiperi	b0af010247	Set texture/image bindings in place rather than allocating and passing an array (#2647 ) * Remove allocations for texture bindings and state * Rent rather than stackalloc + copy A bit faster.	2021-09-19 14:03:05 +02:00
gdkchan	ac4ec1a015	Account for negative strides on DMA copy (#2623 ) * Account for negative strides on DMA copy * Should account for non-zero Y	2021-09-11 22:54:18 +02:00
riperiperi	b0e410a828	Lift textures in the AutoDeleteCache for all modifications. (#2615 ) * Lift textures in the AutoDeleteCache for all modifications. Before, this would only apply to render targets and texture blit. Now it applies to image stores, the fast dma copy path and any other type of modification. Image store always at least has one reference in the texture pool, so the function of the AutoDeleteCache keeping textures _alive_ is not useful, but a very important function for a while has been its use to flush textures in order of modification when they are dereferenced, so that their data is not lost. Before, textures populated using image stores were being dereferenced and reloaded as garbage. Now, when these textures are dereferenced, their data will be put back into memory, and everything stays intact. Fixes lighting breaking when switching levels in THPS1+2, and potentially some more UE4 games. I've tested a bunch more games for regressions and performance impact, but they all seem fine. * Lift copy srcTexture so that it doesn't remain referenceless * Perform lift before reference count change on unbind. It's important to lift on unbind as that is the moment the texture was truly last modified, but definitely not after releasing every single reference.	2021-09-11 21:52:54 +02:00
gdkchan	82cefc8dd3	Handle indirect draw counts with non-zero draw starts properly (#2593 )	2021-08-29 16:52:38 -03:00
riperiperi	ec3e848d79	Add a Multithreading layer for the GAL, multi-thread shader compilation at runtime (#2501 ) * Initial Implementation About as fast as nvidia GL multithreading, can be improved with faster command queuing. * Struct based command list Speeds up a bit. Still a lot of time lost to resource copy. * Do shader init while the render thread is active. * Introduce circular span pool V1 Ideally should be able to use structs instead of references for storing these spans on commands. Will try that next. * Refactor SpanRef some more Use a struct to represent SpanRef, rather than a reference. * Flush buffers on background thread * Use a span for UpdateRenderScale. Much faster than copying the array. * Calculate command size using reflection * WIP parallel shaders * Some minor optimisation * Only 2 max refs per command now. The command with 3 refs is gone. 😌 * Don't cast on the GPU side * Remove redundant casts, force sync on window present * Fix Shader Cache * Fix host shader save. * Fixup to work with new renderer stuff * Make command Run static, use array of delegates as lookup Profile says this takes less time than the previous way. * Bring up to date * Add settings toggle. Fix Muiltithreading Off mode. * Fix warning. * Release tracking lock for flushes * Fix Conditional Render fast path with threaded gal * Make handle iteration safe when releasing the lock This is mostly temporary. * Attempt to set backend threading on driver Only really works on nvidia before launching a game. * Fix race condition with BufferModifiedRangeList, exceptions in tracking actions * Update buffer set commands * Some cleanup * Only use stutter workaround when using opengl renderer non-threaded * Add host-conditional reservation of counter events There has always been the possibility that conditional rendering could use a query object just as it is disposed by the counter queue. This change makes it so that when the host decides to use host conditional rendering, the query object is reserved so that it cannot be deleted. Counter events can optionally start reserved, as the threaded implementation can reserve them before the backend creates them, and there would otherwise be a short amount of time where the counter queue could dispose the event before a call to reserve it could be made. * Address Feedback * Make counter flush tracked again. Hopefully does not cause any issues this time. * Wait for FlushTo on the main queue thread. Currently assumes only one thread will want to FlushTo (in this case, the GPU thread) * Add SDL2 headless integration * Add HLE macro commands. Co-authored-by: Mary <mary@mary.zone>	2021-08-27 00:31:29 +02:00
mpnico	8e1adb95cf	Add support for HLE macros and accelerate MultiDrawElementsIndirectCount #2 (#2557 ) * Add support for HLE macros and accelerate MultiDrawElementsIndirectCount * Add missing barrier * Fix index buffer count * Add support check for each macro hle before use * Add missing xml doc Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2021-08-26 23:50:28 +02:00
gdkchan	8196086f7a	Revert "Calculate vertex buffer sizes from index buffer (#1663 )" (#2544 ) This reverts commit `10d649e6d3`.	2021-08-11 22:13:48 -03:00
riperiperi	0a80a837cb	Use "Undesired" scale mode for certain textures rather than blacklisting (#2537 ) * Use "Undesired" scale mode for certain textures rather than blacklisting * Nit Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2021-08-11 22:44:51 +02:00
gdkchan	10d649e6d3	Calculate vertex buffer sizes from index buffer (#1663 ) * Calculate vertex buffer size from maximum index buffer index * Increase maximum index buffer count for it to be considered profitable for counting	2021-08-11 22:06:09 +02:00
gdkchan	0f6ec446ea	Replace BGRA and scale uniforms with a uniform block (#2496 ) * Replace BGRA and scale uniforms with a uniform block * Setting the data again on program change is no longer needed * Optimize and resolve some warnings * Avoid redundant support buffer updates * Some optimizations to BindBuffers (now inlined) * Unify render scale arrays	2021-08-11 21:33:43 +02:00
gdkchan	ff5df5d8a1	Support non-contiguous copies on I2M and DMA engines (#2473 ) * Support non-contiguous copies on I2M and DMA engines * Vector copy should start aligned on I2M * Nits * Zero extend the offset	2021-08-04 22:20:58 +02:00
gdkchan	04dce402ac	Implement a fast path for I2M transfers (#2467 )	2021-07-12 16:48:57 -03:00
gdkchan	40b21cc3c4	Separate GPU engines (part 2/2) (#2440 ) * 3D engine now uses DeviceState too, plus new state modification tracking * Remove old methods code * Remove GpuState and friends * Optimize DeviceState, force inline some functions * This change was not supposed to go in * Proper channel initialization * Optimize state read/write methods even more * Fix debug build * Do not dirty state if the write is redundant * The YControl register should dirty either the viewport or front face state too, to update the host origin * Avoid redundant vertex buffer updates * Move state and get rid of the Ryujinx.Graphics.Gpu.State namespace * Comments and nits * Fix rebase * PR feedback * Move changed = false to improve codegen * PR feedback * Carry RyuJIT a bit more	2021-07-11 17:20:40 -03:00
gdkchan	b02719cf41	Flush UBO updates more frequently (#2407 )	2021-07-07 21:20:52 -03:00
gdkchan	8b44eb1c98	Separate GPU engines and make state follow official docs (part 1/2) (#2422 ) * Use DeviceState for compute and i2m * Migrate 2D class, more comments * Migrate DMA copy engine * Remove now unused code * Replace GpuState by GpuAccessorState on GpuAcessor, since compute no longer has a GpuState * More comments * Add logging (disabled) * Add back i2m on 3D engine	2021-07-07 20:56:06 -03:00
gdkchan	fbb4019ed5	Initial support for separate GPU address spaces (#2394 ) * Make GPU memory manager a member of GPU channel * Move physical memory instance to the memory manager, and the caches to the physical memory * PR feedback	2021-06-29 19:32:02 +02:00
gdkchan	fefd4619a5	Add support for custom line widths (#2406 )	2021-06-25 20:11:54 -03:00
gdkchan	a10b2c5ff2	Initial support for GPU channels (#2372 ) * Ground work for separate GPU channels * Rename TextureManager to TextureCache * Decouple texture bindings management from the texture cache * Rename BufferManager to BufferCache * Decouple buffer bindings management from the buffer cache * More comments and proper disposal * PR feedback * Force host state update on channel switch * Typo * PR feedback * Missing using	2021-06-24 01:51:41 +02:00
Mary	60cf3dfebc	Do not clear gpu subchannel state on BindChannel (#2348 ) This fixes a regression caused by #980, that was causing a crash on New Super Lucky's Tale. As always, this need feedback on possible regression on any games. Fix #2343.	2021-06-09 00:50:18 +02:00
gdkchan	b84ba43406	Fix texture blit off-by-one errors (#2335 )	2021-06-03 01:30:48 +02:00
riperiperi	54ea2285f0	POWER - Performance Optimizations With Extensive Ramifications (#2286 ) * Refactoring of KMemoryManager class * Replace some trivial uses of DRAM address with VA * Get rid of GetDramAddressFromVa * Abstracting more operations on derived page table class * Run auto-format on KPageTableBase * Managed to make TryConvertVaToPa private, few uses remains now * Implement guest physical pages ref counting, remove manual freeing * Make DoMmuOperation private and call new abstract methods only from the base class * Pass pages count rather than size on Map/UnmapMemory * Change memory managers to take host pointers * Fix a guest memory leak and simplify KPageTable * Expose new methods for host range query and mapping * Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists * Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking) * Add a SharedMemoryStorage class, will be useful for host mapping * Sayonara AddVaRangeToPageList, you served us well * Start to implement host memory mapping (WIP) * Support memory tracking through host exception handling * Fix some access violations from HLE service guest memory access and CPU * Fix memory tracking * Fix mapping list bugs, including a race and a error adding mapping ranges * Simple page table for memory tracking * Simple "volatile" region handle mode * Update UBOs directly (experimental, rough) * Fix the overlap check * Only set non-modified buffers as volatile * Fix some memory tracking issues * Fix possible race in MapBufferFromClientProcess (block list updates were not locked) * Write uniform update to memory immediately, only defer the buffer set. * Fix some memory tracking issues * Pass correct pages count on shared memory unmap * Armeilleure Signal Handler v1 + Unix changes Unix currently behaves like windows, rather than remapping physical * Actually check if the host platform is unix * Fix decommit on linux. * Implement windows 10 placeholder shared memory, fix a buffer issue. * Make PTC version something that will never match with master * Remove testing variable for block count * Add reference count for memory manager, fix dispose Can still deadlock with OpenAL * Add address validation, use page table for mapped check, add docs Might clean up the page table traversing routines. * Implement batched mapping/tracking. * Move documentation, fix tests. * Cleanup uniform buffer update stuff. * Remove unnecessary assignment. * Add unsafe host mapped memory switch On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work. * Remove C# exception handlers They have issues due to current .NET limitations, so the meilleure one fully replaces them for now. * Fix MapPhysicalMemory on the software MemoryManager. * Null check for GetHostAddress, docs * Add configuration for setting memory manager mode (not in UI yet) * Add config to UI * Fix type mismatch on Unix signal handler code emit * Fix 6GB DRAM mode. The size can be greater than `uint.MaxValue` when the DRAM is >4GB. * Address some feedback. * More detailed error if backing memory cannot be mapped. * SetLastError on all OS functions for consistency * Force pages dirty with UBO update instead of setting them directly. Seems to be much faster across a few games. Need retesting. * Rebase, configuration rework, fix mem tracking regression * Fix race in FreePages * Set memory managers null after decrementing ref count * Remove readonly keyword, as this is now modified. * Use a local variable for the signal handler rather than a register. * Fix bug with buffer resize, and index/uniform buffer binding. Should fix flickering in games. * Add InvalidAccessHandler to MemoryTracking Doesn't do anything yet * Call invalid access handler on unmapped read/write. Same rules as the regular memory manager. * Make unsafe mapped memory its own MemoryManagerType * Move FlushUboDirty into UpdateState. * Buffer dirty cache, rather than ubo cache Much cleaner, may be reusable for Inline2Memory updates. * This doesn't return anything anymore. * Add sigaction remove methods, correct a few function signatures. * Return empty list of physical regions for size 0. * Also on AddressSpaceManager Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2021-05-24 22:52:44 +02:00
gdkchan	e9c15d32cb	Use a different method for out of bounds blit (#2302 ) * Use a different method for out of bounds blit * This is not needed	2021-05-22 01:26:49 +02:00
gdkchan	4770cfa920	Only enable clip distance if written to on shader (#2217 ) * Only enable clip distance if written to on shader * Signal InstanceId use through FeatureFlags * Shader cache version bump	2021-04-20 12:33:54 +02:00
riperiperi	9b7335a63b	Improve linear texture compatibility rules (#2099 ) * Improve linear texture compatibility rules Fixes an issue where small or width-aligned (rather than byte aligned) textures would fail to create a view of existing data. Creates a copy dependency as size change may be risky. * Minor cleanup * Remove Size Change for Copy Depenedencies The copy to the target (potentially different sized) texture can properly deal with cropping by itself. * Move StrideAlignment and GobAlignment into Constants	2021-03-19 02:17:38 +01:00
riperiperi	1623ab524f	Improve Buffer Textures and flush Image Stores (#2088 ) * Improve Buffer Textures and flush Image Stores Fixes a number of issues with buffer textures: - Reworked Buffer Textures to create their buffers in the TextureManager, then bind them with the BufferManager later. - Fixes an issue where a buffer texture's buffer could be invalidated after it is bound, but before use. - Fixed width unpacking for large buffer textures. The width is now 32-bit rather than 16. - Force buffer textures to be rebound whenever any buffer is created, as using the handle id wasn't reliable, and the cost of binding isn't too high. Fixes vertex explosions and flickering animations in UE4 games. * Set ImageStore flag... for ImageStore. * Check the offset and size.	2021-03-08 18:43:39 -03:00
riperiperi	b530f0e110	Texture Cache: "Texture Groups" and "Texture Dependencies" (#2001 ) * Initial implementation (3d tex mips broken) This works rather well for most games, just need to fix 3d texture mips. * Cleanup * Address feedback * Copy Dependencies and various other fixes * Fix layer/level offset for copy from view<->view. * Remove dirty flag from dependency The dirty flag behaviour is not needed - DeferredCopy is all we need. * Fix tracking mip slices. * Propagate granularity (fix astral chain) * Address Feedback pt 1 * Save slice sizes as part of SizeInfo * Fix nits * Fix disposing multiple dependencies causing a crash This list is obviously modified when removing dependencies, so create a copy of it.	2021-03-02 19:30:54 -03:00
gdkchan	caf049ed15	Avoid some redundant GL calls (#1958 )	2021-01-27 08:44:07 +11:00
gdkchan	d6bd0470fb	Fix conditional rendering without queries (#1965 )	2021-01-27 08:42:12 +11:00
riperiperi	a1f77a5b6a	Implement lazy flush-on-read for Buffers (SSBO/Copy) (#1790 ) * Initial implementation of buffer flush (VERY WIP) * Host shaders need to be rebuilt for the SSBO write flag. * New approach with reserved regions and gl sync * Fix a ton of buffer issues. * Remove unused buffer unmapped behaviour * Revert "Remove unused buffer unmapped behaviour" This reverts commit f1700e52fb8760180ac5e0987a07d409d1e70ece. * Delete modified ranges on unmap Fixes potential crashes in Super Smash Bros, where a previously modified range could lie on either side of an unmap. * Cache some more delegates. * Dispose Sync on Close * Also create host sync for GPFifo syncpoint increment. * Copy buffer optimization, add docs * Fix race condition with OpenGL Sync * Enable read tracking on CommandBuffer, insert syncpoint on WaitForIdle * Performance: Only flush individual pages of SSBO at a time This avoids flushing large amounts of data when only a small amount is actually used. * Signal Modified rather than flushing after clear * Fix some docs and code style. * Introduce a new test for tracking memory protection. Sucessfully demonstrates that the bug causing write protection to be cleared by a read action has been fixed. (these tests fail on master) * Address Comments * Add host sync for SetReference This ensures that any indirect draws will correctly flush any related buffer data written before them. Fixes some flashing and misplaced world geometry in MH rise. * Make PageAlign static * Re-enable read tracking, for reads.	2021-01-17 17:08:06 -03:00
gdkchan	df820a72de	Implement clear buffer (fast path) (#1902 ) * Implement clear buffer (fast path) * Remove blank line	2021-01-13 08:50:54 +11:00
gdkchan	6ed19c1488	Fix compute reserved constant buffer updates (#1892 )	2021-01-10 21:02:58 +01:00
riperiperi	10aa11ce13	Interrupt GPU command processing when a frame's fence is reached. (#1741 ) * Interrupt GPU command processing when a frame's fence is reached. * Accumulate times rather than %s * Accurate timer for vsync Spin wait for the last .667ms of a frame. Avoids issues caused by signalling 16ms vsync. (periodic stutters in smo) * Use event wait for better timing. * Fix lazy wait Windows doesn't seem to want to do 1ms consistently, so force a spin if we're less than 2ms. * A bit more efficiency on frame waits. Should now wait the remainder 0.6667 instead of 1.6667 sometimes (odd waits above 1ms are reliable, unlike 1ms waits) * Better swap interval 0 solution 737 fps without breaking a sweat. Downside: Vsync can no longer be disabled on games that use the event heavily (link's awakening - which is ok since it breaks anyways) * Fix comment. * Address Comments.	2020-12-17 19:39:52 +01:00
riperiperi	9493cdfe55	Allow copy destination to have a different scale from source (#1711 ) * Allow copy destination to have a different scale from source Will result in more scaled copy destinations, but allows scaling in some games that copy textures to the output framebuffer. * Support copying multiple levels/layers Uses glFramebufferTextureLayer to copy multiple layers, copies levels individually (and scales the regions). Remove CopyArrayScaled, since the backend copy handles it now.	2020-11-20 17:14:45 -03:00
gdkchan	5189a807c4	Fix buffer to texture copy with remap enabled (#1721 )	2020-11-17 19:06:02 -03:00
gdkchan	787e20937f	Propagate zeta format properly (#1716 )	2020-11-16 09:37:16 +01:00
riperiperi	c652494219	Use "Screen Scissor" as size hint for render targets (#1703 ) "Screen scissor" is the minimum size of all render targets, and is set when any render target is bound on NVN or OpenGL. Since it works on all active texture's real sizes, it is therefore more reliable than viewport 0's width, and is actually set before clear. This fixes a regression with Hyrule Warriors: Age Of Calamity's cubemaps, which did not set viewport dimensions before clear. This resulted in attempting to create a cubemap with rectangular sides, which is logically and physically impossible. (also it just fails)	2020-11-13 10:40:26 +11:00
Mary	48f6570557	Salieri: shader cache (#1701 ) Here come Salieri, my implementation of a disk shader cache! "I'm sure you know why I named it that." "It doesn't really mean anything." This implementation collects shaders at runtime and cache them to be later compiled when starting a game.	2020-11-13 00:15:34 +01:00
riperiperi	02872833b6	Size hints for copy regions and viewport dimensions to avoid data loss (#1686 ) * Size hints for copy regions and viewport dimensions to avoid data loss * Reword comment. * Use info for the rule rather than calculating aligned size. * Reorder min/max, remove spaces	2020-11-09 21:41:13 -03:00
gdkchan	934a78005e	Simplify logic for bindless texture handling (#1667 ) * Simplify logic for bindless texture handling * Nits	2020-11-09 19:35:04 -03:00
gdkchan	8d168574eb	Use explicit buffer and texture bindings on shaders (#1666 ) * Use explicit buffer and texture bindings on shaders * More XML docs and other nits	2020-11-08 12:10:00 +01:00
riperiperi	5561a3b95e	Synchronize Rasterizer State before Clear (#1680 )	2020-11-07 16:21:10 -03:00
riperiperi	500b48251c	Only report that GPU commands are available when the queue is not empty. (#1656 ) * Only report that commands are available when the queue is not empty. * Address Feedback Co-authored-by: FICTURE7 <FICTURE7@gmail.com> Co-authored-by: FICTURE7 <FICTURE7@gmail.com>	2020-11-06 23:04:26 -03:00
gdkchan	24dbfc0fe6	Correct BPP of buffer to texture copies (#1670 )	2020-11-06 18:37:05 +01:00
gdkchan	a89b81a812	Separate zeta from color formats (#1647 )	2020-11-05 23:50:34 +01:00
gdkchan	2dcc6333f8	Fix image binding format (#1625 ) * Fix image binding format * XML doc	2020-10-20 19:03:20 -03:00
riperiperi	b4d8d893a4	Memory Read/Write Tracking using Region Handles (#1272 ) * WIP Range Tracking - Texture invalidation seems to have large problems - Buffer/Pool invalidation may have problems - Mirror memory tracking puts an additional `add` in compiled code, we likely just want to make HLE access slower if this is the final solution. - Native project is in the messiest possible location. - [HACK] JIT memory access always uses native "fast" path - [HACK] Trying some things with texture invalidation and views. It works :) Still a few hacks, messy things, slow things More work in progress stuff (also move to memory project) Quite a bit faster now. - Unmapping GPU VA and CPU VA will now correctly update write tracking regions, and invalidate textures for the former. - The Virtual range list is now non-overlapping like the physical one. - Fixed some bugs where regions could leak. - Introduced a weird bug that I still need to track down (consistent invalid buffer in MK8 ribbon road) Move some stuff. I think we'll eventually just put the dll and so for this in a nuget package. Fix rebase. [WIP] MultiRegionHandle variable size ranges - Avoid reprotecting regions that change often (needs some tweaking) - There's still a bug in buffers, somehow. - Might want different api for minimum granularity Fix rebase issue Commit everything needed for software only tracking. Remove native components. Remove more native stuff. Cleanup Use a separate window for the background context, update opentk. (fixes linux) Some experimental changes Should get things working up to scratch - still need to try some things with flush/modification and res scale. Include address with the region action. Initial work to make range tracking work Still a ton of bugs Fix some issues with the new stuff. * Fix texture flush instability There's still some weird behaviour, but it's much improved without this. (textures with cpu modified data were flushing over it) * Find the destination texture for Buffer->Texture full copy Greatly improves performance for nvdec videos (with range tracking) * Further improve texture tracking * Disable Memory Tracking for view parents This is a temporary approach to better match behaviour on master (where invalidations would be soaked up by views, rather than trigger twice) The assumption is that when views are created to a texture, they will cover all of its data anyways. Of course, this can easily be improved in future. * Introduce some tracking tests. WIP * Complete base tests. * Add more tests for multiregion, fix existing test. * Cleanup Part 1 * Remove unnecessary code from memory tracking * Fix some inconsistencies with 3D texture rule. * Add dispose tests. * Use a background thread for the background context. Rather than setting and unsetting a context as current, doing the work on a dedicated thread with signals seems to be a bit faster. Also nerf the multithreading test a bit. * Copy to texture with matching alignment This extends the copy to work for some videos with unusual size, such as tutorial videos in SMO. It will only occur if the destination texture already exists at XCount size. * Track reads for buffer copies. Synchronize new buffers before copying overlaps. * Remove old texture flushing mechanisms. Range tracking all the way, baby. * Wake the background thread when disposing. Avoids a deadlock when games are closed. * Address Feedback 1 * Separate TextureCopy instance for background thread Also `BackgroundContextWorker.InBackground` for a more sensible idenfifier for if we're in a background thread. * Add missing XML docs. * Address Feedback * Maybe I should start drinking coffee. * Some more feedback. * Remove flush warning, Refocus window after making background context	2020-10-16 17:18:35 -03:00
gdkchan	bd28ce90e6	Implement small indexed draws and other fixes to make guest Vulkan work (#1558 )	2020-09-24 09:48:34 +10:00
gdkchan	1eea35554c	Better viewport flipping and depth mode detection method (#1556 ) * Use a better viewport flipping approach * New approach to detect depth mode * nit: Sort method on the OpenGL backend * Adjust spacing on comment * Unswap near and far parameters based on ScaleZ	2020-09-19 19:46:49 -03:00
riperiperi	5d69d9103e	Texture/Buffer Memory Management Improvements (#1408 ) * Initial implementation. Still pending better valid-overlap handling, disposed pool, compressed format flush fix. * Very messy backend resource cache. * Oops * Dispose -> Release * Improve Release/Dispose. * More rule refinement. * View compatibility levels as an enum - you can always know if a view is only copy compatible. * General cleanup. Use locking on the resource cache, as it is likely to be used by other threads in future. * Rename resource cache to resource pool. * Address some of the smaller nits. * Fix regression with MK8 lens flare Texture flushes done the old way should trigger memory tracking. * Use TextureCreateInfo as a key. It now implements IEquatable and generates a hashcode based on width/height. * Fix size change for compressed+non-compressed view combos. Before, this could set either the compressed or non compressed texture with a size with the wrong size, depending on which texture had its size changed. This caused exceptions when flushing the texture. Now it correctly takes the block size into account, assuming that these textures are only related because a pixel in the non-compressed texture represents a block in the compressed one. * Implement JD's suggestion for HashCode Combine Co-authored-by: jduncanator <1518948+jduncanator@users.noreply.github.com> * Address feedback * Address feedback. Co-authored-by: jduncanator <1518948+jduncanator@users.noreply.github.com>	2020-09-10 16:44:04 -03:00
sharmander	bc19114bb5	Fix: Issue #1475 Texture Compatibility Check methods need to be centralized (#1482 ) * Texture Compatibility Check methods need to be centralized #1475 * Fix spacing * Fix spacing * Undo removal of .ToString() * Move isPerfectMatch back to Texture.cs Rename parameters in TextureCompatibility.cs for consistency * Add switch from 1474 to TextureCompatibility as requested by mageven. * Actually add TextureCompatibility changes to the PR (Add DeriveDepthFormat method) * Alignment corrections + Derive method signature adjustment. * Removed empty line as erquested * Remove empty lines * Remove blank lines, fix alignment * Fix alignment * Remove emtpy line	2020-08-31 21:06:27 -03:00
mageven	2a314f3c28	Add missing depth-color conversions in CopyTexture (#1474 ) * Add missing depth-color conversions in CopyTexture * Whitespace * switch expression	2020-08-14 20:03:19 +10:00
LDj3SNuD	8624dd8de6	Fix MacroJit SubtractWithBorrow Alu Reg Operation. (#1473 )	2020-08-13 12:08:48 -03:00
gdkchan	157ad3f54f	Silence several build warnings (#1428 ) * Silence several build warnings * Remove fixed buffers from NVDEC struct * Remove unused field and usings * Fix wrong name * Silence more warning on H264 PictureInfo	2020-08-06 23:40:41 +02:00
mageven	a33dc2f491	Improved Logger (#1292 ) * Logger class changes only Now compile-time checking is possible with the help of Nullable Value types. * Misc formatting * Manual optimizations PrintGuestLog PrintGuestStackTrace Surfaceflinger DequeueBuffer * Reduce SendVibrationXX log level to Debug * Add Notice log level This level is always enabled and used to print system info, etc... Also, rewrite LogColor to switch expression as colors are static * Unify unhandled exception event handlers * Print enabled LogLevels during init * Re-add App Exit disposes in proper order nit: switch case spacing * Revert PrintGuestStackTrace to Info logs due to #1407 PrintGuestStackTrace is now called in some critical error handlers so revert to old behavior as KThread isn't part of Guest. * Batch replace Logger statements	2020-08-04 01:32:53 +02:00
gdkchan	60db4c3530	Implement a Macro JIT (#1445 ) * Implement a Macro JIT * Nit: space	2020-08-03 03:36:57 +02:00
gdkchan	43c13057da	Implement alpha test using legacy functions (#1426 )	2020-07-28 18:30:08 -03:00
gdkchan	51fbc1fde4	Use polygon offset clamp if supported (#1429 )	2020-07-26 18:11:28 -03:00
gdkchan	111534a74e	Remove GPU MemoryAccessor (#1423 ) * Remove GPU MemoryAccessor * Update outdated XML doc * Update more outdated stuff	2020-07-25 16:39:45 +10:00
gdkchan	5a7df48975	New GPFifo and fast guest constant buffer updates (#1400 ) * Add new structures from official docs, start migrating GPFifo * Finish migration to new GPFifo processor * Implement fast constant buffer data upload * Migrate to new GPFifo class * XML docs	2020-07-23 23:53:25 -03:00
mageven	723ae240dc	GL: Implement more Point parameters (#1399 ) * Fix GL_INVALID_VALUE on glPointSize calls * Implement more of Point primitive state * Use existing Origin enum	2020-07-20 21:59:13 -03:00
gdkchan	788ca6a411	Initial transform feedback support (#1370 ) * Initial transform feedback support * Some nits and fixes * Update ReportCounterType and Write method * Can't change shader or TFB bindings while TFB is active * Fix geometry shader input names with new naming	2020-07-15 13:01:10 +10:00
gdkchan	4d02a2d2c0	New NVDEC and VIC implementation (#1384 ) * Initial NVDEC and VIC implementation * Update FFmpeg.AutoGen to 4.3.0 * Add nvdec dependencies for Windows * Unify some VP9 structures * Rename VP9 structure fields * Improvements to Video API * XML docs for Common.Memory * Remove now unused or redundant overloads from MemoryAccessor * NVDEC UV surface read/write scalar paths * Add FIXME comments about hacky things/stuff that will need to be fixed in the future * Cleaned up VP9 memory allocation * Remove some debug logs * Rename some VP9 structs * Remove unused struct * No need to compile Ryujinx.Graphics.Host1x with unsafe anymore * Name AsyncWorkQueue threads to make debugging easier * Make Vp9PictureInfo a ref struct * LayoutConverter no longer needs the depth argument (broken by rebase) * Pooling of VP9 buffers, plus fix a memory leak on VP9 * Really wish VS could rename projects properly... * Address feedback * Remove using * Catch OperationCanceledException * Add licensing informations * Add THIRDPARTY.md to release too Co-authored-by: Thog <me@thog.eu>	2020-07-12 05:07:01 +02:00
riperiperi	f224769c49	Implement Logical Operation registers and functionality (#1380 ) * Implement Logical Operation registers and functionality. * Address Feedback 1	2020-07-10 14:23:15 -03:00
riperiperi	484eb645ae	Implement Zero-Configuration Resolution Scaling (#1365 ) * Initial implementation of Render Target Scaling Works with most games I have. No GUI option right now, it is hardcoded. Missing handling for texelFetch operation. * Realtime Configuration, refactoring. * texelFetch scaling on fragment shader (WIP) * Improve Shader-Side changes. * Fix potential crash when no color/depth bound * Workaround random uses of textures in compute. This was blacklisting textures in a few games despite causing no bugs. Will eventually add full support so this doesn't break anything. * Fix scales oscillating when changing between non-native scales. * Scaled textures on compute, cleanup, lazier uniform update. * Cleanup. * Fix stupidity * Address Thog Feedback. * Cover most of GDK's feedback (two comments remain) * Fix bad rename * Move IsDepthStencil to FormatExtensions, add docs. * Fix default config, square texture detection. * Three final fixes: - Nearest copy when texture is integer format. - Texture2D -> Texture3D copy correctly blacklists the texture before trying an unscaled copy (caused driver error) - Discount small textures. * Remove scale threshold. Not needed right now - we'll see if we run into problems. * All CPU modification blacklists scale. * Fix comment.	2020-07-07 04:41:07 +02:00
gdkchan	76e5af967a	Fix buffer to 3D texture copy (#1354 )	2020-07-04 01:37:36 +02:00
gdkchan	dbeb50684d	Support inline index buffer data (#1351 ) * Support inline index buffer data * Sort usings	2020-07-04 00:41:27 +02:00
gdkchan	b0d9ec8a82	Fix compute restore of previous shader state (#1352 )	2020-07-04 00:30:41 +02:00
gdkchan	96951b7d04	Fix regression caused by wrong SB descriptor offset (#1316 )	2020-06-22 13:48:32 +02:00
riperiperi	bea1fc2e8d	Optimize texture format conversion, and MethodCopyBuffer (#1274 ) * Improve performance when converting texture formats. Still more work to do. * Speed up buffer -> texture copies. No longer copies byte by byte. Fast path when formats are identical. * Fix a few things, 64 byte block fast copy. * Spacing cleanup, unrelated change. * Fix base offset calculation for region copies. * Fix Linear -> BlockLinear * Fix some nits. (part 1 of review feedback) * Use a generic version of the Convert* functions rather than lambdas. This is some real monkey's paw shit. * Remove unnecessary span constructor. * Revert "Use a generic version of the Convert* functions rather than lambdas." This reverts commit aa43dcfbe8bba291eea4e10c68569af7a56a5851. * Fix bug with rectangle destination writing, better rectangle calculation for linear textures.	2020-06-13 19:31:06 -03:00
gdkchan	44d7fcff39	Implement FIFO semaphore (#1286 ) * Implement FIFO semaphore * New enum for FIFO semaphore operation	2020-05-29 10:51:10 +02:00
gdkchan	a15b951721	Fix wrong face culling once and for all (#1277 ) * Viewport swizzle support on NV and clip origin * Initialize default viewport swizzle state, emulate viewport swizzle on shaders when not supported * Address PR feedback	2020-05-28 09:03:07 +10:00
gdkchan	5795bb1528	Support separate textures and samplers (#1216 ) * Support separate textures and samplers * Add missing bindless flag, fix SNORM format on buffer textures * Add missing separation * Add comments about the new handles	2020-05-27 16:07:10 +02:00
gdkchan	5011640b30	Spanify Graphics Abstraction Layer (#1226 ) * Spanify Graphics Abstraction Layer * Be explicit about BufferHandle size	2020-05-23 11:46:09 +02:00
gdkchan	b8eb6abecc	Refactor shader GPU state and memory access (#1203 ) * Refactor shader GPU state and memory access * Fix NVDEC project build * Address PR feedback and add missing XML comments	2020-05-06 11:02:28 +10:00
riperiperi	cd48576f58	Implement Counter Queue and Partial Host Conditional Rendering (#1167 ) * Implementation of query queue and host conditional rendering * Resolve some comments. * Use overloads instead of passing object. * Wake the consumer threads when incrementing syncpoints. Also, do a busy loop when awaiting the counter for a blocking flush, rather than potentially sleeping the thread. * Ensure there's a command between begin and end query.	2020-05-04 12:24:59 +10:00
mageven	53369e79bd	Implement user-defined clipping on GL state pipeline (#1118 )	2020-05-04 12:04:49 +10:00
riperiperi	c2ac45adc5	Fix depth clamp enable bit, unit scale for polygon offset. (#1178 ) Verified with deko3d and opengl driver code.	2020-04-30 11:47:24 +10:00
gdkchan	3cb1fa0e85	Implement texture buffers (#1152 ) * Implement texture buffers * Throw NotSupportedException where appropriate	2020-04-25 23:02:18 +10:00
mageven	a728610b40	Implement Constant Color blends (#1119 ) * Implement Constant Color blends and init blend states * Address gdkchan's comments Also adds Set methods to GpuState * Fix descriptions of QueryModified	2020-04-25 23:00:43 +10:00
gdkchan	6bfe4715f0	Initial conditional rendering support (#1012 ) * Initial conditional rendering support * Properly reset state * Support conditional modes and skeleton a counter cache for future host conditional rendering * Address PR feedback	2020-04-22 16:00:11 +10:00
gdkchan	03711dd7b5	Implement SULD shader instruction (#1117 ) * Implement SULD shader instruction * Some nits	2020-04-22 09:35:28 +10:00
Cristallix	4738113f29	Suppress warnings from fields never used or never assigned (CS0169 and CS0649) (#919 ) * chore : disable unwanted warnings and minor code cleanup * chore : remove more warnings * fix : reorder struct correctly * fix : restore _isKernel and remove useless comment * fix : copy/paste error * fix : restore CallMethod call * fix : whitespace * chore : clean using * feat : remove warnings * fix : simplify warning removal on struct * fix : revert fields deletion and code clean up * fix : re-add RE value * fix : typo	2020-04-21 07:59:59 +10:00
gdkchan	91fa1debd4	Report more realistic GPU timestamps when FastGpuTime is enabled (#1139 )	2020-04-20 22:41:07 +10:00
Thog	644de99e86	Implement GPU syncpoints (#980 ) * Implement GPU syncpoints This adds support for GPU syncpoints on the GPU backend & nvservices. Everything that was implemented here is based on my researches, hardware testing of the GM20B and reversing of nvservices (8.1.0). Thanks to @fincs for the informations about some behaviours of the pusher and for the initial informations about syncpoints. * syncpoint: address gdkchan's comments * Add some missing logic to handle SubmitGpfifo correctly * Handle the NV event API correctly * evnt => hostEvent * Finish addressing gdkchan's comments * nvservices: write the output buffer even when an error is returned * dma pusher: Implemnet prefetch barrier lso fix when the commands should be prefetch. * Partially fix prefetch barrier * Add a missing syncpoint check in QueryEvent of NvHostSyncPt * Address Ac_K's comments and fix GetSyncpoint for ChannelResourcePolicy == Channel * fix SyncptWait & SyncptWaitEx cmds logic * Address ripinperi's comments * Address gdkchan's comments * Move user event management to the control channel * Fix mm implementation, nvdec works again * Address ripinperi's comments * Address gdkchan's comments * Implement nvhost-ctrl close accurately + make nvservices dispose channels when stopping the emulator * Fix typo in MultiMediaOperationType	2020-04-19 11:25:57 +10:00
mageven	4960ab85f8	Implement Depth Clamping (#1120 ) * Implement Depth Clamping and add misc enums * Fix formatting	2020-04-17 11:16:49 +10:00
mageven	468d8f841f	Simple GPU fixes (#1093 ) * Implement RasterizeEnable * Match viewport count to hardware * Simplify ScissorTest tracking around Blits * Disable RasterizerDiscard around Blits and track its state * Read RasterizeEnable reg as bool and add doc	2020-04-07 19:19:45 +10:00
gdkchan	9948a7be53	Support constant attributes (with a value of zero) (#1066 ) * Support constant attributes (with a value of zero) * Remove extra line	2020-03-30 13:11:24 +11:00
gdkchan	ab4867505e	Implement GPU scissors (#1058 ) * Implement GPU scissors * Remove unused using * Add missing changes for Clear	2020-03-29 14:02:58 +11:00
gdkchan	7e4d986a73	Support compute uniform buffers emulated with global memory (#924 )	2020-02-11 01:10:05 +01:00
riperiperi	6db16b4110	Only enumerate cached textures that are modified when flushing. (#918 ) * Only enumarate cached textures that are modified when flushing, rather than all of them. * Remove locking. * Add missing clear. * Remove texture from modified list when data is disposed. In case the game does not call either flush method at any point. * Add ReferenceEqualityComparer from jD for the HashSet	2020-02-07 08:49:26 +11:00
gdkchan	796e5d14b4	Use correct shader local memory size instead of a hardcoded size (#914 ) * Use correct shader local size instead of a hardcoded size * Remove unused uniform block * Update XML doc * Local memory size has 23 bits on maxwell * Generate compute QMD struct from nv open doc header * Remove dummy arrays when shared or local memory is not used, other improvements	2020-02-02 14:25:52 +11:00
gdkchan	f373f870f7	Support configurable point size (#916 )	2020-02-02 10:19:46 +11:00
gdkchan	532ccf929a	Ignore exit flag on branch delay slot (#899 )	2020-01-22 02:11:43 +01:00

1 2 3 4

181 Commits