The trouble with 64-bit DMA

By Jonathan Corbet
August 11, 2022

We live in a 64-bit world, to the point that many distributors want to stop supporting 32-bit systems at all. However, lurking within our 64-bit kernels is a subsystem that has not really managed to move past 32-bit addresses. The quick merge-window failure of an attempt to use 64-bit addresses in the I/O memory-management unit (IOMMU) subsystem shows how hard it can be to leave all of one's 32-bit history behind.

Peripheral devices that move data at any significant rate have to support direct memory access (DMA) to get reasonable performance. As the DMA name suggests, these devices once had direct access to the system's memory in the physical address space. Over time, though, most systems have moved to interposing an IOMMU between devices and memory, for a number of reasons. The IOMMU can help to ensure that the device only accesses the memory that was intended for it, for example. It is also possible to use the IOMMU to make pages scattered throughout physical memory appear to be contiguous from the device's point of view.

For all of this to work, a device driver must create an IOMMU mapping for an I/O buffer before presenting the mapped addresses to the device. Those addresses, called I/O virtual addresses (or IOVAs), look like physical addresses, but they have their own 64-bit address space. One would expect to be able to pass an address anywhere in that range to a device, but life is not so simple; many devices have surprising limitations on how many address bits they can actually use. The kernel's DMA-mapping layer takes this into account; drivers pass in a mask indicating the address range that the device can handle, and the kernel finds an address within that range.

The IOMMU layer imposes an additional constraint, though, in that it will pick an address below 4GB (i.e. one that fits in 32 bits) if at all possible. In the early days of the PCI bus, a device performing DMA to a 64-bit address had to use a special "dual-address cycle" (DAC) mode with each access; DAC cycles were slower than single-address cycles, and a lot of devices either didn't implement them at all or had buggy implementations. Limiting IOVAs to the 32-bit range helped performance and danced around the ever-present possibility of hardware bugs.

It is now 2022, and the PCI bus has been superseded by PCI-Express, which does not have the same performance problems with DAC addresses. One might think that current hardware would not have trouble with 64-bit addresses, which are not exactly new technology at this point. The 32-bit constraint is still in place, though, and it is causing some pain of its own. Back in June, Robin Murphy posted a patch describing that pain:

The IOVA allocator doesn't behave all that well once the 32-bit space starts getting full. As DMA working sets get bigger, this optimisation increasingly backfires and adds considerable overhead to the dma_map path for use-cases like high-bandwidth networking. We've increasingly bandaged the allocator in attempts to mitigate this, but it remains fundamentally at odds with other valid requirements to try as hard as possible to satisfy a request within the given limit.

At a first glance, 4GB of DMA address space seems like it should be enough for anybody, but a big system with the right workload can fragment that space and make allocations hard. On the theory that the kernel is needlessly restricting its options to satisfy constraints that no longer make sense, Murphy changed the default so that the IOMMU layer would no longer try to find a 32-bit-compatible address and would, instead, use the full address range that the target device claimed to support. That makes the performance problem go away, which is a good thing.

The problem of buggy devices, though, cannot be made to disappear with a simple kernel patch. In a sense, that problem is even worse now, in that the 32-bit constraint may have papered over bugs in both devices and the drivers that control them for years. A driver author, perhaps an inexperienced developer who has not yet learned about the mendacity of hardware data sheets, may have trusted the documentation and told the DMA-mapping layer that their hardware could handle full 64-bit IOVAs when, in fact, it cannot. Now the only thing making that hardware actually work is the 32-bit constraint applied by the IOMMU layer.

Murphy acknowledged the risk that this change would expose this kind of bug; the patch included a couple of options for restoring the old behavior. But Murphy wanted to push the change through:

Let's be brave and default it to off in the hope that CI systems and developers will find and fix those bugs, but expect that desktop-focused distro configs are likely to want to turn it back on for maximum compatibility.

IOMMU maintainer Joerg Roedel applied the patch with reservations: "I don't have an overall good feeling about this, but as you said, let's be brave". The patch then landed in the mainline during the 6.0 merge window.

It didn't stay there for long, though. One of the core rules of kernel development is that good things rarely result from breaking Linus Torvalds's machine, and that is what happened here. He promptly reverted the change, saying: "It turns out that it was hopelessly naive to think that this would work, considering that we've always done this. The first machine I actually tested this on broke at bootup". He added that Murphy could "try again in a decade or so".

The problem, of course, is that the problems created by the 32-bit constraint are unlikely to get better by themselves in the next decade or so. There is going to be increasing pressure to leave that behavior behind, at least on machines where the hardware is known to work properly. Somehow, the community is going to have to find a way to change things that doesn't break systems across the planet. Perhaps drivers could set a new flag for hardware that is known to be good, or perhaps some sort of list could be maintained separately. The kernel has spent years papering over buggy hardware and drivers; climbing out of the resulting hole is likely to take a while as well.

Index entries for this article
Kernel	Direct memory access
Kernel	Releases/6.0

The trouble with 64-bit DMA

Posted Aug 11, 2022 14:32 UTC (Thu) by dullfire (guest, #111432) [Link] (26 responses)

> It is now 2022, and the PCI bus has been superseded by PCI-Express, which does not have the same performance problems with DAC addresses.

Corbet, as you may know, this is nuanced. There still is a performance penalty for using address with any of the upper 32-bits set on PCIe. It makes the packet one "dword" longer (32-bit address have shorter PCIe headers). A 32-bit PCIe address access has a 3 dword TLP header (IIRC), while a 64-bit address requires the larger 4-dword TLP header.

Of course, for performant DMA, you'd hope that the payload on the packet is large enough that the 1 extra dword will not be noticable.

Perhapes the IOVA allocater could prefer the lower 4GiB initially (while reserving, say 1GiB for hw in the same translation space that mandates it), And then later fill applicable requests with higher addresses. This should work well especially in cases where the performant/important devices are loaded first (and thus get first pick of 32-bit addresses).

The trouble with 64-bit DMA

Posted Aug 11, 2022 15:21 UTC (Thu) by Paf (subscriber, #91811) [Link] (2 responses)

“Perhapes the IOVA allocater could prefer the lower 4GiB initially (while reserving, say 1GiB for hw in the same translation space that mandates it), And then later fill applicable requests with higher addresses. This should work well especially in cases where the performant/important devices are loaded first (and thus get first pick of 32-bit addresses).”

Is there any reason to think they are loaded first?

The trouble with 64-bit DMA

Posted Aug 11, 2022 15:47 UTC (Thu) by developer122 (guest, #152928) [Link] (1 responses)

More importantly, this approach does nothing to mitigate buggy drivers or hardware. You're maybe less likely to encounter bugs because you're trying to pack things in the lower 4GB, but when A) the range fills up or B) the range becomes too fragmented to use, you're going to immediately crash headfirst into the bugs.

The trouble with 64-bit DMA

Posted Aug 12, 2022 4:52 UTC (Fri) by TheGopher (subscriber, #59256) [Link]

Agree - this seems like a great way to introduce difficult to reproduce errors!

The trouble with 64-bit DMA

Posted Aug 11, 2022 23:07 UTC (Thu) by Karellen (subscriber, #67644) [Link] (22 responses)

This should work well especially in cases where the performant/important devices are loaded first (and thus get first pick of 32-bit addresses).

Wait... there's only one 32-bit mapping across the whole system? There isn't one mapping per device?

O. M. G.

The trouble with 64-bit DMA

Posted Aug 12, 2022 5:36 UTC (Fri) by cladisch (✭ supporter ✭, #50193) [Link] (1 responses)

PCI is a shared bus with a common address space.

PCIe's programming model is backwards compatible. Peer-to-peer transfers are still possible (see https://lwn.net/Articles/767281/), but not all PCIe root complexes (chipsets) support it.

The trouble with 64-bit DMA

Posted Aug 12, 2022 15:59 UTC (Fri) by k8to (guest, #15413) [Link]

Irrelevant nitpick, what we think of as a PCI bus on computers can be a single PCI bus, or multiple PCI busses with their own independent memory spaces, usually with arranged mapped memory windows. The latter is pretty rare outside of weird server hardware though. In such cases, the busses each have their own shared memory space with mappings from bus to bus handled by bridge logic. Regardless, it's still a bus-wide memory layout as stated.

The trouble with 64-bit DMA

Posted Aug 12, 2022 7:33 UTC (Fri) by Wol (subscriber, #4433) [Link] (19 responses)

> Wait... there's only one 32-bit mapping across the whole system? There isn't one mapping per device?

That's the point. They are NOT mapped addresses. OS's have suffered this problem for ever. And it's absolutely crazy, but the first "big" consumer OS *mostly* solved the problem. CP/M. If only that technique had made its way into MS/DOS we *might* not be having this conversation right now. (And we would have avoided all that trouble with lo/hi mem.)

At some point, mapped addresses have to be mapped to physical addresses. And if your i/o device expects to put a 32-bit address onto the physical address bus, you're stuffed ... because MS/DOS assumed everything had a fixed physical address - and allocated the 640K-1MB area specifically to hardware buses, we're still paying the price now :-(

Cheers,
Wol

The trouble with 64-bit DMA

Posted Aug 12, 2022 14:05 UTC (Fri) by Karellen (subscriber, #67644) [Link]

They are NOT mapped addresses.

Huh? According to the article:

As the DMA name suggests, these devices once had direct access to the system's memory in the physical address space. Over time, though, most systems have moved to interposing an IOMMU between devices and memory, for a number of reasons. The IOMMU can help to ensure that the device only accesses the memory that was intended for it, for example. It is also possible to use the IOMMU to make pages scattered throughout physical memory appear to be contiguous from the device's point of view.

Emphasis mine - but how does that work if the addresses aren't mapped?

The trouble with 64-bit DMA

Posted Aug 13, 2022 3:38 UTC (Sat) by developer122 (guest, #152928) [Link] (17 responses)

Elaborate on CP/M?

The trouble with 64-bit DMA

Posted Aug 13, 2022 9:11 UTC (Sat) by Wol (subscriber, #4433) [Link] (16 responses)

CP/M made no assumptions about memory ranges and reserved space.

The only thing it stipulated, to my knowledge, was that first ?100? bytes of memory (which pretty much has to exist, on any system) contained a jump table telling it where to find everything else. CP/M itself existed in high memory, which moved around with however much ram you had, so when you added more ram to the system you had to re-gen the OS to move it.

But that assumption that nothing had a fixed address - I don't know exactly how it did hardware but it would have taken the exact same attitude - would have meant that a lot of these problems would not have arisen. And I think that approach did carry over into CP/M86, which was widely acknowledged as much better than MS-DOS. It just lost out because - I think - Bill Gates Junior had a bunch of contacts in IBM who swung it Bill III's way.

We would never have had that memory hole between 640K and 1MB, for example. Hardware would never have assumed that it could just claim certain memory addresses. Etc etc - the mindset would have been different. We'll never know what differences it would have made, but it would have made people think rather more deeply about the consequences of their choices. Taking short cuts would have been a lot harder.

Cheers,
Wol

The trouble with 64-bit DMA

Posted Aug 13, 2022 10:47 UTC (Sat) by mpr22 (subscriber, #60784) [Link] (11 responses)

> It just lost out because - I think - Bill Gates Junior had a bunch of contacts in IBM who swung it Bill III's way.

From reading Wikipedia, and an archived January 1982 Byte magazine review of the IBM PC referenced therefrom, the situation appears to have been:

Digital Research and IBM couldn't agree on terms – IBM wanted an NDA, and DR wanted per-unit royalties rather than a one-time payment – so IBM went looking for someone else. Microsoft bought an OS from Seattle Computer Products and offered it to IBM as "MS-DOS".

CP/M-86 still ended up being made available on the PC, but only because Gary Kildall threatened IBM with a copyright infringement lawsuit over... something (Wikipedia doesn't tell me exactly what).

Very few people bought it, because (a) the PC started shipping in October 1981, but CP/M-86 for the IBM Personal Computer 1.0 wasn't ready until spring 1982 and (b) buying CP/M-86 with your PC cost $240, while buying MS-DOS cost $40 (c) porting CP/M software to CP/M-86 was about as much work as porting it to MS-DOS.

As for hardware addresses? The reason the IBM PC memory map became the industry standard for MS-DOS 8088/8086 machines was because application software was accessing hardware directly based on that memory map.

The trouble with 64-bit DMA

Posted Aug 13, 2022 10:56 UTC (Sat) by mjg59 (subscriber, #23239) [Link] (6 responses)

> As for hardware addresses? The reason the IBM PC memory map became the industry standard for MS-DOS 8088/8086 machines was because application software was accessing hardware directly based on that memory map.

Right? The hardware is mapped. Whether that mapping is fixed or mutable is up to the hardware, not the OS. You can't get away from the fact that the CPU is going to start executing code from a fixed address, so there'd better be something there that contains startup code. Yes, life is probably better if you avoid fixed mappings other than those that are absolutely necessary, but that's still something that's determined by the hardware and not by the OS that you want to run there. Throwing CP/M on a modern PC wouldn't avoid the fact that (eg) the TPM is speced to decode in a specific address range.

The trouble with 64-bit DMA

Posted Aug 13, 2022 11:28 UTC (Sat) by Wol (subscriber, #4433) [Link] (5 responses)

So what you're saying, is that if two different pieces of hardware both want the same address, the user is stuffed?

The point I'm making is that CP/M used indirect references as a matter of course. If that philosophy had carried on (and not been screwed by "the race to the bottom"), the OS would have used a soft mapping by default, and life would have been easier.

We'll never know, but it's my perpetual Word / WordPerfect moan ... WP tried to do things right, it always set out to solve the immediate problem by addressing the underlying cause. Word just shoved a fix out the door asap. That's why I hate Word - because it's a bundle of single-shot wizards. WordPerfect was far *fewer*, so easier to comprehend, *well thought out* so easier to comprehend, tools that did a far better job ... :-)

It's the mindset that matters, and the current mindset has been horribly corrupted from what I was brought up in my early career with the mini-computer "get it right" ethos. Now it's "I need a bodge now" and "if it looks like it's working, get it out the door".

Cheers,
Wol

The trouble with 64-bit DMA

Posted Aug 13, 2022 13:39 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

> if two different pieces of hardware both want the same address, the user is stuffed?

If the hardware doesn't include a mechanism for selecting the resource allocations (physical address ranges, edge-triggered interrupt lines, etc.) of a device, then yes, the user is stuffed if two different pieces of hardware are expecting to use the same resource.

These days, of course, far more of those mechanisms are conveniently software controllable; I remember Dad having to open up a Commodore 1541 and make/break solder links on the PCB so that we could sensibly have two disk drives on our Commodore 64.

> The point I'm making is that CP/M used indirect references as a matter of course.

CP/M needed them, because CP/M ran on a bewildering array of hardware. Some machines had display adapters; others had serial ports. Different manufacturers had different disk formats.

MS-DOS had many of those mechanisms, but didn't actually need them, because the IBM Model 5150 (to say nothing of the clones, which started appearing on the market less than a year after it was released) sold so many units, and had such a well documented architecture, that you could just target "IBM PC or 100% compatible" and then directly access the screen, the 8250 UART, the keyboard controller, etc if you wanted to.

The trouble with 64-bit DMA

Posted Aug 13, 2022 14:03 UTC (Sat) by khim (subscriber, #9252) [Link]

> If that philosophy had carried on (and not been screwed by "the race to the bottom"), the OS would have used a soft mapping by default, and life would have been easier.

Except that philosophy was there in the MS-DOS and there were lots of MS-DOS compatible-yet-not-IBM-PC-compatible devices with their own special version of MS-DOS.

Microsoft even used one of these to link some software since it could use 960KB of RAM!

But these can not run Lotus 1-2-3 and that meant pretty soon they have become half-forgotten history.

> Now it's "I need a bodge now" and "if it looks like it's working, get it out the door".

Yes, but Microsoft haven't invented it. It adopted it. MS-DOS, in particular. is result of said adoption.

Do you know that Microsoft actually had an operation system when it signed contract with IBM? It wasn't the one they actually delivered, eventually, but it's not as if they agreed to deliver something they haven't had at all

It was much closer to what you like, but it was much heavier, wouldn't even run on standard IBM PC (first models had memory between 16KB and 64KB and while MS-DOS is pretty happy with 64KB M-DOS needed more) and thus, eventually, was replaced.

> It's the mindset that matters, and the current mindset has been horribly corrupted from what I was brought up in my early career with the mini-computer "get it right" ethos.

We would be getting back to that mindset soon, but transition is not expected to be painless.

The if it looks like it's working, get it out the door only makes sense when markets are rapidly expanding and Moore's law covers your back.

We had a few decades of these but we are near the end of that era. Soon there would a a crash (and a big one, Great Depression would be felt as something mild in comparison) and then the mindset would change.

Not sure if that's a good thing or a bad one, though.

The trouble with 64-bit DMA

Posted Aug 13, 2022 20:54 UTC (Sat) by mjg59 (subscriber, #23239) [Link] (1 responses)

> So what you're saying, is that if two different pieces of hardware both want the same address, the user is stuffed?

Yes. Thankfully that doesn't typically happen because the hardware that has fixed addressing is part of the platform and so is just designed not to conflict, and everything else is dynamically programmed into empty areas of address space (these days, at least - in the old days you'd need to jumper your cards to map them without conflict). And again, this isn't an OS issue, it's a hardware design issue. Changing the OS running on a PC doesn't alter the fact that this hardware exists in one place, so CP/M rather than DOS wouldn't have resulted in a different outcome.

Now, you *could* avoid any knowledge of the underlying hardware layout by restricting yourself to performing all access via the BIOS, but then you're limited to whatever subset of functionality the BIOS happens to offer. So instead we abstract at the OS driver level (in general userland apps have no idea what physical address anything is associated with), and further abstract it at the IOMMU level (so, for instance, a 32-bit PCI device can still DMA into physical addresses that are above 4GB despite having no way to express that itself).

The trouble with 64-bit DMA

Posted Aug 14, 2022 11:11 UTC (Sun) by farnz (subscriber, #17727) [Link]

And on that note, the IBM PC's bus really was just the processor bus exposed to expansion cards. ISA PnP hacked around this by relying on the fact that a microcontroller in 1993 was cheap, so you could rely on detecting complex sequences of access to the bus and respond to them.

If IBM had intended the PC to be expandable by ordinary consumers (not skilled technicians who'd handle the conflicts by setting jumpers, or even soldering wires on cards), they'd have implemented the 5150's bus (and hence ISA) differently - rather than having the expansion bus be the CPU's raw bus, they'd have had the motherboard implement a "slot inhibit" feature to tell cards when they should decode the bus, they'd have had one IRQ per slot (routing to a unique pin on the 8259A, and they'd have had the motherboard assign DMA channels to cards.

The trouble with 64-bit DMA

Posted Aug 15, 2022 12:06 UTC (Mon) by wittenberg (subscriber, #4473) [Link]

This is a classic example of "Worse is Better" https://www.jwz.org/doc/worse-is-better.html a talk that Richard Gabriel gave in 1990 comparing C and Lisp. He gives the history, including his later misgivings in https://www.dreamsongs.com/WorseIsBetter.html
This problem has been around for a long time, and isn't likely to go away soon.

--David

The trouble with 64-bit DMA

Posted Aug 13, 2022 11:17 UTC (Sat) by Wol (subscriber, #4433) [Link] (3 responses)

> CP/M-86 still ended up being made available on the PC, but only because Gary Kildall threatened IBM with a copyright infringement lawsuit over... something (Wikipedia doesn't tell me exactly what).

So the story goes ... Gary asked IBM to provide a pristine IBM PC to court, then hit a magic key sequence that triggered a Digital Research copyright message.

Not much IBM could do in their defence after that ... the claim basically was that PC-Dos contained a load of DR code which was quite plausible, because Dos was written by ?Seattle Computer?, as a hobby project, never really intended for commercial sale. So the guy who wrote it probably didn't see any harm in copying loads of stuff, and when Bill bought it he probably didn't go looking for stuff that shouldn't be there ...

Cheers,
Wol

The trouble with 64-bit DMA

Posted Aug 13, 2022 14:22 UTC (Sat) by khim (subscriber, #9252) [Link] (2 responses)

That's pretty nice story except for the fact that somehow no one can present these magical key sequences and show anything like that.

On the contrary, the exact same story where Bill Gates inserted the copyright message into Commodore computer and is very easy to repeat.

This, to me, strongly hints at the fact that this Digital Research tale is just an urban legend.

Because I heard it from quite a few sources and we have pristine copies of PC-DOS 1.0, lots of original IBM 5150 devices still working… and yet no one was able to repeat this feat.

P.S. Keep in mind that triggering sequence shouldn't be too complicated. Otherwise we would have something like this.

The trouble with 64-bit DMA

Posted Aug 15, 2022 15:02 UTC (Mon) by smoogen (subscriber, #97) [Link] (1 responses)

The version I heard when I was a kid around 1978 was between Digital Equipment Corporation and Data General. Another version I heard in the early 1980's was between Apple ][ and Franklin. When I got to college in the 1980's, the one the old hands of the IT group talked about were IBM and Burroughs from the 1960's. And at a different workplace, there was some version between Univac and a clone. The stories all have the same core story where in a court case, the defamed creator does some 'plug change', 'toggle switch manuever', 'keystroke', etc and shows that the system prints up something from their company.

In the years since, I have yet to find anyone able to show where their case actually happened. Especially now that trial records are digitized, these sorts of things are findable but somehow hundreds of these cases have somehow been purged... or never existed in the first place. I expect that some version (or versions) are true. I don't know which ones though anymore.

The trouble with 64-bit DMA

Posted Aug 15, 2022 16:19 UTC (Mon) by khim (subscriber, #9252) [Link]

As I have said: the only one which actually looks plausible is about Commodore, Bill Gates and trade show.

Because we know where text was kept, how to trigger the message, how the trigger does what it does and we even know it was only in one version of Basic (it was, apparently, removed from later versions when Bill Gates showed it to Tramiel), but then it was present in all copies of that Basic, including ones that survived to this day.

All others… there are no evidence and they are too numerous to all be true.

Maybe, just maybe, one of them is true… but unlikely. More likely just a fairy tales.

P.S. I have become much more sceptical about accusations about Bill Gates stealing anything after existence of MDOS was revealed.

It's one thing to have an OS when you sign a deal and to buy a better one if you see yours is problematic (MDOS was too heavy for IBM PC which was initially sold with RAM between 16KiB and 64KiB… not a good fit when even most powerful “standard” config is not powerful enough to run “default” OS), it's completely different thing if you promise to deliver what you don't have at all

The trouble with 64-bit DMA

Posted Oct 13, 2022 1:54 UTC (Thu) by Thomas (subscriber, #39963) [Link] (3 responses)

Atari ST's TOS has entered the chat.

A jump table (among other useful stuff) in the first kB. Cannot hide proper design, can you?

The trouble with 64-bit DMA

Posted Oct 13, 2022 7:42 UTC (Thu) by geert (subscriber, #98403) [Link] (2 responses)

And so the Amiga has to follow ;-) Single fixed pointer to ExecBase at 4. All the rest is reached by indirection from that.

Actually the Commodore KERNAL (as used on e.g. PET and C64) had a fixed jump table at the end of the address space, too. This provided only basic I/O, so no fancy color graphics, sprites, or border tricks on the C64 ;-)

The trouble with 64-bit DMA

Posted Oct 13, 2022 11:04 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

For the ultimate expression of this pattern, you get ARM's SWI instruction in 1986 - a hardware jump to the OS preserving almost all registers, allowing the OS to handle the indirection from there.

The trouble with 64-bit DMA

Posted Oct 13, 2022 15:21 UTC (Thu) by geert (subscriber, #98403) [Link]

Or the 6502's BRK instruction? Which is probably predated by BRK on the 6800...

The trouble with 64-bit DMA

Posted Aug 11, 2022 14:49 UTC (Thu) by fhuberts (subscriber, #64683) [Link] (3 responses)

The default way to fix such a problem is to reroute the original function to a new function with an argument that indicates whether the 'old' or the 'new' behaviour is required and then to convert all callers of the original function one-by-one to the new function with the argument set to 'new' (evolve them).

The trouble with 64-bit DMA

Posted Aug 11, 2022 15:22 UTC (Thu) by Paf (subscriber, #91811) [Link] (1 responses)

But this isn’t about functions, it’ll have to be a hardware whitelist.

The trouble with 64-bit DMA

Posted Aug 11, 2022 15:59 UTC (Thu) by fhuberts (subscriber, #64683) [Link]

A function doesn't need to be an actual code function. It can be an architecture function or conceptual function as well. It's about the strategy of evolution.

The trouble with 64-bit DMA

Posted Aug 12, 2022 7:44 UTC (Fri) by marcH (subscriber, #57642) [Link]

Exactly, this sort of problem is not solved by "trying again in a decade or so" but by gradually switching hardware configurations one by one over the next decade.

The trouble with 64-bit DMA

Posted Aug 11, 2022 15:44 UTC (Thu) by developer122 (guest, #152928) [Link] (12 responses)

Ok, sure, it broke Torvalds' machine. It's expected to break a lot of machines, that's why it defaults to off.

But if the option (off by default) isn't there, how the in the world are people supposed to slowly find and fix buggy drivers or blacklist truly broken devices?

Add Torvalds' machine to the blacklist, merge the patch, and let's move on.

The trouble with 64-bit DMA

Posted Aug 11, 2022 19:43 UTC (Thu) by mwilck (subscriber, #1966) [Link] (9 responses)

I was thinking the same. It's just a config option, so why did he have to wipe it?

The trouble with 64-bit DMA

Posted Aug 11, 2022 19:52 UTC (Thu) by corbet (editor, #1) [Link] (8 responses)

From the email linked in the article: "but at least on x86-64, that config option is basically a "stop modern machines from working", and I don't want to have that even as an option."

The trouble with 64-bit DMA

Posted Aug 12, 2022 2:53 UTC (Fri) by developer122 (guest, #152928) [Link] (7 responses)

And there aren't other config options that can bork your kernel when used improperly?

How does he ever expect this to get fixed.

The trouble with 64-bit DMA

Posted Aug 12, 2022 8:59 UTC (Fri) by joib (subscriber, #8541) [Link]

Add a flag to the DMA allocator saying "yes, I can handle 64-bit addresses", then individual drivers can be tested and converted one-by-one? Presumably there aren't that many drivers that want and benefit from such large DMA spaces in the first place, so probably less effort than spending a decade chasing down bugs in a zillion obscure drivers.

The trouble with 64-bit DMA

Posted Aug 12, 2022 13:32 UTC (Fri) by flussence (guest, #85566) [Link] (5 responses)

There certainly are. Off the top of my head: the config option to clean up stray bus-master flags left enabled by EFI (which *sounds* like a good idea for security) causes my desktop to not boot. Turning on the AMD memory encryption support caused my (also AMD) GPU to not work and I've been dissuaded from trying again, though the BIOS option to have it always enabled seems to work without problems.

On the other hand running with the *defaults* causes my laptop's CPU frequency controls to not work correctly because it does special things for acpi_osi=Linux. And so on.

We do actually need options for this stuff, or else we're left with a padded cell of lowest common denominator functionality.

The trouble with 64-bit DMA

Posted Aug 12, 2022 14:17 UTC (Fri) by flussence (guest, #85566) [Link] (3 responses)

Oh, I just remembered an even easier one! Ethernet MTU. It's incredibly easy to cause your network to break in weird and wonderful ways just by setting each card's MTU to its maximum advertised amount, and you don't even need to reboot, let alone tweak the kernel.

The trouble with 64-bit DMA

Posted Aug 12, 2022 21:42 UTC (Fri) by WolfWings (subscriber, #56790) [Link] (2 responses)

That's your switches not supporting the higher MTU not the network card though usually. :)

The trouble with 64-bit DMA

Posted Aug 15, 2022 19:31 UTC (Mon) by immibis (guest, #105511) [Link] (1 responses)

All devices on the network must have the same MTU. Otherwise, one device may send a frame to another device which the latter can't receive, and there is no way to detect this. End result: the network just doesn't work sometimes.

The trouble with 64-bit DMA

Posted Aug 16, 2022 9:52 UTC (Tue) by farnz (subscriber, #17727) [Link]

That's one of the bugs of switched Ethernet as opposed to doing routing at every device. If you had a router on the other end of the cable (or in WiFi, if the AP routed between all stations), you'd be able to have per-device MTUs and path MTU discovery, and slowly increase the MTU as you upgrade equipment (at the expense of a small number of ICMP Packet Too Big messages when you cross segments).

WiFi, for example, has a "native" MTU of 2304 bytes, but we choose to limit it to 1500 because of the need to switch to 1500 byte MTU Ethernet - in theory, though, you could have an infrastructure with a 2304 byte MTU (non-standard for Ethernet, mind), and get the efficiency gains across the network.

This is, of course, not the only consideration in choosing between switching L2 frames or routing L3 packets, but it is a downside of choosing to switch instead of route.

The trouble with 64-bit DMA

Posted Aug 12, 2022 21:41 UTC (Fri) by WolfWings (subscriber, #56790) [Link]

By default the kernel actively ignores and denies any _OSI(Linux) requests and in fact throws an alert into your DMESG that your BIOS attempted one on newer kernels.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/li...

That's been true dating all the way back before the v3.0 days (I didn't verify a specific git link for earlier):

https://github.com/torvalds/linux/blob/02f8c6aee8df3cdc93...

The trouble with 64-bit DMA

Posted Aug 11, 2022 21:27 UTC (Thu) by Tov (subscriber, #61080) [Link] (1 responses)

It seems like the default "off" actually disables the old behavior.

But yes: "Lets be brave and default this new behavior across the board for all odd devices out there, do no significant testing and hope other developers and users will find and fix all the buggy devices for us".

No wonder Linus has given them a decade to think about another way forward...

The trouble with 64-bit DMA

Posted Aug 12, 2022 8:58 UTC (Fri) by Wol (subscriber, #4433) [Link]

> No wonder Linus has given them a decade to think about another way forward...

The problem is we DON'T HAVE a decade. The old way is already breaking top-end machines, and the problem is only going to get worse as that becomes consumer hardware ...

Cheers,
Wol

The trouble with 64-bit DMA

Posted Aug 11, 2022 21:09 UTC (Thu) by jreiser (subscriber, #11027) [Link]

Murphy acknowledged the risk ...

Murphy's law: If anything can go wrong, it will.

The trouble with 64-bit DMA

Posted Aug 11, 2022 23:41 UTC (Thu) by roc (subscriber, #30627) [Link] (3 responses)

How about clearing that "64 bit addresses supported" feature bit on all devices and then adding it back if/when each device has actually been tested with it?

The trouble with 64-bit DMA

Posted Aug 12, 2022 13:12 UTC (Fri) by mss (subscriber, #138799) [Link] (1 responses)

There's very little incentive for people to test their devices compatibility with 64-bit DMA.

On the other hand, the opt-out variant, while apparently providing miserable UX, at least makes people report incompatible devices.

The trouble with 64-bit DMA

Posted Aug 13, 2022 22:29 UTC (Sat) by gray_-_wolf (subscriber, #131074) [Link]

> There's very little incentive for people to test their devices compatibility with 64-bit DMA.

Hm, but most people just don't care about this no? I don't feel like my laptop is limited by only having 32-bit mapping. Wouldn't it be a safe assumption that likes of Google would bother testing the drivers for those few places where they actually suffer from this problem? So you could enable 64-bit (or rather, the supported range) just for those devices/drivers?

The trouble with 64-bit DMA

Posted Oct 13, 2022 2:10 UTC (Thu) by Thomas (subscriber, #39963) [Link]

That's going to be a tough challenge.

The 64-bit address support is read from a device's PCI capabilities, i.e. it is read from the label on the PCI device's box. This is something one cannot modify, because it comes with the equipment and is read-only. Therefore, in order to be able to overwrite this value, one has to already know that a device is incapable of supporting a 64-bit address space despite reporting it. -> Chicken-egg problem. On top of that a modify-on-write copy of a device's capabilities needs to be maintained because some devices need to have the 64-bit capability overwritten.

Reserve the lower four gigs

Posted Aug 12, 2022 4:42 UTC (Fri) by epa (subscriber, #39769) [Link] (3 responses)

At a first glance, 4GB of DMA address space seems like it should be enough for anybody, but a big system with the right workload can fragment that space and make allocations hard.

On a big system why not reserve the lower four gigabytes (or most of it) for DMA? Then it wouldn't get fragmented.

Reserve the lower four gigs

Posted Aug 12, 2022 5:40 UTC (Fri) by cladisch (✭ supporter ✭, #50193) [Link] (2 responses)

> why not reserve the lower four gigabytes (or most of it) for DMA?

With an IOMMU, the I/O address space is virtual and separate from the physical address space. The entire I/O address space already is reserved for DMA; the problem is that all of it gets filled up.

Reserve the lower four gigs

Posted Aug 12, 2022 14:20 UTC (Fri) by mss (subscriber, #138799) [Link]

As far as I know, making a large IOVA-contiguous DMA allocation still needs the memory to be contiguous in the CPU (physical) address space, even with an IOMMU.

It would help avoid the issues caused by memory fragmentation if more drivers were converted to the dma_alloc_noncontiguous() API instead, which allows assembling such IOVA-contiguous allocation from non-contiguous single pages.

Reserve the lower four gigs

Posted Aug 15, 2022 9:11 UTC (Mon) by epa (subscriber, #39769) [Link]

Oh, I understood that the bottom four gigs of *physical* memory needed to be used, but it was fragmented because of other uses.

The trouble with 64-bit DMA

Posted Aug 12, 2022 5:53 UTC (Fri) by maniax (subscriber, #4509) [Link] (4 responses)

This is something I see every month - anything you try to do with hardware that's not what the Linux kernel does by default, doesn't work. Nobody tests that, nobody seems to care and half the time you're on your own, working around weird hardware bugs.

(so for example for every DPDK driver and related things, for every problem you see you also need to go look in the kernel how something is implemented)

The trouble with 64-bit DMA

Posted Aug 12, 2022 9:07 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

And Windows is different how?

Oh yes, all development is done behind closed doors, and we don't see the hardware manufacturers screaming about their drivers.

The problem is no different - IF the manufacturer can be bothered to provide a linux driver but for any OS the attitude is typically "if it works, ship it", and the kernel devs attitude is sadly but inevitably, "it works for me, I can't test it on anything else, so I'll assume it's fine".

As an application programmer, I hate that attitude, there's loads of (sadly pretty ubiquitous) software that is buggy / illogical / downright frustrating as hell software out there. I try and make sure everything I do makes logical sense and addresses the entire "truth table" range (even if only "I don't want to go there, but I'm not going to stuff you up if you do"). But the majority of crap out there doesn't even try :-(

And for a hardware engineer/device driver writer, that attitude is very hard to take in the face of hardware bugs ...

Cheers,
Wol

The trouble with 64-bit DMA

Posted Aug 12, 2022 9:19 UTC (Fri) by maniax (subscriber, #4509) [Link] (1 responses)

Sorry, I think I wasn't clear :)

My beef is with the hardware vendors on this (and you can do s/Linux/Windows/ in my comment, or say "the most used OSes"). What they test with seems to be what the OSes do, so using any feature not used in Linux, or in a way that Linux doesn't use leads to problems.

I'm not talking about extremely obscure stuff, either... Without naming names, one vendor of NICs had a problem that when filtering packets in the NIC based on UDP port, it tended to corrupt the packets. They'd come in, with the flag for correct checksum set, but the packet would be either truncated, or would have another packet starting at a 4k offset. Never got them to fix it, as it was easily reproducibe only with serious amounts of traffic.

And we get to the situation where a hardware bug is not fixed because the mainstream OSes work around it or don't use that functionality, and the hardware vendors don't really care for the rest. So we all end up implementing the same (bad) workarounds...

The trouble with 64-bit DMA

Posted Aug 12, 2022 23:29 UTC (Fri) by Avamander (guest, #152359) [Link]

Even Linux defaults might not function, RDRAND and TSC come to mind. But there are other, more subtle bugs, like incomplete ACPI tables.

The trouble with 64-bit DMA

Posted Aug 13, 2022 8:17 UTC (Sat) by thoeme (subscriber, #2871) [Link]

>for example for every DPDK driver
Hm, I have to ask our DPDK developer about that... I know the DPDK driver for the AMD EPYC 3000 SoC (on a COM express 7 module) did not work when I tested it, according to one of the people responsible because the specific PHY used on the dev carrier board was not supported. Now I wonder if there's more to it...

The trouble with 64-bit DMA

Posted Aug 13, 2022 0:47 UTC (Sat) by xxiao (guest, #9631) [Link] (1 responses)

How does Nvidia's CUDA do this then? It has hundreds of GB memories on both sides(GPU and CPU) and they definitely need to allocate and move via DMA at large regions(and cross PCIe), there might be some smart way to overcome this IOVA limit?

The trouble with 64-bit DMA

Posted Aug 13, 2022 15:43 UTC (Sat) by khim (subscriber, #9252) [Link]

Both nVidia and AMD only recently offered a way to do that. AMD calls it Smart Access Memory and nVidia Resizable BAR.

Before that they just used 256MB-512MB fixed preallocated “window” to move data.

The trouble with 64-bit DMA

Posted Aug 13, 2022 10:27 UTC (Sat) by andy_shev (subscriber, #75870) [Link] (1 responses)

The problem with Linux nowadays that Linus wants to have his branch as (almost) stable all over the time, and all CIs are for that. The 2 months cycle is too short for such a patch, and it basically blocks the Linux improvements (similar happened to printk threads). We need real development Linux kernel tree, where 6 month or so the release cycle would be.

The trouble with 64-bit DMA

Posted Aug 13, 2022 12:30 UTC (Sat) by mss (subscriber, #138799) [Link]

Sounds like there should be linux-next-longterm tree for the really brave.

And CI infrastructure and bots should be testing that.