Developer Discovers Hardware Issue in Apple Neural Engine on iPhone 16 Pro Max

Spotify Developer Rafael Costa Found a Hardware Bug in Apple's Neural Engine, Causing Local Language Models to Malfunction on iPhone 16 Pro Max, While Other Apple Devices Remain Unaffected

Costa was working on a finance tracker. The app was supposed to automatically record expenses, classify purchases for further analysis, and update a widget on the Apple Watch that shows the percentage of budget spent. The developer decided to build the classification function based on a language model. During the first launch of the test version, the app received a restaurant purchase and issued the category unknown. Log analysis showed that the Apple Intelligence system hangs in a loading state.

The developer decided that if the Apple Intelligence function was not working, he could integrate support for neural networks using the MLX framework. During the app launch on the iPhone 16 Pro Max, it turned out that the smartphone's processor was loaded to almost 100%, while the model at that moment generated a set of unrelated symbols and phrases, without issuing a stop token. For example, when asked "what is 2 + 2," the model generated the phrase "Applied.....*_dAK[...]".

Costa notes that after this he considered himself a talentless developer who could not run a ready-made framework on the smartphone. He managed to return to the project three days later. The new attempt to eliminate the strange behavior of the model began with an experiment.

During all this time, Costa was running the test version of the app on the iPhone 16 Pro Max with iOS 26. The developer decided to run the code on his old iPhone 15 Pro with iOS 18, and everything worked without errors. After updating to the current version of the OS, there were also no problems.

To find out exactly where the error occurs, Costa decided to compare how neural networks work on different Apple devices. For the experiment, he set the following rules:

  • use the quantized version of Gemma on all devices;

  • pass a simple prompt "what is 2 + 2" as a query;

  • set the model temperature to 0.0 to eliminate variability in responses;

  • log tensor values at each layer of the neural network.

The test results on three devices are as follows:

Device

Tensor Values

Result

iPhone 15 Pro

3: ""

Successful

MacBook Pro

3: ""

Successful

iPhone 16 Pro Max

3: ""

Failed

The developer believes that the problem is likely hardware-related, as the error occurs specifically on the iPhone 16 Pro Max. The smartphone is designed based on the A18 Pro chip with the Neural Engine. For computations, the system uses the MLX and Metal frameworks. The problem likely lies within this stack.

Later, Costa conducted an experiment on the new iPhone 17 Pro Max. Language models worked without errors on it. This further indicates that the bug is hardware-related and manifests only when running models on the A18 Pro processor.

Comments