Failures, mistakes and confusion

I feel like there's quite a bit of confusion online in discussions of different "error handling models" in programming, and I think that it's probably because basically every "error handling model" in programming languages I've seen conflates two (or more!) separate concepts into a single system.

Failures

Failures happen when the system fails to perform some action, usually due to some constraint. Constraints could be physical (limited memory size), security (authentication), business (quotas), etc. Failures (almost always) cannot be proven to never happen in some code. They are usually explicitly communicated. They can often be worked around/fixed without human intervention. You usually handle them by either trying to work around and/or fixing the source issue of the failure, or communicating a failure yourself, depending on context and philosophy (fail-fast, etc).

Mistakes

Mistakes are faults in the code that break invariants, both explicit and implicit. Invariants are usually expressible with (some type of) logic. Large classes of mistakes can be proven to never happen in some code. They may be explicitly communicated when an invariant check fails, but sometimes it may result in an immediate effect without communication. It's usually undesirable to work around mistakes in code, and the largely preferred solution is to stop execution in a safe manner, stopping the propagation of the invariant breakage.

Confusion

Both failures and mistakes are usually mashed together into errors - except in static analysis, where essentially only mistakes can show up. They are predominantly communicated via the same channels. And in fact, a lot of invariant checking happens at the same time as checking for failures! EINVAL is the kernel saying "hey, I think you have a mistake" after it checked some invariants before doing an operation, and those invariants have been broken. The next step for the kernel after checking invariants would be trying to check some constraints to see if the operation can be performed, and if any of the constraints are not satisfied that would indicate failure. Invariant checking is done in the same place as checking for constraints. As such, mistakes detected in such way are communicated in the same way as failures. Including the ones you may want to handle, like a disk being filled up or the system having lost internet connection.

Mistakes and failures have very different properties in what they mean and how you want to handle them, and yet they are treated as if it's a singular concept of an error. I don't think they should live in a single system

Conclusion

There may be a benefit in separating these out more. But I don't have any solutions, I just wanted to bring this up in hopes that others see this and start thinking about it.