Lean is a language that supports theorem prooving. It has, among other features, dependend data types - types that allow to check state transition
This code represents the very core of AI training and inference. All in 200 lines of code.
Mercor is a company that created APEX benchmark for AI models. The benchmark is concentrated on finance.
AI creates proofs of Erdos's theorems
Cover article to the Poison article
AI Halucination is ineviteble as users prefer certainty of answers over correctness - and so the models are trained that way. In addition, models are statistical engines, where errors accumulate with length and complexity.