A National Deep Inference Facility

July 4, 2023

David Bau

Update: See the NDIF website for current information.

In October 2022, the White House Office of Science and Technology Policy released a Blueprint for an AI Bill of Rights delineating a consumer’s right to AI systems that “provide explanations that are technically valid, meaningful and useful.” In January, 2023, the National AI Research Resource Task Force identified one of the four critical opportunities for strengthening the U.S. AI R&D ecosystem as the development of trustworthy AI by “supporting research on AI’s societal implications, developing testing and evaluation approaches, improving auditing capabilities, and developing best practices for responsible AI R&D can help improve understanding and yield tools to manage AI risks.” Two months later (March 2023), the Future of Life Institute published a “Pause Giant AI” open letter which has since garnered more than 25,000 signatories, including many national leaders in AI research, recommending “a significant increase in public funding for technical AI safety research in the areas of alignment, robustness and assurance, and explainability and interpretability”. These three documents published in the last six months alone, highlight the urgency of research to explain, audit, evaluate, and manage impacts of Large Language Models (LLMs).

The Challenge Posed by Black-Box Deployment

Meanwhile, LLMs such as ChatGPT are being adopted more quickly than any previous technology, with widespread deployment in consumer-facing technologies, touching every field involving reading, writing, or programming, even as its mechanisms remain unexplained. Because we do not understand how LLMs make their predictions, we find ourselves in a situation where the most impactful class of AI model today is inscrutable: the opacity of LLMs has become a foundational challenge to our national goal of developing trustworthy AI.

When Senator Schumer spoke at CSIS to introduce the SAFE Innovation Framework, he summarized the challenge posed by large-scale deployments of black-box systems:

Explainability is about transparency. When you ask an AI system a question and it gives you an answer—perhaps an answer you weren’t expecting—you want to know where that answer came from. You should be able to ask “why did AI choose this answer, over some other answer that could have also been a possibility?” And it should be done in a simple way, so all users can understand how these systems come up with answers.

Congress should make this issue a top priority, and companies must take the lead in helping us solve this problem. Because without explainability, we may not be able to move forward.

If the user of an AI system cannot determine the source of the sentence or paragraph or idea—and can’t get some explanation of why it was chosen over other possibilities—then we may not be able to accomplish our other goals of accountability, security, or protecting our foundations.

Explanability is thus perhaps the greatest challenge we face on AI. Even the experts don’t always know why these algorithms produce the answers they do. It’s a black box.

No everyday user of AI will understand the complicated and ever-evolving algorithms that determine what AI systems produce in response to a question or task.

And of course, those algorithms represent the highest level of intellectual property for AI developers. Forcing companies to reveal their IP would be harmful, it would stifle innovation, and it would empower our adversaries to use them for ill.

Fortunately the average person does not need to know the inner workings of these algorithms. But we do need to require companies to develop a system where, in simple and understandable terms, users understand why the system produced a particular answer and where that answer came from.

This is very complicated work. And here we will need the ingenuity of the experts and companies to come up with a fair solution that Congress can use to break open AI’s black box.

The National Deep Inference Facility Proposal

To enable scientists to meet this challenge, we are proposing a new cyberinfrastructure infrastructure to the NSF. It is called the National Deep Inference Facility, and it provides the investment in a pivotal software-hardware infrastructure to unlock research into cracking open the black box and explaining the largest class of LLMs.

Our proposal to the NSF can be found here.