DeepMind claims its new AI coding engine is as good as an average human programmer


DeepMind has created a AI system named AlphaCode which he says “writes computer programs at a competitive level”. The Alphabet subsidiary tested its system against coding challenges used in human competitions and found that its program achieved an “estimated rank” placing it in the top 54% of human coders. The result is a significant step forward for self-directed coding, says DeepMind, though AlphaCode’s skills aren’t necessarily representative of the kind of programming tasks faced by the average coder.

Oriol Vinyals, Principal Investigator at DeepMind, said The edge via email that the research was still in its early stages, but the results brought the company one step closer to creating a flexible problem-solving AI – a program capable of autonomously tackling the coding challenges that are currently the domain of humans alone. “Long term, we are excited about [AlphaCode’s] potential to help programmers and non-programmers write code, improve productivity, or create new ways to build software,” Vinyals said.

AlphaCode has been tested against challenges organized by code strength, a competitive coding platform that shares weekly issues and publishes leaderboards for coders similar to the Elo rating system used in chess. These challenges are different from the kind of tasks a coder might face when building, say, a business application. They are more autonomous and require a broader knowledge of algorithms and theoretical concepts in computer science. Think of them as highly specialized puzzles that combine logic, math, and coding expertise.

In example challenge on which AlphaCode was tested, competitors are asked to find a way to convert a string of repeated random characters s and you letters into another string of the same letters using a limited set of inputs. Competitors cannot, for example, just type new letters, but must use a “backspace” command that deletes several letters in the original string. You can read a full description of the challenge below:

A sample challenge titled “Backtracking” that was used to evaluate DeepMind’s program. The problem is of medium difficulty, with the left side showing the problem description and the right side showing sample test cases.
Image: DeepMind/Codeforces

Ten of these challenges were introduced in AlphaCode in exactly the same format as they are given to humans. AlphaCode then generated a larger number of possible answers and sorted them by running the code and checking the output like a human competitor would. “The whole process is automatic, with no human selection of the best samples,” said Yujia Li and David Choi, co-editors of the AlphaCode paper. The edge by email.

AlphaCode has been tested on 10 of the challenges completed by 5,000 users on the Codeforces site. On average, it ranked in the top 54.3% of answers, and DeepMind estimates that this gives the system an Elo Codeforces of 1238, which puts it in the top 28% of users who have competed on the site at course of the last six months.

“I can safely say that AlphaCode’s results exceeded my expectations,” Codeforces founder Mike Mirzayanov said in a statement shared by DeepMind. “I was skeptical [sic] because even in simple competitive problems you often have to not only implement the algorithm, but also (and this is the hardest part) invent it. AlphaCode has managed to place itself at the level of a promising new competitor.

An example interface from AlphaCode tackling a coding challenge. The input is given as is to humans on the left and the generated output on the right.
Image: DeepMind

DeepMind notes that AlphaCode’s current skill set is currently only applicable in the realm of competitive programming, but its capabilities open the door to creating future tools that make programming more accessible and one day fully automated. .

Many other companies are working on similar applications. For example, Microsoft and the artificial intelligence lab OpenAI have adapted the latter’s GPT-3 language generator program to work as an auto-completion program that terminates strings of code. (Like GPT-3, AlphaCode is also based on an artificial intelligence architecture known as Transformer, which is particularly adept at analyzing sequential text, both in natural language and in code). For the end user, these systems work just like the Smart Compose feature in Gmails – suggesting ways to finish whatever you write.

Many advances have been made in the development of AI coding systems in recent years, but these systems are far from ready to simply take over the work of human programmers. The code they produce is often buggy, and because the systems are usually trained on public code libraries, they sometimes reproduce copyrighted material.

In a study of an AI programming tool named Copilot developed by the GitHub code repository, researchers found that about 40% of its results contained security flaws. Security analysts even offered that bad actors could intentionally write and share code with hidden backdoors online, which could then be used to train AI programs that would insert those errors into future programs.

Such challenges mean that AI coding systems are likely to be slowly integrated into the work of programmers – starting with assistants whose suggestions are treated with suspicion before they are trusted to do the work on their own. . In other words: they have an apprenticeship to do. But so far, these programs are learning fast.


Comments are closed.