Goodbye, programmer? AI already writes code better than you

18:25
14.11.2024
Artezio_team
359

Dmitry Rozhkov, manager of the Kubernetes services team and creator of the popular YouTube channel Senior Software Vlogger, shared his experience testing AI assistants for programming. He talked about whether neural networks can replace programmers, what pitfalls await when working with AI assistants, and why we still do not see a boom in new applications created with the help of artificial intelligence.

We asked Dmitry Rozhkov to talk about the future of programming after his video with testing AI programmers. Dmitry's experiment sparked heated discussions in the IT community and raised many questions about the role of artificial intelligence in software development. We decided to delve into the topic and find out firsthand how real the threat of replacing a human programmer with artificial intelligence is today.

The full version of the interview, organized by our company Artezio, can be viewed on the channel Ai4Dev on Youtube. We also have a Telegram channel for developers who use AI. In it, you can exchange opinions and real cases.

“Expecting a neural network to write everything correctly on the first try is a big mistake”

Dmitry, tell us why you decided to test these AI tools and how you chose the products for testing?

I work as a manager, leading a team of 15 people. Although I haven't been programming at work for a long time, I continue to do programming for personal projects, including media projects, blogs, and telegram bots.

AI agents came into my area of interest partly because of my media activities, and also because I still consider myself a developer. The choice of tools for testing started with the high-profile Devin project. However, since Devin is in closed beta, I decided to explore other available tools on the market, preferably open source or subscription-based.

First, I tested Devica, which appeared shortly after the announcement of Devin. Devica provides a development environment with a chat for communicating with the agent, a virtual internet browser, a console, and a virtual IDE for writing code.

Then I tried Coursor - an IDE with artificial intelligence features. The interface aims to overturn the traditional IDE, allowing users to write instructions to the agent instead of directly writing code.

Next was Aider - a terminal tool that, in my opinion, fits most organically into the current workflow of programmers.

Finally, I tested the tool from Replit. They provide hosting and have developed an agent that allows you to write code directly on their service, deploy it immediately, and pay for hosting. This is part of their strategy to attract more users and democratize development.

When choosing tools for testing, I focused on those that were most discussed in the developer community. Of course, I couldn't cover all available options. For example, I tried Openhands, but it didn't work at all in my case.

“Reproducing the flexibility and efficiency of human thinking in the context of programming has not yet been achieved”

Recently, we have observed a trend of integrating code execution environments directly into AI model interfaces. Claude has implemented this through "artifacts", and OpenAI is experimenting with similar solutions. How do you assess the impact of these built-in sandboxes on the efficiency of AI in development? How much does the ability to immediately execute and test generated code bring us closer to creating a full-fledged AI assistant for programmers?

"Garbage in, garbage out"

Let's talk about the complexity of interacting with AI models in the context of programming. How would you characterize the process of formulating a task for AI? How iterative is it usually, and what level of detail in the prompt do you consider optimal for obtaining a quality result? Are there any features or techniques of prompt engineering that you find particularly effective when working with AI in the field of software development?

One of the key problems in working with neural networks is the quality of the input data. People who have worked a lot with language models (LLM) come to the conclusion: "garbage in, garbage out." In other words, the quality of the answer directly depends on the quality of the query.

Take, for example, neural networks for generating images. It would seem simple: write "a beautiful kitten at sunset" and you will get the corresponding picture. But if you need a specific image that you have in mind, difficulties arise. How to describe it so that you get exactly what you want?

The situation is similar with programming. A neural network can easily handle the request "make a random web page." But if you need something specific that can be verified, then problems begin.

Interestingly, none of the tested models suggested writing tests for the code. This is surprising, considering that testing is a widely recognized best practice in development, helping to verify the program.

As for the detail of the prompt, I tend to think that it should be in human language and not too detailed. Otherwise, writing a prompt for a simple task can take as much time as writing the code itself. I have seen examples where people use 200-line prompts, detailing the structure of the database and ORM in TypeScript. But this already looks like you wrote half of the code yourself and then say that the neural network did it.

I believe that the task for AI should be formulated in much the same way as a technical task from a competent manager for a programmer.

The number of iterations when working with AI depends on the user. I have come to the conclusion that the target audience for these tools is people who understand the basics of programming. They should be able to read code, know about variables, loops, dependencies, be able to run and test code. Essentially, this should be at least a competent tester who understands internal processes, rather than just working on the "black box" principle.

In addition, working with AI requires time and patience. You need to be able to improve the initial prompt, try different approaches. In my experiments, each neural network took from half an hour to an hour. At some point, I just stopped, not wanting to spend the whole day getting fully working code from one neural network.

Are you saying that a non-programmer cannot create a program on their own? But LLM has the ability to return an error. The neural network agrees to constantly refine the code to a working state. It is not necessary to be a programmer to understand whether the program works or not.

The question of whether a person without programming skills can create a program independently with the help of LLM is quite complex. On the one hand, modern language models have built-in error handling mechanisms and can iteratively improve the code. Theoretically, even a non-programmer can determine whether the program works or not.

I would not categorically state that this is impossible. Surely there are examples where people without programming experience have successfully created simple programs using LLM. After all, neural networks are capable of generating code even from a single prompt. However, the result largely depends on the complexity of the project and the number of "degrees of freedom" in the task.

For example, in my test, there were several interconnected tasks: uploading a file to remote storage, converting it to text, compressing the text, creating chapters for YouTube, and publishing a blog post. This is already quite a complex chain of actions.

On the other hand, there are examples where people create simple web interfaces that call one or two APIs using Python - a language well known to neural networks. In such cases, you can get a working result in 30-40 minutes of working with various prompts.

The key question here is the efficiency of time use. Should a manager spend a whole day communicating with a neural network instead of performing their direct duties? If the value of the created product exceeds the potential value of other tasks of the manager (for example, communicating with clients or optimizing processes), then this approach may make sense. However, it is often more efficient to assign work with the neural network to a junior developer who already has basic programming knowledge. This will allow you to get the result faster and better.

In light of the development of AI technologies, how do you assess the future of the programmer profession? Many companies are investing in platforms where a client can order a website "in one click," and AI supposedly does all the work. Do you consider this a realistic scenario or more of a marketing ploy?

I tend to think that such a scenario is possible, but with some reservations. It is important to understand that for AI there is no fundamental difference between writing a complex algorithm used in interviews or simple code for saving data to a database. However, there are limitations that need to be considered. These are the quality of the input data and the AI training set, the limitations of the context window with which the neural network works, and the complexity of integrating various system components.

In my tests, I noticed that models cope better with simpler, isolated tasks, but begin to experience difficulties when it is necessary to create a system with a large number of interconnected components. For example, AI cannot simply take and run code on your computer - additional infrastructure and setup are needed for this.

Therefore, although the idea of a "one-click site" sounds attractive, implementing such a system requires solving many technical and organizational issues. It is not impossible, but it is not as simple as it may seem at first glance. The key challenge is not so much in writing code, but in creating a holistic, integrated system capable of working in real conditions.

How do you evaluate new approaches to AI code generation, in particular, systems like O1 preview? How much, in your opinion, can such innovations improve the quality of programming with the help of AI assistants? What approaches are currently the most common in this area?

It seems to me that if you look at the O1 preview, they just packaged what some people, so-called prompt engineers, did on their own. One of my acquaintances joked that the profession of a prompt engineer did not have time to appear, and it has already been automated.

The thing is, it really improves the quality of the output. I watched an interview with Maxim Strakhov on "Podlodka", I think, where he explains in great detail how LLM works. In particular, he talks about one of his cases where he asked the neural network to generate a haiku, a Japanese poem. And with the second request, he asked if it looked like a haiku. That is, he asked the same neural network without prior context to verify the result. And this second request significantly improved the output.

It turns out that if the answer is "no, it doesn't look like a haiku," then we start the cycle again and, say, try this cycle 10 times. If after 10 times the haiku is not generated, we fall with an error or go to the next step. This moment of generation, verification, and possibly repetition really dramatically improves the quality of the output.

The only difference, probably, is that we need to learn to verify the result of the written program again. If we are talking about a program that simply counts some numbers, it is probably easy to verify. If this program controls a drone, then we need to somehow understand that it does it correctly. Accordingly, some kind of emulator is needed. And so that the data from this emulator can be automatically analyzed, to understand that everything is happening correctly. Because a person can simply look with their eyes that the drone really took off, flew two meters, and landed. And the neural network cannot do this yet.

That is, it already needs to connect the body, hands, eyes, some kind of physical interface, to move in the direction of connecting artificial intelligence to the body, and then the results will improve.

«Programmers say that complex AI systems cannot write yet - this is their defensive reaction»

How complex projects can be done with the help of an AI assistant? The developers we talked to believe that only simple ones - some kind of website, bot, and that's it. But something complex, some kind of multimodal system, is almost impossible to do. That is, if we are talking about some kind of bank, then it is impossible to comprehensively apply AI there. You can essentially write code, but you cannot implement it comprehensively. What do you think about this?

I think this is about the same problem. It's not about the complexity of the algorithms, because basically the neural network doesn't care what complexity of code it generates. It generates code that writes "Hello World" or implements the Union-Find algorithm equally quickly. It doesn't care.

But the problem, in particular with large systems, is that they usually use a large context. And when we write large systems, we usually involve many people. Including in order to load this huge context that we have in small pieces into different people. Accordingly, none of the people has the full context in their head. It just doesn't fit in the head.

We have, let's say, some architect who just draws UML diagrams and roughly understands how this system works. Then we have each individual brick, which represents a group of systems, breaking down into organizations. In these organizations, systems break down into teams. Teams write microservices. And then it all integrates together. And, accordingly, it all comes together from the bottom up. We have some metrics on dashboards. And based on these readings, we draw conclusions.

It seems to me that the problem is not that the neural network is not capable of writing this code. The problem is that we have not yet learned to overcome these, as they say in English, "air gaps" - gaps between systems. It is necessary to teach the neural network to somehow transmit information through these gaps.

Suppose we have technical documentation, a thick book on the development of a banking system. Someone has to read all this documentation. But even if he reads it all, he will not remember it all. Accordingly, this person will make a summary. A summary is essentially a text summarization. This problem is solved. That is, each individual small problem can be solved by a neural network. But linking them together is a problem. AI developers are currently working on solving it. They are trying to fill these gaps with something, some kind of "ether" that the neural network can also use.

It seems to me that development is moving in this direction. And the fact that programmers say that such complex systems cannot yet be written by a neural network seems to me to be a kind of defense mechanism.

Will the development of AI assistants and automatic code generation systems affect the profession of a programmer in the near future? Do you expect a significant reduction in the number of developers or a change in the structure of demand in the labor market?

It's a difficult question. Some people say that we will just write more programs. On the other hand, as the situation in the US and European economies shows after the COVID bubble burst, when tens of thousands of programmers were laid off, it seemed that not so many programs were needed. Free venture money ran out, and they stopped being used in various mobile applications, conditionally. Why should the creation of a neural network suddenly increase the number of programmers, if even without a neural network tens of thousands of programmers were laid off? This is a question that needs to be answered first. I probably don't have an answer to it.

It all depends on how much this network can reproduce itself. Theoretically, it is possible, probably, to come up with such a step that we will have a perfect black box that no one else can reproduce again, but it solves all our problems. Perhaps there will be a group of scientists of a hundred people who service it, and a specialized institute where we prepare a replacement for these scientists. That is, the training is no longer for hundreds of thousands of engineers a year, but for people at the PhD level who specifically go to work on this system.

If an error occurred somewhere, it goes back to the initial requirements. This is the moment I talked about in my first video about the perfect artificial programmer. What is the problem with live programmers? When new requirements come in, we have to somehow fit them into the already written system. But the artificial programmer does not need to fit them. He will simply rewrite the entire system from scratch perfectly with the new requirements.

«You need to take the time to figure it out»

And can't the LLM itself create its own programming language that will allow people without programming experience to successfully control the effectiveness of AI?

By and large, such attempts have been made with various languages, like Rust. The Rust language is famous for its error messages - everything is described in detail: what happened, where, where to look. Theoretically, it seems to me that it is not the neural network that should make such a language for itself. This should be done by people whose task will be to say: "Okay, the error messages of this programming language should be written in plain English, without lines of code, without anything." But lines of code are still needed by people to understand where to go to look, what to do. And this bundle will still have to be somehow preserved.

It seems to me that you will be able to verify whether the neural network has solved your problem correctly or not. You just need to formulate the request in such a way that, conditionally, if we write a calculator, then we can test it and understand that it solves mathematics correctly. If there are any errors, then this is not about verifying the correctness of the program, but about debugging the program.

What advice would you give to programmers, developers who want to try using AI in their work? Based on your test, what pitfalls should be considered first? And, in general, how applicable is all this today?

AI assistants are already quite applicable in the work of a programmer, but the key to their effective use is understanding their capabilities and limitations. My main advice is to invest time in studying these tools. Spend at least a day to deeply understand their functionality, understand their strengths and weaknesses. It is important to realize that AI is not a magical solution to all problems, but a powerful tool that requires skillful handling.

In programming, there is a concept of "leaky abstraction". It implies that for effective use of the tool, it is necessary to understand its internal structure. I recommend watching several interviews or reports where experts explain in detail how neural networks process requests. This knowledge will help you formulate more effective prompts and interact with AI more consciously.

Understanding the principles of AI assistants will allow you to achieve significantly higher quality results. You will be able to better anticipate which queries will give the desired output and how to interpret the received responses. As a result, this will not only increase your productivity but also help avoid common mistakes when using AI in programming.

As I said before, garbage in, garbage out. And if a person just comes to the neural network and writes: "Create a TikTok for me," and the neural network does something, but it doesn't work out, and the person says: "Haha, you can't write code" - this is a very arrogant position, I think. You need to take the time to figure it out.

Why don't we see new TikToks or programs created by neural networks today? It turned out that many non-programmers entered the market, but we do not see a boom of new ideas, new programs.

Because, as it turned out, programming is not the most difficult part. First, you need the very idea of a product that solves a real problem or meets the needs of users. Generating and validating such an idea is a separate skill that cannot be replaced by AI.

Secondly, even having an idea, it is necessary to validate it correctly. This requires understanding the market, user needs, and business processes. AI can help with data analysis, but the interpretation of results and decision-making remains with the person.

Finally, the most difficult part is marketing and attracting an audience. Creating an application with the help of AI is possible, but getting people to use it is a completely different task.

Therefore, if a conditional manager spends the whole day creating a landing page with the help of AI, it may be an inefficient use of his time. Instead, he should focus on validating the idea, communicating with potential customers, and developing a promotion strategy. And the technical part can be delegated to a junior developer who, with the help of AI, can quickly create the necessary landing page.

Thus, the absence of a boom in new applications is not due to the limitations of AI in development, but to the complexity of the process of creating a successful product, where programming is just one of many components.

Goodbye, programmer? AI already writes code better than you

“Expecting a neural network to write everything correctly on the first try is a big mistake”

“Reproducing the flexibility and efficiency of human thinking in the context of programming has not yet been achieved”

"Garbage in, garbage out"

«Programmers say that complex AI systems cannot write yet - this is their defensive reaction»

«You need to take the time to figure it out»

Write comment

Relevant news on the topic "AI"

Implementation of structured data for AI assistants: FAQPage, HowTo, comparison tables

If you need to generate synthetic data — here's a selection of open-source solutions

When two heads are better than one: scientists experiment with collective work of neural networks

What AI agents are: how they work and why they are important

What are AI agents: how they work and why they are important

Let's keep it quieter with AI

Neural networks for video generation: a brief overview of Veo 3

Also read

In the CMDB, 500 computers are listed, but 300 are working in the office: Detective ITMen-Ventura is on the case of the missing IT assets

Works even in the parking lot: how to deploy your own communication node based on Matrix Synapse, Coturn, and Element

Time in Cryptography

Part 6: PCB Manufacturing – Experience with JLCPCB