How to choose the best model for coding: using SLM and local LLM

Hello, this is Yulia Rogozina, business process analyst at Sherpa Robotics. Today I have translated for you an article dedicated to the use of SLM and local LLM. Small language models and local LLMs are becoming increasingly popular among developers. The article reviews the best of them and provides tips on how to evaluate them.

The impact of GitHub Copilot and other popular solutions on the programming process cannot be ignored, however, with the growth of this trend, many questions arise.

Firstly, not all developers are comfortable with the idea of sharing their code with third parties, especially when it comes to private data. There is also a financial aspect: the cost of using the API can quickly increase, especially if you have to work with the most powerful models.

This is where local language models and their smaller counterparts, small language models, come to the rescue. The developer community is increasingly noting their advantages, and let's figure out what this hype is about. In addition to the concept itself, we will discuss the best models, their advantages, and the impact on the development of programming with AI in general.

What are locally hosted LLMs?

Locally hosted LLMs (Large Language Models) are advanced machine learning models that operate exclusively in your local environment. These models typically have billions of parameters and are capable of generating code, understanding context, and assisting in debugging. Hosting LLMs on a local server allows developers to avoid latency, privacy issues, and subscription costs associated with cloud solutions.

Running LLMs locally allows for fine-tuning the model to specific needs, which is especially important for specialized workflows.

In addition, the ability to fine-tune the model on private code bases helps to obtain more accurate and contextualized suggestions, which significantly simplifies complex workflows. Support for sensitive data on local servers also reduces the risk of information leaks, making this option attractive for corporate developers who need to comply with strict data protection requirements.

However, running large models requires significant computational resources — usually these are multi-tasking processors or graphics processors with large amounts of memory. Therefore, such solutions are better suited for those who have powerful equipment or have specific performance needs. In return, you get a powerful and flexible tool capable of providing deep understanding and support in complex coding scenarios.

What is SLM?

SLM, or Small Language Models, are lightweight versions of their larger counterparts, such as LLM. Their main advantage is a smaller number of parameters, which makes them faster and more efficient without compromising basic functionality, such as code autocompletion and basic context processing. Of course, they can't do everything, but what they can do, they do really well.

The smaller architecture of SLM also makes them extremely efficient for tasks where low latency and compact memory usage are important. These models are ideal for scenarios such as rapid prototyping, embedded systems development, or working on devices with limited computing resources.

The main limitation of SLM is their limited ability to handle complex and broad contexts compared to LLM, which may affect their performance when working with large and complex projects or extensive code bases.

Nevertheless, SLM attract the attention of specialists, as it is assumed that in just a few months smartphones will be able to effectively run such models. I have personally seen experiments where SLM were used to process bank statements using computer vision and transfer data to FreshBooks — in the future, there will be more such examples.

If giants like Google, Microsoft, and Anthropic focus on large models offered as a service, then Apple has become a leader in the field of open SLM. Their OpenELM family is designed to work on mobile devices, and the first reviews indicate that they are capable of effectively performing coding tasks.

How to choose the best model for coding?

Choosing the optimal local LLM or SLM model for your development needs is always a combination of community information, empirical benchmarks, and personal tests. Start by studying the ratings created by communities that evaluate models based on various metrics such as speed, accuracy, and parameter efficiency.

These ratings give a good idea of which models are leading in their field and how active their communities are in improving and optimizing. However, it is worth remembering that this is only a general picture.

Next, it is recommended to check how the model performs on more standardized benchmarks, such as:

  • HumanEval A benchmark consisting of 164 programming tasks that evaluates the functional correctness of code generated by LLMs. Models are tested on their ability to generate correct and executable code.

  • MBPP (MultiPL-E) An extension of HumanEval that includes tasks in various programming languages. It evaluates models on their ability to generate correct code in multiple programming languages.

  • BigCodeBench A comprehensive benchmark that evaluates models on the task of understanding and generating code in 43 programming languages. Performance is measured by three parameters: length, complexity, and efficiency.

  • LiveCodeBench A dynamic benchmark that constantly collects new tasks from platforms like LeetCode, AtCoder, and CodeForces. It evaluates models on their ability to generate code, fix errors, execute code, and predict test results.

  • EvoEval A set of benchmarks that evolves existing tasks to create new challenges. This helps identify potential overfitting of models and assess their ability to adapt to new tasks.

While benchmarks are important, they are not universal solutions. Public benchmarks provide a general idea of model performance on standardized tasks, but the true test always lies in how the model performs in your specific development environment.

Running your own benchmarks that reflect typical tasks in your work will help you understand how well the model meets your real requirements—whether it is generating template code, debugging legacy applications, or providing contextual recommendations. Determine what you need from the model for coding, try different models, and repeat the task regularly, including new models as they are released.

The best local LLMs for programming

The word "best" always carries a subjective load, and it is important to remember that any list is essentially the author's personal opinion. Each benchmark, each test, and each application are different from each other, and a model that is perfect for one user may not be the best for another. Nevertheless, let's look at some of the most interesting local machine learning models designed for programming tasks.

DeepSeek V2.5

DeepSeek V2.5 is an open model that combines the capabilities of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, improving both general dialogue ability and programming skills. This model supports a context length of up to 128 thousand tokens, allowing it to work effectively with large projects and complex data.

In tests, DeepSeek V2.5 demonstrated significant improvements in tasks such as code writing and following instructions, surpassing its predecessors in benchmarks such as AlpacaEval 2.0 and ArenaHard. The model is available through web platforms and APIs, offering a convenient and efficient user experience.

Qwen2.5-Coder-32B-Instruct

Qwen2.5-Coder-32B-Instruct is an advanced open model from the Qwen team, developed at Alibaba Cloud. It is a powerful alternative to GPT-4o, with excellent programming skills as well as good mathematical abilities.

The model supports a context length of up to 128K tokens and works with 92 programming languages. It shows outstanding results in a number of benchmarks, such as EvalPlus, LiveCodeBench, and BigCodeBench, and also performs code correction tasks at the GPT-4o level.

The feature of this model is that it is presented in various quantization options with the number of parameters ranging from 0.5 to 32 billion, which makes it accessible even for devices with less powerful characteristics, allowing it to be used for programming tasks.

Nxcode-CQ-7B-orpo

Nxcode-CQ-7B-orpo is a local model optimized for programming tasks. It demonstrates balanced performance in simple tasks, providing a lightweight solution for developers who need to efficiently generate and interpret code.

Interestingly, this model is not independent, but an improvement of the Qwen/CodeQwen1.5-7B model on data related to programming, as emphasized by its authors.

Unlike more complex models like Qwen2.5 or LLaMa 3, Nxcode-CQ-7B-orpo performs better with basic tasks, making it an excellent tool for learning programming and working with basic aspects of web development in JavaScript. However, it may disappoint when working with more complex projects, such as animations in Three.js.

OpenCodeInterpreter-DS-33B

OpenCodeInterpreter-DS-33B is a high-parameter model focused on interpreting complex code and dynamic problem solving, developed by Chinese scientists. It excels at analyzing complex code structures and generating advanced solutions.

Unlike Qwen, this model is based on another one — Deepseek-coder-33b-base. Already at the time of its release, it attracted the attention of the community, demonstrating excellent results in the HumanEval and MBPP tests.

With a huge number of parameters, the model effectively handles more complex programming tasks, making it a valuable tool for developers working with deeper code analysis and generation.

Artigenz-Coder-DS-6.7B

Artigenz-Coder-DS-6.7B, developed by an Indian team, is designed for rapid code prototyping. This model is optimized for high-speed code generation but does not have the power of larger models.

It is ideal for projects that require quick creation of working prototypes but is not suitable for tasks related to handling complex programming scenarios. Despite this, it is an excellent solution for developers who need to generate code quickly without deep analysis.

Its 13 GB of memory makes Artigenz-Coder-DS-6.7

Disadvantages of local LLMs for programming

The main problem with local models is the limitation of hardware resources. For example, the top Nvidia H100 graphics card costs up to $40,000, and with such a level of computing power, it is difficult to compete with large companies and their billion-dollar investments in AI. Although theoretically, it is possible to rent time on a GPU for training or fine-tuning models, this remains an extremely expensive solution.

In addition, data security is also an important issue. Even if the model works locally, this does not guarantee complete information protection. In the future, we will likely see increasing user caution when connecting to Wi-Fi networks, as data can be stolen by attackers.

Finally, it should be recognized that solutions like Claude 3.5 Sonnet and o1-preview are far ahead of all open local counterparts. Billions of gigabytes of VRAM and tens of billions of dollars in research and development are unbeatable. Nevertheless, the goal of such models is not to compete with giants, but to offer a free, open, and customizable solution for developers.

Conclusion

Many believe that local LLMs and SLMs are the future of programming assistants. Despite the fact that solutions like Copilot, ChatGPT, and Claude have colossal financial resources, local models provide freedom and independence from third-party restrictions, censorship, and cloud service issues.

Local models guarantee complete confidentiality, do not require code exchange with external servers, and allow you to work without dependence on cloud services and API budget constraints.

However, despite the prospects, local LLMs are still inferior in terms of performance and ease of use to more mature solutions like Copilot. But with the development of open technologies, we are already approaching levels that allow us to confidently compete with large cloud services. Exciting times are ahead.

Comments