- Hardware
- A
Testing BMC: let's talk about load testing
tekkix, hello! At Aquarius, we strive to ensure that testing is carried out without active human participation. Therefore, continuing our previous article on automated BMC testing: Testing BMC: Automate! You can't do everything by hand, I will talk about a universal solution that we are creating to obtain BMC performance indicators. Why this is necessary and how we are trying to apply the accumulated experience in other areas, for example, when testing the performance of a new direction for the company - storage systems.
tekkix, hello! At Aquarius, we strive to ensure that testing is carried out without active human participation. Therefore, continuing our previous article on automated BMC testing: BMC Testing: Automate! You can't do everything by hand, I will talk about a universal solution that we are creating to obtain BMC performance indicators. Why this is necessary and how we are trying to apply the accumulated experience in other areas, for example, when testing the performance of a new direction for the company - SDS (Storage System).
What is BMC and why is it needed?
BMC (Baseboard Management Controller) is a specialized microcontroller that performs tasks such as server inventory, monitoring, and power management. Thanks to BMC, you can remotely manage servers, monitor their status, and maintain stable operation. Without a reliable BMC, administrators may face difficulties in managing and monitoring the server fleet.
What are the features of microcontroller testing?
BMC is primarily a microcontroller with low performance. At the time of writing, the most popular chip used for BMC is the ASPEED AST2500 with an ARM11-based core running at 800MHz and 512MB of DDR4 SDRAM. Therefore, this imposes its own specifics when conducting performance testing.
The two main features encountered in performance testing are insufficient bandwidth and queue filling, and in most situations, one follows from the other. No matter how optimized the software for BMC is, it will not be possible to apply a load similar to server or desktop solutions. If you start with large RPS (Requests Per Second) values, the BMC can fill all its queues with requests and, as a result, it will not have enough resources (CPU/RAM) to service each of them. This can slow down the microcontroller or even turn it into a "zombie".
First, I will note two important points. First, BMC is not required to handle thousands of requests per second. Usually, it is polled by one or two management and monitoring systems for northern equipment. Such a system can be DCIm (Data center infrastructure management), which is compatible with Redfish requests for BMC. And the second important point to know is that with performance testing, we can give our clients recommendations on how often to poll servers. This is primarily necessary to optimize the working time of system administrators who want to always see the current state of systems, rather than dealing with the selection of server park polling times.
What methods and technologies to use in microcontroller load testing?
By 2024, the community has already accumulated extensive experience with Enterprise solutions. Therefore, we decided to do everything in the best traditions of this approach, refining it to the specifics of our task.
Our work with load testing covers several key concepts that we recommend to others:
• Containerization: The main advantage is the deployment of tests in containers for isolation and ease of scaling. For example, if you have 20 or more servers with different firmware versions that need to be tested, the best solution is to create a test queue (we did this based on RabbitMQ), which is distributed to several load generators packaged in Docker containers. With this solution, we offer developers to independently put testing in the queue, or it will be queued with the release of new firmware, and our system will automatically send the results of the completed benchmarks.
• Automation: Automating the process of selecting load profiles allows for faster identification of bottlenecks. Our recommendation is to implement a service once to search for load boundary values depending on SLA or other factors, which will save time in the future by not having to manually select test configurations. Also, maximum automation is needed in creating test reports. This allows QA to switch to writing new tests and other projects, rather than diving into dozens of pages of reports.
• Wide range of load generation tools: We have reviewed various tools (JMeter, Locust, K6) and built a benchmark concept for ourselves. In our project, a benchmark is a composite understanding that includes a performance test, specifying a load generator, and a data extractor for the corresponding generator. This allows tests to be implemented on any familiar solution, which greatly simplifies the team's work and minimizes the time it takes for new people to get involved.
These three methods allow us to create new benchmarks in the shortest possible time, issue test reports in a timely manner, and apply any development experience. We call this approach Performance Testing as a Service.
Does Enterprise development have a strong influence on BMC testing?
Let's still discuss what else from the Enterprise world we have managed to adopt and how it has affected the efficiency of our work:
Design and Planning: The system architecture should take into account the features of microservices and the possibility of scaling. My recommendation is to use C4 models, this will allow you to store simple files in the format of "architecture as code". And do not forget about all the possibilities that K8S offers with its huge zoo of various functions and solutions, this will simplify scaling, for example, it can be organized on demand in case of overflow of the queue of expected tests.
Generalize entities for reuse in other situations: Just as notification services have already become a common solution in projects, we decided to generalize tests into benchmarks. This allowed us to standardize tests, make the results more uniform and understandable, and quickly start testing not only BMC, but also storage systems.
Integration and interaction of microservices with each other and with external systems: We use REST and MQ to optimize the interaction of services. This allowed us to provide several interfaces for launching load testing. For example, now any specialist has a choice between Web UI, Jenkins and CLI, as it is more convenient for him. And under the hood of all three test launch methods, there is only one single REST request!
UI and CLI interfaces: An intuitive interface and command line allow testers to interact with the system faster. Simplify the work of employees by providing them with maximum opportunities, and then you will get an excellent solution. Specialists will understand the impact of patches even before they are published in the main branches of the project. Guided by these approaches, we create a communal service with a flexible and convenient interface for testing BMC and adapt them for other server products. Moreover, we create this with a small but very active team.
How did our approach help improve BMC?
Success Story #1: improving the logging system
We found that logging in BMC may not cope with high load, which led to the loss of some logs. With the help of benchmarks, we identified critical points in the logging subsystem and proposed improvements that increased the reliability of logging.
Success Story #2: adaptation of the service for other teams
Thanks to the implemented benchmarks and interface unification, we managed to create a communal service that can be used not only by testers but also by developers, analysts, and other teams for performance research. This allowed us to significantly expand testing capabilities and facilitate interaction between departments, as well as help optimize the time spent on obtaining results of checking new firmware and product versions.
Conclusion
Load testing of BMC turned out to be a truly interesting task, not only related to performance testing but also to a deep understanding of the operation of microcontrollers. Our experience shows that a combination of classical methods and modern Enterprise practices gives excellent results and allows for a better understanding of all the nuances of BMC operation. We were able to identify bottlenecks, improve performance, and create a universal product suitable for use by various teams.
How is your load testing of BMC or other microcontrollers going? Have you encountered similar problems? Let's discuss! Share your thoughts, stories, and approaches in the comments — it's always interesting to know how others deal with similar tasks.
By the way, if you are interested in this topic, I invite you to the Highload++ conference on December 2-3, 2024, where you can discuss performance issues in person at the Aquarius PC booth! I will be happy to exchange experiences and talk about new opportunities and tools. At the booth, you can also chat with colleagues from other areas. Together we will play "AQ Memo", assemble a server, and every day we will raffle off a super prize — the book "A Hundred Years of Incompleteness. Quantum Mechanics for Everyone in 25 Essays" signed by the author!
See you!
Write comment