- AI
- A
8 best tips for outsourcing data annotation
Any CV project starts with annotating large volumes of images and videos. Only successful results and high-quality data ensure that the model can be trained correctly.
Any CV project starts with the annotation of large volumes of images and videos. Only successful results and high-quality data guarantee that the model will be able to learn correctly.
But what if the internal team cannot cope with the volumes, and it is difficult to find qualified specialists? The answer is simple: delegate the task to professionals.
Outsourcing data annotation helps speed up the process and take the project to a whole new level. However, finding a reliable partner who will become your strategic ally is not an easy task.
How to choose a company and build processes so that cooperation is productive? In this material, our experts will share Data Light's many years of experience in organizing annotation:
Aleksey Kornilov, Special Projects Group Manager
Dmitry Rogalsky, Moderation Group Manager
Why is annotation outsourcing required?
Saving time and resources
When a startup or large company starts a CV project, it seems that 1000-2000 images is quite a lot. But the reality is that the more complex the task, the more data is needed to train the model.
For example, a smart robot vacuum cleaner manufacturer wants the device to distinguish trash cans from sidewalks or cars. To do this, you need to label hundreds of thousands of objects in thousands of images. If you take 5 employees within the company, the process will take months — and who will work on the product itself?
Outsourcing will ensure not only speed (the company will immediately assemble a team of the required size), but also quality, because its employees are specialists with experience in such tasks. Many of these companies also carry out thorough validation, i.e., data verification.
We would like to share our experience, talk about the main secrets of annotation, and give tips that will help you build the outsourcing process and find the perfect partner for you.
Complex metrics for complex tasks
Data annotation for CV is not always just highlighting a car with a rectangle. For example, medical diagnostics tasks require complex, precise annotation.
Imagine: a company developing a tool for analyzing X-ray images needs to train a model to identify changes in lung tissue to detect early stages of cancer. Errors are unacceptable here. Specialists with medical education are needed, who will work according to strict instructions.
What does outsourcing have to do with it? Because such companies can attract experts in narrow fields and train them in annotation within a few weeks, providing quality and control guarantees.
Seasonal or one-time projects
Sometimes a company needs a large amount of data, but only for a short period. For example, to test a hypothesis or launch a new product.
Imagine that a manufacturer of automotive cameras wants to test an algorithm that determines the condition of the road surface in winter. To do this, thousands of videos with frames of snowy roads, puddles, potholes, and ice need to be annotated.
After the experiment is completed, annotation may no longer be needed. Hiring employees for one project is impractical. Outsourcing allows you to solve the problem quickly, preserving the company's resources.
Quality control and standardization
Different projects require compliance with strict annotation standards. This is especially important for CV, where incorrect annotation can "teach" the model to make mistakes.
For example, a company developing a quality control system for production lines needs to train AI to find the smallest defects on metal parts. Defect annotation requires high accuracy and strict adherence to instructions.
Outsourcing companies working in this field have a multi-level quality control system and can ensure results regardless of the amount of data.
Resource saving
We all understand: having your own team is often expensive and time-consuming. You need not only to hire people but also to train them, organize the necessary infrastructure, and constantly monitor the process.
Imagine: a retail company wants to train a model to track customer behavior in stores: how they approach shelves, which products they take, where they linger. To do this, it is necessary to label videos with thousands of people, highlighting heads, hands, eye movements.
Outsourcing will ensure the completion of this task within a fixed budget, without hidden costs for hiring, training. And most importantly, with predictable quality.
8 steps to successful outsourcing of labeling
Each company may have its own nuances in organizing data annotation. We want to share our experience and recommendations based on many years of practice.
Step 1: Determine your needsDifferent projects require different types of labeling. Determine what data you need, what goals you want to achieve, what budget you have, and how you will evaluate the results. The more accurately you describe your requirements, the easier it will be to get the result you need.
At the initial stage, it is also important to understand what budget you are ready to allocate, this helps to properly distribute tasks. Sometimes the client informs the approximate amount in advance, and then a solution can be proposed within this budget.
But there are cases when the budget is unknown. Then we, for example, do a pilot project to determine the cost and metrics. For example, if we test image labeling, we annotate a few of them, calculate the price for one and offer it to the customer. He can accept the cost or adjust the volume of the project.
Sometimes it is also possible to assess quality requirements depending on the budget. For example, one labeling will be more accurate and expensive, another - less accurate but cheaper. Accuracy depends on the number of points on the image, as, for example, when labeling a tree - you can circle it with one circle or draw a detailed contour with each leaf.
Also, to reduce costs, pre-labeling can sometimes be used. This speeds up the work and allows offering the client more favorable prices, but it can be used far from all projects.
Step 2: Choose the right provider This is probably the most important step for successful annotation. Make sure the company has the necessary experience and access to modern tools.
Ask yourself the following questions:
Do they have the necessary expertise? Experience in data annotation outsourcing can vary significantly. This is especially important for highly specialized projects. Find out if they have relevant experience and prefer companies with a proven reputation and expertise.
For example, if the project is related to medicine, it is better to choose a company with experience in this field. If there is no experience, you can find other specialists or subcontractors. In any case, experience in related fields will be a plus, although sometimes it is not critical.
Do they have the technical capabilities? Data annotation may require the use of various tools depending on the AI model you are working with.
Check what technologies the company has and whether they can offer suitable software solutions for task execution and rapid scaling.
The availability of suitable tools also matters. For example, if one contractor has a tool for high-precision work and the other does not, this can affect the price and quality.
But if the customer asks to use their own tool, it is important that the team can adapt and work with any systems.
Can they be trusted with confidential data?Your partner should not only help scale the project but also protect your data. A leak can cause serious harm to the company, so choose a provider who can reliably ensure the security of your information.
All our performers sign an NDA to ensure data security. This is especially important when working with sensitive data such as medical images or corporate images. In one of the major projects for an insurance company, our customer even insisted that we work in their system.
4. How do they control quality on the project?
This is also a very important point, although all companies approach it differently.
For example, all our data goes through another mandatory stage: validation. Our approach includes selecting a certain number of representative images for verification, although sometimes the project requires checking all of them.
During validation, we actively ask clarifying questions to the teams, passing information about detected anomalies to the team leaders, as well as statistics on the most productive and weakest performers. We pay special attention to the quality of the validators' work — we have involved the training department for their training and professional development.
It is essential to clarify such points before starting work because not all companies do this. Some of our clients have even noted that they chose us precisely because of the well-established quality control system.
And if you are interested in reading more about our QMS and adopting some approaches, you can check out this article.
Step 3: Start small: organize a pilot project
Outsourcing data annotation should always start with a small proof of concept (POC) or pilot project to test the capabilities, skills, tools, and team of the new provider.
The pilot is a key stage for assessing the quality and cost of annotation. We always strive to make the pilot as close to the combat project as possible to give the customer a real picture and avoid misunderstandings. It is important that the pilot data is similar to what will be used during the work.
Step 4: Carefully monitor progress
Annotation projects can have tight deadlines and often process large volumes of data daily. Monitoring progress is a very important way to ensure timely delivery of annotated data sets with the required level of accuracy and the highest possible quality.
Otherwise, you risk that the data will be delivered months after they were supposed to be received by the computer vision model. After receiving the initial batch of training data, it is easier to assess the accuracy of the supplier's work.
Step 5: Monitor accuracy, do benchmarking
When transferring the first set of images or videos to the computer vision or ML/AI model, the accuracy may be 70%. The model is trained on the data sets it receives. Improving accuracy is critically important. To improve project results, computer vision models need larger data sets with higher accuracy, and it all starts with improving the quality of training data.
To do this, you can monitor and benchmark accuracy on open-source data sets and on image data that your company has already used in machine learning models. Benchmarking data sets and algorithms is equally useful and effective, for example, such as COCO and many others.
Step 6: Minimize errors and inaccuracies
Errors and inaccuracies are lost time and money. Data annotation service providers should have a responsive workflow that allows for quick adjustments and re-annotation of data if necessary.
Step 7: Control costs
Costs should be carefully controlled, especially if re-annotation is required. The project manager must ensure that costs are in line with the project's expected expenses within an acceptable margin of error.
Any annotation project budget should have funds for unforeseen expenses. However, this aspect should not get out of control, especially if any time and cost overruns are the fault of an external annotation service provider. Discuss these issues before signing the contract, review key performance indicators (KPIs) and service level agreements (SLAs).
Check the supplier's performance against deadlines, QA parameters, KPIs, and SLAs to avoid cost overruns on the project.
Step 8: Use tracking tools
Obviously, tracking metrics is one of the most important parts of the annotation process. With the right metric tracking tools and dashboard, you can create annotation workflow tools that ensure high-quality annotation results.
A well-defined annotation structure reduces the level of ambiguity and doubt among annotators. You will increase the guarantee of high-quality results if annotation teams use appropriate tools to automate the annotation of image and video data.
Ultimately, it is important to understand that successful data annotation outsourcing is not just a matter of saving time and money. It is an opportunity to attract experienced experts who can ensure high quality and accuracy at every stage of the work.
How do you usually address data annotation issues in your projects? Have you encountered problems in the outsourcing process or have you already found the optimal solution? Share your experience in the comments!
Write comment