- AI
- A
Processing of catalog and products on LLM
There is a classic problem on any marketplace related to how to categorize and understand product descriptions. It is especially exacerbated by the fact that users create convoluted descriptions even for the simplest products. For example, a regular blue t-shirt may be described as sky blue or even dark blue aquamarine.
There is a classic problem on any marketplace related to how to sort and analyze the catalog. It is especially exacerbated by the fact that users or employees create convoluted descriptions even for the simplest products. For example, a regular blue t-shirt can be described as "sky blue," "royal blue," or even "dark blue-aquamarine." Some sellers and suppliers manage to shove product information directly into the image itself, drawing over a poorly lit photo in bright green letters: "The best t-shirt in the world!" As a result, two identical products can appear as if they are from different universes.
Consequently, searching, categorizing, and analyzing these products becomes a real headache (and guarantees job security for data specialists). Moreover, under such conditions, it is challenging to match products with trending trends, as a trend is also, in essence, a categorization task.
Of course, over the years, various ways have emerged to deal with this:
1. Category-specific matcher. This is a separate "matching" model or algorithm for each product category — for electronics, for clothing, for cosmetics, and so on, down to subcategories. This approach is very specialized and works, but it can turn into quite a hassle if you have a hundred thousand categories. There are many options on how to implement this, usually depending on what attribute it is, for example, Named Entity Recognition for brand names, decision trees, and even a huge if-then-else script.
2. Candidate searching using embeddings. Embeddings are vector representations of data (for example, descriptions or product names) used to determine their similarity. By using text or image processing methods (like word2vec or sentence-transformers), similar products can be found based on the proximity of the obtained embeddings. Embeddings can also be generated using LLM.
3. Attribute extraction for each product. Product information (such as brand, model, color, size, etc.) is extracted from descriptions, for example, through regex, to analyze and match products at a deeper level.
4. Gradient Boosting. Gradient boosting algorithms (CatBoost, etc.) are applied to classification tasks, determining whether products are similar. These models are trained on pre-labeled data and take into account both textual and numerical attributes. They require labeled data.
All of this really helps up to a certain point, but people are remarkably inventive in describing things. Feature hell is a reality where there are 400 ways to say "comfortable," and visually identical clothing can be named in completely different ways ("Eco T-shirt made of hemp" vs. "100% eco-friendly plant-based top").
In the field of clothing and similar products that may be identical but described completely differently, matching can only be done based on photographs, considering color, shape, and fabric structure, i.e., the main information will still not be in the product description.
The new approach is to use the multimodal capabilities of LLMs (language processing AI models) and Vision-Language Models (VLMs) to tackle this problem. This approach is already being widely used by leading retailers and platforms:
"We've used multiple large language models to accurately create or improve over 850 million pieces of data in a catalog.
Without the use of generative AI, this work would have required nearly 100 times the current headcount to complete in the same amount of time."
Doug McMillon, CEO of Walmart
The overall workflow (examples below):
Extraction of attributes for each product using LLM or VLM
LLMs are used to extract specific attributes (e.g., brand, color, size, material) from product descriptions and images.
For example, “Stylish bright red cotton t-shirt for men” is broken down into: color=red, material=cotton, targeted demographic=men, etc.Some attributes may be synthetic, i.e., not directly present in the product description, such as the style of the clothing item or its parts (e.g., the style of the neckline on a sweater). VLMs can determine style with good accuracy.
VLMs can also determine, albeit approximately, the condition of the product if it has been used, based on photos.
Grouping attributes into categories with unique values
There may be too many attributes; moreover, trends usually consist of a specific set of attributes, so grouping them may make sense.
Matching products based on extracted attributes using LLM
Once the attributes are extracted, LLMs use them to compare two products and determine if they match.
Models can consider both explicit similarities (e.g., the same brand and size) and implicit ones (e.g., “eco-friendly” and “sustainable”).
For example, matching “Men's Nike Air Max sneakers” with “Nike Air Max shoes for men” by recognizing that they are the same product.Matching based on images using VLM (Vision-Language Models)
Some attributes, such as color, design, or unique patterns, are better analyzed visually rather than textually. VLMs combine visual and textual data for more accurate product matching. These models analyze product images along with their descriptions to better understand the product. For example: matching an image of a black leather bag with another similar bag by identifying visual characteristics (shape, texture, etc.) and combining them with textual data.
Advantages of LLM
More accurate matching. Fewer false mismatches (for example, when you received a "turquoise" t-shirt instead of a "dark green" one).
Accuracy in the range of 90-99%, especially with fine-tuned models (both precision and recall).
Handling diverse data. Text, images, random emojis — anything.
Improved understanding of product descriptions and visual characteristics.
No specialized training required: no need to create and maintain 50 different specialized matching algorithms.
Disadvantages
Requires much more computational power and is more expensive.
Latency is not great. Works well for batch processing, but not for real-time.
Overall, for some categories, it makes no sense to use LLM if it can be done without it. If the category is simple and standard algorithms work well, it will be much cheaper. LLM is suitable for processing medium and complex categories. That is, it is better to use LLM in combination with standard methods to optimize cost, speed, and quality.
Attribute extraction using LLM
Example prompt:
I have a product card from the category "Refrigerators".
I need to extract and format attributes from it.
extracted_attributes = {
"model": "Refrigerator model. Include the brand or company name, but exclude color and size",
"capacity": "Total volume of the refrigerator, usually measured in liters (L). Look for terms like 'Total capacity'. If not available, set the value to null",
"energy_efficiency": "Extract the energy efficiency class, e.g., 'A++', 'A+' or 'B'. Look for terms like 'Energy efficiency class'. If not available, set the value to null",
...
"dimensions": "Include height, width, and depth (e.g., '180x70x60'). Look for terms like 'Dimensions'. If not available, set the value to null"
}
Result:
extracted_attributes = {
"model": "Upstreman UF70",
"capacity": "7.0 cubic feet",
"energy_efficiency": "ENERGY STAR certified",
"number_of_doors": "1",
"freezer_position": "Convertible",
"defrost_system": "Low frost design",
"dimensions": "56.1x21.3x22.76 inches",
}
Matching using LLM
Example color comparison
Prompt:
There are two product listings from the "Freezers" category on the marketplace. Compare the attribute "color". The answer should include:
is_match: A boolean value indicating whether the values match.
reason: A detailed explanation of the comparison result. The response format is JSON.
Product Listing 1: | Product Listing 2: |
Model: Upstreman UF70 Color: White | Model: Midea WHS-109FSS1 Color: Stainless Steel |
Result:
{
"is_match": false,
"reason": "The color attributes do not match. Product 1 is 'White' while Product 2 is 'Stainless Steel'."
}
Matching multiple attributes
Prompt:
There are two product listings from the "Freezers" category on the marketplace. Compare their attributes to determine if the two products are similar enough to be considered the same model. The attributes for comparison are:
Model name: Remove spaces for comparison.
Capacity: Matches if the difference is less than or equal to 0.2 cubic feet.
Energy efficiency: Matches if both are ENERGY STAR certified or have identical efficiency ratings.
Dimensions: Match if the height, width, and depth differences are no more than 1 inch.
Color: Matches only if the color is identical.
The answer should include:
is_match: A boolean value indicating whether the two products are considered the same model.
reason: A detailed explanation of the comparison result. The response format is JSON.
Product Card 1: | Product Card 2: |
Model: Upstreman UF70 Capacity: 7.0 cubic feet Energy Efficiency: ENERGY STAR Certified Dimensions: 56.1x21.3x22.8 inches (H x W x D) Color: Pearl White | Model: Upstreman UF70 Capacity: 6.9 cubic feet Energy Efficiency: ENERGY STAR Certified Dimensions: 56x21.2x22.9 inches (H x W x D) Color: White |
Output:
{
"is_match": true,
"reason": "Two freezers are considered the same model based on the following comparisons:
1. The model name matches after normalization ('UpstremanUF70').
2. The capacities are similar (7.0 vs 6.9 cubic feet, a difference of 0.1 cubic feet, which is within the acceptable range of 0.2 cubic feet).
3. Both have ENERGY STAR certification, indicating identical energy efficiency.
4. The dimensions are almost identical (difference: height 0.1 inches, width 0.1 inches, depth 0.1 inches, all within an acceptable margin of error of 1 inch).
5. Both products have similar colors: 'White' and 'Pearl White'."
}
Working with Photos and Images
Extracting Attributes
Suppose we have an image like this. We will use the same request as for text extraction:
There is a product card from the category "Refrigerators" on a marketplace.
We need to extract and format attributes from it.
key_attributes_list = [
# ...
{
"name": "dimensions",
"attribute_comment": "Include height, width, and depth in centimeters (e.g., '180x70x60'). Look for terms like 'Dimensions'. If not available, set the value to null."
}
]
Result:
{
"dimensions": "56.10x21.30x22.76 inches"
}
Another one:
Output:
{
"category": "Headphones",
"brand": "KVIDIO",
"color": "Black",
"features": [
"Full-sized cup design",
"Wireless",
"Bluetooth connectivity",
"Soft ear cushions"
]
}
Matching Clothing Items
Prompt:
As a fashion expert, you should compare two photos of women's tops. The analysis should focus solely on the tops themselves, ignoring:
Any other visible clothing items, even if they are part of the outfit or match the style.
Differences in poses, body movements, or how the tops are worn.
Goal: to determine whether the tops are identical or completely different. Provide a clear answer of "Identical" or "Different," accompanied by a brief phrase explaining. The answer should be in JSON format.
Result:
The LLM was able to detect a slight difference in the neckline, which, being a man, I still am not sure I see.
{
"result": "Different",
"reason": "The tops have different necklines: the first has a round neckline, while the second has a boat neckline."
}
Here’s another example:
{
"result": "Identical",
"reasoning": "Both tops have the same color, design, and fabric characteristics, including long sleeves, a fitted cut, and a light aqua shade."
}
Perfecto!
Models and Throughput
Throughput is extremely important for the marketplace, as thousands of products are processed daily, with new ones added every day. Throughput depends on two factors:
Your hardware
The size and type of the model
If you are using a 70b model, such as LLama or Qwen, it will work well but slowly. Without a supercomputer,
Using a smaller model trained on your dataset. The size of the model depends on the task, and you will likely need to train several smaller models, as each will only be able to handle a limited number of categories. Overall, you will have to experiment. General observations are as follows:
Models with 7-12b parameters are suitable for extracting attributes from text.
Models smaller than 7b may be suitable for a limited set of finely-tuned attributes, and such a model can possibly be obtained through distillation of a larger model.
This can increase throughput by 10–20 times. However, it should be noted that smaller models struggle to extract many attributes at once and may have difficulty handling complex queries, so you will need to test them on your task.
Quantization. Quantization can increase the number of requests per second by 20–50% without significantly reducing the model's performance quality.
Scenarios with large volumes of data. In such cases, it makes no sense to use anything other than a self-hosted model, as the costs of using OpenAI or Anthropic would be too high. However, such models are suitable for prototyping and testing ideas or for handling very complex cases. They parse truly complex descriptions at a level comparable to humans.
For self-hosting, I would recommend using the latest versions of LLama or Qwen models. Start with 70b for testing, and then optimize down to smaller models until the performance meets your satisfaction.
You will likely need to further fine-tune the model for specific categories. For example, in the medical field, abbreviations are common, as they are in the construction industry. A universal model may struggle with such cases, so in these situations, it would be beneficial to use the LORA (Low-Rank Adaptation) method.
In addition, consider the language. For example, LLama works well with English, but for Chinese, it is likely to hallucinate. In this case, the Gwen model is probably the better choice. In the Russian language, endings, as well as names and their declensions, often cause failures, so it is worth paying attention to products where this is one of the attributes: movies, music, books.
Pricing
Extracting the price from the LLM description is possible, but can LLMs help with price analysis and pricing? I think not yet, and the problem lies in working with numbers. The latest models from OpenAI can calculate fairly well, but it will be too expensive when trying to analyze large volumes of data.
For narrow niches with specific pricing, it is possible to build such a system on agents. Most platforms and large retailers are what is called "market-driven," or simply put, they orient themselves to competitors' prices. Therefore, either custom algorithms or pricing management systems, similar to those we helped create for the Keeprise team, are used there.
Leave your questions in the comments.
Wishing everyone well!
Write comment