Zero-Shot-Learning: Neue Klassen erkennen ohne Trainingsdaten
Zero-Shot Learning: Recognizing new classes without training data
Zero-Shot Learning (ZSL) enables AI models to make predictions for categories they have never encountered during training. This fundamentally distinguishes it from classical ML methods, which require extensive labeled datasets for each target category. ZSL becomes particularly relevant where requirements change rapidly or a complete list of classes is difficult to define in advance.
What is Zero-Shot Learning?
Zero-Shot Learning is a machine learning method where a model recognizes "unseen" classes or concepts. Instead of requiring specific training data for each new category, the model leverages existing general knowledge and transfers it to new tasks.
The core of this approach lies in the use of Embeddings: Concepts are represented as numerical vectors in semantic space. This makes relationships between terms computationally accessible to the model. A model that knows "apple" and "pear" as fruits can, on this basis, also classify "orange" as a fruit – without having been explicitly trained on this category.
How does Zero-Shot Learning work?
ZSL builds upon pre-trained models and transfer learning. A model is first pre-trained with large datasets and then uses this prior knowledge to handle new tasks without specific fine-tuning. Similarities between concepts in semantic space are utilized to establish connections to new, unknown classes.
In a multimodal context, this works as follows: A recognition model is coupled with a text encoder. The image is represented by visual features, and the target classes by semantic vectors derived from text descriptions. If image features and semantic description match, the model outputs a zero-shot prediction. In the Ultralytics ecosystem, YOLOWorld implements precisely this principle: users define classes at runtime via text command, without needing to retrain the model.
Advantages of Zero-Shot Learning
- No additional training effort: New data does not need to be collected, nor do new models need to be trained.
- Faster deployment readiness: ZSL models can be deployed immediately when requirements change.
- Flexibility with Open-Ended Class Lists: Particularly suitable when target categories change dynamically or cannot be fully defined in advance.
Practical Examples and Use Cases
Natural Language Processing (NLP): ZSL is used for automatic translations into new languages or for answering questions on topics that the model has not been specifically trained on.
Image Recognition: A model that can distinguish between dogs and cats can also be applied to lions or tigers through semantic similarities – without training images of these animals.
Medicine: For rare diseases, specialized training data is often lacking. ZSL uses descriptions of more common diseases and expert knowledge of rare symptoms to identify relevant patterns.
Conservation and Agriculture: Endangered species can be identified based on attribute-based descriptions, without requiring separate image datasets for each species.
Distinction from Related Concepts
Few-Shot Learning (FSL) typically uses 1–5 training examples of the target class. Zero-Shot Learning, however, operates entirely without examples of the target class. One-Shot Learning is a subset of FSL, where exactly one example is used. Transfer Learning is the overarching term; ZSL is considered a specialized form of it, where the transfer to unseen classes occurs via semantic attributes and embeddings.
Conclusion
Zero-Shot Learning reduces the reliance on extensive, labeled datasets and enables faster adaptation to new requirements. However, accuracy varies depending on the problem – especially compared to models that have been specifically trained on the respective target classes. For scenarios with dynamic or difficult-to-predict class lists, ZSL remains a practically relevant approach.