Detect objects in images without training!
Welcome to the CLIP Zero-Shot Object Detection project! This repository demonstrates how to perform zero-shot object detection by integrating OpenAIβs CLIP (Contrastive Language-Image Pretraining) model with a Faster R-CNN for region proposal generation.
Source Code | Website |
---|---|
github.com/deepmancer/clip-object-detection | deepmancer.github.io/clip-object-detection |
Set up and run the pipeline in three simple steps:
Clone the Repository:
git clone https://github.com/deepmancer/clip-object-detection.git
cd clip-object-detection
Install Dependencies:
pip install -r requirements.txt
Run the Notebook:
jupyter notebook clip_object_detection.ipynb
CLIP (Contrastive LanguageβImage Pretraining) is trained on 400 million image-text pairs. It embeds images and text into a shared space where the cosine similarity between embeddings reflects their semantic relationship.
Our approach combines CLIP and Faster R-CNN for zero-shot object detection:
Regions proposed by Faster R-CNNβs RPN:
Objects detected by CLIP based on textual queries:
Ensure the following are installed:
This project is licensed under the MIT License. Feel free to use, modify, and distribute the code.
If this project inspires or assists your work, please consider giving it a β on GitHub! Your support motivates us to continue improving and expanding this repository.