ImageNet: A Large-Scale Hierarchical Image Database
Summary
The paper introduces ImageNet, a large-scale hierarchical image database built upon the WordNet structure, aiming to provide a comprehensive resource for developing advanced image search and understanding algorithms. ImageNet seeks to populate the majority of WordNet's 80,000 synsets with 500-1000 high-quality images each, resulting in tens of millions of annotated images organized semantically.
ImageNet's construction involves collecting candidate images from the Internet using search engines and human verification through Amazon Mechanical Turk to ensure accuracy. The current version includes 12 subtrees with 5247 synsets and 3.2 million images, demonstrating greater scale, diversity, and accuracy compared to existing datasets.
The paper highlights ImageNet's potential applications in object recognition, image classification, and automatic object clustering, showcasing its utility as a resource for visual recognition tasks. Experiments demonstrate that ImageNet's clean, high-resolution images improve recognition accuracy, and its hierarchical structure enhances classification performance.
Limitations include the absence of segmentation annotations and the challenge of achieving high precision for deeper synsets in the hierarchy. Future work aims to complete ImageNet with 50 million images, enhance its accessibility, and extend its capabilities with additional annotations and community involvement.
The authors envision ImageNet becoming a central resource for vision-related research, offering a training resource for rare object models, a benchmark dataset for algorithm evaluation, and a platform for exploring semantic relations in visual modeling. The database also holds potential for advancing human vision research by aligning cognitive and visual hierarchies.