A new human-centric image dataset, FHIBE, has been developed to support ethical benchmarking of AI models. Data collection began after Institutional Review Board approval on April 23, 2023, with all participants providing informed consent compliant with GDPR and similar privacy regulations. Subjects could withdraw consent at any time without affecting compensation.
Vendors collecting images ensured that only individuals of legal age provided signed consent and copyright agreements. Images featured one or two consenting subjects, with strict guidelines on image quality, format, and diversity to improve dataset robustness. Subjects were permitted to submit up to ten images, taken under varying conditions to enhance pose and scene variety.
Annotations were mostly self-reported, covering demographics and physical traits, while objective attributes, like facial landmarks and pose, were annotated by professionals. Extensive quality control involved vendor checks, manual reviews by trained QA specialists, and automated automated procedures to detect duplicates, inappropriate content, or unauthorized images.
To detect and remove suspicious or fraudulent data, the team employed Google Cloud Vision’s Web Detect, manual cross-verification, and exhaustive metadata analysis, resulting in the exclusion of 3,848 questionable images. This process helped mitigate risks though it influenced demographic distribution due to the removal of disproportionately affected groups.
Privacy was further protected via inpainting with fine-tuned diffusion models to obscure non-consensual identifiers, with no significant impact observed on dataset utility. Sensitive subject attributes are reported only in aggregate, and metadata like timestamps were generalized to protect identity.
The dataset was released with original and downsampled image versions, along with two face-focused subsets featuring cropped-only and cropped-aligned images for facial analysis. FHIBE was benchmarked against popular human-centric datasets such as MS-COCO, FACET, WiderFace, and Open Images MIAP, demonstrating competitive visual diversity.
Performance evaluations were conducted on eight computer vision tasks, including pose estimation, segmentation, face detection, verification, reconstruction, and super-resolution, using state-of-the-art pretrained models. Metrics such as keypoint accuracy, recall at various IoU thresholds, F1 scores, and perceptual similarity were utilized to assess efficacy.
Bias detection methods applied included pairwise statistical comparisons, regression modeling using decision trees and random forests for feature importance, and association rule mining to identify attributes linked to performance disparities. These analyses revealed patterns of bias in model outputs across demographic groups.
The dataset also enabled bias assessments of foundation vision-language models CLIP and BLIP-2. CLIP was evaluated in zero-shot classification settings with varied text prompts, while BLIP-2 was tested through visual question answering on social attributes. The results revealed inherent societal biases learned by these models, underscoring the dataset’s role in ethical AI investigation.
FHIBE promotes ethical AI development by providing a rigorously curated, diverse, and privacy-conscious dataset primed for fairness evaluation. Researchers are encouraged to consider potential vulnerabilities to fraudulent submissions and the complexity of demographic representation when using crowd-sourced data.
