One of our student from the Applied MSc in Data Science and AI programme, who asked to remain anonymous for privacy and professional reasons, shares her 6-month internship project. She currently works in a public research lab (I3S, INRIA, Université Côte D’Azur).
My internship project is at the crossroads of two data science technologies Machine Learning and Semantic Web and hopefully to the benefit of the art community. I am working with the dataset of hundreds of thousands illustrated artwork records from the French museums provided by the French Ministry of Culture. The dataset includes the structured metadata and the collection of digital images linked with the metadata.
The objective is to combine the semantic indexing of image annotations and the Deep Learning image classification (1) to improve the accuracy of the annotations and (2) to enhance and complete these annotations.
Technologies used: Linked data and Semantic Web stack (RDF, RDFS, OWL, SKOS, SPARQL), PyTorch for Deep Learning (VGGNet CNN), Pandas, Matplotlib and Rdflib for data analysis and visualization.
As in the classical case, the data wrangling and analysis takes about 70-80% of the time while coding takes about 20-30% of the time.