Talk Semanticpedia : le DBpedia francophone

Presented by Alexandre Monnin in Web Sémantique 2012 on 2012/05/02 from 17:00 to 17:30 in room Bruxelles
Abstract

The project semanticpedia.org aims at extracting data from French Wikipedia with the help of DBpedia.org extraction framework. It is supported by the INRIA, the French Ministry of Culture and Wikimedia France.

Following DBpedia approach for English pages, data is extracted from several elements of Wikipedia pages (title, links, infoboxes, ...). The extracted data is recorded in the W3C standard RDF for resource description. It is composed of triples of the form "subject predicate object". This enables to express relations between subjects of Wikipedia pages, for instance that "France hasCapital Paris", or to express values for its attributes, for instance "France hasPopulation 60 millions". This data can be queried with the language SPARQL. For instance, to get the list of the cities in France that have more than 100000 inhabitants.

Semanticpedia has some differences compared to DBpedia.org:

  • Data is extracted directly from French speaking pages. As DBpedia.org runs the extraction from English Wikipedia pages, it misses any page in French that is not linked by an interwikilink. About 15-20% of pages in French are not properly related to English pages, for instance "Yvette Horner", "Les Frères Jacques". Semanticpedia will extract data from these pages whereas DBpedia won't.

  • Extractors are adapted to the habits in French Wikipedia. This allows a better extraction quality.

  • Collaboration with Wikimedia community with several benefits

    • a better understanding of the processes in Wikipedia
    • feedback to the contributors in order to suggest improvement in the edition of pages
    • developing tools that are more adapted to the needs of contributors and users.

This project stays very close to DBpedia.org, it is member of the internationalization committee. The generated data are both published under the URIs "fr.dbpedia.org" and "lab.wikimedia.fr/semanticpedia".

In addition to the data extracted from Wikipedia, several extensions are considered, as the extraction of data from the wiktionary.

Auteur : Alexandre Monnin

  • Responsable Recherche Web et Métadonnées à l'Institut de Recherche et d'Innovation du Centre Pompidou (IRI)
  • Doctorant en philosophie à Paris 1 (PHICO, EXeCO)
  • Collaborateur extérieur de l'INRIA (Membre associé de l'EPI Wimmics, Centre de Recherche de Sophia-Antipolis)
  • Doctorant associé au CNAM (équipe DICEN)
  • Responsable du séminaire "Philosophie du Web", 2011-2012 (Collège des Ecoles Doctorales de Paris 1, IRI, Implications Philosophiques)
  • Responsable du séminaire "Web social, Web Sémantique et musées", 2011-2012 (Ministère de la Culture, IRI ; Centre Pompidou, Wikimedia France, W3C)
  • Enseignant en M2 à Marne-la-Vallée ("Architecture du Web et Web Sémantique")
  • Co-organisateur des "Rencontres du Web de données" au Centre Pompidou
  • Membre du comité de rédaction de la revue Implications Philosophiques
  • Twitter : @aamonnz & @PhiloWeb
  • Philosophy of the Web, http://web-and-philosophy.org/
  • PhiloWeb on Dailymotion, http://www.dailymotion.com/PhiloWeb
  • Philosophy and Web discussion list @INRIA, https://lists-sop.inria.fr/sympa/info/philoweb