Keynote Speakers
Dr. Tim Furche
"How the Minotaur turned into Ariadne... Ontologies in Web Data Extraction"
Abstract
The Web has been a stellar success: there are now more web pages than
stars in the Milky Way. The flip side of this success is that
navigating and understanding this information to make informed
decisions has become overwhelmingly hard: To rent an apartment, say in
Oxford, you need to sift through the web sites of hundreds of
real-estate agencies, and yet always feel like you missed that even
nicer place Humans require automated support to profit from this
wealth of data. To enable automation, the linked open data initiative
and others have been asking data providers to publish structured,
semantically annotated data. Small data providers, such as most UK
real-estate agency, however, are not equipped to shoulder this burden.
They are just starting to deal with the transition from simple, table-
or list-like directories to web applications with rich interfaces.
In this talk, we argue that fully automated extraction of structured
data can be the way out of this dilemma. Ironically, automated data
extraction has seen a recent revival thanks to the use of ontologies
and linked open data to guide the data extraction. First results from
the ERC DIADEM project illustrate that high quality, fully automated
data extraction at a web scale is possible, if we combination domain
ontologies with a phenomenology describing how the concepts of a
domain are represented on the web. The talk concludes with a summary
of the current state of data extraction and a brief discussion of the
major open problems.
Dr. Tim Furche leads the DIADEM lab at the Oxford University, UK, as senior postdoc. DIADEM (diadem-project.info) is an ERC advanced investigator grant on web data extraction recently awarded to Georg
Gottlob. His research interests include data extraction, XML and semi-structured data, in particular query evaluation and optimisation,
and advanced Web information systems. He has authored over 40 peer-reviewed scientific publications, some of them cited over 200
times. His main contributions are on XPath optimisation and evaluation, on linear time and space querying of large graphs, and on
languages for web data extraction, querying, and search. Tim Furche regularly contributes to scientific conferences and journals,
especially in the areas Web and Semantic Web as an author, reviewer and program committee member. From 2004-2008 he co-coordinated the
working group on Reasoning-aware Querying in the EU Network of Excellence REWERSE.
Prof. Stefano Ceri
Title: The Anatomy of a Multi-Domain Search Infrastructure
Abstract
The Search Computing (SeCo,
www.search-computing.eu) project focuses on
building answers to complex search queries, like "Where can I attend an
interesting conference in my field close to a sunny beach?", by using
ranking and joining of results as the dominant factors for service
composition. SeCo is funded by an ERC Senior Grant, started in November
2008, and will last until 2013; in the talk, I will argue that search should
evolve from general-purpose monolithic engines, owned by few market leaders,
towards a flexible and modular scenario, targeted to supporting complex
queries and to exposing specialized data sources of the "hidden web"; I will
give substance to this argument by showing that indeed such a
component-based infrastructure is "under construction" and presenting its
rationale and current state of development.
Prof. Stefano Ceri (http://home.dei.polimi.it/ceri/) is Professor of Database
Systems at Politecnico di Milano focusing on database technology to
incorporate data distribution, deductive and active rules, object
orientation, XML query languages. His recent work focuses on design methods for data-intensive WEB sites,
stream reasoning, and search computing. He has authored more than 300
articles and nine international books and is co-editor of the book series
"Data Centric Systems and Applications" (Springer-Verlag). He is responsible of several EU Projects projects including "Large Knowledge
Collider" (2008-2011) and was awarded an IDEAS Advanced Grant of the European Research
Council (ERC), on "Search Computing" (2008-2013). He co-invented WebML, a
model for the conceptual design of Web applications, and co-founded of Web
Models, a startup of Politecnico di Milano focused on WebML
commercialization by means of the product WebRatio (www.webratio.com). He is
the director of Alta Scuola Politecnica (www.asp-poli.it).