11th International Conference on Web Engineering (ICWE 2011)

June 20-24 2011, Paphos, Cyprus

Dr. Tim Furche

"How the Minotaur turned into Ariadne... Ontologies in Web Data Extraction"


The Web has been a stellar success: there are now more web pages than stars in the Milky Way. The flip side of this success is that navigating and understanding this information to make informed decisions has become overwhelmingly hard: To rent an apartment, say in Oxford, you need to sift through the web sites of hundreds of real-estate agencies, and yet always feel like you missed that even nicer place Humans require automated support to profit from this wealth of data. To enable automation, the linked open data initiative and others have been asking data providers to publish structured, semantically annotated data. Small data providers, such as most UK real-estate agency, however, are not equipped to shoulder this burden. They are just starting to deal with the transition from simple, table- or list-like directories to web applications with rich interfaces. In this talk, we argue that fully automated extraction of structured data can be the way out of this dilemma. Ironically, automated data extraction has seen a recent revival thanks to the use of ontologies and linked open data to guide the data extraction. First results from the ERC DIADEM project illustrate that high quality, fully automated data extraction at a web scale is possible, if we combination domain ontologies with a phenomenology describing how the concepts of a domain are represented on the web. The talk concludes with a summary of the current state of data extraction and a brief discussion of the major open problems.

Dr. Tim Furche leads the DIADEM lab at the Oxford University, UK, as senior postdoc. DIADEM (diadem-project.info) is an ERC advanced investigator grant on web data extraction recently awarded to Georg Gottlob. His research interests include data extraction, XML and semi-structured data, in particular query evaluation and optimisation, and advanced Web information systems. He has authored over 40 peer-reviewed scientific publications, some of them cited over 200 times. His main contributions are on XPath optimisation and evaluation, on linear time and space querying of large graphs, and on languages for web data extraction, querying, and search. Tim Furche regularly contributes to scientific conferences and journals, especially in the areas Web and Semantic Web as an author, reviewer and program committee member. From 2004-2008 he co-coordinated the working group on Reasoning-aware Querying in the EU Network of Excellence REWERSE.

Prof. Stefano Ceri

Title: The Anatomy of a Multi-Domain Search Infrastructure


The Search Computing (SeCo, www.search-computing.eu) project focuses on building answers to complex search queries, like "Where can I attend an interesting conference in my field close to a sunny beach?", by using ranking and joining of results as the dominant factors for service composition. SeCo is funded by an ERC Senior Grant, started in November 2008, and will last until 2013; in the talk, I will argue that search should evolve from general-purpose monolithic engines, owned by few market leaders, towards a flexible and modular scenario, targeted to supporting complex queries and to exposing specialized data sources of the "hidden web"; I will give substance to this argument by showing that indeed such a component-based infrastructure is "under construction" and presenting its rationale and current state of development.

Prof. Stefano Ceri (http://home.dei.polimi.it/ceri/) is Professor of Database Systems at Politecnico di Milano focusing on database technology to incorporate data distribution, deductive and active rules, object orientation, XML query languages. His recent work focuses on design methods for data-intensive WEB sites, stream reasoning, and search computing. He has authored more than 300 articles and nine international books and is co-editor of the book series "Data Centric Systems and Applications" (Springer-Verlag). He is responsible of several EU Projects projects including "Large Knowledge Collider" (2008-2011) and was awarded an IDEAS Advanced Grant of the European Research Council (ERC), on "Search Computing" (2008-2013). He co-invented WebML, a model for the conceptual design of Web applications, and co-founded of Web Models, a startup of Politecnico di Milano focused on WebML commercialization by means of the product WebRatio (www.webratio.com). He is the director of Alta Scuola Politecnica (www.asp-poli.it).


