Invited talks

Pierre Baldi, University of California at Irvine, USA.

Learning and Charting Chemical Space with Strings and Graphs: Challenges and Opportunities for AI and Machine Learning

Informatics methods and computers have not yet become as pervasive in chemistry as they have in physics and biology. Drawing analogies from bioinformatics, key ingredients for progress in chemoinformatics are the availability of large, annotated databases of compounds and reactions, data structures and algorithms to efficiently search these databases, and computational methods to predict the physical, chemical, and biological properties of new compounds and reactions. We will describe how graph-based methods play a key role in the development of: (1) a large public database of compounds and reactions (ChemDB) and the underlying algorithms and representations; (2) machine learning kernel methods to predict molecular properties; and (3) the applications of these methods to drug screening/design problems and the identification of new drug leads against a major disease.

Luc De Raedt, Katholieke Universiteit Leuven, Belgium.

ProbLog and its Application to Link Mining in Biological Networks

ProbLog is a recently introduced probabilistic extension of Prolog [De Raedt, Kimmig, Toivonen, IJCAI 07]. A ProbLog program defines a distribution over logic programs by specifying for each clause the probability that it belongs to a randomly sampled program, and these probabilities are mutually independent. The semantics of ProbLog is then defined by the success probability of a query in a randomly sampled program. It has been applied to link mining and discovery in a large biological network. In the talk, I will also discuss various learning settings for ProbLog and link mining, in particular, I shall present techniques for probabilistic local pattern mining, probabilistic explanation based learning [Kimmig, De Raedt, Toivonen, ECML 07] and theory compression from examples [De Raedt et al, ILP 96].
This is joint work with Angelika Kimmig, Hannu Toivonen, Kate Revoredo and Kristian Kersting.

Jiawei Han, Univ. of Illinois at Urbana-Champaign, USA.

Mining, Indexing, and Searching Graphs in Large Data Sets

Recent research on pattern discovery has progressed from mining frequent itemsets and sequences to mining structured patterns including trees, lattices, and graphs. As a general data structure, graph can model complicated relations among data with wide applications in Web, social network analysis, and bioinformatics. However, mining and searching large graphs in graph databases is challenging due to the presence of an exponential number of frequent subgraphs.
In this talk, we present our recent progress on developing efficient and scalable methods for mining and searching of graphs in large databases. We introduce gSpan and CloseGraph, two efficient methods for mining frequent graph patterns in graph databases. Then we introduce constraint-based graph mining methods. Further, we introduce a graph indexing method, gIndex, and a graph approximate searching method, grafil, both taking advantages of frequent graph mining to construct a compact but highly effective graph index and perform similarity search with such indexing structures. These methods not only facilitate mining and querying graph patterns in massive datasets but also claim broad applications in other fields, including DB/OS systems and software engineering.

Lise Getoor, University of Maryland, USA.

Graph Identification

Within the machine learning community, there has been a growing interest in learning structured models from input data that is itself structured. Graph identification refers to methods that transform an observed input graph into an inferred output graph. Examples include inferring organizational hierarchies from social network data and identifying gene regulatory networks from protein-protein interactions. The key processes in graph identification are entity resolution, link prediction, and collective classification. I will overview algorithms for these tasks and discuss the need for integrating the results to solve the overall problem collectively.

Alex Smola, National ICT Australia.

Learning Graph Matching

As a fundamental problem in pattern recognition, graph matching has found a variety of applications in the field of computer vision. In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence between the nodes of different graphs. There are many ways in which the problem has been formulated, but most can be cast in general as a quadratic assignment problem, where a linear term in the objective function encodes node compatibility functions and a quadratic term encodes edge compatibility functions. The main research focus in this theme is about designing efficient algorithms for solving approximately the quadratic assignment problem, since it is NP-hard.
In this paper, we turn our attention to the complementary problem: how to estimate compatibility functions such that the solution of the resulting graph matching problem best matches the expected solution that a human would manually provide. We present a method for learning graph matching: the training examples are pairs of graphs and the “labels” are matchings between pairs of graphs. We present experimental results with real image data which give evidence that learning can improve the performance of standard graph matching algorithms. In particular, it turns out that linear assignment with such a learning scheme may improve over state-of-the-art quadratic assignment relaxations. This finding suggests that for a range of problems where quadratic assignment was thought to be essential for securing good results, linear assignment, which is far more ef icient, could be just sufficient if learning is performed. This enables speed-ups of graph matching by up to 4 orders of magnitude while retaining state-of-the-art accuracy.