Pierre Baldi, University of California at Irvine, USA.
Learning and Charting Chemical Space with Strings and Graphs: Challenges and Opportunities for AI and Machine Learning
Informatics methods and computers have not yet become as pervasive in chemistry as they have in physics and biology. Drawing analogies from bioinformatics, key ingredients for progress in chemoinformatics are the availability of large, annotated databases of compounds and reactions, data structures and algorithms to efficiently search these databases, and computational methods to predict the physical, chemical, and biological properties of new compounds and reactions. We will describe how graph-based methods play a key role in the development of: (1) a large public database of compounds and reactions (ChemDB) and the underlying algorithms and representations; (2) machine learning kernel methods to predict molecular properties; and (3) the applications of these methods to drug screening/design problems and the identification of new drug leads against a major disease.
Luc De Raedt, Katholieke Universiteit Leuven, Belgium.
ProbLog and its Application to Link Mining in Biological Networks
ProbLog is a recently introduced probabilistic
extension of Prolog [De Raedt, Kimmig, Toivonen, IJCAI 07]. A
ProbLog program defines a distribution over logic programs by
specifying for each clause the probability that it belongs to a
randomly sampled program, and these probabilities are mutually
independent. The semantics of ProbLog is then defined by the
success probability of a query in a randomly sampled program. It has
been applied to link mining and discovery in a large biological
network. In the talk, I will also discuss various learning settings
for ProbLog and link mining, in particular, I shall present
techniques for probabilistic local pattern mining, probabilistic
explanation based learning [Kimmig, De Raedt, Toivonen, ECML 07] and
theory compression from examples [De Raedt et al, ILP 96].
This is joint work with Angelika Kimmig, Hannu Toivonen, Kate Revoredo and Kristian Kersting.
Jiawei Han, Univ. of Illinois at Urbana-Champaign, USA.
Mining, Indexing, and Searching Graphs in Large Data Sets
Recent research on pattern discovery has progressed from mining
frequent itemsets and sequences to mining structured patterns
including trees, lattices, and graphs. As a general data structure,
graph can model complicated relations among data with wide
applications in Web, social network analysis, and bioinformatics.
However, mining and searching large graphs in graph databases is
challenging due to the presence of an exponential number of frequent
In this talk, we present our recent progress on developing efficient and scalable methods for mining and searching of graphs in large databases. We introduce gSpan and CloseGraph, two efficient methods for mining frequent graph patterns in graph databases. Then we introduce constraint-based graph mining methods. Further, we introduce a graph indexing method, gIndex, and a graph approximate searching method, grafil, both taking advantages of frequent graph mining to construct a compact but highly effective graph index and perform similarity search with such indexing structures. These methods not only facilitate mining and querying graph patterns in massive datasets but also claim broad applications in other fields, including DB/OS systems and software engineering.
Lise Getoor, University of Maryland, USA.
Within the machine learning community, there has been a growing interest in learning structured models from input data that is itself structured. Graph identification refers to methods that transform an observed input graph into an inferred output graph. Examples include inferring organizational hierarchies from social network data and identifying gene regulatory networks from protein-protein interactions. The key processes in graph identification are entity resolution, link prediction, and collective classification. I will overview algorithms for these tasks and discuss the need for integrating the results to solve the overall problem collectively.
Alex Smola, National ICT Australia.
Learning Graph Matching
As a fundamental problem in pattern recognition, graph matching
has found a variety of applications in the field of computer
vision. In graph matching, patterns are modeled as graphs and pattern
recognition amounts to finding a correspondence between the nodes of
different graphs. There are many ways in which the problem has been
formulated, but most can be cast in general as a quadratic assignment
problem, where a linear term in the objective function encodes node
compatibility functions and a quadratic term encodes edge
compatibility functions. The main research focus in this theme is
about designing efficient algorithms for solving approximately the
quadratic assignment problem, since it is NP-hard.
In this paper, we turn our attention to the complementary problem: how to estimate compatibility functions such that the solution of the resulting graph matching problem best matches the expected solution that a human would manually provide. We present a method for learning graph matching: the training examples are pairs of graphs and the “labels” are matchings between pairs of graphs. We present experimental results with real image data which give evidence that learning can improve the performance of standard graph matching algorithms. In particular, it turns out that linear assignment with such a learning scheme may improve over state-of-the-art quadratic assignment relaxations. This finding suggests that for a range of problems where quadratic assignment was thought to be essential for securing good results, linear assignment, which is far more ef icient, could be just sufficient if learning is performed. This enables speed-ups of graph matching by up to 4 orders of magnitude while retaining state-of-the-art accuracy.