Pierre Baldi, University of California at Irvine, USA.
Learning and Charting Chemical Space with Strings and Graphs: Challenges and Opportunities for AI and Machine Learning
Informatics methods and computers have not yet become as pervasive in chemistry as they have in physics and biology. Drawing analogies from bioinformatics, key ingredients for progress in chemoinformatics are the availability of large, annotated databases of compounds and reactions, data structures and algorithms to efficiently search these databases, and computational methods to predict the physical, chemical, and biological properties of new compounds and reactions. We will describe how graph-based methods play a key role in the development of: (1) a large public database of compounds and reactions (ChemDB) and the underlying algorithms and representations; (2) machine learning kernel methods to predict molecular properties; and (3) the applications of these methods to drug screening/design problems and the identification of new drug leads against a major disease.
Luc De Raedt, Katholieke Universiteit Leuven, Belgium.
ProbLog and its Application to Link Mining in Biological Networks
ProbLog is a recently introduced probabilistic
extension of Prolog [De Raedt, Kimmig, Toivonen, IJCAI 07]. A
ProbLog program defines a distribution over logic programs by
specifying for each clause the probability that it belongs to a
randomly sampled program, and these probabilities are mutually
independent. The semantics of ProbLog is then defined by the
success probability of a query in a randomly sampled program. It has
been applied to link mining and discovery in a large biological
network. In the talk, I will also discuss various learning settings
for ProbLog and link mining, in particular, I shall present
techniques for probabilistic local pattern mining, probabilistic
explanation based learning [Kimmig, De Raedt, Toivonen, ECML 07] and
theory compression from examples [De Raedt et al, ILP 96].
This
is joint work with Angelika Kimmig, Hannu Toivonen, Kate Revoredo
and Kristian Kersting.
Jiawei Han, Univ. of Illinois at Urbana-Champaign, USA.
Mining, Indexing, and Searching Graphs in Large Data Sets
Recent research on pattern discovery has progressed from mining
frequent itemsets and sequences to mining structured patterns
including trees, lattices, and graphs. As a general data structure,
graph can model complicated relations among data with wide
applications in Web, social network analysis, and bioinformatics.
However, mining and searching large graphs in graph databases is
challenging due to the presence of an exponential number of frequent
subgraphs.
In this talk, we present our recent progress on
developing efficient and scalable methods for mining and searching
of graphs in large databases. We introduce gSpan and CloseGraph,
two efficient methods for mining frequent graph patterns in graph
databases. Then we introduce constraint-based graph mining methods.
Further, we introduce a graph indexing method, gIndex, and a graph
approximate searching method, grafil, both taking advantages of
frequent graph mining to construct a compact but highly effective
graph index and perform similarity search with such indexing
structures. These methods not only facilitate mining and querying
graph patterns in massive datasets but also claim broad applications
in other fields, including DB/OS systems and software engineering.
Lise Getoor, University of Maryland, USA.
Graph Identification
Within the machine learning community, there has been a growing interest in learning structured models from input data that is itself structured. Graph identification refers to methods that transform an observed input graph into an inferred output graph. Examples include inferring organizational hierarchies from social network data and identifying gene regulatory networks from protein-protein interactions. The key processes in graph identification are entity resolution, link prediction, and collective classification. I will overview algorithms for these tasks and discuss the need for integrating the results to solve the overall problem collectively.
Alex Smola, National ICT Australia.
Learning Graph Matching
As a fundamental problem in pattern recognition, graph matching
has found a variety of applications in the field of computer
vision. In graph matching, patterns are modeled as graphs and pattern
recognition amounts to finding a correspondence between the nodes of
different graphs. There are many ways in which the problem has been
formulated, but most can be cast in general as a quadratic assignment
problem, where a linear term in the objective function encodes node
compatibility functions and a quadratic term encodes edge
compatibility functions. The main research focus in this theme is
about designing efficient algorithms for solving approximately the
quadratic assignment problem, since it is NP-hard.
In this
paper, we turn our attention to the complementary problem: how to
estimate compatibility functions such that the solution of the
resulting graph matching problem best matches the expected solution
that a human would manually provide. We present a method for learning
graph matching: the training examples are pairs of graphs and the
“labels” are matchings between pairs of graphs. We present
experimental results with real image data which give evidence that
learning can improve the performance of standard graph matching
algorithms. In particular, it turns out that linear assignment with
such a learning scheme may improve over state-of-the-art quadratic
assignment relaxations. This finding suggests that for a range of
problems where quadratic assignment was thought to be essential for
securing good results, linear assignment, which is far more ef icient,
could be just sufficient if learning is performed. This enables
speed-ups of graph matching by up to 4 orders of magnitude while
retaining state-of-the-art accuracy.