Fact Bases and Querying¶
As well as offering a high-level interface for mapping ASP facts to Python objects, Clorm also provides facilities for dealing with collections of facts. An ASP application typically does not simply deal with individual facts in isolation, but instead needs to deal in a collection of facts; whether they are the set of facts that make up the problem instance or, alternatively, the facts that constitute the model of a problem,
A Container for Facts¶
Clorm provides the FactBase
class as a container for storing and
querying facts. A FactBase
behaves much like a normal Python
set
object with two caveats: firstly, it can only contain instances of
Predicate
sub-classes, and secondly, it provides an interface to
a database-like query mechanism.
from clorm import Predicate, ConstantField, StringField, FactBase
class Person(Predicate):
id = ConstantField
address = StringField
class Pet(Predicate):
owner = ConstantField
petname = StringField
dave = Person(id="dave", address="UNSW")
morri = Person(id="morri", address="UNSW")
dave_cat = Pet(owner="dave", petname="Frank")
fb = FactBase([dave,morri,dave_cat])
# The "in" and "len" operators work as expected
assert dave in fb
assert len(fb) == 3
A fact base can be populated at object construction time or later. It can also
be manipulated using the standard Python set operators and member
functions. Like a Python set
object it has an
FactBase.add()
member function for adding
facts. However, because it can only store Predicate
instances
this function is able to be more flexible and has been overloaded to accept
either a single fact or a collection of facts.
dave_dog = Pet(owner="dave", petname="Bob")
morri_cat = Pet(owner="morri", petname="Fido")
morri_cat2 = Pet(owner="morri", petname="Dusty")
fb.add(dave_dog)
fb.add([morri_cat, morri_cat2])
assert dave_dog in fb
assert morri_cat in fb
assert morri_cat2 in fb
Querying¶
An important motivation for providing a specialised FactBase
container class for storing facts, as opposed to simply using a Python list
or set
object, is to support a rich mechanism for querying the contents of a
collection.
When an ASP model is returned by the solver the application developer needs to process the model in order to extract the relevant information. The simplest mechanism to do this is to simply loop through the facts in the model. This loop will typically contain a number of conditional statements to determine what action to take for the given fact; and to store it if some sort of matching needs to take place.
However, this loop-and-test approach leads to unnecessary boilerplate code as
well as making the purpose of the code more obscure. FactBase
is
intended to alleviate this problem by offering a database-like query mechanism
for extracting information from a model.
Note
The following highlights the operations of the new Query API. As of Clorm 1.2.1 this new API should be the preferred search mechanism. It provides all the functionality of the old query interface and much more; including SQL-like joins between predicates and controlling how the query results are presented.
Simple Queries¶
Continuing the running example above the
FactBase.query()
method can be used to create
Query
objects.
query1=fb.query(Pet).where(Pet.owner == "dave")
query2=fb.query(Person).where(Person.id == "dave")
The queries are defined by chaining over the member functions of a
Query
object. Each function call returns a modified copy of the
Query
object. Here the member function
Query.where()
returns a modified copy of
itself. This chaining technique will be be familiar to users of Python ORM’s
such as SQLAlchemy or Peewee, where it is used as a generator for SQL
statements.
A query object needs to be executed in order to return the search results. There
are number of end-points that can be used to execute the search. The
Query.all()
member function returns a generator to
iterate over all matching search results:
assert set(query1.all()) == set([dave_cat,dave_dog])
The Query.singleton()
member function returns
the single matching item (and raises an exception if there is not exactly one
match):
assert query2.singleton() == dave
The Query.first()
member function returns the first
matching item, and only raises an exception if there no matching items:
assert query2.first() == dave
The Query.count()
member function returns
the number of matching entries:
assert query1.count() == 2
Note
For comparison the following shows how these queries and results can be
encoded using the legacy query API. The
FactBase.select()
method is used to create
clorm.Select
objects. Note: there is no matching member function for
Query.first()
.
query1_legacy=fb.select(Pet).where(Pet.owner == "dave")
query2_legacy=fb.select(Person).where(Person.id == "dave")
assert set(query1_legacy.get()) == set([dave_cat,dave_dog])
assert query2_legacy.get_unique() == dave
assert query1_legacy.count() == 2
An important difference between the old and new interfaces is that the call
to Select.get()
executes the query and returns the
list of results. In contrast the call to Query.all()
returns a generator and the query is executed by the generator during its
iteration.
Queries with Joins¶
It is often useful to match instances of different predicates in the same way
that you would join multiple database tables in an SQL query. To perform a
search across multiple predicates it is first necessary to specify the
predicates in the call to FactBase.query()
and
then specify to how these predicates are to be joined in the chained member
function Query.join()
query3=fb.query(Person,Pet).join(Person.id == Pet.owner)
When a query contains multiple predicates the result will consist of tuples,
where each tuple contains the facts matching the signature of predicates in the
query
clause. Mathematically, the tuples are a subset of the cross-product
over instances of the predicates; where the subset is determined by the join
clause.
assert set(query3.all()) == set([(dave,dave_cat),(dave,dave_dog),
(morri,morri_cat),(morri,morri_cat2)])
Projections¶
Returning tuples of facts may not be convenient and a more usable output format
may be desired. In such a case it is possible to specify a
Query.select()
specification to provide the
projection of the results. This is much like the use of the SQL SELECT
clause.
Note
Instead of formulating the query from scratch a new query can be defined as a refinement of an existing query.
query4=query3.select(Pet.petname, Person.address)
assert set(query4.all()) == set([("Bob","UNSW"),("Frank","UNSW"),
("Fido","UNSW"),("Dusty","UNSW")])
In the general case the query result is returned as a tuple consisting of the
instances of the signature matching the
FactBase.query()
specification. However, if the
result signature is for a single item, for example you only want to return the
name of the pet, then returning a singleton tuple is not very intuitive. So,
instead, when the result signature consists only of a single item then the API
default behaviour is for the query result to return the items themselves rather
than being wrapped in a singleton tuple.
query5=query3.select(Pet.petname)
assert set(query5.all()) == set(["Bob","Frank","Fido","Dusty"])
One important point to note when using projections is that the uniqueness of the
output is no longer guaranteed. While the combinations of the cross-product of
tuples being joined are guaranteed to be unique, once a
Query.select()
signature is specified this may no
longer be the case. For example, if in the above query we only want to output
the addresses of the owners of the different pets, the projection will lead to
duplicate elements. These duplicates can be removed from the search by
specifying the Query.distinct()
modifier. In
terms of SQL this is similar to specfying a SELECT DISTINCT
query.
query6=query3.select(Person.address)
query7=query6.distinct()
assert query6.count() == 4
assert set(query6.all()) == set(["UNSW"])
assert list(query7.all()) == ["UNSW"]
Finally, for greatest flexibility the
Query.select()
member function can be passed a
single Python callable object such as a function or lambda expression. The
call signature of this object must match the signature specified in the
FactBase.query()
specification. The output of
this callable are then presented as the results of the query.
query7=fb.query(Person,Pet).join(Person.id == Pet.owner)\
.select(lambda pn,pt: f"{pt.petname} from {pn.address}")
Queries with Ordered Results¶
The Query.order_by()
member function allows for
the ordering of results similar to an SQL ORDER BY
clause. Multiple fields
can be listed as well as being able to specify ascending or descending sort
order; with ascending order being the default and descending order specified by
the desc()
function.
from clorm import desc
query8=fb.query(Pet).order_by(Pet.owner, desc(Pet.petname))\
.select(Pet.owner,Pet.petname)
assert list(query8.all()) == [("dave","Frank"),("dave","Bob"),
("morri","Fido"),("morri","Dusty")]
Grouping the Query Results¶
Query results can be grouped in a similarly to an SQL GROUP BY
clause using
the Query.group_by()
member function . An
important distinction between SQL and Clorm’s grouping mechanism is that Clorm
does not support query aggregate functions, so any aggregating needs to be
performed outside the query specification itself.
The Query.group_by()
clause modifies the
behaviour of the output of the generator returned
Query.all()
. Instead of simply iterating over the
individual items, the iterator returns pairs where the first element of the pair
is the group identifier (based on the group_by
specification) and the second
element is an iterator over the matching elements within the group.
query9=fb.query(Pet).group_by(Pet.owner)\
.order_by(desc(Pet.petname)).select(Pet.petname)
result = [(oname, list(petnames)) for oname,petnames in query9.all()]
assert result == [("dave",["Frank","Bob"]),("morri",["Fido","Dusty"])]
Querying by Positional Arguments¶
As well as querying by field name (or sub-field name) it is also possible to query by the field (sub-field) position.
query10=fb.query(Pet).where(Pet[0] == "dave").order_by(Pet[1])
However, earlier warnings still hold; use positional arguments sparingly and only in cases where the order of elements will not change as the ASP code evolves.
Querying Predicates with Complex Terms¶
Querying Predicates with complex terms is no different to the simple case. A
chain of “.” notation expressions and positional arguments can be used to
identify the appropriate field. For example we can replace the Person
definition earlier to something containing a tuple:
from clorm import Predicate, ConstantField, StringField, FactBase
class PersonAlt(Predicate):
id = ConstantField
address = (StringField,StringField)
dave = PersonAlt(id="dave", address=("Newcastle","UNSW"))
morri = PersonAlt(id="morri", address=("Sydney","UNSW"))
torsten = PersonAlt(id="torsten", address=("Potsdam","UP"))
fb2 = FactBase([dave,morri,torsten])
query11=fb2.query(PersonAlt)\
.where(PersonAlt.address[1] == "UNSW")\
.select(PersonAlt.address[0])\
.order_by(PersonAlt.address[1])
assert list(query11.all()) == ["Newcastle","Sydney"]
Complex Query Expressions¶
So far we have only seen Clorm’s support for queiries with a single where
clause, such as:
query12=fb.query(Pet).where(Pet.owner == "dave")
However, more complex queries can be specified. Firstly, a where
clause can
consist of a comma seperated list of clauses. These are treated as a
conjunction:
# Search for pets named Bob that are owned by dave
query13=fb.query(Pet).where(Pet.petname == "Bob", Pet.owner == "dave")
assert query13.singleton() == dave_dog
It is also possible to specify more complex queries using the overloaded logical
operators &
, |
, and ~
.
# Find the Person with id "torsten" or whose university address is not "UP"
query14=fb2.query(PersonAlt)\
.where((PersonAlt.id == "torsten") | ~(PersonAlt.address[1] == "UP"))
assert set(query14.all()) == set([dave,morri,torsten])
# Find the Person with id "dave" and with address "UNSW"
query15=fb2.query(PersonAlt)\
.where((PersonAlt.id == "dave") & (PersonAlt.address[1] == "UNSW"))
assert query15.singleton() == dave
Clorm also provides the explicit functions and_()
,
or_()
, and not_()
for these logical operators,
but the overloaded syntax is arguably more intuitive. With the explicit
functions the above could also be written as:
query14alt=fb2.query(PersonAlt)\
.where(or_(PersonAlt.id == "torsten", not_(PersonAlt.address[1] == "UP")))
query15alt=fb2.query(PersonAlt)\
.where(and_(PersonAlt.id == "dave", PersonAlt.address[1] == "UNSW"))
Finally, it is also possible to test for membership of a collection using the
in_()
and notin_()
functions.
query16=fb2.query(PersonAlt).where(in_(PersonAlt.id, ["dave","bob","sam"])
assert query16.singleton() == dave
Queries with Parameters¶
To support more flexible queries Clorm provides placeholders as a means of
parameterising queries. Placeholders are named ph1_
to ph4_
and
correspond to the positional parameters. These parameters are bounds to actual
values by calling Query.bind()
where the input
parameter to the function call must match the declared placeholders.
from clorm import ph1_, ph2_
query12=fb.query(Pet).where((Pet.owner == ph1_) & (Pet.petname == ph2_))
assert query12.bind("dave","Bob").singleton() == dave_dog
assert query12.bind("dave","Fido").count() == 0
Additional placeholders can be defined using the ph_()
function. For
example, ph_(5)
will create a placeholder for the 5th positional argument.
Clorm also supports named placeholders, which may be preferable if there are
a larger number of parameters. A named placeholder is created by calling the
ph_()
function with a non-numeric first parameter, and are referenced
in the call to Query.bind()
using keyword function
parameters. An advantange of named placeholders is that they allow for a default
value to be set.
from clorm import ph_
query13=fb.query(Pet).where(Pet.owner == ph_("owner","dave"))
assert set(query13.all()) == set([dave_dog,dave_cat])
assert set(query13.bind(owner="morri").all()) == set([morri_cat,morri_cat2])
Querying Negative Facts/Complex-Terms¶
ASP problems can often by compactly modelled using only default negation instead of strong negation. Because of this the use of explicitly negated literals is not particularly common in ASP programs.
Nevertheless Clorm does support negated facts and the Clorm query mechanism support querying based on the sign of a fact or complex term.
from clorm import IntegerField
class P(Predicate):
a = IntegerField
p1 = P(1)
neg_p2 = P(2,sign=False)
fb3 = FactBase([p1,neg_p2])
assert fb3.query(P).where(P.sign == True).singleton() == p1
assert fb3.query(P).where(P.sign == False).singleton() == neg_p2
Querying the Predicate Itself¶
While it is possible to query fields (and sub-fields) of a predicate using the
intutive “.” syntax (eg., Pet.owner == ph1_
), unfortunately, it is not
possible to provide this intuitive syntax for querying the predicate itself
(e.g., a query of Pet < ph1_
will fail).
Instead a helper function path()
is provided for this special case.
from clorm import path
query14=fb.query(Pet).where(path(Pet) == dave_dog)
assert query14.count() == 1
Note, querying by the predicate itself is a boundary case. While testing for
equality or inequality makes sense semantically, the semantics of a query based
on an ordering operator doesn’t always make sense (eg., path(Pet) < dave_dog
).
Furthermore, when testing for equality or inequality it is usually simpler to not use the query mechanism and instead to use the basic Python set inclusion operation:
assert dave_dog in fb
Queries that modify the FactBase¶
Querying can be used to modify the underlying FactBase
to acheive a similar effect to an SQL
DELETE
or UPDATE
query. The Query.delete()
end-point provides
a mechanism to delete the matching facts of a query from the underlying FactBase
.
For example, to delete the pets owned by people with the address “UNSW”, we can identify the matching pets in the query.
fb.query(Person,Pet).join(Person.id == Pet.owner).where(Person.address == "UNSW")\
.select(Pet).delete()
Clorm facts are immutable, so it is not possible to modify the facts themselves in the same way that
one might want to perform an SQL UPDATE
query. Nevertheless it is possible to provide query
functions to make it easy to replace the selected facts within the FactBase
. The
Query.replace()
and Query.modify()
end-points provides the mechanism to modify the underlying FactBase
based on the matches of a
query.
For example, rather than deleting all pets owned by people living in “UNSW” we can instead use the
Query.replace()
end-point to assign these pets to a new owner,
“Rob”, replacing the existing Pet fact with a modified cloned fact.
def change_owner(pet):
return pet.clone(owner="Rob")
fb.query(Person,Pet).join(Person.id == Pet.owner).where(Person.address == "UNSW") \
.select(Pet).replace(change_owner)
The Query.replace()
method takes a single function as an input. The
input signature of the function must match the query selection criteria. The expected output of the
function is a fact or set of facts that will be used to replace the matched facts. The matched facts
are deleted and the replacements inserted in their place.
The Query.modify()
method is a more general version of the
Query.replace()
method. This allows for greater control over which
facts are deleted. In this case the parameter function must return a pair of fact sets. The first
set contains the facts to be deleted and the second set the facts to be inserted.
Note, the behaviour of these modifying queries could also achieved by simply iterating over the results of a “normal” query and explictly buidling the delete and add lists. The advantage of using the special end-point methods is that it is more declarative and therefore more succint and less error prone. This can be especially convenient when chaining multiple modifications of a factbase.
FactBases with Indexes¶
A typical ASP program has models that contain relatively small numbers of facts
(e.g., 10-100 facts). With such small numbers of facts, querying these facts
from a FactBase
can often be done without regard to performance
considerations, since the solving of the combinatorial ASP problem will often
dominate.
However, as the number of the number of facts increases so to does the cost of
querying these facts from a FactBase
. Eventually this can lead
to a noticeable impact of performance.
In order to alleviate this problem a FactBase
can be defined
with indexes for one of more fields.
To highlight this the following example creates a simple test predicate that has
two fields. Instances are created where the two fields have identical values,
and these instances are added to a FactBase
where one field is
indexed and the other is not.
class Num(Predicate):
to_idx=IntegerField
not_to_idx=IntegerField
fb4 = FactBase([Num(to_idx=n,not_to_idx=n) for n in range(0,100000)], indexes=[Num.to_idx])
We can now compare the timing differences between searching for a value where one query searches for a value based on the indexed field and the other query searches for the same value based on the non-indexed field.
import time
query15=fb4.query(Num).where(Num.to_idx == 50000)
query16=fb4.query(Num).where(Num.not_to_idx == 50000)
start_q15 = time.time()
assert query15.count() == 1
q15_time = time.time() - start_q15
start_q16 = time.time()
assert query16.count() == 1
q16_time = time.time() - start_q16
assert q15_time < q16_time
print("Indexed search {} vs non-indexed search {}".format(q15_time,q16_time))
To confirm that these two queries are indeed behaving differently we can examine
the query plans for the respective queries by calling the
Query.query_plan()
methods.
print("Querying without indexing:\n{}\n".format(query15.query_plan()))
print("Query with indexing:\n{}\n".format(query16.query_plan()))
Note, currently, there is no official API for a query plan object so it is only
possible to print the object for manual examination. The key aspect to notice
here is that the search on the indexed field appears as a keyed search
whereas the search on the non-indexed field appears as a filter
clause
. Essentially the non-indexed search has to examine every fact in the
fact base while the indexed search doesn’t.
Querying without indexing:
------------------------------------------------------
QuerySubPlan:
Input Signature: ()
Root path: Num
Indexes: (Num.to_idx,)
Prejoin keyed search: [ Num.to_idx == 50000 ]
Prejoin filter clauses: None
Prejoin order_by: None
Join key: None
Post join clauses: None
Post join order_by: None
------------------------------------------------------
Query with indexing:
------------------------------------------------------
QuerySubPlan:
Input Signature: ()
Root path: Num
Indexes: (Num.to_idx,)
Prejoin keyed search: None
Prejoin filter clauses: ( [ Num.not_to_idx == 50000 ] )
Prejoin order_by: None
Join key: None
Post join clauses: None
Post join order_by: None
------------------------------------------------------
A final note. As with indexing in databases, the use of indexes should be monitored carefully. The speed up in search must always be balanced the cost of constructing and maintaining the index.