Queries and Relations
Queries in Clash are declarative. The user is in principle only allowed to explain what they want to have and Clash decides how this result is produced.
Syntax
The syntax of queries is a subset of SQL. For example, these two are Clash queries:
SELECT s.name, s.address
FROM students s
WHERE s.semester > 10
SELECT a.foo
FROM topic a('sliding', '4', 'hours')
We are currently restricted to simple select-from-where-queries without nested subqueries or aggregations.
Definitions
In this project we (try to) consistently use the terms explained in this section. Consider the following query:
SELECT a.x, b.y
FROM alice a, bob b
WHERE a.q = 1 AND a.r = b.s
- InputName
- The name of a Input (relation). In the example,
alice
andbob
are inputs. The system has to know how to get real data from these inputs, which is an orthogonal task to formulating a query. - Alias
- the name of inputs used downstream in the queries. In this case,
a
andb
. If no alias is specified, the InputName is used as alias; so internally an alias is always defined. - Attribute
- Attributes of the relations, e.g.
x
,y
,q
,r
ands
. In order to avoid naming clashes, it is forbidden to use attributes without a qualifying alias. - AttributeAccess
- The pair of alias and attribute, e.g.
('a', 'q')
in the predicatea.q = 1
. - FilterPredicate
- A predicate with a single input, e.g.
a.q = 1
. These can be evaluated at ingestion of the inputs or maybe even pushed down into the data sources. - JoinPredicate
- A predicate with two inputs, e.g.
a.r = b.s
, here tuples from both relations,a
andb
are needed in order to evaluate the predicate
Semantics and Relations
We interpret a (streamed) relation as a specification of tuples. This specification describes:
- inputs
- where do input tuples come from? (real inputs or other relations) This includes window definitions.
- filters
- which filterPredicates are applied to the inputs?
- joinPredicates
- how are multiple inputs joined together?
- aggregations
- defines how aggregated values are derived
- projections
- list of attribute accesses that should be included in the output
- alias
- a name for the generated relation
With such a query, a Relation is defined. The description of a relation comprises of the inputs, and the predicates (unary and binary). The tuples in this relation adhere the predicates.
In general, we interpret a relation as a specification of tuples. Data in relation might stem from various sources and is arranged in tuples. A tuple consists of attributes which satisfy certain properties.
Relations can also be defined in a programmatic way. Consider the following:
r1 = relationOf("R")
r2 = r.where("R.x = 1")
r3 = r.where("R.y = 2")
Here subsequently three relations are created: r1
, r2
and r3
. r1
is a BaseRelation with the base name “R”. r2
is a derived relation with the unary predicate forcing the attribute “R.x” to be 1. r3
is another derived relation with two unary predicates, one carried over from r2
, and the new one demanding “R.y = 2”.
With a join this works as follows:
r1 = relationOf("R")
r2 = r.where("R.x = 1")
s1 = relationOf("S")
rs1 = r.join("S")
rs2 = r.where("R.z = S.z")
The meaning of r1
, r2
, and s1
is like in the previous example. rs1
is now a relation with two base names, “R” and “S”, and the unary predicate “R.x = 1”. rs2
then also has two base names and the unary predicate as well as a binary predicate which describes the join “R.z = S.z”.
Schemas and Tuples of Relations
Tuples of relations need to adhere to a (potentially implicit) schema. Consider the query:
SELECT s.name, s.address
FROM students s
WHERE s.semester > 10
Here, the implicit schema for students
contains attributes name
, address
, and semester
. And for
SELECT *
FROM orders o
WHERE o.state = 'DELIVERED'
the schema for orders
consists of attribute state
as well as an unknown number of further attributes captured by the *
wildcard.
Join queries like the following show the need for qualification of attribute names:
SELECT s.name, o.name
FROM supplier s, orders o
WHERE s.suppkey = o.suppkey
For ease of programming (and enforcing good style from the user ;)) in CLASH all attribute accesses in select and from clauses need to be qualified with the base relation or the alias of the base relation.
Windows
As shortly mentioned above, Clash allows the specification of windows. A window marks for an input how long tuples are valid and thus should be considered potential join partners. For the user this is a way to specify timeliness of relevant tuples. But more important, this is a method to make the infinite stream of data a finite bag of tuples.
For now, we only support windows for input relations.
Connection to the Outside World
TODO.