Command Line Interface

You can use CLASH via the command line interface embedded into the built jarfile. Alias clashcli to java -jar $PATH_TO_CASH_JAR if you want it quicker.

Verify that clash is working by using one of these:

clashcli --help
clashcli --version

Getting CLASH’s capabilities

You can get information about the supported features of clash using a series of flags:

clashcli --supported-global-strategies
clashcli --supported-probe-order-strategies
clashcli --supported-partitioning-attribute-selection-strategies

Query Parsing

For parsing a SQL query, for example for quickly checking syntactical correctness, the query subcommand can be used. It will show the interpreted subcommand as JSON or show an error.

For example:

clashcli query -- "SELECT r.a, s.b FROM r, s WHERE r.c = s.d"

Yields:

{
  "binaryPredicates": ["r.c = s.d"],
  "unaryPredicates": [],
  "query": "SELECT r.a, s.b FROM r, s WHERE r.c = s.d",
  "baseRelations": [
    {"inner": "r"},
    {"inner": "s"}
  ],
  "baseRelationAliases": [
    {"inner": "r"},
    {"inner": "s"}
  ]
}

An erroneous query is detected:

clashcli query -- "SELECT r.a, s.b FROM r, s, r.c = s.d"

Yields this to stdout (also, catch stderr):

{
  "query": "SELECT r.a, s.b FROM r, s, r.c = s.d",
  "error": "net.sf.jsqlparser.JSQLParserException"
}

In both examples, after query two dashes -- are written. This stops the query parser from interpreting certain symbols in the following strings.

Optimization

Optimization tasks can be given via JSON documents, specifying everything needed.

{
  "query": "SELECT ... "
  "dataCharacteristics": { ... },
  "optimizationParameters": {
    "taskCapacity": 1000000,
    "availableTasks": 100,
    "globalStrategy": { "name": "Flat", ... },
    "probeOrderOptimizationStrategy": { "name": "LeastSent", ... },
    "partitioningAttributesSelectionStrategy": { "name": "Explicit", ... }
  },
}

The result will consist of two parts, once the optimization result, which is an internal representation, e.g., of a materialization tree, and a physical graph, which ultimately will be deployed as a Storm topology.

{
  "optimizationResult": { ... }
  "physicalGraphResult": { ... }
}

The json task is called via

clashcli query -- "{}"

you have to take care of proper escaping of quotes.

Running Storm

With the command

clashcli storm

you can run storm. You can start a local test server using the --local flag as follows:

clashcli --local -- $QUERY $DATACHARACTERISTICS $OUTSIDEINTERFACE

Or run it on a remote nimbus using:

clashcli --nimbus dbis-exp1 --config config.yaml -- $QUERY $DATACHARACTERISTICS $OUTSIDEINTERFACE

ATTENTION: This running storm remotely assumes some things: First, there should be a jar without storm sources next to the jar you are running. For example, if the jar is named clash-0.2.0.jar, the jar without storm sources should be named clash-0.2.0-stormCluster.jar.

The config you provide is optional, but will be used for setting up the topology. This is useful, e.g., for setting Postgres or Kafka connection options which are specific to the cluster you are running on but not the actual query.

Further you have to provide arguments $QUERY, $DATACHARACTERISTICS, and $OUTSIDEINTERFACE. $QUERY is a query string as above. $DATACHARACTERISTICS is a JSON-String containing the data characteristics. $OUTSIDEINTERFACE explains how to wire up the spouts with data sources and the result bolt with data sinks, see OutsideInterfaces.

Implementation Notes

The CLI implementation is found in the de.unikl.dbis.clash.api package. The CLI aspects are implemented using the excellent clikt package.