User Tools

Site Tools


development

Pattern development

The patterns are defined with SQL in XML. Patterns may use macro variables, which always start with @. Following macro variables are automatically replaced with (in the presented order):

  1. @base: is replaced with “@baseId, @baseDate, @baseTarget” if @baseDate is defined, otherwise it is replaced with “@baseId, @baseTarget”. Each pattern should return @base and @columnName.
  2. @basePartitionBy: is replaced with “@baseId, @baseDate” if @baseDate is defined, otherwise it is replaced with “@baseId”. This substitution should be used in partition by clauses instead of @base to make the query faster (approximately by 10%).
  3. @baseId: column(s) that is/are defined as Target Id (composite Target Id).
  4. @baseDate: a single column that is defined as Target Date.
  5. @baseTarget: column(s) that is/are defined as Target Column (multiple independent targets are permissible).
  6. @numericalColumn: a name of some numerical attribute in @propagatedTable.
  7. @nominalColumn: a name of some categorical attribute in @propagatedTable.
  8. @temporalColumn: a name of some temporal attribute in @propagatedTable.
  9. @propagatedTable: a single table name.
  10. @columnName: autogenerated name of the predictor.
  11. @targetName: a single column from @baseTarget. Intended for Weight of Evidence.
  12. @targetValue: a value of @targetName column. Intended for Weight of Evidence.

Characters to escape in XML documents

There are only five:

"
' '
< &lt;
> &gt;
& &amp;

Naming convention for patterns

  • bare name, if possible, to keep the names short
  • use “aggregate_” prefix, if necessary, to differentiate the pattern from a “direct” pattern
  • use “aggregate_time_” prefix, if necessary, to differentiate the pattern from a “direct” and “aggregate” patterns

General instructions for pattern developers

  1. Do not escape entities - they are escaped automatically with the escape symbol used by the database.
  2. Do not add database names and schemas - they are added automatically to make the pattern transferable between databases.
  3. Do not terminate the statement with a semicolon as termination is handled by the database driver.
  4. The pattern has to consist of a single query. Multiple query patterns are not supported.
  5. The pattern should avoid unnecessary joins and sorting.
  6. If you need to join tables on @basePartitionBy, use “using(@basePartitionBy)” instead of writing “on t1.@baseId=t2.@baseId and t1.@baseDate=t2.@baseDate” to make the pattern work even if @baseDate is not defined. If the database does not support “using” clause, “using” clause is automatically translated to “on” clause.
  7. The select should return four columns: @baseId, @baseDate, @baseTarget and the predictor named @columnName @baseTarget is required for calculation of predictor's predictive power. @baseID and @baseDate are required for join with the base table.
  8. Grouping should be performed on {@baseId, @baseDate} level to support multiple target events per @baseId.
  9. It is permissible to use ODBC escape sequences (e.g. {fn log(x)} calculates natural logarithm).
  10. It is not necessary to add time condition - records are already time filtered during base propagation.

Page Tools