User Tools

Site Tools


configuration:database

database.xml

Defines the properties of propositionalization.

Parameters

  • name: Unique name by which the configuration can be called from the command line.
  • description: Human readable text note. Ignored by Predictor Factory.
  • inputSchema: Comma delimited list of the schemas with the data.
  • outputSchema: Schema to store the table with the features. Default: the first inputSchema.
  • targetSchema: The schema with the target table. Default: the first inputSchema.
  • targetTable: The name of the table with the label.
  • targetId: Comma delimited list of the columns that identify rows in the target table.
  • targetColumn: Comma delimited list of the columns to predict.
  • targetDate: The name of the column with the time of the desired prediction.
  • unit: The temporal unit used for temporal constraint. Default: “year”.
  • lag: Length of history that is used for creation of the patterns. Default: 100.
  • lead: Blackout length. Default: 0.
  • task: Classification or regression. Default: “classification”.
  • sampleCount: How many rows to use from the target table per each unique value. Default: 2 147 483 647.
  • predictorMax: The maximal count of the returned features. Default: the estimated limit of the maximal count of columns in a table for the database minus the column count for the used {targetId, targetColumn, targetTimestamp}.
  • secondMax: Timeout on the feature calculation. If missing, no timeout is applied.
  • useIdAttributes: Whether to use primary and foreign keys as an input for the features. Default: true
  • useTwoStages: Whether to first explore the data with a small sample count before running on all records in the target table. Default: true.
  • ignoreDatabaseForeignConstraints: If the definition of the foreign key constraints in the database is spotty, it can be better to completely ignore the FKCs from the database and use FKC definitions from DDL or XML file exclusively. Default: false.
  • whiteListTable: If defined, use only the listed tables as the input. Otherwise use all the tables in the inputSchema.
  • blackListTable: If defined, do not use the listed tables. Has priority over whiteListTable.
  • whiteListColumn: If defined, use only the listed columns as the input. The notation is “table.column”. Otherwise use all the columns in the used tables.
  • blackListColumn: If defined, do not use the listed columns. Has priority over whiteListColumn.
  • whiteListPattern: If defined, use only the listed patterns. Otherwise use all patterns.
  • blackListPattern: If defined, do not use the listed patterns. Has priority over whiteListPattern.

Example

<databases>
    <database name="GUI" 
              inputSchema="financial" 
              outputSchema="predictor_factory"
              targetTable="loan" 
              targetId="account_id" 
              targetDate="date" 
              targetColumn="status" 
              task="classification"/>
</databases>