Inputs

Datasets

A dataset is a collection of records ingested into TAL. Each record has numeric fields, categorical fields, and optional hints. TAL fingerprints the dataset deterministically on ingest — the same records always produce the same fingerprint, regardless of upload order.

Record schema

{
  "id"?:                string,         // optional — derived deterministically if omitted
  "numericFields":      { [k: string]: number },
  "categoricalFields":  { [k: string]: string },
  "hints"?:             string[],       // optional categorization hints
  "sourceTrace"?:       {
    "origin":           "upload" | "api" | "sample" | "sync",
    "originRef"?:       string,         // filename / endpoint / sync source
    "rowIndex"?:        number
  }
}

Sample datasets

Six canonical demos ship with TAL. Use them to exercise the engine without bringing your own data.

landscaping18 rows

18 records, 3 numeric fields (revenue, jobs, leads), 2 categorical (region, service). The canonical worked example.

3 numeric2 categorical

ecommerce1,024 rows

Synthetic D2C orders with revenue, units, AOV, discount %, channel, region.

6 numeric3 categorical

saas512 rows

Monthly account metrics: MRR, seats, support tickets, churn risk score, plan tier.

5 numeric2 categorical

restaurant365 rows

Daily covers, average ticket, wait time, table turn, day-of-week, weather bucket.

4 numeric3 categorical

agency256 rows

Client engagement: hours billed, retainer size, project stage, NPS, vertical.

4 numeric2 categorical

fitness480 rows

Member activity: visits, class signups, plan tier, churn flag, acquisition source.

4 numeric3 categorical

Dataset lifecycle

01 Upload or POST records to /api/v2/datasets. TAL canonicalizes and fingerprints.
02 Attach one or more hierarchy trees. The same records can be projected through multiple trees (MHC).
03 Run an analysis. TAL emits findings with per-result confidence and provenance.
04 Re-run any audit-logged job via verify — output must be byte-identical.