Datasets
A dataset is a collection of records ingested into TAL. Each record has numeric fields, categorical fields, and optional hints. TAL fingerprints the dataset deterministically on ingest — the same records always produce the same fingerprint, regardless of upload order.
Record schema
{
"id"?: string, // optional — derived deterministically if omitted
"numericFields": { [k: string]: number },
"categoricalFields": { [k: string]: string },
"hints"?: string[], // optional categorization hints
"sourceTrace"?: {
"origin": "upload" | "api" | "sample" | "sync",
"originRef"?: string, // filename / endpoint / sync source
"rowIndex"?: number
}
}Sample datasets
Six canonical demos ship with TAL. Use them to exercise the engine without bringing your own data.
18 records, 3 numeric fields (revenue, jobs, leads), 2 categorical (region, service). The canonical worked example.
Synthetic D2C orders with revenue, units, AOV, discount %, channel, region.
Monthly account metrics: MRR, seats, support tickets, churn risk score, plan tier.
Daily covers, average ticket, wait time, table turn, day-of-week, weather bucket.
Client engagement: hours billed, retainer size, project stage, NPS, vertical.
Member activity: visits, class signups, plan tier, churn flag, acquisition source.
Dataset lifecycle
- 01 Upload or POST records to /api/v2/datasets. TAL canonicalizes and fingerprints.
- 02 Attach one or more hierarchy trees. The same records can be projected through multiple trees (MHC).
- 03 Run an analysis. TAL emits findings with per-result confidence and provenance.
- 04 Re-run any audit-logged job via verify — output must be byte-identical.