THE SMALL DATA COMPANY.AI · 2026

Agentic
Data™

Automate Data / DWH / ETL engineering with agents that work side-by-side with your team - then graduate to full automation.

On-prem lakehouse Cloud-native Open standards Kubernetes-ready

SCROLL

Data engineering is still manual, brittle, and expensive - because critical work is trapped in scripts, legacy ETL tools, and tribal knowledge.

The winners will be those who can integrate, migrate, and operate faster - safely.

Platform at a glance

A modern lakehouse experience - on your infrastructure.

🏢

On-prem first

Deploy behind your firewall. Keep data where it lives.

🏗️

Lakehouse storage

S3-compatible object storage + open table format for ACID transactions and time travel.

☁️

Cloud-native

Kubernetes-ready architecture. Portable across on-prem and private cloud.

⚡

Distributed compute

Distributed computation engine for SQL, batch, and streaming.

🤖

Agentic automation

Agents work side-by-side with engineers - then automate with guardrails.

📚

Catalog

Discovery, governance & metadata management in one place.

Why we exist

Vision

Automate Data Engineers and DWH/ETL Engineers with agentic automation.

Mission

Build agents that take over repetitive data engineering work - replication, lineage extraction, job analysis, source DB analysis, migrations, and operational support.

Approach

Start side-by-side (copilot mode) to earn trust. Then graduate to automation with strong guardrails, approvals, and continuous data-level verification.

Agents that work where data work happens

From assisted execution to autonomous operations.

Side-by-side

Agent suggests plans, SQL, mappings, and tests. Human reviews & runs. Perfect for migrations and complex pipelines.

→

Automation

Agent executes approved workflows end-to-end: replication, schema drift handling, lineage extraction, and job refactoring.

→

Continuous improvement

Every run creates feedback: quality metrics, incidents, test results. Agents learn what "good" looks like in your environment.

Use-cases

Automate the work across the full data lifecycle.

Replication & ingestion

Sources → Lakehouse

CDC / batch ingestion
Schema evolution
Data products → data contracts

Lineage & observability

Understand and control

End-to-end lineage
Job & dependency analysis
Cost / performance insights

Migration & modernization

Move off legacy ETL

IBM DataStage
Oracle ODI / OIC
SQL / PL/SQL scripts

All use-cases run on shared, secure, governed infrastructure - on-prem or in your VPC.

How it works

Unified agentic data infrastructure

Agent Solutions

Replication · Lineage · Migration · Operations

Agent Creation & Management

Agent Studio · Tools & Skills · Policies · Approvals

Testing & Verification

Automated tests · Data quality checks · Reconciliation · Regression suites

Infrastructure

Catalog · Distributed Compute · S3 Storage + Open Table Format · Connectors

Runs on

Kubernetes · On-prem or private cloud

Automated testing & verification

Fast, repeatable checks that prove an agent's output is correct.

Schema & drift detection - expected columns, types, constraints

Data contracts - freshness, volume, null / unique rules

Reconciliation - row-counts, checksums, key-level diffs

Semantic tests - reference integrity, business rules

Performance - runtime budgets, partitioning, file sizes

Regression - golden datasets + snapshot comparisons

Result: faster trust-building and safer automation.

Verification loop

Agent generates / changes pipeline

→

Run on sample or staging data

→

Execute automated test suite

→

Produce report + diffs

→

Approve → deploy → monitor

Agent Studio & portfolio

Everything you need to ship reliable agents in production.

🛠️

Agent Studio

Build & test agents using natural language.

🔒

Governance

Policies, approvals, and guardrails.

✅

Automated tests

Data-level verification & regression suites.

🔌

Connectors

Integrate with tools & systems via APIs.

🧩

Tools & Skills

Reusable skills for SQL, compute, DQ, and lineage.

📊

Observability

Dashboards, alerts, and run history.

Migration accelerators

From legacy ETL to modern lakehouse pipelines.

Supported inputs

IBM DataStage
Oracle ODI
Oracle Integration Cloud (OIC)
SQL / PL/SQL scripts
Custom schedulers / job definitions

Outputs

Compute jobs / SQL pipelines
Open table format & medallion-style layers
Lineage graphs + documentation
Automated test suites for regression

Ingest definitions (jobs, mappings, scripts)

Parse & normalize (control flow + SQL)

Map to target patterns (distributed compute / open table format)

Generate code + tests (data-level)

Validate + iterate (with humans)

On-prem lakehouse

Open components, cloud-native deployment.

Catalog

Metadata & discovery

Datasets & schemas
Governance hooks
Lineage and documentation

Compute: Distributed Engine

Distributed processing

SQL / batch / streaming
Works on clusters
Integrates with open table format

Storage: S3-Compatible + Open Table Format

S3-compatible object storage + open table standard

ACID & time travel
Efficient files & partitions

Security & governance

Enterprise control without sacrificing speed.

Principles

Data stays in your environment (on-prem / private cloud)
Least-privilege access for agents and tools
Human-in-the-loop approvals for risky actions
Full audit trail: prompts, tools, runs, and outputs
Deterministic verification via automated test suites

Operational Controls

Versioned agents + change management
Environment separation (dev / sit / uat / prod)
Observability: logs, metrics, and alerts
Safe rollback: previous pipeline versions
Policy-based guardrails

AgenticData™

Platform at a glance

On-prem first

Lakehouse storage

Cloud-native

Distributed compute

Agentic automation

Catalog

Why we exist

Vision

Mission

Approach

Agents that work where data work happens

Side-by-side

Automation

Continuous improvement

Use-cases

Replication & ingestion

Lineage & observability

Migration & modernization

How it works

Agent Solutions

Agent Creation & Management

Testing & Verification

Infrastructure

Runs on

Automated testing & verification

Verification loop

Agent Studio & portfolio

Agent Studio

Governance

Automated tests

Connectors

Tools & Skills

Observability

Migration accelerators

Supported inputs

Outputs

On-prem lakehouse

Catalog

Compute: Distributed Engine

Storage: S3-Compatible + Open Table Format

Security & governance

Principles

Operational Controls

Let's automate data engineering -safely.

Agentic
Data™

Let's automate data engineering -
safely.