THE SMALL DATA COMPANY.AI · 2026

Agentic
Data™

Automate Data / DWH / ETL engineering with agents that work side-by-side with your team - then graduate to full automation.

On-prem lakehouse Cloud-native Open standards Kubernetes-ready
SCROLL

Data engineering is still manual, brittle, and expensive - because critical work is trapped in scripts, legacy ETL tools, and tribal knowledge.

The winners will be those who can integrate, migrate, and operate faster - safely.


Platform at a glance

A modern lakehouse experience - on your infrastructure.

🏢

On-prem first

Deploy behind your firewall. Keep data where it lives.

🏗️

Lakehouse storage

S3-compatible object storage + open table format for ACID transactions and time travel.

☁️

Cloud-native

Kubernetes-ready architecture. Portable across on-prem and private cloud.

Distributed compute

Distributed computation engine for SQL, batch, and streaming.

🤖

Agentic automation

Agents work side-by-side with engineers - then automate with guardrails.

📚

Catalog

Discovery, governance & metadata management in one place.


Why we exist

Vision

Automate Data Engineers and DWH/ETL Engineers with agentic automation.

Mission

Build agents that take over repetitive data engineering work - replication, lineage extraction, job analysis, source DB analysis, migrations, and operational support.

Approach

Start side-by-side (copilot mode) to earn trust. Then graduate to automation with strong guardrails, approvals, and continuous data-level verification.


Agents that work where data work happens

From assisted execution to autonomous operations.

1

Side-by-side

Agent suggests plans, SQL, mappings, and tests. Human reviews & runs. Perfect for migrations and complex pipelines.

2

Automation

Agent executes approved workflows end-to-end: replication, schema drift handling, lineage extraction, and job refactoring.

3

Continuous improvement

Every run creates feedback: quality metrics, incidents, test results. Agents learn what "good" looks like in your environment.


Use-cases

Automate the work across the full data lifecycle.

Replication & ingestion

Sources → Lakehouse
  • CDC / batch ingestion
  • Schema evolution
  • Data products → data contracts

Lineage & observability

Understand and control
  • End-to-end lineage
  • Job & dependency analysis
  • Cost / performance insights

Migration & modernization

Move off legacy ETL
  • IBM DataStage
  • Oracle ODI / OIC
  • SQL / PL/SQL scripts

All use-cases run on shared, secure, governed infrastructure - on-prem or in your VPC.


How it works

Unified agentic data infrastructure

Agent Solutions

Replication · Lineage · Migration · Operations

Agent Creation & Management

Agent Studio · Tools & Skills · Policies · Approvals

Testing & Verification

Automated tests · Data quality checks · Reconciliation · Regression suites

Infrastructure

Catalog · Distributed Compute · S3 Storage + Open Table Format · Connectors

Runs on

Kubernetes · On-prem or private cloud

Automated testing & verification

Fast, repeatable checks that prove an agent's output is correct.

Schema & drift detection - expected columns, types, constraints
Data contracts - freshness, volume, null / unique rules
Reconciliation - row-counts, checksums, key-level diffs
Semantic tests - reference integrity, business rules
Performance - runtime budgets, partitioning, file sizes
Regression - golden datasets + snapshot comparisons

Result: faster trust-building and safer automation.


Verification loop

1

Agent generates / changes pipeline

2

Run on sample or staging data

3

Execute automated test suite

4

Produce report + diffs

5

Approve → deploy → monitor


Agent Studio & portfolio

Everything you need to ship reliable agents in production.

🛠️

Agent Studio

Build & test agents using natural language.

🔒

Governance

Policies, approvals, and guardrails.

Automated tests

Data-level verification & regression suites.

🔌

Connectors

Integrate with tools & systems via APIs.

🧩

Tools & Skills

Reusable skills for SQL, compute, DQ, and lineage.

📊

Observability

Dashboards, alerts, and run history.


Migration accelerators

From legacy ETL to modern lakehouse pipelines.

Supported inputs

  • IBM DataStage
  • Oracle ODI
  • Oracle Integration Cloud (OIC)
  • SQL / PL/SQL scripts
  • Custom schedulers / job definitions

Outputs

  • Compute jobs / SQL pipelines
  • Open table format & medallion-style layers
  • Lineage graphs + documentation
  • Automated test suites for regression
Ingest definitions (jobs, mappings, scripts)
Parse & normalize (control flow + SQL)
Map to target patterns (distributed compute / open table format)
Generate code + tests (data-level)
Validate + iterate (with humans)

On-prem lakehouse

Open components, cloud-native deployment.

Catalog

Metadata & discovery
  • Datasets & schemas
  • Governance hooks
  • Lineage and documentation

Compute: Distributed Engine

Distributed processing
  • SQL / batch / streaming
  • Works on clusters
  • Integrates with open table format

Storage: S3-Compatible + Open Table Format

S3-compatible object storage + open table standard
  • ACID & time travel
  • Efficient files & partitions

Security & governance

Enterprise control without sacrificing speed.

Principles

  • Data stays in your environment (on-prem / private cloud)
  • Least-privilege access for agents and tools
  • Human-in-the-loop approvals for risky actions
  • Full audit trail: prompts, tools, runs, and outputs
  • Deterministic verification via automated test suites

Operational Controls

  • Versioned agents + change management
  • Environment separation (dev / sit / uat / prod)
  • Observability: logs, metrics, and alerts
  • Safe rollback: previous pipeline versions
  • Policy-based guardrails

Let's automate data engineering -
safely.

Ready to see the platform in action?

Get in touch →