Overview
Manufacturing accounts for roughly 15% of global GDP, yet industrial AI
remains confined to single-machine, bespoke deployments. Foundation models
have transformed vision and language by pretraining on large, structurally
coherent corpora, but no analogous substrate exists for industrial
time-series. The gap is not merely volume: existing anomaly detection and
forecasting datasets record sensor outcomes without separating
commanded intent from measured response. For actuated
systems, learning transferable dynamics requires observing the full control
loop, from target trajectory through actuation effort to the resulting
physical state.
FactoryNet introduces the first universal pretraining corpus for industrial
time-series. It unifies novel laboratory recordings on UR3 and
KUKA KR10 industrial arms with standardized adaptations of
voraus-AD, AURSAD, and
UMich CNC, alongside a parallel synthetic track from NVIDIA
Isaac Sim. Every signal is mapped into the Setpoint–Effort–Feedback–Context
(S-E-F-C) schema, a control-theoretic decomposition grounded in
IEC 81346 that enables cross-embodiment learning across arbitrary
actuated systems.
TL;DR. With just 24 schema-aligned signals, a small MLP
reaches 83.2% mean AUROC on voraus-AD, matching baselines
trained on all 130 channels. A 105k-parameter TCN-Transformer trained on one
cobot zero-shot transfers to a previously unseen UR3e screwdriving robot,
beating every learned baseline and a strong kinematic prior under bias-aware
metrics.
The S-E-F-C taxonomy
Most existing benchmarks log raw sensor streams, tangling the controller's
target variables with the machine's actual physical execution. FactoryNet
maps over 300 heterogeneous signals into four standardized roles. The
taxonomy lets a single dataloader work across a 6-DOF rotational arm and a
4-axis CNC gantry: they expose the same four roles, only with different axis
counts and units.
S · Setpoint
Commanded intent
Target joint positions, velocities, accelerations, and torques: what
the controller asked the machine to do.
E · Effort
Actuation energy
Motor currents, torques, and electrical power expended by the
drives: what it cost the machine to act.
F · Feedback
Measured outcome
Actual joint positions, velocities, and TCP forces sensed by the
machine: what the physics produced.
C · Context
Environmental biases
Payload masses, temperatures, mode flags, and boundary conditions:
the static variables that shape dynamics.
Corpus composition
FactoryNet v1.0 spans three pillars: real-world laboratory recordings,
standardized open-source adaptations, and a synthetic generation pipeline.
Of the 9,114 lab episodes, approximately 40% are healthy and 60% contain
injected faults across 27 anomaly types spanning Pick & Place,
Screwdriving, and Peg-in-Hole.
| Source |
Embodiment |
Tasks |
Episodes |
Datapoints |
| Forgis Lab (real) |
UR3 |
P&P · Screw · Peg |
7,141 |
18M |
| Forgis Lab (real) |
KUKA KR10 |
Pick & Place |
1,973 |
4M |
| voraus-AD (real) |
Yu-Cobot |
Pick & Place |
2,122 |
16M |
| AURSAD (real) |
UR3e |
Screwdriving |
2,045 |
3M |
| UMich CNC (real) |
CNC gantry |
Machining |
18 |
18K |
| Isaac Sim (synthetic) |
UR5 |
Pick & Place |
9,799 |
10M |
Synthetic track & sim-to-real
The synthetic pipeline procedurally generates UR5 Pick & Place episodes
in NVIDIA Isaac Sim with domain randomization across payload mass
(0.10–0.30 kg), surface friction (0.30–0.50), controller gains, sensor
noise, and task geometry. Each episode comes with aligned S-E-F-C metadata
and matched healthy twins for controlled fault-deviation analysis. Batch
sim-to-real validation across 1,155 paired episodes reports a median joint
RMSE of
2.83° and a TCP position RMSE of
13.26 mm, with residual TCP rotation spread attributed
to gripper-geometry mismatch between the Isaac (Robotiq 2FG85) and lab
(OnRobot 2FG14) setups.
Headline results
Anomaly detection on voraus-AD
Trained on healthy episodes only, an S-E-F-C MLP regresses motor torque from
18 setpoint signals and uses per-episode prediction error as the anomaly
score. On just
24 schema-aligned signals, it matches or beats every
full-channel (130-signal) baseline except the strongest unstructured
methods, with notable wins on mechanically distinctive faults
(miscommutation 99.2, additional axis weight 95.8).
| Method |
Channels |
Mean AUROC |
| 1-NN |
130 |
77.5 |
| GANF |
130 |
79.9 |
| PCA |
130 |
80.0 |
| S-E-F-C MLP (ours) |
24 |
83.2 |
| CAE |
130 |
85.2 |
| LSTM-VAE |
130 |
86.7 |
| MVT-Flow |
130 |
93.6 |
Zero-shot cross-embodiment transfer
A 105k-parameter TCN-Transformer trained solely on voraus-AD (Yu-Cobot) Pick
& Place is evaluated on 1,433 AURSAD UR3e
Screwdriving episodes: a different robot doing a different task.
Under the mean-centered MAE metric, which isolates dynamic forces from
static payload biases, the structured S-E-F-C model is the only learned
baseline that beats a strong non-learned kinematic prior.
| Model |
MC-MAE ↓ |
95% CI |
| Linear |
0.928 |
±0.023 |
| Flat MLP |
0.792 |
±0.019 |
| TCN |
0.770 |
±0.017 |
| Kinematic baseline |
0.373 |
±0.005 |
| TCN-Transformer (ours) |
0.339 |
±0.006 |
Getting started
Coming soon. Public dataset release and dataloader are in
preparation. The Hugging Face dataset card, S-E-F-C Parquet files, and
framework-native loaders will land alongside the v1.0 tag at
huggingface.co/datasets/factorynet/factorynet.
Citation
@inproceedings{factorynet2026,
title = {FactoryNet: A Large-Scale Dataset toward Industrial
Time-Series Foundation Models},
author = {Othman, Karim and Petersen, Jonas and
Ignuta-Ciuncanu, Matei and Maggioni, Riccardo and
Mazzoleni, Camilla and Martelli, Federico and
Petersen, Philipp},
year = {2026}
}