Digital Twin Laboratory #
Architecture Documentation #
a digital twin runtime framework
DTLab’s architecture documentation - follow the github link in the blue box above to get the code
STATUS: UNDER CONSTRUCTION - to run the software requires patience with the documentation and initial API ergonomics.
DTLab is an actor-oriented distributed computing framework for hosting DTs (Digital Twins).
DTLab computes near-realtime insights into the state of complex systems.
Suggested applications are Internet of Things (IOT), Augmented Reality (AR), Logistics, streaming analytics - any problem requiring near-realtime continuous recalculation of actionable states at scale. DTLab is especially suited to modeling systems of participants that combine and advance over time in otherwise difficult to predict ways.
What is a DT? #
Digital twins are software analogs for devices, people, processes, or collections of digital twins. The term “digital twin” is used to distinguish the DT from other software modeling. DTs emphasize individually programmable software objects, each of whose state is recalculated continuously as observations arrive in a low latency unbounded stream from the DT’s counterpart.
Candidate use cases:
- Machine DTs might monitor a machine’s engine temperature
- Supply chain replenishment DTs may monitor retail sales transaction completions
- Energy Efficiency DTs monitor doors that stay open too long
- Security DTs fire alerts if motion detectors are triggered
- A hemisphere evacuation alerting system might act on an approaching asteroid’s DT’s current speed and distance values - our first demo app uses NASA’s Near Earth Orbit API :)
In DTLab, Each DT:
- Computes its own state
- Receives continuous input from counterparts
- Is independently addressable - ask any DT about its state any time
DTLab Dependencies #
- A database supported by Akka Persistence. We currently develop with Postgres DB - specifically Digital Ocean’s managed Postres offering but should work with any post-9.6 Postgres.
- Java and a Java Virtual Machine - development is currently done on Java 11 and 13
DTLab Quick Start (via Docker) #
- See dependencies above
TODO TODO TODO
DTLab Quick Start from Code #
- See dependencies above
git clone firstname.lastname@example.org:DTLaboratory/dtlab-scala-alligator.git cd dtlab-scala-alligator sbt assembly # to create a superjar export POSTGRES_HOST="<YOUR PGSQL HOST>" export POSTGRES_PORT=<DB PORT> export POSTGRES_DB="<DB NAME>" export POSTGRES_USER="<DB USER>" export POSTGRES_PASSWORD="<DB PWD>" sbt run # now you can interact with the unsecured on port 8080 - see examples dir for REST API usage # see https://home.dtlaboratory.com/dtlab-api-docs for Open API docs
DTLab Fully Functioning Cloud Deployment #
TODO TODO TODO
DTLab Architecture #
Introduction and Goals #
The DTLab framework enables a user to instantiate a system of digital twins in the public cloud or on-prem cluster of computers.
The project goal is that a useful system can be instantiated from configured DTLab components with complete security and integration features. It is also the goal of DTLab to support configuration entirely in a declarative deployment-time style via REST-like API - no first class programming language coding should be required to program useful DTs.
- MIT/BSD/Apache2 licensed dependencies (100% Open Source)
- No reliance on VM abstractions (100% containerized)
- Can run on a system-on-a-board (1g RPi)
- Completely programmable via API (no config files)
Context and Scope #
The DTLab system is operated as a utility and service.
In its initial releases, DTLab can support DTs for sources that emit telemetry in JSON format. Other data formats will be added as they are requested.
External automation can interact with the DTs via the DTLab HTTP API. An external system will be able to register a webhook with a DTLab cluster and listen for all assessments calculated by all DTs as each DT state advances.
DTLab may be operated by an organization for its own purposes or by a service provider for its customers - perhaps as a SAAS.
If operated as a SAAS, the operator would need to provide a front-end to its customers that supported multi-tenancy. No changes to the DTLab base code would be required to support multi-tenancy but the API calls to operate the system on behalf of the SAAS users should be sharded across clusters with tenant ID as the primary component of the shard key.
Solution Strategy #
The DTLab implementation values actor programming, asynchronous messaging, and persistence via event sourcing.
The system is developed with modern cloud infrastructure-as-code tools and practices in mind. A new deployment can be instantiated in the cloud or via an IOT solution pushing firmware/appware to a smart edge device
Input from the DT’s analogs must arrive in standard marshaling syntaxes (JSON, etc…) for out-of-the-box integration. Output is marshaled in JSON and is available as an unbounded stream so that it can be processed by modern analytics tools like OpenTSDB, InfluxdB, Apache Spark, Grafana, Elastic Search, or accumulated in cloud Blob services like Azure Storage or AWS S3, etc…
A foundational idea in DTLab is that information from outside the DTLab
runtime is normalized into multiple time series before any DT processing. The
arriving data is transformed into
name,datetime,value(double) records before
it is seen by a DT. The only context surviving data after it arrives is the
name can be overloaded at ingest time via enrichment
to provide more context to processing but the limitation of context to
datetime is a hard and fast rule.
NO BREAKING API CHANGES #
Each repository name contains the service name, the computer language implementation, and the version name. The initial version names are animals in alphabetic order.
The redundancy of embedding the version in the component name instead of just advancing the semantic numeral major value is due to the propensity of engineers to introduce breaking changes to APIs. We want incompatible code to be obvious, not subtle. It is never OK to break backwards compatibility in DT Lab.
See Hickey on semantic versioning, “If it is not backward compatible, rename it.”
We make an explicit commitment to the user. A new backwards-incompatible implementation will live in a new repository of a different name without the overhead and false hopes of a git fork merge and semantic version number advance. The successor to the alligator version that has a backwards-incompatible change is the badger version and no user should expect the badger version to work with code designed for the alligator version. However, we promise that all future releases of alligator will support all software that has ever worked with earlier releases of alligator. The badger version may not be a full feature set of alligator and may not even be of the quality of alligator and alligator could back-port badger features or keep implementing new features as long is it does not break existing APIs. The new name merely indicates incompatibility.
<projectName>-<langName>-<versionName> example 1: dtlab-scala-alligator example 2: dtlab-rust-alligator example 3: dtlab-rust-badger example 4: dtlab-ingest-rust-badger
Building Block View #
The primary building blocks are the digital twin and an incoming unbounded stream of observations. The state of the digital twins can be processed by traditional off-the-shelf data analytics tools.
Runtime View #
The system tends to flow from left to right with observations starting at the left and DT state shared to standard tools at the right.
Data in the above deployment is read from a remote MQTT server by a container instantiated from the DTLab MQTT Ingest Service.
The Ingest MQTT client then posts the incoming raw telemetry to the DTLab Ingest Service via HTTP.
The Ingest Service then transforms the incoming data into a list of
name,datetime,value(double) observations that it then posts to the DTLab
runtime where the DTs
will be updated in accordance with the
name calculated/enriched by the ingest
Deployment View #
A normal deployment would be managed Kubernetes from a cloud provider paired with a managed database. However, all the DTLab Docker images make no Kubernetes or PAAS-database assumption and the system can be run in any Docker-enabled environment.
Crosscutting Concepts #
- Actor programming - We value actor programming not so much as a means of parallelism but as a way to build a system we can then reason about. Search Youtube for a great interview of Alan Kay by Joe Armstrong to learn more of this perspective.
- Event Sourcing - in support of the goal of * “Explainable AI” * Audits * Replay features * “What if” and back-test scenarios using the prototype-based DT clone feature
- 100% Config by HTTP API
Architectural Decisions #
Important, expensive, critical, large scale or risky architecture decisions including rationales.
- Scala and Akka
- The first implementation is written in Scala.
- Actor programming and pattern matching are leveraged throughout the code - these two features are harder to use in other languages. The choices were between Scala and Erlang/Elixir to get these features and Erlang/Elixir has not been made container-friendly yet. Scala is a bit more accessible than Erlang/Elixir because of the popularity of the JVM it runs on on integration with Java.
- Risk is the Scala community - ugh.
- Pushing queue processing to the outer edges of the system and using webhooks and HTTP as the main composability approach. Composing with webhooks can be optimized via sidecar containers.
- A risk is HTTP overhead - future implementations will probably want gRPC or similar intra-system binary APIs.
- Event Sourcing
- No complex database models improperly designed - fewer scaling problems due to poor partitioning and clustering.
- Rewind and recalculation (back-testing) features are enabled with event sourcing.
- Prototype-based programming of DTs is enabled via event sourcing.
- DT Singletons
- Containerization w/ sharding and actor resurrection - all DTs must advance their state and respond to queries w/o issue with the system at n-1 containers.
Quality Requirements #
- Horizontally Scalable
- Process back-testing of 1 million devices 24X faster than the live system collects the data.
- Actor state query API must remain responsive under all ingest loads.
- Fault Tolerant
- Survive chaos agent testing.
- All DTs must advance their state and respond to queries w/o issue with the system at n-1 containers.
NO ONE IS USING THIS SOFTWARE FOR REAL WORK TO OUR KNOWLEDGE
Project Status #
Pull requests, feedback, and collaboration welcome.
We believe DTLab is worth studying, implementing, and refining in other programming languages and platforms - we intend to create Python, Rust, and Erlang/Elixir implementations, time permitting.
- Github - all the code is Open Source
- CICD is managed by Github Actions
- Code quality is monitored by Codacy
- SBT dependency updates are managed by the Scala Steward bot PRs
- Docker images are at Dockerhub (for now)
- UI - will host a react.js UI for talking to DTs - currently only demo for Auth0 SSO
- Notebooks (Jupyterhub) - contact Navicore to get your github ID whitelisted
- DTLab API Docs - OpenAPI 3.0
- DTLab API sandbox endpoint - https://sandbox.somind.tech/dtlab-alligator/(type/actor)
- DTLab Ingest API Docs - OpenAPI 3.0
- DTLab Ingest API sandbox endpoint - https://sandbox.somind.tech/dtlab-alligator/extractor/(specId)
- Security is Implemented by Auth0 - contact Navicore for access
- Project Kanban with backlog and help wanted tags is here
- This page is generated from the
gh-pagesbranch of the DTLaboratory.github.io
- The system is currently run on Digital Ocean managed Kubernetes.
- TLS is implemented with Lets Encrypt.
- DT event sourcing is persisted to Digital Ocean managed Postgres.
- IOT Device Management and Connectivity via Cloud PAAS
- AWS IOT Core is deployed with webhook forwarding to the DTLab sandbox Ingest Service (contact Navicore for access)
- MQTT (TBD)
- Azure IotHub (TBD)
Support or Contact #
Want more information or to get involved? Open an issue here and say hello or DM @navicore on Twitter.