Managing petabytes of data is already a challenge for most organizations. But what if you needed to store and process exabytes? That’s the kind of scale addressed by TernFS, a distributed filesystem recently released as open source by XTX Markets, a high-frequency trading firm moving over $250 billion a day.
After running TernFS in production for three years without losing a single byte, XTX has now shared the technology with the community. This could become a game-changer for big data, AI, and machine learning projects worldwide.
Why Another Filesystem?
Traditional storage solutions like NFS or even advanced options such as ZFS hit their limits when faced with hundreds of petabytes and workloads driven by modern machine learning.
XTX Markets reached that point when their datasets exceeded 650 PB. Instead of trying to push existing tools beyond their comfort zone, they built something new: TernFS, designed from the ground up for massive, immutable datasets.
In other words, it’s made for “write once, read many times” data — perfect for training large language models (LLMs), data lakes, and long-term archives.
How TernFS Works
At its core, TernFS relies on a simple but powerful architecture:
- Metadata shards – 256 logical units to handle metadata distribution
- Cross-Directory Coordinator (CDC) – manages operations across directories
- Block services – take care of the actual data storage
- Registry – orchestrates the whole system
To ensure reliability, TernFS uses Reed-Solomon erasure coding and CRC32-C checksums, combined with high-speed communication over UDP and TCP.
Real-World Performance
XTX isn’t just publishing a prototype — this is production-ready. Their infrastructure already runs:
- 500+ PB stored across 30,000 HDDs and 10,000 SSDs in three data centers
- Terabytes per second throughput during peak loads
- Native multi-region support and strong fault tolerance
On top of that, the system includes block proofs (cryptographic checks to prevent data corruption) and an automatic disk scrubbing mechanism to replace failing sectors before data is lost.
Deployment and Usage
Getting started is straightforward:
git clone https://github.com/XTXMarkets/ternfs
cd ternfs
./build.sh alpine # or ubuntu
./scripts/ternrun # for local testing
For maximum performance, you can run TernFS as a Linux kernel module. A FUSE version is available for user-space testing, and an S3-compatible API makes migration from AWS environments easier.
⚠️ Important note: TernFS is not designed for millions of tiny files. It’s optimized for huge volumes of data — think scientific datasets, logs, AI training sets, or archives. Permissions and access control also need to be managed externally.
Licensing and Outlook
TernFS comes under a dual license:
- GPLv2+ for the core
- Apache 2.0 with LLVM exception for libraries and protocols
This balance allows both community contributions and commercial adoption.
With XTX investing more than €1 billion in new data centers, including sites in Finland packed with thousands of GPUs, it’s clear the company sees infrastructure as a strategic advantage.
For organizations struggling with the limitations of ZFS or NFS at petabyte scale, TernFS offers an open-source path forward. And with the explosive growth of AI and big data, demand for exabyte-level storage is only just beginning.



