top of page

DARWIN: The First Foundation Model for Marine Intelligence

  • David Lamb
  • Jul 3
  • 3 min read

We’d like to introduce you to DARWIN, the first foundation model designed specifically for ocean intelligence. After training on 32 years of oceanographic data, we've reached an initial milestone - demonstrating that AI can learn to understand complex ocean patterns.


We want to share our progress, detail what DARWIN can already do, and be open about the volume of work still ahead of us.


Our Creation


DARWIN is our approach to bridging the gap between traditional physics-based ocean models and modern AI capabilities. While numerical models excel at simulating known physical processes, they're computationally expensive and struggle with complex multi-variable relationships. Existing machine learning approaches in marine science typically focus on single variables or specific tasks, requiring separate models for temperature prediction, chlorophyll estimation, or current forecasting.


We've trained the model on data from the Copernicus Marine Service, covering the North-Western Atlantic-European continental shelf from 1992-2024. This includes 21 variables- temperature, salinity, chlorophyll, currents, nutrients, pH, oxygen levels, and more- at 7km resolution. Rather than working with scattered datasets, we use this data to help DARWIN understand how ocean systems actually work together.


Figure 1: Examples of variables in dataset (Copernicus Marine Service Products, 2024)
Figure 1: Examples of variables in dataset (Copernicus Marine Service Products, 2024)

The foundation model approach means researchers can adapt its base intelligence for specific tasks rather than building everything from scratch. Whether you're studying species distribution, tracking climate changes, or monitoring pollution, you can fine-tune DARWIN rather than starting from scratch.


How We Know It Works


DARWIN learns ocean patterns through a masked autoencoder approach, where we deliberately hide 75% of the input data during training and ask the model to reconstruct the missing pieces. This training task forces the model to comprehend underlying relationships between ocean variables rather than simply memorising patterns.


The results have been encouraging. The model reconstructs temperature fields accurately, fills in missing chlorophyll data, and predicts current patterns from limited observations. It handles random missing data well and we're currently working on improving performance with large spatial or temporal gaps through new temporal and spatial metadata features and custom masking strategies we're developing.


What's particularly interesting is that the model’s reconstructions make physical sense. The model doesn't just smooth over missing values - it maintains realistic relationships between variables. Warm water areas match up with expected biological activity, currents align with temperature patterns, and seasonal cycles stay consistent across different measurements.


Figure 2: Visualisation of reconstruction task performance
Figure 2: Visualisation of reconstruction task performance

Current Capabilities


Intelligent Gap Filling: The model reconstructs missing ocean observations with high accuracy, already useful for researchers dealing with incomplete datasets from satellite gaps or sensor failures.


Multi-Variable Understanding: Unlike single-variable models, DARWIN processes all 21 ocean variables simultaneously, capturing the complex interdependencies that drive marine systems.


Physical Consistency: Reconstructions maintain oceanographically realistic relationships, demonstrating understanding of ocean dynamics rather than just statistical patterns.


Foundation for Applications: The model's base intelligence can be fine-tuned for specific tasks, dramatically reducing development time for new marine AI applications.



Model Architecture


Built on a Vision Transformer backbone, the model processes ocean data as 3D spatiotemporal patches, understanding space and time together rather than treating them as separate dimensions. This approach has proven important for capturing ocean dynamics where current events depend on historical conditions and spatial context.


We've implemented a masked autoencoder framework specifically adapted for marine data. Key innovations include 3D patch embeddings via convolutional layers, spatiotemporal positional encodings that capture latitude, longitude, and temporal relationships, and a custom loss function that excludes land pixels from training—ensuring the model focuses exclusively on ocean dynamics.


The encoder-decoder structure processes 32×32×12 spatiotemporal chips (224km × 224km area over 12 months) with 21 variables per location. This design enables DARWIN to understand both local phenomena and broader regional patterns while maintaining computational efficiency through the transformer's attention mechanisms.


Figure 3: Prithvi architecture (reprinted from Szwarcman et al.).
Figure 3: Prithvi architecture (reprinted from Szwarcman et al.).

What's Next


The reconstruction work is our foundation, but we're building toward more advanced applications. Our roadmap includes moving from monthly to daily data resolution, expanding from our current regional coverage to global scope, and adding more variables from satellites and autonomous sensors.


DARWIN's foundation model approach opens up numerous potential applications: metocean forecasts for offshore wind operations, current forecasts for shipping route optimisation, coral bleaching early warning systems, species distribution modeling, and marine hazard detection. Our initial focus is harmful algal bloom prediction - a critical application given that these blooms contaminate seafood, poison humans and wildlife, and directly impact fishing, tourism, and public health. With climate change making blooms more frequent in warmer, fresher waters, there's an urgent need for better prediction capabilities that DARWIN's multi-variable understanding can provide.


This first version of DARWIN represents our initial step toward building a true foundation model for marine science & operations.The fact that it can understand complex ocean patterns gives us confidence it can serve as a base for the next generation of marine intelligence tools.

 
 
 
bottom of page