Welcome to 2DseisvelGenerator !

Generative AI In Geoscience

Generative AI is transforming geoscience by making data more accessible, improving model generalization, and enhancing decision-making in subsurface studies. It generates realistic synthetic data, such as seismic signals and geological structures, addressing data scarcity and high collection costs. This improves machine learning models for tasks like seismic interpretation and full waveform inversion (FWI). Additionally, by simulating subsurface environments, Generative AI aids in better predictions for resource exploration, groundwater management, and carbon capture and storage (CCS).

DDPM Trained on OpenFWI Dataset to Reduce Bias in Synthetic Data : A Try to Enhancing Generalization in Seismic FWI

This model is a Diffusion Probabilistic Model (DDPM) trained on the OpenFWI dataset, designed to generate synthetic seismic velocity models with a reduction in bias. By generating more diverse seismic velocity fields, this model aims to reduce the bias typically found in synthetic data used in seismic Full Waveform Inversion (FWI) and related geophysical tasks. It is intended for research purposes in geophysical exploration, resource extraction, and seismic hazard assessment.

Model type: Generative Model

Intended Use

This model is intended for generating synthetic seismic velocity models to help reduce bias in traditional seismic inversion tasks. It introduces greater diversity in the synthetic data, improving generalization compared to conventional methods. However, this model does not eliminate bias entirely and should be validated carefully before use in real-world applications.

Training Data

OpenFWI Data Set

Training Procedure

Model Architecture : Diffusion Probabilistic Model (DDPM)
Dataset: OpenFWI , 5,000 of each class (total 10 classes) with 4 augmentation , Total Data Size = 2,50,000 samples For training.
Input Dimensions: Seismic velocity models of size (64x64)
Augmentation: Horizontal flipping, rotation(-25,+25) , scaling, cropping, and elastic distortion to reduce bias.

Samples generated by Generative Model

(1) Random Distribution Generator : This will generate anything random , you can use it with any type of geological class. It is an experiment to make distribution more complex.

(2) Curve Velocities : It indecates the curve subsurface layer. This dataset is diverse than OpenFWI data set.

(3) Flat Faults Type A : Faulting mechanism in layers.

(4) Flat Faults Type B : Relative geometry, displacement, or faulting mechanism in different layers.

How to use it !

You can generate the samples of Seismic Velocity Models using the code below . Note that you must have GPU to Reverse Diffusion Process for sampling , if you hav't use Google Colab Notebook with T4 GPU .

# Install the Library
%%capture
!pip install diffusers transformers
-------------------------------------------------------------------------------------

# Import the required libraries
from diffusers import DDPMPipeline
import torch

----------------------------------------------------------------------------------------


# Enable GPU usage in the pipeline
device = "cuda" if torch.cuda.is_available() else "cpu"

------------------------------------------------------------------------------------------
# Select the model what you want to sample
# For Random                model_name = "kankur0007/2DseisvelGenerator"
# For the Curve Velocity    model_name = "kankur0007/2DseisvelGenerator_Curve_Velocities"
# For Flat Fault Type A     model_name = "kankur0007/2DseisvelGenerator_Flat_Faults_Type_A"
# For Flat Fault Type B     model_name = "kankur0007/2DseisvelGenerator_Flat_Faults_Type_B"

----------------------------------------------------------------------------------------------

# Load the model from Hugging Face hub
model_name = Select your model
pipeline = DDPMPipeline.from_pretrained(model_name)
pipeline.to(device)

-----------------------------------------------------------------------------------------------

# Generate Multiple Samples
num_samples = 10            # You can generate any number of unique samples!
generated_images = pipeline(num_inference_steps=1000, batch_size=num_samples)

--------------------------------------------------------------------------------------------------

# Let's see our image
import matplotlib.pyplot as plt
import math

# Define the number of rows and columns
num_rows = math.ceil(num_samples**0.5)
num_cols = math.ceil(num_samples / num_rows)

# Create a figure with dynamic grid size and larger figure size
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10))  # Adjust figsize as needed
axes = axes.flatten()

# Display the images
for idx, image in enumerate(generated_images.images):
    axes[idx].imshow(image)
    axes[idx].axis('off')  # Hide the axis

# Turn off unused subplots (if any)
for ax in axes[num_samples:]:
    ax.axis('off')

plt.tight_layout()
plt.show()

Bias, Risks, and Limitations

Bias: When training a DDPM using the OpenFWI dataset, biases may arise from various factors. Data sampling bias can occur if the dataset lacks diversity in geological structures, leading to poor generalization to unseen structures. Label bias, where seismic velocities are unevenly distributed, might cause the model to favor dominant classes. Augmentation bias from artificial transformations can introduce unrealistic patterns, while class imbalance could lead to underrepresentation of rare events. Geographical bias may arise if the dataset is region-specific, and measurement bias can result from differences in data collection methods or sensor types, affecting the model's performance across varying conditions.
Risk : When training a DDPM using the OpenFWI dataset, risks include overfitting to specific patterns or noise, leading to poor generalization to new seismic data. The model may amplify existing biases in the dataset, resulting in inaccurate or unfair predictions, especially for underrepresented regions or structures. This could cause incorrect seismic inversions, potentially misleading geophysical interpretations and decision-making. Additionally, unrealistic artifacts might be generated if the data is overly augmented or not properly curated. These risks could have serious ethical and economic consequences, particularly in critical applications like disaster prediction or resource extraction.
Limitations : You need a multicore GPU for sampling .

Evaluation

I’ve dedicated substantial effort to ensure that our learned distribution closely mimic the actual dataset distribution. Despite this, there remains a significant divergence between the two. We are currently updating the model to enhance its accuracy in reflecting the true distribution.

Acknowledgments

OpenFWI: Thanks to the creators of the OpenFWI dataset for providing the foundational seismic data.
Bias Reduction Efforts: This model incorporates advances in diffusion models and data augmentation techniques to help reduce bias in synthetic data generation.

Citing

If you use this model in your research or applications, please consider citing it as follows:

@Misc{2DseisvelGenerator,
  title =        {2DseisvelGenerator: 2D Seismic Velocity Synthesis},
  author =       {Ankur Kumar},
  howpublished = {\url{https://huggingface.co/kankur0007/2DseisvelGenerator}},
  year =         {2024}
}

kankur0007
/

2DseisvelGenerator