YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Joint NT-ESM2 DNA-Protein Models
This repository contains jointly trained Nucleotide Transformer (NT) and ESM2 models for DNA-protein sequence analysis.
Model Components
DNA Model (dna/
)
- Type: Nucleotide Transformer for DNA sequences
- Context: 4096 tokens
- Training: Transcript-specific coding sequences
Protein Model (protein/
)
- Type: ESM2 for protein sequences
- Variant: Large model
- Training: Corresponding protein sequences
Usage
from transformers import AutoModel, AutoTokenizer
# Load DNA model
dna_model = AutoModel.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="dna")
dna_tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="dna")
# Load protein model
protein_model = AutoModel.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="protein")
protein_tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="protein")
# Example joint usage
dna_seq = "ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA"
protein_seq = "MKRISLHHHHHHHQVTVRWD"
dna_inputs = dna_tokenizer(dna_seq, return_tensors="pt")
protein_inputs = protein_tokenizer(protein_seq, return_tensors="pt")
dna_outputs = dna_model(**dna_inputs)
protein_outputs = protein_model(**protein_inputs)
Training Details
- Joint Training: Models trained together for cross-modal understanding
- Batch Size: 8
- Data: Transcript-specific coding sequences with corresponding proteins
- Architecture: Maintained original NT and ESM2 architectures
Repository Structure
βββ dna/ # NT DNA model
β βββ config.json
β βββ model.safetensors
β βββ tokenizer_config.json
β βββ vocab.txt
β βββ special_tokens_map.json
βββ protein/ # ESM2 protein model
β βββ config.json
β βββ model.safetensors
β βββ tokenizer_config.json
β βββ vocab.txt
β βββ special_tokens_map.json
βββ joint_config.json # Joint model configuration
Citation
If you use these models, please cite the original NT and ESM2 papers along with your joint training methodology.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support