YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ” Obfuscated Variable Renaming with aixcoder

This repository hosts a aixcoder–based model fine-tuned to rename obfuscated variables in source code, improving readability while preserving program semantics.

The model is designed for use cases such as malware analysis, reverse engineering, digital forensics, and general program comprehension.


πŸš€ Task Overview

Task: Code Deobfuscation / Variable Renaming
Base Model: aixcoder Input: Source code with obfuscated variable names
Output: Semantically equivalent source code with readable variable names

Example

Input

function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}

Output

function multiplyAndAdd(a, b) {
  let product = a * b;
  return product + 10;
}

🧠 Model Description

  • Architecture: aixcoder (Transformer-based)
  • Fine-tuning Objective: Context-aware variable renaming
  • Approach: AST-guided identifier alignment + sequence generation
  • Languages: JavaScript (primary), extendable to others

The model learns to infer meaningful variable names from usage context, not from superficial patterns.


πŸ— Training Details

Dataset

  • Paired samples of:
    • Obfuscated code
    • Original / readable code
  • Variable mappings extracted using AST-based analysis
  • Realistic obfuscation patterns (minifiers, packers, name mangling)

Training Objectives

  • Identifier-aware sequence-to-sequence learning
  • Contextual name prediction
  • Syntax preservation

πŸ“¦ Installation

pip install transformers torch accelerate

▢️ Usage

Inference Example

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Neo111x/aixcoder-renaming"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

code = '''
function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}
'''

inputs = tokenizer(code, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ§ͺ Evaluation

  • Identifier exact-match accuracy
  • AST equivalence checks
  • Manual readability assessment

⚠️ Limitations

  • Generated names are semantic approximations, not original identifiers
  • Performance degrades on:
    • Extremely short contexts
    • Heavy control-flow flattening
  • Single-file scope only

πŸ” Ethical Considerations

This model is intended for:

  • Malware and binary analysis
  • Digital forensics and incident response (DFIR)
  • Code maintenance and auditing

It should not be used to violate software licenses or intellectual property rights.


🧩 Future Work

  • Multi-language support (C/C++, Python)
  • Function and class renaming
  • Control-flow–aware modeling
  • Integration with decompilers and IR tools

πŸ“œ License

Specify the license here (e.g., Apache-2.0, MIT).


πŸ“– Citation

@misc{aixcoder_code_variable_renamer,
  title={Context-Aware Variable Renaming for Obfuscated Code using aixcoder},
  author={Your Name},
  year={2026},
  url={https://huggingface.co/Neo111x/aixcoder-renaming}
}
Downloads last month
-
Safetensors
Model size
7B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support