code-graph-v4
Packaged git clones for the graphjepa / code-transformer project. with full git history.
Contents
- clones_csharp_full.tar.gz
- clones_java_full.tar.gz
- clones_javascript_full.tar.gz
- clones_python_full.tar.gz
- clones_typescript_full.tar.gz
Each tarball contains {language}/{repo_id}/... โ extract anywhere,
point the parser at the extracted directory.
On the receiving (big) machine
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id="IDMedicine/code-graph-v4",
filename="clones_python_full.tar.gz",
repo_type="model",
local_dir=".",
)
tar -xzf $path -C ./data_multilang/
# Then process each repo with build_bundle.py (needs include_git=True for
# temporal processing; or single-snapshot parsing if code-only).
Limitations
- If packaged without
.git(the_codevariants), no temporal processing is possible downstream โ only single-snapshot SSL. - If packaged with
.git(the_fullvariants), tarballs are larger but the full commit history is preserved forbuild_bundle.py.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support