dv4aby's picture
Upload README.md with huggingface_hub
26fff0c verified
metadata
tags:
  - code-search
  - pytorch
  - gnn
  - procedural-similarity
datasets:
  - google/code_x_glue_cc_clone_detection_poj104

Procedural Code Search (GIN + CodeBERT)

This model was trained to identify procedural similarity in C++ code (POJ-104 dataset). Unlike semantic models that focus on intent ("what it does"), this model focuses on structure ("how it does it").

Architecture

  • Backbone: Frozen CodeBERT (microsoft/codebert-base) for node features.
  • Head: 2-layer GIN (Graph Isomorphism Network) for structural aggregation.
  • Objective: Triplet Margin Loss (Margin=0.8) to separate structurally distinct implementations.

Usage

This is a custom PyTorch model. You must define the CFGEncoder class structure exactly as used in training to load these weights.