| tags: | |
| - code-search | |
| - pytorch | |
| - gnn | |
| - procedural-similarity | |
| datasets: | |
| - google/code_x_glue_cc_clone_detection_poj104 | |
| # Procedural Code Search (GIN + CodeBERT) | |
| This model was trained to identify **procedural similarity** in C++ code (POJ-104 dataset). | |
| Unlike semantic models that focus on intent ("what it does"), this model focuses on structure ("how it does it"). | |
| ## Architecture | |
| - **Backbone:** Frozen CodeBERT (microsoft/codebert-base) for node features. | |
| - **Head:** 2-layer GIN (Graph Isomorphism Network) for structural aggregation. | |
| - **Objective:** Triplet Margin Loss (Margin=0.8) to separate structurally distinct implementations. | |
| ## Usage | |
| This is a custom PyTorch model. You must define the `CFGEncoder` class structure exactly as used in training to load these weights. | |