File size: 794 Bytes
26fff0c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

---
tags:
- code-search
- pytorch
- gnn
- procedural-similarity
datasets:
- google/code_x_glue_cc_clone_detection_poj104
---

# Procedural Code Search (GIN + CodeBERT)

This model was trained to identify **procedural similarity** in C++ code (POJ-104 dataset). 
Unlike semantic models that focus on intent ("what it does"), this model focuses on structure ("how it does it").

## Architecture
- **Backbone:** Frozen CodeBERT (microsoft/codebert-base) for node features.
- **Head:** 2-layer GIN (Graph Isomorphism Network) for structural aggregation.
- **Objective:** Triplet Margin Loss (Margin=0.8) to separate structurally distinct implementations.

## Usage
This is a custom PyTorch model. You must define the `CFGEncoder` class structure exactly as used in training to load these weights.