--- license: mit tags: - cancer-genomics - bioinformatics - graph-database - neo4j - distributed-computing - boinc - healthcare - genomics - fastq - blast - variant-calling - gdc-portal - tcga library_name: cancer-at-home-v2 pipeline_tag: other metrics: - accuracy - bleu - bleurt --- # Cancer@Home v2 A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization. ## 🚀 Quick Start (5 minutes) ### Prerequisites - Python 3.8+ - Docker Desktop - 8GB RAM minimum ### Installation 1. **Clone and setup** ```bash cd CancerAtHome2 python -m venv venv venv\Scripts\activate # Windows pip install -r requirements.txt ``` 2. **Start Neo4j Database** ```bash docker-compose up -d ``` 3. **Run the application** ```bash python run.py ``` 4. **Open your browser** - Application: http://localhost:5000 - Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123) ## 🎯 Features ### 1. **Distributed Computing (BOINC Integration)** - Submit cancer research computational tasks - Monitor distributed workload processing - Real-time task status tracking ### 2. **GDC Data Integration** - Download cancer genomics data from GDC Portal - Support for various cancer types (TCGA, TARGET projects) - Automatic data parsing and normalization ### 3. **Sequence Analysis Pipeline** - FASTQ file processing - BLAST sequence alignment - Variant calling and annotation ### 4. **Neo4j Graph Database** - Graph-based cancer data modeling - Relationships: Gene → Mutation → Patient → Cancer Type - Interactive graph visualization ### 5. **GraphQL API** - Query cancer data flexibly - Filter by gene, mutation, patient cohort - Aggregate statistics ### 6. **Interactive Dashboard** - Real-time data visualization - Network graphs for gene interactions - Mutation frequency charts - Patient cohort analysis ## 📊 Architecture ``` Cancer@Home v2 │ ├── Frontend (React + D3.js) │ ├── Dashboard │ ├── Neo4j Visualization │ └── Task Monitor │ ├── Backend (FastAPI) │ ├── REST API │ ├── GraphQL Endpoint │ └── WebSocket (real-time updates) │ ├── Data Layer │ ├── Neo4j (Graph Database) │ ├── BOINC Client │ └── GDC API Client │ └── Analysis Pipeline ├── FASTQ Parser ├── BLAST Wrapper └── Variant Annotator ``` ## 🗂️ Project Structure ``` CancerAtHome2/ ├── backend/ │ ├── api/ # FastAPI routes │ ├── boinc/ # BOINC integration │ ├── gdc/ # GDC data fetcher │ ├── neo4j/ # Neo4j database layer │ ├── pipeline/ # Bioinformatics pipeline │ └── graphql/ # GraphQL schema ├── frontend/ │ ├── public/ │ └── src/ │ ├── components/ # React components │ ├── views/ # Page views │ └── api/ # API client ├── data/ # Downloaded datasets ├── docker-compose.yml # Neo4j container ├── requirements.txt # Python dependencies └── run.py # Main entry point ``` ## 🧬 Data Flow 1. **Data Ingestion**: Download cancer genomics data from GDC Portal 2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network 3. **Storage**: Store results in Neo4j graph database 4. **Visualization**: Query and visualize via web dashboard ## 🔧 Configuration Edit `config.yml` to customize: - Neo4j connection settings - GDC API parameters - BOINC project URL - Analysis pipeline options ## 📖 Usage Examples ### Query Mutations by Gene ```graphql query { mutations(gene: "TP53") { id position consequence patients { cancerType stage } } } ``` ### Submit Analysis Task ```python from backend.boinc import BOINCClient client = BOINCClient() task_id = client.submit_task( workunit_type="variant_calling", input_file="sample.fastq" ) ``` ## 🤝 Inspired By - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research - [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling ## 📄 License MIT License ## 🛟 Support For issues or questions, please open a Huggingface or GitHub issue.