We build intelligent systems that transform how biological knowledge is generated — from curating 20+ genomic databases to deploying autonomous AI agents for marine genomics, cancer biology, and multi-omics data integration.
Our group operates at the convergence of machine learning, multi-omics data integration, and translational genomics — developing both foundational databases and next-generation AI reasoning systems.
We develop machine learning frameworks that unify genomics, transcriptomics, proteomics, and epigenomics data streams. Our graph-neural-network approaches enable cross-modal feature learning, identifying regulatory nodes invisible to single-omics analysis. Applied to cancer, neurodegeneration, and rare disease cohorts.
Moving beyond static pipelines, we pioneer agentic AI systems that reason iteratively — querying databases, evaluating intermediate outputs, and refining hypotheses without human intervention at each step. Our BLASTclaw and OpenClaw frameworks demonstrate that LLM-orchestrated agents can conduct publishable-quality genomic analysis autonomously.
Our SingleCellStudio platform integrates deep-learning cell type classifiers with conventional Seurat/Scanpy workflows. We apply scRNA-seq to cancer immunology, rare paediatric cancers, and infectious disease (scrub typhus, Hirschsprung's), revealing cell-state transitions invisible at bulk-tissue resolution.
Leveraging our curated genomic databases (DNGene, dbTBI, ONGene, CMGene), we apply graph-learning and network medicine to identify druggable targets. Our TxGNN-augmented pipelines predict drug repurposing candidates for diabetic nephropathy, traumatic brain injury, and rare cancers — translating database curation into actionable clinical hypotheses.
We are building a new paradigm where AI systems do not merely accelerate analysis — they conduct discovery. Our agent frameworks integrate large language models, tool orchestration, and domain knowledge graphs to reason autonomously over complex biological data.
BLASTclaw — Autonomous Post-BLAST Reasoning AgentBLASTclaw reimagines one of the most routine yet labour-intensive steps in genomics: post-BLAST analysis. Rather than treating sequence alignment as an endpoint, BLASTclaw transforms it into the entry point for automated interpretation. The agent integrates contextual knowledge retrieval, iterative LLM reasoning chains, and workflow orchestration to produce biologically meaningful outputs — functional annotation, pathway context, and literature synthesis — with minimal human intervention. Designed for non-model organisms, marine species, and novel sequence space where reference databases are sparse.
marineClaw — Marine Genomics Discovery AgentAn early-stage autonomous agent designed to navigate complex marine genomic datasets, generate hypotheses, and iteratively refine analytical strategies. marineClaw addresses the central challenge of marine biotechnology: interpreting vast sequence data without reference genomes. It integrates genome annotation pipelines (RepeatModeler, BRAKER3), functional inference, and AI-driven hypothesis generation into a unified reasoning loop — pointing toward a near future where discovery is shaped by intelligent, adaptive systems.
OpenClaw / Zoe — Agentic Orchestration PlatformA modular multi-agent orchestration system with GitHub PR automation, Telegram notifications, and task routing. OpenClaw serves as the infrastructure layer for our research group's automated workflows — from manuscript revision tracking to bioinformatics pipeline monitoring. Zoe is the conversational front-end enabling natural-language control of complex computational tasks.
SmartBLAST — Intelligent Sequence AnalysisSmartBLAST extends standard BLAST with AI-powered interpretation across 37 follow-up analysis types, covering functional enrichment, structural prediction integration, phylogenetic contextualisation, and regulatory inference. Designed for researchers who need publication-ready analysis beyond raw alignment scores.
SingleCellStudio — scRNA-seq AI PlatformA GUI-driven single-cell analysis platform integrating Seurat, Scanpy, and deep-learning cell type classifiers. Features automated cluster annotation, AI-assisted trajectory inference, and figure-ready visualisation — lowering the barrier for wet-lab researchers to conduct publication-quality single-cell analysis without programming expertise.
OmicsGPT — Database-Native AI AssistantA next-generation conversational AI assistant natively integrated with our 20+ genomic databases. OmicsGPT enables natural-language querying of ONGene, dbEMT, DNGene, lncRNACancer, CMGene and sister databases — translating complex biological questions into structured database queries and synthesising results into research-ready summaries. Planned to support grant hypothesis generation and literature gap analysis.
Over a decade of manual curation and computational integration has produced a suite of high-quality, widely-cited databases covering cancer, neurodegeneration, rare disease, and non-coding biology. These resources now serve as the knowledge backbone for our AI agent frameworks.
Our databases collectively represent over 10 years of literature mining, experimental validation, and community contribution — cited thousands of times across oncology, systems biology, and translational medicine. Each database is now being integrated into our AI agent architecture, transforming static knowledge repositories into dynamic, queryable knowledge graphs accessible via natural language.
The next generation of these resources will incorporate LLM-powered annotation, automatic literature monitoring, and federated querying — enabling researchers to interrogate multiple databases simultaneously through conversational interfaces.
Marine genomics is entering a phase of unprecedented data abundance — yet biological insight remains the limiting step. We argue this is not merely a technology problem, but a paradigm shift.
Despite advances in sequencing and computational pipelines, the interpretation of marine genomic data remains largely manual, fragmented, and difficult to scale across the vast diversity of ocean life. A central question therefore emerges: can artificial intelligence move beyond accelerating analysis to fundamentally redefining how discovery is conducted?
We synthesise recent advances in AI across the life sciences, highlighting how large-scale models are reshaping sequence annotation, functional inference, and multi-omics integration, with emerging applications in marine systems. BLASTclaw transforms post-BLAST analysis from an endpoint into an entry point for automated interpretation — integrating contextual knowledge, iterative reasoning, and workflow orchestration.
Our prototype autonomous agent marineClaw is designed to navigate complex marine genomic datasets, generate hypotheses, and iteratively refine analytical strategies — suggesting a near future in which AI actively participates in the scientific process, operating at scales and speeds beyond human cognition.
Selected publications spanning database development, cancer genomics, AI methodology, and marine biology. For a full list, visit Google Scholar.
A cross-disciplinary team spanning bioinformatics, AI engineering, cancer biology, and marine genomics — united by the goal of making biological discovery faster, smarter, and more autonomous.