AI-Driven Genomics · University of Sunshine Coast · Sunshine Coast, Australia

Where Artificial Intelligence
Meets Genomic Discovery

We build intelligent systems that transform how biological knowledge is generated — from curating 20+ genomic databases to deploying autonomous AI agents for marine genomics, cancer biology, and multi-omics data integration.

20+
Genomic Databases
200+
Publications
15K+
Monthly Users
3
AI Agent Frameworks
Research Focus

Integrating AI Across the Genomic Landscape

Our group operates at the convergence of machine learning, multi-omics data integration, and translational genomics — developing both foundational databases and next-generation AI reasoning systems.

AI CORE
Multi-omics AI
AI-Powered Multi-Omics Data Integration

We develop machine learning frameworks that unify genomics, transcriptomics, proteomics, and epigenomics data streams. Our graph-neural-network approaches enable cross-modal feature learning, identifying regulatory nodes invisible to single-omics analysis. Applied to cancer, neurodegeneration, and rare disease cohorts.

SEQUENCE LLM AGENT ANNOTATION PATHWAY HYPOTHESIS iterative reasoning loop
Agentic AI
Autonomous Scientific Reasoning Agents

Moving beyond static pipelines, we pioneer agentic AI systems that reason iteratively — querying databases, evaluating intermediate outputs, and refining hypotheses without human intervention at each step. Our BLASTclaw and OpenClaw frameworks demonstrate that LLM-orchestrated agents can conduct publishable-quality genomic analysis autonomously.

T cells B cells Myeloid NK UMAP · scRNA-seq
Single-Cell · scRNA-seq
Single-Cell Transcriptomics & AI Cell Typing

Our SingleCellStudio platform integrates deep-learning cell type classifiers with conventional Seurat/Scanpy workflows. We apply scRNA-seq to cancer immunology, rare paediatric cancers, and infectious disease (scrub typhus, Hirschsprung's), revealing cell-state transitions invisible at bulk-tissue resolution.

TARGET Drug A Drug B Gene X Path Y
Drug · Disease
AI-Driven Drug Target Discovery & Disease Networks

Leveraging our curated genomic databases (DNGene, dbTBI, ONGene, CMGene), we apply graph-learning and network medicine to identify druggable targets. Our TxGNN-augmented pipelines predict drug repurposing candidates for diabetic nephropathy, traumatic brain injury, and rare cancers — translating database curation into actionable clinical hypotheses.

AI Agent Frameworks

From Pipelines to Autonomous Scientific Agents

We are building a new paradigm where AI systems do not merely accelerate analysis — they conduct discovery. Our agent frameworks integrate large language models, tool orchestration, and domain knowledge graphs to reason autonomously over complex biological data.

Prototype
🌊
marineClaw — Marine Genomics Discovery Agent

An early-stage autonomous agent designed to navigate complex marine genomic datasets, generate hypotheses, and iteratively refine analytical strategies. marineClaw addresses the central challenge of marine biotechnology: interpreting vast sequence data without reference genomes. It integrates genome annotation pipelines (RepeatModeler, BRAKER3), functional inference, and AI-driven hypothesis generation into a unified reasoning loop — pointing toward a near future where discovery is shaped by intelligent, adaptive systems.

Marine Biotechnology Hypothesis Generation De Novo Annotation Autonomous Reasoning
In Production
⚙️
OpenClaw / Zoe — Agentic Orchestration Platform

A modular multi-agent orchestration system with GitHub PR automation, Telegram notifications, and task routing. OpenClaw serves as the infrastructure layer for our research group's automated workflows — from manuscript revision tracking to bioinformatics pipeline monitoring. Zoe is the conversational front-end enabling natural-language control of complex computational tasks.

Multi-Agent GitHub Integration Workflow Automation Telegram Bots
Active
🧬
SmartBLAST — Intelligent Sequence Analysis

SmartBLAST extends standard BLAST with AI-powered interpretation across 37 follow-up analysis types, covering functional enrichment, structural prediction integration, phylogenetic contextualisation, and regulatory inference. Designed for researchers who need publication-ready analysis beyond raw alignment scores.

Sequence Analysis 37 Analysis Types Functional Enrichment Phylogenetics
Active
🔬
SingleCellStudio — scRNA-seq AI Platform

A GUI-driven single-cell analysis platform integrating Seurat, Scanpy, and deep-learning cell type classifiers. Features automated cluster annotation, AI-assisted trajectory inference, and figure-ready visualisation — lowering the barrier for wet-lab researchers to conduct publication-quality single-cell analysis without programming expertise.

scRNA-seq Deep Learning Cell Type AI Seurat · Scanpy
Planned
💊
OmicsGPT — Database-Native AI Assistant

A next-generation conversational AI assistant natively integrated with our 20+ genomic databases. OmicsGPT enables natural-language querying of ONGene, dbEMT, DNGene, lncRNACancer, CMGene and sister databases — translating complex biological questions into structured database queries and synthesising results into research-ready summaries. Planned to support grant hypothesis generation and literature gap analysis.

RAG Architecture 20+ Databases NL Querying Grant Support
Database Atlas

20+ Curated Genomic Databases

Over a decade of manual curation and computational integration has produced a suite of high-quality, widely-cited databases covering cancer, neurodegeneration, rare disease, and non-coding biology. These resources now serve as the knowledge backbone for our AI agent frameworks.

Our databases collectively represent over 10 years of literature mining, experimental validation, and community contribution — cited thousands of times across oncology, systems biology, and translational medicine. Each database is now being integrated into our AI agent architecture, transforming static knowledge repositories into dynamic, queryable knowledge graphs accessible via natural language.

The next generation of these resources will incorporate LLM-powered annotation, automatic literature monitoring, and federated querying — enabling researchers to interrogate multiple databases simultaneously through conversational interfaces.

20+
Genomic Databases
Cancer databases8
Non-coding RNA4
Neurological / TBI3
Marine / Genomics3
Traditional Medicine2+
Total citations (est.)3,000+
ONGene
Literature-based human oncogene database with experimentally verified annotations, expression profiles, and CNV integration across TCGA cancers.
Cancer · Oncogenes
dbEMT 2.0
Gold-standard database for epithelial–mesenchymal transition genes. 370+ curated genes, pre-calculated regulatory networks, and cancer metastasis links.
Metastasis · EMT
lncRNACancer
Pan-cancer lncRNA resource with expression, co-expression networks (lnCaNet), and functional annotations across 110 cancer subtypes.
Non-coding RNA
TSGene
Comprehensive tumour suppressor gene database enabling cross-cancer comparison of suppressor gene availability and mutation burden.
Cancer · TSG
CMGene
Cancer metastasis gene database with KEGG pathway integration, enabling exploration of genetic mechanisms driving metastatic cascades.
Metastasis
BCGene
Brain cancer-implicated gene resource with subtype-specific mechanisms, Allen Brain Map expression, and TCGA mutation summaries.
Brain Cancer
DNGene
Diabetic nephropathy gene and drug target database integrating TxGNN predictions, DGIdb interactions, and clinical variant evidence.
Nephropathy · Drug
dbTBI
Traumatic brain injury multi-omics database linking genetic variants, expression changes, and network medicine drug repurposing candidates.
Neuro · TBI
CNVannotator
Copy number variant annotation tool integrating TCGA tumour data with oncogene/suppressor overlap to identify recurrent drivers.
CNV · Genomics
TCMID Pipeline
Automated scraper and SQLite integration pipeline for the Traditional Chinese Medicine Ingredients Database, enabling pharmacogenomic cross-referencing.
Traditional Medicine
lnCaNet
Pan-cancer lncRNA co-expression interactome. Pre-computed networks across 110 cancer subtypes for biomarker discovery and regulatory inference.
Network · lncRNA
MarineDB [new]
Curated marine organism genomic resource integrating de novo assemblies, functional annotations, and AI-generated pathway predictions for ocean biodiversity.
Marine · Genomics
Keynote Vision

AI and Marine Biotechnology:
A Conceptual Shift

Marine genomics is entering a phase of unprecedented data abundance — yet biological insight remains the limiting step. We argue this is not merely a technology problem, but a paradigm shift.

Despite advances in sequencing and computational pipelines, the interpretation of marine genomic data remains largely manual, fragmented, and difficult to scale across the vast diversity of ocean life. A central question therefore emerges: can artificial intelligence move beyond accelerating analysis to fundamentally redefining how discovery is conducted?

"The convergence of artificial intelligence and marine biotechnology is not merely a technological upgrade, but a conceptual shift — from human-guided bioinformatics to increasingly autonomous systems of scientific reasoning."

We synthesise recent advances in AI across the life sciences, highlighting how large-scale models are reshaping sequence annotation, functional inference, and multi-omics integration, with emerging applications in marine systems. BLASTclaw transforms post-BLAST analysis from an endpoint into an entry point for automated interpretation — integrating contextual knowledge, iterative reasoning, and workflow orchestration.

Our prototype autonomous agent marineClaw is designed to navigate complex marine genomic datasets, generate hypotheses, and iteratively refine analytical strategies — suggesting a near future in which AI actively participates in the scientific process, operating at scales and speeds beyond human cognition.

View BLASTclaw View marineClaw
AI marineClaw agent
Selected Publications

High-Impact Research Output

Selected publications spanning database development, cancer genomics, AI methodology, and marine biology. For a full list, visit Google Scholar.

2024
BLASTclaw: An AI Agent Framework for Autonomous Post-BLAST Genomic Interpretation
bioRxiv preprint · Under review
AI · Agents
2023
SingleCellStudio: A GUI Platform Integrating Deep Learning Cell Type Classification for scRNA-seq Analysis
Bioinformatics · Oxford Academic
scRNA-seq
2022
DNGene: A Multi-omics Database for Diabetic Nephropathy Gene and Drug Target Discovery
Journal of Biomedical Informatics
Database
2021
BCGene: Online Database for Brain Cancer-Implicated Genes Exploring Subtype-Specific Mechanisms
BMC Genomics · 2021
Cancer
2019
dbEMT 2.0: An Updated Database for Epithelial-Mesenchymal Transition Genes with Experimentally Verified Information and Precalculated Regulation Information for Cancer Metastasis
Journal of Genetics and Genomics · 2019
Database
2017
ONGene: A Literature-Based Database for Human Oncogenes
Journal of Genetics and Genomics · 2017
Database
2015
dbEMT: An Epithelial-Mesenchymal Transition Associated Gene Resource
Scientific Reports · Nature Publishing Group
Database
View All Publications →
Research Team

Our Group

A cross-disciplinary team spanning bioinformatics, AI engineering, cancer biology, and marine genomics — united by the goal of making biological discovery faster, smarter, and more autonomous.

MZ
Dr. Min Zhao
Principal Investigator
AI genomics · Database development · Marine biotechnology · Cancer biology
PD
Postdoctoral Researcher Dr. Liu
AI Agent Development
LLM orchestration · BLASTclaw framework · Agentic bioinformatics
PhD
PhD Graduated
Marine Genomics
marineClaw · De novo genome annotation · Non-model organisms
PhD
PhD Candidate
Cancer Bioinformatics
scRNA-seq · SingleCellStudio · Rare paediatric cancers
+
We Are Hiring
PhD · Postdoc · RA
AI genomics · Marine bioinformatics · Database development. Contact us to apply.