Pathway analysis#
In the tutorial, we try to interpret GO term or pathway with actual biological context using LLM.
import scanpy as sc
adata = sc.read_h5ad("pbmc.h5ad")
sc.pl.umap(adata, color=["leiden", "cell_type_lvl1"], legend_loc="on data", frameon=False)
celltype_dic = adata.obs.set_index('leiden')['cell_type_lvl1'].to_dict()
celltype_dic
{'0': 'CD4 T',
'1': 'B',
'2': 'FCGR3A+ Monocytes',
'3': 'NK',
'4': 'CD8 T',
'5': 'CD14+ Monocytes',
'6': 'Dendritic',
'7': 'Megakaryocytes'}
deg_df = sc.get.rank_genes_groups_df(adata, None,key="logreg_deg")
import gseapy as gp
term_dic = {}
for gi,sdf in deg_df.groupby("group"):
enr_bp = gp.enrichr(sdf["names"][:800].tolist(), gene_sets=['GO_Biological_Process_2023'], outdir=None)
term_ls = enr_bp.res2d.loc[enr_bp.res2d["Adjusted P-value"]<0.05, "Term"].tolist()[:20]
term_dic[gi] = term_ls
/tmp/ipykernel_3058393/1166100747.py:4: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
for gi,sdf in deg_df.groupby("group"):
import gptbioinsightor as gbi
### set API KEY
import os
os.environ['API_KEY'] = "sk-**"
background = "Cells are PBMCs from a Healthy Donor."
gbi.depict_pathway(term_dic, out="Pathway.md",
celltype_dic=celltype_dic, background=background,
provider="anthropic", model="claude-3-5-sonnet-20241022")
# Pathway summary
## Pathway set 0:
### Pathway explanation
- **Cytoplasmic Translation (GO:0002181)**: This pathway refers to the process where the ribosome decodes mRNA sequences to synthesize proteins in the cytoplasm. It is central to gene expression and protein synthesis, which are fundamental processes in all living cells. In CD4+ T cells, the regulation of translation is critical for immune response activation and differentiation.
- **Peptide Biosynthetic Process (GO:0043043)**: This term encompasses the enzymatic reactions involved in peptide chain elongation during translation. It directly contributes to the diversity and functionality of proteins produced by cells. Given that CD4+ T cells are highly active in protein synthesis for immune function, this pathway's enrichment is expected and significant.
- **Translation (GO:0006412)**: Translation is the process by which the genetic information carried by mRNA is used to synthesize proteins. This is a core process in all cells and is particularly relevant in CD4+ T cells due to their dynamic response requirements.
- **Macromolecule Biosynthetic Process (GO:0009059)**: This pathway includes the biosynthesis of large molecules such as proteins, nucleic acids, and lipids. The enrichment of this pathway in CD4+ T cells reflects the high metabolic activity required for rapid proliferation and function.
- **Gene Expression (GO:0010467)**: Gene expression involves the transcription of genes into RNA and the subsequent translation into proteins. In CD4+ T cells, the regulation of gene expression is crucial for controlling the differentiation and activation states of these cells.
- **Ribosomal Small Subunit Biogenesis (GO:0042274)**: This pathway focuses on the assembly and maturation of the small subunit of the ribosome, essential for initiating translation. Its enrichment highlights the importance of maintaining a sufficient number of functional ribosomes in CD4+ T cells for efficient protein synthesis.
- **Ribonucleoprotein Complex Biogenesis (GO:0022613)**: Ribonucleoprotein complexes, such as ribosomes and spliceosomes, play key roles in gene expression. Their biogenesis is crucial for the proper functioning of CD4+ T cells, especially during their activation and proliferation.
- **mRNA Splicing, Via Spliceosome (GO:0000398)**: This pathway involves the removal of introns from pre-mRNA and the joining of exons to form mature mRNA. This process is essential for the generation of diverse protein isoforms from a single gene, which can contribute to the complexity of CD4+ T cell responses.
- **RNA Splicing, Via Transesterification Reactions With Bulged Adenosine As Nucleophile (GO:0000377)**: This is a specific mechanism of splicing that occurs during the processing of pre-mRNA. It adds another layer of complexity to mRNA processing and can affect the stability and translation efficiency of mRNA in CD4+ T cells.
- **mRNA Processing (GO:0006397)**: This pathway includes various modifications of pre-mRNA, such as capping, splicing, and polyadenylation, necessary for the production of functional mRNA. The enrichment of this pathway underscores the importance of post-transcriptional regulation in CD4+ T cells.
- **Ribosome Biogenesis (GO:0042254)**: This pathway involves the synthesis and assembly of ribosomal components. Its enrichment suggests that CD4+ T cells have a high demand for new ribosomes to support increased protein synthesis during activation.
- **rRNA Processing (GO:0006364)**: rRNA is a critical component of ribosomes. Its processing is essential for ribosome assembly and function. The enrichment of this pathway in CD4+ T cells indicates the active turnover and maintenance of ribosomal components.
- **Ribosome Assembly (GO:0042255)**: This pathway describes the steps leading to the formation of a functional ribosome from its components. Its importance in CD4+ T cells is linked to the cell's need for efficient protein synthesis machinery.
- **Protein-RNA Complex Assembly (GO:0022618)**: This pathway involves the assembly of protein-RNA complexes, such as ribosomes and spliceosomes. The enrichment of this pathway in CD4+ T cells highlights the importance of these complexes in cellular processes.
- **rRNA Metabolic Process (GO:0016072)**: This pathway includes the synthesis, modification, and degradation of rRNA. Its relevance in CD4+ T cells is due to the cell's high metabolic demands and the need for functional ribosomes.
### Summary
[Comprehensive Analysis]: The pathways enriched in CD4+ T cells from a healthy donor reflect the cell's active involvement
in immune response and its high metabolic demands.
These pathways logically make biological sense in the context of CD4+ T cells, as they highlight the cell's need for efficient protein synthesis, post-transcriptional regulation, and ribosome biogenesisto maintain its function and respond to external stimuli.
A coherent biological hypothesis arising from these pathways is that CD4+ T cells are highly adapted to rapidly modulate their gene expression and protein synthesis capabilities to effectively participate in immune responses.
This hypothesis is supported by the enrichment of pathways related to translation, ribosome biogenesis, and mRNA processing,
which collectively ensure that CD4+ T cells can quickly adapt to changing environments by synthesizing proteins necessary
for their activation, proliferation, and differentiation.