Welcome on Bactmentha Documentation !
I. Project Overview
BactMentha is a database that provides informations about bacteria-host
protein-protein interactions. The specificity of BactMentha database is that it includes interaction
interfaces data retrieved using MimicINT tool in addition to experimentally detected binding regions,
thus increasing the proportion of known (or predicted) interaction regions compared to other databases.
By providing detailed information about interaction motifs, BactMentha database
aims to enable researchers to better understand the underlying mechanisms of bacterial infections and predict
new interactions based on known motifs. This can guide experiments to validate these interactions and offer
insights for developing new therapeutic strategies targeting these critical bacterial interactions.
Commensal and/or pathogenic bacteria secrete and inject effector proteins into
host cells to target various signaling pathways to ensure their survival. Currently, a little over ten
thousand interactions between bacterial and human proteins have been cataloged in interaction databases
(i.e., IntAct, MINT, BioGrid). As part of a European project (https://www.healthydietforhealthylife.eu/index.php/joint-actions/hdhl-intimic), we generated several
thousand new interactions between human proteins and proteins secreted by commensal intestinal Proteobacteria,
by combining experimental and computational approaches. We recently implemented a workflow to gather and process
protein-protein interaction data, integrating these new interactions with already known interactions into a
new database named BactMentha. Furthermore, the workflow uses a method we developed (mimicINT) to predict
the interfaces involved in the interactions within the database, thus providing molecular information that could
be useful for understanding how bacteria disrupt host cell networks.
By providing detailed information about interaction motifs, BactMentha database
aims to enable researchers to better understand the underlying mechanisms of bacterial infections and predict
new interactions based on known motifs. This can guide experiments to validate these interactions and offer
insights for developing new therapeutic strategies targeting these critical bacterial interactions.
II. Data Provenance
Interactions between human proteins and bacterial proteins were retrieved by the
BactMentha workflow. This workflow gathers all interactions involving the host (human/mouse/rat) from the
databases of the International Molecular Exchange Consortium or IMEx Consortium, which includes Intact,
MINT, and Uniprot. Additionally, the taxonomy of all bacteria is
retrieved via a query on the National Center for Biotechnology Information or NCBI website. The interactions involving human proteins
are then filtered to retain only those in which proteins from bacterial taxa are involved. The interactions are
further filtered to remove redundancy for each human-pathogen protein pair (e.g., same interaction referenced in
distinct databases).
Additional information is added to the interactions, such as proteins sizes and descriptions, and when available, protein interaction regions. The BactMentha workflow then performs BLASTp (Basic Local Alignment Search Tool from NCBI) searches against the Virulence Factor Database (VFDB) and BastionHub databases to annotate bacterial proteins as virulence factors or effectors. A sequence similarity threshold of 30% and an alignment coverage of 75% are used to infer annotations for BactMentha bacterial proteins from these two databases.
The mimicINT software, managed by the Snakemake workflow manager, is launched by the BactMentha workflow to identify potential interaction interfaces between human and pathogen proteins. First, mimicINT detects the presence of interaction elements, such as domains (using InterProScan) and SLiMs (using the SLiMProb tool from SLiMSuite and the Eukaryotic Linear Motif (ELM) database), that resemble those of the host in pathogen proteins, and collects the domains present in human proteins from the InterPro database. Next, the software retrieves known interaction models between two domains from the 3did database and between a domain and a motif from ELM, and infers interactions (and their interfaces) between human and pathogen proteins based on these models. The sequences of pathogen proteins involved in interactions with human proteins are provided to the software, which detects the presence of domains and/or motifs. Known interaction models between two domains or a domain and a motif then allow it to predict the interaction interfaces between domains present in human proteins and domains or motifs present in bacterial proteins.
Additional information is added to the interactions, such as proteins sizes and descriptions, and when available, protein interaction regions. The BactMentha workflow then performs BLASTp (Basic Local Alignment Search Tool from NCBI) searches against the Virulence Factor Database (VFDB) and BastionHub databases to annotate bacterial proteins as virulence factors or effectors. A sequence similarity threshold of 30% and an alignment coverage of 75% are used to infer annotations for BactMentha bacterial proteins from these two databases.
The mimicINT software, managed by the Snakemake workflow manager, is launched by the BactMentha workflow to identify potential interaction interfaces between human and pathogen proteins. First, mimicINT detects the presence of interaction elements, such as domains (using InterProScan) and SLiMs (using the SLiMProb tool from SLiMSuite and the Eukaryotic Linear Motif (ELM) database), that resemble those of the host in pathogen proteins, and collects the domains present in human proteins from the InterPro database. Next, the software retrieves known interaction models between two domains from the 3did database and between a domain and a motif from ELM, and infers interactions (and their interfaces) between human and pathogen proteins based on these models. The sequences of pathogen proteins involved in interactions with human proteins are provided to the software, which detects the presence of domains and/or motifs. Known interaction models between two domains or a domain and a motif then allow it to predict the interaction interfaces between domains present in human proteins and domains or motifs present in bacterial proteins.
III. Key Features and Resources
Core Features:
BactMentha website enables the user to navigate through the host-bacteria
protein-protein interactions providing additionnal informations such as experimental and inferred binding
data (for bacteria-human proteins interactions). The addition of mimicINT interfaces data have permitted to
increase the proportion of interactions between bacteria and human proteins with binding data from 5.3% of the
interactions with experimental binding data (in which only 1.4% with data on both host and pathogen proteins
sides) to about 11% of interactions with binding data (10.5% for human proteins side and 11.7% on the bacteria
proteins side). BactMentha also provides an
Advanced Search form to filter the database, as well as the possibility to download the resulting tables
(including dynamic filtrations on the columns !). Finally, the complete database dataset and archives are
available on the Download page.
Navigation and Interface (User guide):
IV. Access and Usage
No registration or login is necessary to access BactMentha database.
BactMentha is licensed under Creative Commons Attribution 4.0 International
V. Contributions, Funding and Collaborations (TODOOOOOOOOOO)
(Andreas) Mettre ici les grants et instituts ( 2 grants + INSERM + MESRI + qui côté Rome ?)
V. Useful links
BastionHub:
ELM database:
IMEx Consortium:
InterPro database:
Mimicint:
The Approved List of biological agents (from HSE):
VFDB:
WHO priority groups definition:
ELM database:
IMEx Consortium:
InterPro database:
Mimicint:
The Approved List of biological agents (from HSE):
VFDB:
WHO priority groups definition:
VI. Glossary
ANR: Agence Nationale de la Recherche
AMU: Aix-Marseille Université
API: Application Programming Interface
BLAST: Basic Local Alignment Search Tool
BR: Binding regions (Experimentally detected regions of interactions)
CSV: Comma-Separated Values
EBI: European Bioinformatics Institute
ELM: Eukaryotic Linear Motif
HSE: Health and Safety Executive
HUPO: Human Proteome Organization
IMEx Consortium: International Molecular Exchange Consortium
INSERM: Institut National de la Santé et de la Recherche Médicale
JPI: Joint Programming Initiative
MESR: Ministère de l'Enseignement Supérieur et de la Recherche
MI: mimicINT interfaces (inferred regions of interactions)
NCBI: National Center for Biotechnology Information
OS: Ontology Search
PA: Protein Annotation (referring to bacterial proteins being annotated as virulence factors or effectors)
PPI: Protein-protein interaction
SLiM: Short Linear Motif
TAGC: Theories and Approaches of Genomic Complexity
VFDB: Virulence Factor DataBase
WHO: World Health Organization
AMU: Aix-Marseille Université
API: Application Programming Interface
BLAST: Basic Local Alignment Search Tool
BR: Binding regions (Experimentally detected regions of interactions)
CSV: Comma-Separated Values
EBI: European Bioinformatics Institute
ELM: Eukaryotic Linear Motif
HSE: Health and Safety Executive
HUPO: Human Proteome Organization
IMEx Consortium: International Molecular Exchange Consortium
INSERM: Institut National de la Santé et de la Recherche Médicale
JPI: Joint Programming Initiative
MESR: Ministère de l'Enseignement Supérieur et de la Recherche
MI: mimicINT interfaces (inferred regions of interactions)
NCBI: National Center for Biotechnology Information
OS: Ontology Search
PA: Protein Annotation (referring to bacterial proteins being annotated as virulence factors or effectors)
PPI: Protein-protein interaction
SLiM: Short Linear Motif
TAGC: Theories and Approaches of Genomic Complexity
VFDB: Virulence Factor DataBase
WHO: World Health Organization