Welcome to the Molecule Annotation System (MAS) Quick Start.
MAS (Molecule Annotation System) is a whole data-mining and function annotation solution to extract & analyze biological molecules relationships from public knowledgebase of biological molecules and signification. MAS analysis platform is a web client program for interactive navigation in the knowledge base. MAS uses relational database of biological networks created from millions of individually modeled relationships between genes, proteins, diseases and tissues. MAS allow a view on your data, integrated in biological networks according to different biological context. This unique feature results from multiple lines of evidence which are integrated in MAS' database. MAS Help to understand relationship of gene expression data.
The primary databases of MAS integrated various well-known biological resources such as Genbank, EMBL, SwissProt, Gene Ontology, KEGG, BioCarta, GenMapp, mirBase, EPD, HPRD, MIND, BIND, Intact, TRANSFAC, UniGene, dbSNP, OMIM, InterPro, HUGO, MGI and RGD. MAS offers various query entries, graphics result display, The system represents an alternative approach to mining and catch on biological signification for high-throughput array data.
MAS is a web-base application so that a computer with a standard web browser using default setting should work well. MAS was tested with several combinations of browser and operation systems, including Microsoft Internet Explorer(6,7,8), Mozilla Firefox(3+) and Google Chrome in Windows XP/Vista/7, Mozilla Firefox(3+), Google Chrome and Opera(10) in GNU/Linux. Other W3C standards-compliant browser should work.
Open your browser, go to http://bioinfo.capitalbio.com/mas3
Click "Create Account" on the upper right corner, you are required to fill out the registration form (email, name, and organization etc.). Make sure that the properties filled out by your full contact information.
After the account creation process, you will receive an email to guide you to login or an rejection email because of your incorrect information.
Open your browser, go to http://bioinfo.capitalbio.com/mas3
Click "Login" on the upper right corner, you are required to fill out the login form (email, and password etc.). Press the "Login" button to log into MAS.You also can get back a new password by yourself If you forget the old one. Lots of help doc are available in the 'support' module,such as 'Quick Start','FAQ','Tutorial','Resource','Data Source','Discussion Forum','Contact Us'.
Click "Project" in the top menu, click the "Create New Project" button to create a new project for collecting analyses.
Click "Project" in the top menu, enter the project,and click the "Create new Analysis" button to create a new analysis.
System provides two groups of sample data, coming from Human (with experiment data columns) and a.thaliana (with single column).
To define the analysis when creating a new analysis includes the following contents:
Click "Create" to submit an analysis to server side,and click the "Start Analysis" button to run a new analysis.
When an analysis is finished, click "Show Result" in the Operations side bar to view the result of this analysis.'download' button will supply you with a result zip package.'copy(clone)','edit','delete' operations are available too.
You can close the web page while the analysis is waiting / running and check result after it is completed.
Here, Gene Ontology term information associated with inputted data, along with GO term significance, is shown. Distribution of the selected GO term can be graphically displayed, Distribution of the selected GO term can be graphically displayed.
Genes co-exist frequency statistics and according graphic display in pathway.
Every color indicates interaction in one database, gray is summary in all four databases. It shows protein interaction graph in different and all databases according to input molecular information.
MAS allow a view on gene physical distribution in chromosome localization. Possible co-regulation relations between genes may be predicted by location distances and expression levels. Gene co-regulated relationship graph, system intuitive interface of gene co-regulated relationship graph mapping according to gene physical location list.
Cluster analysis on dependability of pathway or GO term in different experiments can be run. The color scope of the cluster heatmap key bar is p value range. GO/Pathway cluster heatmap.
The color of GO terms is defined by the p value of the GO term. The GO terms related with a single protein will display around the protein.
The MAS graphical output of enriched GO terms in the biological process category for a sample experiment. The result graphs display enriched GOIDs and their hierarchical relationships in "biological process", "cellular component" or "molecular function" GO categories. Boxes represent GO terms, labeled by its GOID, term definition, p-value and detail information (see below) . Significantly enriched GO terms are marked yellow. The degree of color saturation of each node is positively correlated with the significance of enrichment of the corresponding GO term. Non-significant GO terms within the hierarcical tree are either shown as white boxes or drawn as points. Branches of the GO hierarchical tree without significant enriched GO terms are not shown. Edges stand for connections between different GO terms. Red edges stand for relationship between two enriched GO terms, black solid edges stand for relationship between enriched and unenriched terms, black dashed edges stand for relationship between two unenriched GO terms.
MAS provides flexible molecular relationship query and visual result display. The following will describe details of result entries.
After limiting inputed molecular symbols and combined databases, run the analysis to get the according result. Summary demonstrates abstract information of associated molecular. Input Symbol\Gene\mRNA\Protein\GO\Pathway\CPG Island\Disease\Interaction\miRNA\Probe\Promoter\TF\UniGene,all related information is the names
All columns of MAS csv format results are described below:
MAS incorporate general gene mutation-related disease database, containing Genecard, OMIM and Genetic Association Database, to show diseases related to the inputted molecular.
MAS integrate four global databases of high quality, that is, HPRD, BIND, intact and MINT, to offer protein-protein interaction annotation.
MAS integrates miRBase database to offer miRNA information of 8 species, including miRNA description, chromosome localization, target mRNA information and etc. miRNA and target mRNA information, downloaded from miRBase Target data set and sorted, is supplied to users information of miRNA regulating gene expression.
| Oligo Type | Definition |
|---|---|
| I | Oligo represents one transcript of an Ensembl gene with multiple transcripts |
| CI | Oligo represents the only transcript of an Ensembl gene |
| C | Oligo represents all transcripts of an Ensembl gene |
| P | Oligo represents a subset of transcripts of an Ensembl gene |
| M | Oligo represents multiple Ensembl genes |
| O | Oligo represented an Ensembl gene in an earlier release but no longer matches any transcripts |
MAS incorporates EPD database, the authoritative eukaryotic promoter database maintained by EBI, to offer promoter annotation about basic information, related gene and regulatory factors information. By searching molecular symbols, users can directly or indirectly get promoter annotation.
MAS extract transcription factor regulation information from TRANSFAC database to offer to users query entrance to transcription factor basic information, promoter combination information, corresponding regulatory genes information and so on.
Unigene information related to the inputted data and its expression distribution in different tissues are provided in MAS. MAS integrate species-special Unigene data, extract gene-mRNA-Unigene corresponding relations and Unigene expression in tissues and developmental stages to show to users clear and visual Unigene annotation results.
By default, the p-value of GO/Pathway enrichment is calculated as the hypergeometric probability to get so many probes/probe-sets/genes/mRNA/protein/miRNA/promoter/TF for a GO/Pathway term. To be specific, the p-value can be calculated as:
The smaller the p-value is, the more significant the GO term is enriched in your dataset. Other statistical test is also supported to calculate p-value in advanced paramter setteings, e.g. Fisher exact test and Χ2 test. However, Fisher exact test is statistically equivalent to hypergeometric test, which can be calculated much faster in R. Χ2 test is only suitable for GO/pathway terms containing very lot genes. Thus, MAS uses hypergeometric test by default.
Q-value is used for control the type I error rate in one statistical test. when MAS identifies significant enriched GO terms, the same type of statistical tests was carried out many times. Under such circumstance, controlling the false discovery rate (FDR) of the whole results would become important. There are several methods to adjusting the raw p-value to FDR for different types of data. Because individual tests in MAS would be positively related, especially for GO terms on same hierarchical trees, we use Benjamini & Hochberg (2000) method to calculate the FDR value and estimate eta0 by adaptive method.
Theorganism-specific pathways are then automatically generated by matching the enzyme genes in the gene catalog with the enzymes on the reference pathway diagrams according to the EC number. The matched enzymes are colored green in the pathway diagrams. This matching process is possible because the intermediary metabolism is relatively well conserved among different organisms.In contrast, the regulatory pathways are too divergent to be represented in a single reference diagram; they are drawn separately for each organism.
Check whether the species have been selected, and whether the symbols you input is identical with the molecular symbol type you have selected.