Quick Start

Support » Quick Start

Welcome to the Molecule Annotation System (MAS) Quick Start.

I. Overview

MAS (Molecule Annotation System) is a whole data-mining and function annotation solution to extract & analyze biological molecules relationships from public knowledgebase of biological molecules and signification. MAS analysis platform is a web client program for interactive navigation in the knowledge base. MAS uses relational database of biological networks created from millions of individually modeled relationships between genes, proteins, diseases and tissues. MAS allow a view on your data, integrated in biological networks according to different biological context. This unique feature results from multiple lines of evidence which are integrated in MAS' database. MAS Help to understand relationship of gene expression data.

The primary databases of MAS integrated various well-known biological resources such as Genbank, EMBL, SwissProt, Gene Ontology, KEGG, BioCarta, GenMapp, mirBase, EPD, HPRD, MIND, BIND, Intact, TRANSFAC, UniGene, dbSNP, OMIM, InterPro, HUGO, MGI and RGD. MAS offers various query entries, graphics result display, The system represents an alternative approach to mining and catch on biological signification for high-throughput array data.

II. Requirements

MAS is a web-base application so that a computer with a standard web browser using default setting should work well. MAS was tested with several combinations of browser and operation systems, including Microsoft Internet Explorer(6,7,8), Mozilla Firefox(3+) and Google Chrome in Windows XP/Vista/7, Mozilla Firefox(3+), Google Chrome and Opera(10) in GNU/Linux. Other W3C standards-compliant browser should work.

III. Create Account

Open your browser, go to http://bioinfo.capitalbio.com/mas3

Click "Create Account" on the upper right corner, you are required to fill out the registration form (email, name, and organization etc.). Make sure that the properties filled out by your full contact information.

IV. Login

After the account creation process, you will receive an email to guide you to login or an rejection email because of your incorrect information.

Open your browser, go to http://bioinfo.capitalbio.com/mas3

Click "Login" on the upper right corner, you are required to fill out the login form (email, and password etc.). Press the "Login" button to log into MAS.You also can get back a new password by yourself If you forget the old one. Lots of help doc are available in the 'support' module,such as 'Quick Start','FAQ','Tutorial','Resource','Data Source','Discussion Forum','Contact Us'.

V. Project

Click "Project" in the top menu, click the "Create New Project" button to create a new project for collecting analyses.

VI. Analysis

Click "Project" in the top menu, enter the project,and click the "Create new Analysis" button to create a new analysis.

System provides two groups of sample data, coming from Human (with experiment data columns) and a.thaliana (with single column).

To define the analysis when creating a new analysis includes the following contents:

Basic Information

  • Name: defines the analysis name.
  • Description: defines the analysis description.

Parameters

  • Species: defines the species of data.
  • Molecule Type: defines the molecule type of data.
  • Database Symbol: defines the database symbol of data, eg., "2745418" is a NCBI Gene ID symbol, "AT1G01040" is a TAIR gene, and "NP_001030616" is a NCBI reference protein. You can select all symbols when you don't know how to select one correctly.
  • Data has title: The first line will not involved in the analysis process if the input data has the title line.
  • Data/Data File:paste symbol list or upload a txt File to query and analyze
  • Regulation Threshold: Thresholds for filtering differentially expressed genes,the genes with the values in the range will not displayed in every experiment tab of results.
  • Email Notification: There will be a notice email for you when analysis complete If it is chosen.

Click "Create" to submit an analysis to server side,and click the "Start Analysis" button to run a new analysis.

Result

When an analysis is finished, click "Show Result" in the Operations side bar to view the result of this analysis.'download' button will supply you with a result zip package.'copy(clone)','edit','delete' operations are available too.

You can close the web page while the analysis is waiting / running and check result after it is completed.

What is the meaning of the MAS graphical results?

GO mapping

Here, Gene Ontology term information associated with inputted data, along with GO term significance, is shown. Distribution of the selected GO term can be graphically displayed, Distribution of the selected GO term can be graphically displayed.

Correlated Genes

Genes co-exist frequency statistics and according graphic display in pathway.

Protein interaction graph

Every color indicates interaction in one database, gray is summary in all four databases. It shows protein interaction graph in different and all databases according to input molecular information.

Co-regulated Gene graph

MAS allow a view on gene physical distribution in chromosome localization. Possible co-regulation relations between genes may be predicted by location distances and expression levels. Gene co-regulated relationship graph, system intuitive interface of gene co-regulated relationship graph mapping according to gene physical location list.

Pathway/GO cluster analysis heatmap

Cluster analysis on dependability of pathway or GO term in different experiments can be run. The color scope of the cluster heatmap key bar is p value range. GO/Pathway cluster heatmap.

The network of Go-Protein

The color of GO terms is defined by the p value of the GO term. The GO terms related with a single protein will display around the protein.

GO enrichment graph

The MAS graphical output of enriched GO terms in the biological process category for a sample experiment. The result graphs display enriched GOIDs and their hierarchical relationships in "biological process", "cellular component" or "molecular function" GO categories. Boxes represent GO terms, labeled by its GOID, term definition, p-value and detail information (see below) . Significantly enriched GO terms are marked yellow. The degree of color saturation of each node is positively correlated with the significance of enrichment of the corresponding GO term. Non-significant GO terms within the hierarcical tree are either shown as white boxes or drawn as points. Branches of the GO hierarchical tree without significant enriched GO terms are not shown. Edges stand for connections between different GO terms. Red edges stand for relationship between two enriched GO terms, black solid edges stand for relationship between enriched and unenriched terms, black dashed edges stand for relationship between two unenriched GO terms.

What are the meanings of every column in the MAS csv results?

MAS provides flexible molecular relationship query and visual result display. The following will describe details of result entries.

Summary

After limiting inputed molecular symbols and combined databases, run the analysis to get the according result. Summary demonstrates abstract information of associated molecular. Input Symbol\Gene\mRNA\Protein\GO\Pathway\CPG Island\Disease\Interaction\miRNA\Probe\Promoter\TF\UniGene,all related information is the names

All columns of MAS csv format results are described below:

CpGisland

  • Input Symbol: the symbol you input in the text
  • Gene: the gene name found in MASCORE according to symbol
  • CPG Island Number: CPG Island number in the promoter range of some gene
  • CPG Island Start: CPG Island start site in the promoter of some gene
  • CPG Island End: CPG Island end site in the promoter of some gene
  • CPG Island Lengh:the start site minus the end site
  • CPG Island GCC: GC content in the CPG Island
  • CPG Island OBS: ObsCpG/ExpCpG

Disease

MAS incorporate general gene mutation-related disease database, containing Genecard, OMIM and Genetic Association Database, to show diseases related to the inputted molecular.

  • Input Symbol: the symbol you input in the text
  • Gene: the gene name found in MASCORE according to symbol
  • Disease Name: for example: leukemia promyelocytic acute
  • Disease Class: for example:leukemia
  • Disease Source: original database resources the disease information coming from.

Gene

  • Input Symbol: the symbol you input in the text
  • Gene MAS ID: MASCORE gene id
  • Gene DBID: NCBI Gene ID
  • Gene Name: public gene name
  • Gene Chromosome: the number of the chromosome that gene belong to
  • Gene Location: gene location on the chromosome, for example: 3p21.31
  • Gene Start: gene sequence start site on the chromosome
  • Gene End: gene sequence end site on the chromosome
  • Gene Length: the start site minus the end site
  • Gene Orient: The 'gene orientation' concept has a unique instance: 5' -> 3', is defined as '+'
  • Gene Synonymy: Synonymy name of the gene commonly used
  • Gene Description:the function and annotation description of the gene

GO - Index By GO

  • GO: GO acc+ Ontology: Identifiers (GO acc) used in Gene Ontology Project. Ontology To which ontology category does the GOID belong to, namely "biological process", "cellular component" or "molecular function".
  • Count: count of proteins (found by input symbol) in the GO term
  • p-Value: P-value of the significance for the enrichment in your dataset of the listed GO. Could be either raw p-value
  • q-Value: Q is Fasle Discovery Rate (FDR) ,Q default 0.05, the less the q is, the more significant the genes (or proteins) enriched in the one pathway (or GO term) , and the less FDR.
  • Significant: if q value < 0.05 (default) ,the genes is significantly existed in the pathway.
  • Protein: protein of input symbols: probes/probesets/genes/mRNA/miRNA/promoter/TF/ belong to.
  • Input Symbol: the symbol you input in the text

GO - Index By Symbol

  • Input Symbol:the symbol you input in the text
  • Protein: protein of input symbols: probes/probesets/genes/mRNA/miRNA/promoter/TF/ belong to.
  • Count: count of GO (found by input symbol) related with the protein
  • Molecular Function (MF) : the GO acc+ontology in the type related with the protein
  • Biological Process (BP) : the GO acc+ontology in the type related with the protein
  • Cellular Component (CC) : the GO acc+ontology in the type related with the protein

Interaction

MAS integrate four global databases of high quality, that is, HPRD, BIND, intact and MINT, to offer protein-protein interaction annotation.

  • Input Symbol: the symbol you input in the text
  • Protein: protein of input symbols: probes/probesets/genes/mRNA/miRNA/promoter/TF/ belong to.
  • Interaction DBID:Interaction id in the original database ,for example, DIP-88056E
  • Interaction Type: binary or complex
  • Interaction PMID:the pubmed ID that publish the interaction information
  • Interaction Detection:the interaction detection method: yeast 2-hybrid
  • Interaction Confidence: get Confidence score through homomint-score
  • Interaction Source: original database resources the disease information coming from.

miRNA

MAS integrates miRBase database to offer miRNA information of 8 species, including miRNA description, chromosome localization, target mRNA information and etc. miRNA and target mRNA information, downloaded from miRBase Target data set and sorted, is supplied to users information of miRNA regulating gene expression.

  • Input Symbol: the symbol you input in the text
  • mRNA: mRNA of input symbols: probes/probesets/genes/mRNA/miRNA/promoter/TF/ belong to and related with the miRNA.
  • miRNA Name:the miRNA name given by miRBASE
  • ACC: the miRNA acc given by miRBASE
  • Mature miRNA:Mature miRNA name
  • Description:for example:The mature sequence shown here represents the most commonly cloned form from large-scale cloning studies .
  • Sequence: sequence of the miRNA
  • Chromosome: the chromosome that miRNA belong to
  • Start: start site on the chromosome
  • End: end site on the chromosome
  • Orient: The 'miRNA orientation' concept has a unique instance: 5' -> 3', is defined as '+'

mRNA

  • Input Symbol: the symbol you input in the text
  • mRNA MAS ID: mRNA id in MASCORE
  • mRNA Name:public name of the mRNA
  • mRNA Synonymy: Synonymous name of the mRNA

Pathway - Index By Pathway

  • Pathway: pathway name used in every pathway data base: biocarta,kegg,genmapp
  • Count: count of genes (found by input symbol) in the pathway term
  • p-Value: P-value of the significance for the enrichment in your dataset of the listed pathway. Could be raw p-value.
  • q-Value: Q is Fasle Discovery Rate (FDR) ,Q default 0.05, the less the q is, the more significant the genes (or proteins) enriched in the one pathway (or GO term) , and the less FDR.
  • Significant: if q value < 0.05 (default) ,the genes is significantly existed in the pathway
  • Gene: gene of input symbols: probes/probesets/genes/mRNA/protein/miRNA/promoter/TF/ belong to.
  • Input Symbol: the symbol related with the gene and that you input in the text

Pathway - Index By Symbol

  • Input Symbol: the symbol you input in the text
  • Gene: gene of input symbols: probes/probesets/genes/mRNA/protein/miRNA/promoter/TF/ belong to.
  • Count: count of pathway (found by input symbol) related with the gene
  • KEGG: pathway original database KEGG (Kyoto Encyclopedia of Genes and Genomes)
  • GenMAPP: pathway original database GenMAPP is a free computer application designed to visualize gene expression and other genomic data on maps representing biological pathways and groupings of genes
  • BioCarta: pathway original database BioCarta

Probe

  • Input Symbol: the symbol you input in the text
  • mRNA: mRNA of input symbols: probes/probesets/genes/mRNA/protein/miRNA/promoter/TF/ belong to.
  • Probe ID: probe id or probe set of input symbols,probes/probe sets/genes/mRNA/protein/miRNA/promoter/TF/ related to.
  • Probe Probe Name:probe name
  • Probe Oligo Type:for example: I and CI
    Oligo Type Definition
    I Oligo represents one transcript of an Ensembl gene with multiple transcripts
    CI Oligo represents the only transcript of an Ensembl gene
    C Oligo represents all transcripts of an Ensembl gene
    P Oligo represents a subset of transcripts of an Ensembl gene
    M Oligo represents multiple Ensembl genes
    O Oligo represented an Ensembl gene in an earlier release but no longer matches any transcripts
  • Probe Sequence:
  • Probe Sequence Type: for example, Consensus sequence
  • Probe Sequence Source: for example, GenBank

Promoter

MAS incorporates EPD database, the authoritative eukaryotic promoter database maintained by EBI, to offer promoter annotation about basic information, related gene and regulatory factors information. By searching molecular symbols, users can directly or indirectly get promoter annotation.

  • Input Symbol: the symbol you input in the text
  • Gene: gene of input symbols: probes/probe sets/genes/mRNA/protein/miRNA/promoter/TF/ belong to.
  • Promoter DBID: promoter id in the original database ,for example, HS_H1F5
  • Promoter ACC: promoter acc in the original database ,for example, S_H1F5EP11070
  • Promoter Description: for example:Cellular-Abelson murine leukemia virus oncogene 7kb.
  • Promoter Sequence:
  • Promoter Neighbouring Promoter:
  • Promoter Alternative Promoter: for example, Alternative promoter #3 of 3; 5' exon 2; site 1.

Protein

  • Input Symbol: the symbol you input in the text
  • Protein Name: public protein name
  • Protein DBID: public protein id
  • Protein ACC: NCBI protein acc
  • Protein Synonymy: Synonymy name of the protein commonly used
  • Protein Description: the function and annotation description of the protein
  • Protein Subcell Location: for example 'Cytoplasm'
  • Protein Function: for example :Enhances AR-mediated transactivation.
  • Protein Tissue Specificity: for example : Higher level expression is seen in the coloncarcinoma tissue than normal colon tissue.
  • Protein Interpro:the related protein name in the interpro
  • Protein Pfam: the related protein name in the Pfam

TF

MAS extract transcription factor regulation information from TRANSFAC database to offer to users query entrance to transcription factor basic information, promoter combination information, corresponding regulatory genes information and so on.

  • Input Symbol: the symbol you input in the text
  • Gene: gene of input symbols: probes/probe sets/genes/mRNA/protein/miRNA/promoter/TF/ belong to and that the TF related with
  • TF Name:transcription factor name
  • TF DBID:public TF id in TRANSFAC
  • TF ACC: public TF acc in TRANSFAC
  • TF Synonymy:
  • TF SwissProt Accession:TF SwissProt acc
  • TF Matrix: the sequence with the same characteristics near the identification site of the transcription factors

UniGene

Unigene information related to the inputted data and its expression distribution in different tissues are provided in MAS. MAS integrate species-special Unigene data, extract gene-mRNA-Unigene corresponding relations and Unigene expression in tissues and developmental stages to show to users clear and visual Unigene annotation results.

  • Input Symbol: the symbol you input in the text
  • Gene: gene of input symbols: probes/probesets/genes/mRNA/protein/miRNA/promoter/TF/ belong to and that the unigene related with
  • UniGene DBID:NCBI unigene id
  • UniGene Chromosome: the number of the chromosome that UniGene belong to
  • UniGene Location: UniGene location on the chromosome, for example: 3p21.31
  • UniGene Description:property,for example: Dipeptidyl-peptidase 6, mRNA (cDNA clone IMAGE:5494573)
  • UniGene NCBI Gene ID: NCBI Gene ID of gene found in MASCORE according to the unigene
  • UniGene Gene: NCBI Gene name of gene found in MASCORE according to the unigene
  • UniGene Express:Unigene tissue specific expression.

How does MAS calculate the p-value for enrichment by default? Why?

By default, the p-value of GO/Pathway enrichment is calculated as the hypergeometric probability to get so many probes/probe-sets/genes/mRNA/protein/miRNA/promoter/TF for a GO/Pathway term. To be specific, the p-value can be calculated as:

The smaller the p-value is, the more significant the GO term is enriched in your dataset. Other statistical test is also supported to calculate p-value in advanced paramter setteings, e.g. Fisher exact test and Χ2 test. However, Fisher exact test is statistically equivalent to hypergeometric test, which can be calculated much faster in R. Χ2 test is only suitable for GO/pathway terms containing very lot genes. Thus, MAS uses hypergeometric test by default.

Why MAS recommend using q value adjustment for the p-value?

Q-value is used for control the type I error rate in one statistical test. when MAS identifies significant enriched GO terms, the same type of statistical tests was carried out many times. Under such circumstance, controlling the false discovery rate (FDR) of the whole results would become important. There are several methods to adjusting the raw p-value to FDR for different types of data. Because individual tests in MAS would be positively related, especially for GO terms on same hierarchical trees, we use Benjamini & Hochberg (2000) method to calculate the FDR value and estimate eta0 by adaptive method.

What the meaning of green color in the pathway graph?

Theorganism-specific pathways are then automatically generated by matching the enzyme genes in the gene catalog with the enzymes on the reference pathway diagrams according to the EC number. The matched enzymes are colored green in the pathway diagrams. This matching process is possible because the intermediary metabolism is relatively well conserved among different organisms.In contrast, the regulatory pathways are too divergent to be represented in a single reference diagram; they are drawn separately for each organism.

How to get a correct result in summary and all the different level?

Check whether the species have been selected, and whether the symbols you input is identical with the molecular symbol type you have selected.

MAS 3.37 Copyright © 2007-2017 CapitalBio Corporation. All rights reserved.