KEGG Databases

KEGG is an integrated database consisting of fifteen databases (and one under development) shown in Table 1.

Table 1. KEGG databases

Database name Abbre-
viation
Content Release
year
kegg2 pathway path KEGG pathway maps 1995
brite br Brite functional hierarchies 2005
module md KEGG modules 2006
ko ko Functional orthologs 2002
genome gn KEGG organisms (complete genomes) 2000
genes1 <org>
vg
ag
Genes in KEGG organisms
Genes in viruses category
Genes in addendum category
1995
2015
2015
ligand3 compound cpd Metabolites and other small molecules 1995
glycan gl Glycans 2003
reaction
rclass
rn
rc
Biochemical reactions
Reaction class
1998
2010
enzyme ec Enzyme nomenclature 1995
medicus network
variant
ne
hsa_var
Disease-related network elements
Human gene variants
2017
2017
disease ds Human diseases 2008
drug
dgroup
dr
dg
Drugs
Drug groups
2005
2014
environ ev Crude drugs and health-related substances 2010
  • 1 "genes" is a composite database consisting of KEGG organisms with three- or four-letter <org> codes, and viruses (vg) and addendum (ag) categories.
  • 2 "kegg" stands for the collection of all databases shown above.
  • 3 "ligand" stands for the collection of chemical databases: compound, glycan, reaction and enzyme.

KEGG Identifiers

In the general form an entry of any database is identified by
<database>:<entry>
where <database> is the database name or its abbreviation defined in Tables 1-3 and <entry> is the entry name or the accession number given by the database.

In most KEGG databases except "genes", "enzyme" and "variant", <entry> is named with the database-dependent prefix followed by a five-digit number (see KEGG Objects for more details). This is the KEGG identifier
<kid>
shown in Table 2, such as K number, C number and D numbers as identifiers of "ko", "compound" and "drug" databases, respectively.

A "genes" entry is identified by
<org>:<gene>
where <org> is the three- or four-letter KEGG organism code or the T number genome identifier and <gene> is the gene identifier, usually NCBI GeneID or INSDC Locus_tag (see KEGG GENES).

Table 2. KEGG identifiers

DB name Abbrev <kid> prefix Example
pathway path map, ko
ec, rn
<org>
 
map00010
hsa04930
map00010
hsa04930
brite br br, jp
ko

<org>
br:08303
br:01002
br08303
ko01002
module md M <org>_M M00010 M00010
ko ko K K04527
genome gn T T01001 (hsa)
genes <org>
vg
ag
- hsa:3643
vg:155971
ag:CAA76703
compound cpd C C00031
glycan gl G G00109
reaction
rclass
rn
rc
R
RC
R00259
RC00046
enzyme ec - ec:2.7.10.1
network
variant
ne
hsa_var
N
-
N00002
hsa_var:25v1
disease ds H H00004
drug
dgroup
dr
dg
D
DG
D01441
DG00710
environ ev E E00048

Note that "pathway", "brite" and "module" consist of manually created reference datasets and computationally generated organism-specific datasets with prefix containing <org> (see KEGG Pathway Maps). Organism-specific identifiers are not real <kid>, and may require <database>: specification.

Other Database Entry Identifiers

KEGG MEDICUS is a translational bioinformatics resource, both in English and Japanese, promoting use of genomic information in society. Table 3 shows an additional set of databases in KEGG MEDICUS.

Table 3. KEGG MEDICUS extension databases

DB name Abbrev Entry ID Remark
disease_ja ds_ja H number In Japanese
drug_ja
dgroup_ja
dr_ja
dg_ja
D number
DG number
In Japanese
environ_ja ev_ja E number In Japanese
compound_ja cpd_ja C number In Japanese
brite_ja br jp number In Japanese
atc 7-letter ATC code ATC classification
atc_ja 7-letter ATC code ATC classification (in Japanese)
jtc Therapeutic category code Therapeutic category in Japan (in Japanese)
jtc_en Therapeutic category code Therapeutic category in Japan
ndc National Drug Code Drug products in the USA
yj YJ code Drug products in Japan (in Japanese)

In addition to these databases, the NCBI PubMed database shown in Table 4 is tightly integrated in KEGG. Table 4 also contains databases used only for ID conversion (conv operation).

Table 4. Outside databases integrated in KEGG

DB name Abbrev Entry ID Remark
pubmed pmid PubMed ID
ncbi-geneid Gene ID ID conversion only
ncbi-proteinid Protein ID ID conversion only
uniprot up UniProt Accession ID conversion only
pubchem PubChem SID ID conversion only
chebi ChEBI ID ID conversion only

KEGG Database Links

The databases shown above are highly integrated with mutual links as summarized in Table 5.

Table 5. KEGG database links

Database Internal links Outside links1
pathway module ko genome <org> compound glycan reaction rclass enzyme disease drug pubmed
brite module ko genome <org> compound glycan reaction rclass disease drug environ
module pathway brite ko genome <org> compound glycan reaction enzyme pubmed
ko pathway brite module genes <org> vg ag reaction rclass enzyme disease drug dgroup pubmed
genome pathway brite module <org> vg ag compound disease pubmed
<org> pathway brite module ko genome enzyme (network disease drug dgroup)2 ncbi-geneid ncbi-proteinid uniprot
vg ko genome enzyme ncbi-geneid ncbi-proteinid uniprot
ag ko genome enzyme pubmed ncbi-proteinid uniprot
compound pathway brite module genome glycan reaction enzyme disease drug environ pubchem chebi
glycan pathway brite module compound reaction enzyme disease pubchem chebi
reaction pathway brite module ko compound glycan rclass enzyme
rclass pathway brite ko reaction enzyme
enzyme pathway module ko <org> vg ag compound glycan reaction rclass
network pathway ko hsa compound hsa_var disease drug pubmed
hsa_var hsa network pubmed
disease pathway brite ko genome hsa compound glycan drug dgroup pubmed
drug pathway brite ko hsa compound disease dgroup environ atc jtc ndc yj pubchem chebi
dgroup ko hsa disease drug
environ brite compound drug
  • 1 Outside links for ID conversion
  • 2 Parentheses for hsa (Homo sapiens) only


Last updated: December 1, 2017