Creating Local Copies of KEGG Data

PATHWAY

A pathway map consists of a png image file and a "conf" file containing coordinates of map objects in the image file, from which an html page similar to the one available at the KEGG website may be reconstructed. Each file can be obtained by:
/get/<map_number>/image
/get/<map_number>/conf
and the list of map numbers can be obtained by:
/list/pathway
/list/pathway/hsa
for example, for the reference pathway maps and human pathway maps, respectively.

BRITE

The BRITE database is the only database in KEGG that is not represented by a flat file. It is a collection of htext files for BRITE hierarchies and html files for BRITE tables. The content of the BRITE database can be viewed by:
/list/brite
In addition to the htext and html files, this list includes the third type: tab-delimited text files for BRITE binary relationships. This is simply to enable retrieval in the KEGG API, such as:
/get/br:ko2go
/get/br:drug2target

KO

The KO database flat file can be copied by repeating the get operations with specification of up to 100 KO identifiers (K numbers) at a time. The entire set of K numbers can be obtained by the list operation.
/list/ko
/get/K00001+K00002+...
...
Links to/from each database can be found by the info operation and the actual links can be obtained by the link operation. For the KO database,
/info/ko
returns the list of <linked_db> and
/link/<target_db>/<source_db>
is used to obtain actual links. For example, KO to PubMed links and PubMed to KO links can be obtained, respectively, by:
/link/pubmed/ko
/link/ko/pubmed

GENES

The combination of list and get operations can be used for each of the KEGG organisms (<org>), viruses (vg) or addendum category (ag). For example, the amino acid sequence data in FASTA format for Escherichia coli can be obtained as follows
/list/eco
/get/eco:b00001+eco:b00002+.../aaseq
...
again specifying up to 100 identifiers at a time.

COMPOUND

A full set of the COMPOUND database consists of the flat file, the SDF file (multiple MOL files) and the collection of gif image files for chemical structures.
/list/cpd
/get/C00001+C00002+...
...
/get/C00001+C00002+.../mol
...
/get/C00001/image
...

DRUG

A full set of the DRUG database can be obtained in the same way as the COMPOUND database. In addition there are other useful datasets available in the BRITE database, in which drug-related files have br number identifiers starting with br083.
/find/brite/br083
For example, the ATC drug classification can be obtained by:
/get/br:br08303
The combination of info and link operations can be used, for example, to obtain links from drug products marketed in the USA.
/info/drug
/link/drug/ndc

API vs FTP

Since there are more than 5,000 organisms available in KEGG, it is not practical to try to download all different types of genes data or organism-specific pathway data through KEGG API. The file sizes of GENES and PATHWAY databases exceed 100 GB and 50 GB, respectively, which are two- to three-orders of magnitude larger than the other databases. If complete datasets are necessary, please consider obtaining a full license with FTP access. Large files containing all organisms, such as non-redundant sequence files for KO annotation and GENES ID mapping files, are available only through FTP.


KEGG Mapper Analysis

Enhanced KEGG API

The KEGG API at kegg.net for subscribers is an enhanced version of the KEGG API at kegg.jp for academic users. Especially, the new upload, idmap and map operations are intended for KEGG Mapper analysis using the Search, Search&Color and Reconstruct tools for PATHWAY, BRITE and MODULE databases. Here the KEGG API operations are explained in comparison to these web tools.

Search tools

The search tool is the most basic tool in KEGG Mapper. Given a list of objects, such as genes, KOs and compounds represented by KEGG identifiers, the tool named Search Pathway, Search Brite or Search Module first searches against the PATHWAY, BRITE or MODULE database and, second, reports the list of map, brite or module identifiers that contain the given objects. Third, by clicking on each map, brite or module identifier the objects are highlighted in red (foreground color) in each pathway map, brite hierarchy or module.

The Search procedure can be accomplished as follows:
  1. Create a dataset containing KEGG identifiers
  2. Upload the dataset
  3. Use the idmap operation for the search
  4. Process the idmap result to create the list
  5. Use the map/<dbentry> operation for highlighting

Search&Color tools

The Search&Color tools, namely, Search&Color Pathway, Search&Color Brite and Search&Color Module, allow coloring of objects in the KEGG pathway map, brite hierarchy or module. The basic procedure is the same as the Search tools and the only difference is the data to be uploaded, which may contain background and foreground color specification in the second column and/or the default specification in the comment line.

The Search&Color procedure can be accomplished s follows:
  1. Create a dataset containing KEGG identifiers and color specifications
  2. Upload the dataset
  3. Use the idmap operation for the search
  4. Process the idmap result to create the list
  5. Use the map/<dbentry> operation for coloring

Reconstruct tools

The Reconstruct tools of Reconstruct Pathway, Reconstruct Brite and Reconstruct Module are used for mapping of a KO (K number) dataset against the reference pathway maps, brite hierarchies and modules. Since they were originally developed for post-processing of the results generated by BlastKOALA and KAAS annotation servers, the K numbers should be given in the second column of the dataset. In the KEGG API the K numbers should be in the first column, which is in compatible with the Search and Search&Color tools.

The Reconstruct procedure can be accomplished s follows:
  1. Create a dataset containing K numbers with an optional comment line for default coloring such as "#default=#bfffbf"
  2. Upload the dataset
  3. Use the map/<database> operation for the search and for obtaining the list
  4. Use the map/<dbentry> operation for coloring
In fact for the KO data mapping against the reference PATHWAY, BRITE or MODULE database, the Search and Search&Color procedures mentioned above can be simplified with the use of the map <database>/operation rather than the idmap operation:
  1. Create a dataset containing K numbers with optional color specification
  2. Upload the dataset
  3. Use the map/<database> operation for the search and for obtaining the list
  4. Use the map/<dbentry> operation for highlighting or coloring
because the map/<database> operation reports the summary of the search result.

The map/module operation includes the completeness check of KEGG modules as in the Reconstruct Module tool of KEGG Mapper.


Last updated: August 1, 2017