API for NCBI Genes
This API provides access to information about human genes, taken from NCBI's Gene dataset.
Source file: gene_info.gz (version date: )This processed data only includes the human gene data from NCBI's gene database.
The following demo shows how this API might be used with an autocompleter we've developed. (Example: Try typing NM.)
For further experimentation with the autocompleter and this API, try the autocompleter demo page.
API Base URL: https://clinicaltables.nlm.nih.gov/api/ncbi_genes/v3/search (+ query string parameters)
This data set may also be accessed through the FHIR ValueSet $expand operation.
In addition to the base URL, you will need to specify other parameters. See the query string parameters section below for details.
At a minimum, when using the above base URL, you will need to specify the "terms" parameter containing a word or partial word to match.
|Parameter Name||Default Value||Description|
|terms||(Required.) The search string (e.g., just a part of a word) for which to find matches in the list. More than one partial word can be present in "terms", in which case there is an implicit AND between them.|
|maxList||Optional, with a default of 7. Specifies the number of results requested, up to the upper limit of 500. If present but the value is empty, 500 will be used.|
|q||An optional, additional query string used to further constrain the results returned by the "terms" field. Unlike the terms field, "q" is not automatically wildcarded, but can include wildcards and can specify field names. See the Elasticsearch query string page for documentation of supported syntax.|
|df||_code_system, _code, chromosome, Symbol, description, type_of_gene||A comma-separated list of display fields (from the fields section below) which are intended for the user to see when looking at the results.|
|sf||All fields||A comma-separated list of fields to be searched.|
|cf||GeneID||A field to regard as the "code" for the returned item data.|
|ef||A comma-separated list of additional fields to be returned for each retrieved list item. (See the Output format section for how the data for fields is returned.) If you wish the keys in the returned data hash to be something other than the field names, you can specify an alias for the field name by separating it from its field name with a colon, e.g., "ef=field_name1:alias1,field2,field_name3:alias3,etc. Note that not every field specified in the ef parameter needs to have an alias.|
(Field descriptions were based on the NCBI Gene REAMDE file.)
|GeneID||The unique identifier for a gene.|
|HGNC_ID||The HGNC identifier for the gene, if one was provided in the dbXrefs field.|
|Symbol||The default symbol for the gene|
|Synonyms||A bar-delimited set of unofficial symbols for the gene|
|dbXrefs||A bar-delimited set of identifiers in other databases for this gene. The unit of the set is database:value.|
|chromosome||The chromosome on which this gene is placed. For mitochondrial genomes, the value 'MT' is used.|
|map_location||The map location for this gene (i.e. the cytogenetic location).|
|description||A descriptive name for this gene.|
|type_of_gene||The type assigned to the gene according to the list of options provided in https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/entrezgene/entrezgene.as.|
|na_symbol||The symbol from a nomenclature authority. When this is not '-', it indicates that this symbol is from a nomenclature authority.|
|na_name||The full name from a nomenclature authority. When this is not '-', it indicates that this full name is from a nomenclature authority.|
|Other_designations||A pipe-delimited set of some alternate descriptions that have been assigned to a GeneID. '-' indicates none are reported.|
|Modification_date||The last date a gene record was updated, in YYYYMMDD format.|
|_code_system||The records in this API may have multiple unique IDs, e.g., NCBI GeneID and HGNC_ID (for about half of the records). This _code_system field indicates which ID system (aka, code system) is being used for the record, see _code field below for more details. This field is not in the original source data and is not searchable.|
|_code||The records in this API may have multiple unique IDs, e.g., NCBI GeneID and HGNC_ID (for about half of the records). The _code field contains a unique ID of the record based on the 'cf' parameter (specified in the request or default). The _code_system field (details see above) indicates which ID system (aka, code system) the _code value is from. This field is not in the original source data and is not searchable.|
Output for an API query is an array of the following elements:
- The total number of results on the server (which can be more than the number returned). For APIs in which there are millions of records, this number might be a lower bound due to early termination if there are more than a hundred thousand results.
- An array of codes for the returned items. (This is the field specified with the cf query parameter above.)
- A hash of the "extra" data requested via the "ef" query parameter above. The keys on the hash are the fields (or their requested aliases) named in the "ef" parameter, and the value for a field is an array of that field's values in the same order as the returned codes.
- An array, with one element for each returned code, where each element is an array of the display strings specified with the "df" query parameter.
- An array, with one element for each returned code, where each element is the "code system" for the returned code. Note that only code-system aware APIs will return this array.
Sample API Queries
|https://clinicaltables.nlm.nih.gov/api/ncbi_genes/v3/search?terms=MTX||[5,["4580","105616916","10651","4581","345778"],null,[ ["1","4580","HGNC:7504","MTX1","metaxin 1","protein-coding"], ["5","105616916","HGNC:50545","LINC01455","long intergenic non-protein coding RNA 1455","ncRNA"], ["2","10651","HGNC:7506","MTX2","metaxin 2","protein-coding"], ["1","4581","HGNC:7505","MTX1P1","metaxin 1 pseudogene 1","pseudo"], ["5","345778","HGNC:24812","MTX3","metaxin 3","protein-coding"]]]||Returns all five genes matching MTX. The number 5 is the total available result count, which in this case is also the number returned. The following array contains the GeneID values for the genes, the subsequent "null" means no extra was requested (via the ef parameter), and the last array contains the display fields for the list items, which in this case are: chromosome, GeneID, HGNC_ID, Symbol, description, and type_of_gene.|