API for HUGO Gene Nomenclature Committee (HGNC) Database

The HUGO Gene Nomenclature Committee (HGNC) is the worldwide authority that assigns standardized nomenclature to human genes. The work of the HGNC is supported by National Human Genome Research Institute (NHGRI) and Wellcome Trust grants.

Source file: ftp://ftp.ebi.ac.uk/pub/databases/genenames/new/tsv/hgnc_complete_set.txt

API Demo

The following demo shows how this API might be used with an autocompleter we've developed.

For further experimentation with the autocompleter and this API, try the autocompleter demo page.

API Documentation

API Base URL:

This data set may also be accessed through the FHIR ValueSet $expand operation.

In addition to the base URL, you will need to specify other parameters. See the query string parameters section below for details.

Query String Parameters and Default Values

At a minimum, when using the above base URL, you will need to specify the "terms" parameter containing a word or partial word to match.

Parameter NameDefault ValueDescription
terms(Required.) The search string (e.g., just a part of a word) for which to find matches in the list. More than one partial word can be present in "terms", in which case there is an implicit AND between them.
maxList Optional, with a default of 7. Specifies the number of results requested, up to the upper limit of 500. If present but the value is empty, 500 will be used.
qAn optional, additional query string used to further constrain the results returned by the "terms" field. Unlike the terms field, "q" is not automatically wildcarded, but can include wildcards and can specify field names. See the Elasticsearch query string page for documentation of supported syntax.
dfname_modA comma-separated list of display fields (from the fields section below) which are intended for the user to see when looking at the results.
sfAll fieldsA comma-separated list of fields to be searched that depends on the base url used (see above).
cfhgnc_idA field to regard as the "code" for the returned item data.
efA comma-separated list of additional fields to be returned for each retrieved list item. (See the Output format section for how the data for fields is returned.) If you wish the keys in the returned data hash to be something other than the field names, you can specify an alias for the field name by separating it from its field name with a colon, e.g., "ef=field_name1:alias1,field2,field_name3:alias3,etc. Note that not every field specified in the ef parameter needs to have an alias.

Genes Field Descriptions

FieldField Description (taken from the HGNC website)
hgnc_idA unique ID provided by the HGNC for each gene with an approved symbol. Although standard HGNC IDs are of the format HGNC:n, where n is a number, we have removed the "HGNC:" prefix, so that these values are just numbers.
hgnc_id_numThis is the hgnc_id with the "HGNC:" prefix removed, just in case some apps may want the autocomplete to work on the numeric part of the id as well.
symbolThe official gene symbol that has been approved by the HGNC. Gene symbols begin with an uppercase letter and are usually limited to 6 characters.
locationA marker for the point in the genome that can be mapped by some means.
alias_symbolOther symbols used to refer to this gene.
prev_symbolDisplays any symbols that were previously HGNC-approved nomenclature. Many genes will have no data in this field as the symbol will never have been changed.
refseq_accessionThe gene Reference Sequence (RefSeq) Accession number for the entry, provided by the NCBI. An Accession number is a unique identifier given to a sequence when it is submitted to one of the DNA repositories (GenBank, EMBL, DDBJ). The initial deposition of a sequence record is referred to as version 1. If the sequence is updated, the version number is incremented, but the Accession number remains constant.
nameThe gene name.
name_modThe "name" field with any parenthetical content removed.
alias_nameOther names used to refer to this gene.
prev_nameDisplays any names that were previously HGNC-approved nomenclature. Many genes will have no data in this field as the name will never have been changed.

Output format

Output for an API query is an array of the following elements:

  1. The total number of results on the server (which can be more than the number returned). For APIs in which there are millions of records, this number might be a lower bound due to early termination if there are more than a hundred thousand results.
  2. An array of codes for the returned items. (This is the field specified with the cf query parameter above.)
  3. A hash of the "extra" data requested via the "ef" query parameter above. The keys on the hash are the fields (or their requested aliases) named in the "ef" parameter, and the value for a field is an array of that field's values in the same order as the returned codes.
  4. An array, with one element for each returned code, where each element is an array of the display strings specified with the "df" query parameter.
  5. An array, with one element for each returned code, where each element is the "code system" for the returned code. Note that only code-system aware APIs will return this array.

Sample API Queries

QueryResultDescription
https://clinicaltables.nlm.nih.gov/api/genes/v4/search?df=symbol&terms=epilepsy [7,["HGNC:6572","HGNC:16406","HGNC:14270","HGNC:3413","HGNC:2482","HGNC:21576","HGNC:2079"],null,[["LGI1"],["EFHC1"],["PCDH19"],["EPM2A"],["CSTB"],["NHLRC1"],["CLN8"]]] Returns a count of the gene symbols associated with epilepsy along with a list of the HGNC IDs and their associated gene symbols.
https://clinicaltables.nlm.nih.gov/api/genes/v4/search?sf=symbol&terms=SCN1A [2,["HGNC:10585","HGNC:54069"],null,[["sodium voltage-gated channel alpha subunit 1"],["SCN1A and SCN9A antisense RNA 1"]]] Returns the HGNC IDs whose gene symbols start with SCN1A along with the official name of the genes.
https://clinicaltables.nlm.nih.gov/api/genes/v4/search?sf=symbol&df=refseq_accession&terms=SCN1A [2,["HGNC:10585","HGNC:54069"],null,[["NM_006920"],["NR_110260"]]] Returns the HGNC IDs of the genes whose symbols start with SCN1A along with the RefSeq accession numbers of the genes.