API for PharmVar Star Alleles

Star alleles are a nomenclature invented for specifying the set of variants in a metabolic gene that causes a specified change in the metabolism of a class of drugs. This star alleles data is based on the PharmVar CYP data.

This service is provided "as is" and free of charge. Please see the Frequently Asked Questions page for more details on terms of service, etc.

API Demo

The following demo shows how this API might be used with an autocompleter we've developed.

For further experimentation with the autocompleter and this API, try the autocompleter demo page.

API Documentation

API Base URL: https://clinicaltables.nlm.nih.gov/api/star_alleles/v3/search (+ query string parameters)

This data set may also be accessed through the FHIR ValueSet $expand operation.

In addition to the base URL, you will need to specify other parameters. See the query string parameters section below for details.

Query String Parameters and Default Values

At a minimum, when using the above base URL, you will need to specify the "terms" parameter containing a word or partial word to match.

Parameter NameDefault ValueDescription
terms(Required.) The search string (e.g., just a part of a word) for which to find matches in the list. More than one partial word can be present in "terms", in which case there is an implicit AND between them.
maxList7 Optional, with a default of 7. Specifies the number of results requested, up to the upper limit of 500. If present but the value is empty, 500 will be used. Note that this parameter does not support pagination, see "count" and "offset" below for details on pagination support.
count7 The number of results to retrieve (page size). The maximum count allowed is 500, see "offset" below on pagination support.
offset0 The starting result number (0-based) to retrieve. Use offset and count together for pagination. Note that the current limit on the total number of results that can be retrieved (offset + count) is 7,500. We reserve the right to decrease or increase this limit based on system capacity and/or other factors. Please see the FAQ page on how to sign up to our email list to be notified of any changes or new features.
qAn optional, additional query string used to further constrain the results returned by the "terms" field. Unlike the terms field, "q" is not automatically wildcarded, but can include wildcards and can specify field names. See the Elasticsearch query string page for documentation of supported syntax.
df StarAlleleName, cDNANucleotideChanges, GeneNucleotideChange, OtherNames, ProteinChange A comma-separated list of display fields (from the fields section below) which are intended for the user to see when looking at the results.
sf StarAlleleName, ProteinAffected, cDNANucleotideChanges, GeneNucleotideChange, OtherNames, ProteinChange A comma-separated list of fields to be searched.
cfStarAlleleNameThe star Allele name, it's the unique record ID.
efA comma-separated list of additional fields to be returned for each retrieved list item. (See the Output format section for how the data for fields is returned.) If you wish the keys in the returned data hash to be something other than the field names, you can specify an alias for the field name by separating it from its field name with a colon, e.g., "ef=field_name1:alias1,field2,field_name3:alias3,etc. Note that not every field specified in the ef parameter needs to have an alias.

Star Alleles Field Descriptions

FieldField Description
StarAlleleNameThe name of the star allele.
GenBankWhen applicable, the GenBank Accession number is listed, which is from the NIH GenBank genetic sequence database, an annotated collection of all publicly available DNA sequences
ProteinAffectedThe names for the corresponding proteins have a period between the name of the gene product and the allele number (e.g. CYP2D6.2). If the allele is unable to produce full-length protein, no protein name will be assigned.
cDNANucleotideChangesThe location of the cDNA (complementary coding DNA synthesized from mRNA) nucleotide change, which corresponds to the mRNA change in "NM" and "c." in HGVS.
GeneNucleotideChangeThe location of the non-coding genomic nucleotide change. Often uses the individual submitter’s accession numbering.
XbaIHaplotypeThis field is only used for CYP2D6 alleles. The alleles can be characterized by an XbaI restriction fragment length polymorphisms (as defined in kilobases), which affects the metabolism of debrisoquine. This field describes the length of the XbaI haplotype in kilobase length.
RFLPRFLP stands for Restriction Fragment Length Polymorphism, which is a difference in homologous DNA sequences that can be detected by the presence of fragments of different lengths after digestion of the DNA samples in question with specific restriction endonucleases. RFLP, as a molecular marker, is specific to a single clone/restriction enzyme combination. PstI+, RsaI-, and DraI- are common polymorphisms associated with CYP2E1.
OtherNamesInformal names and formerly assigned star allele names.
ProteinChangeRefers to both the protein change that results from the cDNA nucleotide change, which corresponds to "NP" and "p." in HGVS, and the altered functionality of the allele (i.e. splicing defect). All of the cDNA nucleotide changes will correspond to the effect(s) listed.
InVivoEnzymeActivityThe enzyme activity studied in vivo. Usually defined by Normal, None, Increased, or Decreased drug metabolism.
InVitroEnzymeActivityThe enzyme activity studied in vitro. Usually defined by Normal, None, Increased, or Decreased drug metabolism.
ReferencesThe references to the paper(s) that use the star alleles to describe these variants.
ClinicalPhenotypeFor POR star alleles, this field is defined by Normal, None, Increased, or Decreased drug metabolism. For CYP21A2 star alleles, this field refers specifically to phenotypes related to congenital adrenal hyperplasia; SW = salt-wasting congenital adrenal hyperplasia; SV = simple-virilizing congenital adrenal hyperplasia; NC= non-classic congenital adrenal hyperplasia.
NotesThese are notes that apply to specific star alleles regarding gene sequencing and numbering.

Output format

Output for an API query is an array of the following elements:

  1. The total number of results on the server, which can be more than the number of results returned. This reported total number of results may also be significantly less than the actual number of results and is limited to 10,000, which may significantly improve the service response time.
  2. An array of codes for the returned items. (This is the field specified with the cf query parameter above.)
  3. A hash of the "extra" data requested via the "ef" query parameter above. The keys on the hash are the fields (or their requested aliases) named in the "ef" parameter, and the value for a field is an array of that field's values in the same order as the returned codes.
  4. An array, with one element for each returned code, where each element is an array of the display strings specified with the "df" query parameter.
  5. An array, with one element for each returned code, where each element is the "code system" for the returned code. Note that only code-system aware APIs will return this array.

Sample API Queries

QueryResultDescription
https://clinicaltables.nlm.nih.gov/api/star_alleles/v3/search?terms=CYP1A [56,["CYP1A1*2B","CYP1A1*6","CYP1A1*10","CYP1A2*1B","CYP1A2*1G","CYP1A2*1M","CYP1A2*1S"],null,[["CYP1A1*2B","","2454A>G; 3798T>C (MspI)","","I462V"],["CYP1A1*6","","1635G>T","","M331I"],["CYP1A1*10","","2499C>T","","R477W"],["CYP1A2*1B","","5347T>C","",""],["CYP1A2*1G","","-739T>G; 5347T>C","",""],["CYP1A2*1M","","-163C>A; 2159G>A","",""],["CYP1A2*1S","","-3053A>G; 5347T>C","",""]]] Returns 7 (out of 56 total) star alleles records that match the "CYP1A" prefix.