API for dbVar Germline Data

This API provides information from the Database of Genomic Structural Variation (dbVar) data set provided by NCBI. The subset served by this API is the germline data for assembly GRCh37.

Source files:
GRCh37.remap.all.germline.gvf.gz,   GRCh37.submitted.all.germline.gvf.gz,  and GRCh37.p13.remap.all.germline.gvf.gz

This service is provided "as is" and free of charge. Please see the Frequently Asked Questions page for more details on terms of service, etc.

API Demo

The following demo shows how this API might be used with an autocompleter we've developed. (Example: Try typing ns.)

For further experimentation with the autocompleter and this API, try the autocompleter demo page.

API Documentation

API Base URL: https://clinicaltables.nlm.nih.gov/api/dbvar/v3/search (+ query string parameters)

This data set may also be accessed through the FHIR ValueSet $expand operation.

In addition to the base URL, you will need to specify other parameters. See the query string parameters section below for details.

Query String Parameters and Default Values

At a minimum, when using the above base URL, you will need to specify the "terms" parameter containing a word or partial word to match.

A comma-separated list of additional fields to be returned for each retrieved list item. (See the Output format section for how the data for fields is returned.) If you wish the keys in the returned data hash to be something other than the field names, you can specify an alias for the field name by separating it from its field name with a colon, e.g., "ef=field_name1:alias1,field2,field_name3:alias3,etc. Note that not every field specified in the ef parameter needs to have an alias.
The parameter "df" (see above) may also be used to specify the data fields to retrieve. The main difference is that the value of "df" is always a string (for display), while the value for "ef" could be a json object when the field value has a complex structure.
Parameter NameDefault ValueDescription
terms(Required.) The search string (e.g., just a part of a word) for which to find matches in the list. More than one partial word can be present in "terms", in which case there is an implicit AND between them.
autocompIf present and equal to 1, limits the returned result size to 7; otherwise up to 500 results are returned.
qAn optional, additional query string used to further constrain the results returned by the "terms" field. Unlike the terms field, "q" is not automatically wildcarded, but can include wildcards and can specify field names. See the Elasticsearch query string page for documentation of supported syntax.
dfName, var_origin, ZygosityA comma-separated list of display fields (from the fields section below) which are intended for the user to see when looking at the results.
The parameter "ef" (see below) may also be used to specify the data fields to retrieve. The main difference is that the value of "df" is always a string (for display), while the value for "ef" could be a json object when the field value has a complex structure.
sfAll fieldsA comma-separated list of fields to be searched.
cfNameA field to regard as the "code" for the returned item data.
efA comma-separated list of additional fields to be returned for each retrieved list item. (See the Output format section for how the data for fields is returned.) If you wish the keys in the returned data hash to be something other than the field names, you can specify an alias for the field name by separating it from its field name with a colon, e.g., "ef=field_name1:alias1,field2,field_name3:alias3,etc. Note that not every field specified in the ef parameter needs to have an alias.
The parameter "df" (see above) may also be used to specify the data fields to retrieve. The main difference is that the value of "df" is always a string (for display), while the value for "ef" could be a json object when the field value has a complex structure.

DbVar Field Descriptions

Field descriptions were taken from the dbVar README file and the GVF format specification.

FieldField Description
AliasSubmitted variant id
ciendConfidence interval around column "FeatureEnd" for imprecise variants, expressed as 2 comma-delimited integers. The first is negative or 0. The second is positive or 0.
ciposConfidence interval around column "FeatureStart" for imprecise variants, expressed as 2 comma-delimited integers. The first is negative or 0. The second is positive or 0.
clinical_intThe clinical interpretation asserted by the submitter of the variant
copy_numberThe copy number of the variant
DbxrefA link to the variant web page
End_rangeIndicates fuzziness of end coordinate with an outer and/or inner
FeatureEndA 1-based integer of the end of the sequence_alteration on the plus strand
FeatureStartA 1-based integer for the beginning of the sequence_alteration locus on the plus strand
genderM or F
IDNumeric identifier, unique within each file
NameVariant accession
parentParent variant region accession
phenotypePhenotype text or 'not_reported'
sampleset_namesampleset_name for the variant call, if not directly tied to a sample
sampleset_typesampleset_type for the variant call, if not directly tied to a sample
SeqIDThe chromosome or contig on which the sequence_alteration is located
Start_rangeIndicates fuzziness of start coordinate with an outer and/or inner
TypeThe type of sequence_alteration, no_variation or a gap
var_originorigin
Zygosityzygosity

Output format

Output for an API query is an array of the following elements:

  1. The total number of results on the server, which can be more than the number of results returned. This reported total number of results may also be significantly less than the actual number of results and is limited to 10,000, which may significantly improve the service response time.
  2. An array of codes for the returned items. (This is the field specified with the cf query parameter above.)
  3. A hash of the "extra" data requested via the "ef" query parameter above. The keys on the hash are the fields (or their requested aliases) named in the "ef" parameter, and the value for a field is an array of that field's values in the same order as the returned codes.
  4. An array, with one element for each returned code, where each element is an array of the display strings specified with the "df" query parameter.
  5. An array, with one element for each returned code, where each element is the "code system" for the returned code. Note that only code-system aware APIs will return this array.

Sample API Queries

QueryResultDescription
https://clinicaltables.nlm.nih.gov/api/dbvar/v3/search?terms=nssv401 [11110,["nssv4012517","nssv4012512","nssv4012507","nssv4012500", "nssv4012495","nssv4012490","nssv4012485"],null,[["nssv4012517", "Germline","heterozygous"],["nssv4012512","Germline","heterozygous"], ["nssv4012507","Germline","heterozygous"],["nssv4012500","Germline", "heterozygous"],["nssv4012495","Germline","heterozygous"], ["nssv4012490","Germline","heterozygous"],["nssv4012485", "Germline","heterozygous"]]] Finds genomic structural variations with variant accession name containing "nssv401" from the dbVar data set. Seven of 11110 structural variations are returned as code fields, no extra data was requested ("ef" was not specified in the URL), and finally the three (default) display fields for each record are returned.