PI Corner

Using Correct Gene and Protein Nomenclature

by Dr. Jerry Kidder

Those of us in biomedical research who work at the molecular level are challenged to learn and remember the names and symbols (abbreviations) of a plethora of genes and proteins important for the processes we are studying.  Often, the original name for a protein is based on what its discoverers regarded its function to be, and that can change over time as the protein’s various roles in different biological processes are elucidated.  Sometimes, a protein is simultaneously identified by different research teams studying different processes, thereby resulting in the same protein being given different names.  When this happens, a literature search using one of the names may fail to identify published papers in which the other name was used.  For example, in my own research field, where the developmental and physiological roles of the 21-member connexin family of gap junction proteins are being explored, several different symbols had been in use for each of the connexin-encoding genes.  Given the confusion this caused in our field, the researchers in attendance at the 2007 International Gap Junction Conference adopted a coherent system of gene and protein symbols that had been developed by an ad hoc working group of “connexinologists” (I chaired the working group; it took us three years to reach a consensus!).  Subsequently, the connexin gene nomenclature adopted at the conference was endorsed by the human and mouse genome committees and is now used on their websites. 

There is an obvious need for the international research community to decide on a single name for every human protein and the gene that encodes it.  As with the connexins, this task falls on committees, usually international in scope and consisting of experts in different categories of proteins (enzymes, paracrine factors, ion channels, transcription factors etc.), who are given the responsibility of sorting out redundancies to come up with a single coherent system for naming genes and proteins within each category.  Having established a protein’s official name, a (usually) 3-5 character symbol is decided upon to designate the protein and its cognate gene.  Once such a nomenclature system is established, it is up to individual researchers- backed by journal editors- to adhere to it in order to eliminate confusion in the literature going forward.

Just as there are officially accepted names for genes and proteins, there are definite rules concerning gene and protein symbols.  Generally speaking, the official gene symbol is derived directly from the protein symbol by using italics: the human gene encoding the transcription factor MIST1, for example, is MIST1 (the same symbol is used for its mRNA).  Adherence to the correct use of italics eliminates any confusion about whether an author is referring to the gene/transcript or the protein it encodes.  Protein symbols are always presented in upper case letters, never italicized, and do not differ from one mammal to another.  Likewise, gene symbols for all mammals are presented in upper case letters- with the exception of rodents.  Rodent gene symbols use both upper and lower case letters, e.g Mist1.  This distinction avoids confusion when a journal article refers to results obtained from both human and mouse work: the reader always knows which of the two species’ gene is being referred to.  In addition to italics, there are other features that distinguish gene symbols: they do not include Greek letters and the use of hyphens is discouraged.  For example, the mouse or rat gene encoding the transforming growth factor-β type 2 receptor (protein symbol TGFβR2) is Tgfbr2.  The human gene is TGFBR2.  If you are unsure about the accepted name of your gene, go to the Mouse Genome Informatics website at http://www.informatics.jax.org/ and type in the name of the protein it encodes.

There are additional rules that govern mutant alleles created via mutagenesis or genetic manipulation (transgenics, knockins, knockouts etc.).  For detailed guidelines on rodent gene nomenclature, consult the Mouse Genome Informatics website at (http://www.informatics.jax.org/mgihome/nomen/gene.shtml).