Introduction to Proteins in Molecular Biology

by Pallabi Roy Chakravarty, Ph.D.

Studying and experimenting on proteins have become a mainstay of modern bioscience.

In this article we will take a quick glance at the fundamentals of protein biochemistry.

What are proteins?

Chemically, proteins are long chains of amino acids.

Akin to how single brick pieces together make up a large wall, each constituent amino acid is a monomer that together builds up the polymer i.e., protein. Consequently, a protein has a very high molecular weight. Being large molecules, they are thus termed ‘macromolecules’.

Individual amino acids in a protein are technically called ‘residues’.

A word on amino acids

Amino acids are a type of organic compound.

If we recap our high school chemistry, the carbon atom (C) is tetravalent. That is, a carbon atom can make four bonds. Further, C can bond with both positively charged atoms or groups such as hydrogen (H⁺), (-NH₃⁺) etc., as well as with negatively charged atoms or groups like oxygen (O^2-), sulfur (S^2-) and carboxylate (-CO₂^-).

In an amino acid molecule, the central carbon atom, known as the α-carbon (C- α), is bonded to these four groups (moieties):

α-carbon (C- α) bonded to four groups (moieties) - amino acids

The first three moieties are present in all amino acids. Since both the ‘amino’ as well as the carboxylic ‘acid’ groups are present for this class of compounds, they are termed ‘amino acids.'

Schematic representation of the generic structure of an amino acid.

Figure 1. Schematic representation of the generic structure of an amino acid.

The variations in amino acids lie in their “R groups”, which is unique for each amino acid. Amino acids are classified based on these respective side chains. R groups can be as simple as a single hydrogen atom. This is the case for glycine, the amino acid with the simplest structure.

Glycine amino acid labeled

Figure 1.2. Shows the single hydrogen in the R-group of glycine.

As the next level of complexity, the R group can be aliphatic (organic compounds without a ring structure. They can be straight/ branched chains but should not have rings). Alanine, valine, leucine and isoleucine comprise this class of amino acids.

Amino acids alanine, valine, leucine and isoleucine with complex R-groups

Figure 1.3. In contrast to glycine whose R group is a single hydrogen, alanine, valine, leucine and isoleucine are examples of having more complex R- groups.

Proline, though considered to be an aliphatic amino acid, nevertheless is an aberration in this group. Here the aliphatic side chain connects with N of the amino group bonded to the C- α, forming a loop.

Proline is an example of an amino acid with an R-group containing a ring (aliphatic amino acid)

Figure 1.4. Proline contains a ring and is an example of an aliphatic amino acid.

For two amino acids, serine and threonine, the R group contains an alcohol group (-OH) Examples of amino acids containing alcohol groups (serine and threonine) and amino acids changed to a thiol group (cysteine and mehionine).

Figure 1.5 Examples of amino acids containing alcohol groups (serine and threonine) and amino acids changed to a thiol group (cysteine and mehionine).

As a slight variation, if the -OH is changed to a thiol group (-SH), that forms a distinct class of amino acids that contain sulfur. As we will see later, these amino acids have distinct physiological properties owing to the constituent S atom. Members of this amino acid type are cysteine and methionine.

If an R group contains an amide (-CO(NH₂)) functional group, that constitutes another class of amino acids. Examples are glutamine and asparagine.

R group contains an amide (-CO(NH2)) functional group are another class of amino acids. Examples are glutamine and asparagine.

Figure 1.6. Glutamine and asparagine are examples of R groups containing an amide functional group.

For amino acids tryptophan, tyrosine and phenylalanine, the R group has an aromatic ring. They form the category- aromatic amino acids (organic compounds with a ring structure).

Shows aromatic rings in the R groups of tryptophan, tyrosine and phenyalanine.

Figure 1.7. Shows aromatic rings in the R groups of tryptophan, tyrosine and phenyalanine.

For two amino acids, aspartic acid and glutamic acid, the R group has a carboxylic (COO ^-) functional group. They are classified as negatively charged (anionic) side chain amino acids.

Illustrates the carboxylic functional group in aspartic acid and glutamic acid.

Figure 1.8. Illustrates the carboxylic functional group in aspartic acid and glutamic acid.

Alternatively, the R group is positively charged (cationic) for lysine, arginine and histidine owing to the presence of a positively charged N atom in their R group.

Lysine, arginine and histidine are positvely charged amino acids

Figure 1.9. Lysine, arginine and histidine are examples of amino acids with a positively charged R group.

Table 1. is a quick, simple guide to the structures of amino acids commonly seen in proteins.

Simplest R
Amino Acid	Structure
Glycine
Simple Aliphatic R
Alanine
Valine
Leucine
Isoleucine
Proline
-OH containing R
Serine
Threonine
-SH containing R
Cysteine
Methionine
-CONH2 containing R
Glutamine
Asparagine
Aromatic group containing R
Tryptophan
Tyrosine
Phenylalanine
COO- group in R
Aspartate (Aspartic Acid)
Glutamate (Glutamic Acid)
N group in R
Lysine
Arginine
Histidine

Peptide bond: the road to polypeptide formation

Now that we are familiarized with the individual building blocks (amino acid residues) of proteins, let us understand how amino acids get joined to each other to form a protein.

As depicted in Figure 2, two amino acids are in close proximity to each other and the functional groups readily react.

Whenever the carboxylic acid functional group of amino acid 1 (represented with side chain R1) is near the amino group of the second amino acid (represented with side chain R2), these two moieties react. The -COOH group of one amino acid loses an OH and the -NH₂group from the other amino acid loses a H, which combine to form a water (H₂O) molecule byproduct.

Meanwhile, the two amino acids participating in the reaction gets joined via a -CO-NH- bond between the carboxy terminus of one amino acid and the amino terminus of the second amino acid. This linkage is called the ‘peptide bond’.

Illustrates the peptide bond formation between amino acids

Figure 2. Illustrates the peptide bond formation between amino acids.

This new molecule (two amino acids joined to each other with a peptide bond) is a ‘peptide’.

At one end of this molecule is a free carboxyl group. At the other end is a free amino group.

When two amino acids are joined, the corresponding peptide bond is termed as dipeptide (‘di’=two). When a third amino acid participates and three are joined together, it now becomes a tripeptide (‘tri’=three). As the length increases to four amino acids, the resultant product is consequently a tetrapeptide (‘tetra’=four) and so on.

Two amino acids – Dipeptide
Three amino acids – Tripeptide
Four amino acids – Tetrapeptide
Five amino acids – Pentapeptide
Six amino acids – Hexapeptide
Seven amino acids – Heptapeptide

When the number of amino acids linked together is 2-20, it can also be called an ‘oligopeptide’.

However, as with most common proteins, the number of constituent amino acids is way beyond twenty. In such cases, the term used is ‘polypeptide’ (‘poly’=many).

The important thing to note is that irrespective of whether the number of amino acids is as low as 2 or as high as innumerable, there will always be a free carboxyl group at one end and a free amino group at the other end of the molecule.

The end containing the carboxyl group is termed as “C terminus”: C from ‘Carboxyl’ and ‘terminus’ means ‘end’.

The opposite end with a free amino group is called “N terminus”. ‘N’ here signifies the N-atom of the terminal amino group.

Thus, every polypeptide has a C and a N terminus.

Most importantly, all proteins are polypeptides, in their most simplistic terminology.

How does the polypeptide chain fold into higher levels of protein structure?

The most basic or ‘primary’ structure of a protein is its sequence of amino acids forming the polypeptide chain.

To understand the next level of complexity in the structure of proteins, let us again get back to high school chemistry and refresh our memories on Hydrogen bonds.

In a polypeptide backbone, there will be innumerable carboxyl oxygen atoms and hydrogen atoms from the amino groups. Hydrogen bonds forms between them in close proximity. As a result, in local segments of the polypeptide chain, three dimensional structural patterns develop. This represents the secondary structure of proteins.

Two secondary structures most commonly seen in proteins are alpha helix and beta sheet. A schematic representation of an alpha helix is depicted in Figure 3.

Representation of an alpha-helix

Figure 3. Representation of an alpha-helix

The secondary structural features in proteins pave the way for the next level of protein architecture - the tertiary structure. This is the final three-dimensional structural organization in which a single protein exists in its biological niche, like the cell cytoplasm for example.

The amino and carboxy functional groups of the amino acids are sequentially bonded to each other such that only a terminal carboxy and a terminal amino group are left free. However, the side chains of constituent amino acids in a protein are free to interact in a variety of ways owing to their specific chemical natures.

For example, consider the sulfur atoms in cysteines. These can form disulfide bridges between two cysteine residues in a protein.

Other such type of interactions between amino acid side chains in the polypeptide backbone drives formation of the protein tertiary structure.

Figure 4. Protein folding. Amino acids are represented by the letter ‘A’

Figure 4. Protein folding. Amino acids are represented by the letter ‘A.'

Often the secondary structural elements facilitate the polypeptide adopting its tertiary architecture. Let us look at one very common example.

Alpha helices and beta sheets form in a way that they have certain specific regions which are hydrophobic and others hydrophilic. This happens because amino acids may have both non-polar (such as aliphatic) as well as polar (such as carboxyl groups) moieties. Just to recall, non-polar molecules are hydrophobic while polar ones are hydrophilic

Once this type of secondary structural organization happens, the architecture is adopted such that the hydrophobic regions of the protein are buried inside, while the hydrophilic regions face the outward environment such as the cell cytoplasm which is rich in water.

Some proteins, hemoglobin for example, are made up of more than one polypeptide chain. The intricate and precise architecture of how these multiple polypeptides (each of them having their own structural organizations up to tertiary levels) are structurally organized determines the quaternary structure of a protein.

In real physiological circumstances, proteins are often found in a complex with metals (think about iron in hemoglobin) or other biomolecules such as nucleic acids (chromosomes – complex of DNA and proteins). These kinds of structural organizations also qualify as quaternary.

References

Alberts B et al. (2002). The shape and structure of proteins. Molecular biology of the cell.
Sun et al. (2004). Overview of protein structural and functional folds. Curr Protoc Protein Sci. 35(1):1711-171189
Rayment. (2003). Protein structure. Encyclopedia of physical science and technology. 191-218

How to Make Stock Solutions for Antibiotics and Cell Selection Agents
Antibiotics and cell selection agents are used to isolate cells that contain a particular resistance marker from a mixed population. These powerful reagents are used...

Read more
How to Use a Kill Curve to Optimize Antibiotic and Cell Selection Agent Concentrations
During my undergraduate internship I was making different formulations of insulin nanoparticles that would, in theory, be delivered through an inhaler and into the lungs....

Read more
Choosing Between Cell Selection Agents: Puromycin, Blasticidin, Hygromycin, and G418
The cell selection agents puromycin, blasticidin, hygromycin and G418 kill prokaryotic and eukaryotic cells by inhibiting protein translation. Deciding which selection agent, and how much...

Read more
How the Cell Selection Agents Puromycin, Blasticidin, Hygromycin, and G418 Kill Cells
In the microscopic world, different species of bacteria and fungi are engaged in a never-ending turf war, fighting to defend their territory and to advance...

Read more