SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Biotech / Medical : PROTEOMICS

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Jongmans who started this subject8/16/2000 2:34:27 AM
From: sim1  Read Replies (1) of 539
 
Proteomics: An Overview and Analysis of Current and Emerging Technologies, Competitive Landscape and the Market Environment

Found this at bio.com. Looks like a teaser to purchase the full report @ ~$5K. Couldn't find any threatening reproduction warnings, so here it is....

Proteomics aims to directly study the role and function of
proteins in tissues and cells. The ultimate goal is to study the
interaction of multiple proteins in healthy, diseased, and
experimental conditions. Such global studies will enable the
investigator to understand the holistic effects of a particular
therapeutic agent or experimental intervention. To this end,
proteomics requires the ability to separate and isolate all the
constituents of a proteome. These pure or near homogeneous
protein isolates must be detectable and in a form conducive to
analysis.

All proteomics studies fall into the following [three] general
categories: differential protein display (including changes in
quantity), protein characterization (including post-translational
modifications), and protein-protein interaction (including
activity assays). All of these demand careful isolation of tissues
or cells and proper sample preparation at the onset of the
study. The correct choice for starting material, and proper
preparation of it, can dictate the success of the whole
procedure, because most studies rely on either purity of the
sample or valid comparisons (untainted) between two
samples.

The very first step is to choose and catalogue the proper
crude samples. In basic research, the investigator has great
discretion over this manner. For example, in differential
protein display between two in vitro tissue culture samples, the
treated and untreated cells are easily distinguishable. The
experimental and control samples can be chosen based on
their identical, or at least, similar pedigree to minimize spurious
and irrelevant differences between the two. The only
distinctions, at least theoretically, in the protein profile of the
two samples will be due to the known intervention or
therapeutic. Therefore, the selection of starting material in such
a scenario is simple. However, this is not the case for most
applications, even in basic science. The target cells are often
intermixed with a variety of other cell types or unafflicted cells
of the same kind. All proteomics protocols demand the
separation of the cells of interest from the rest of the tissue.
Otherwise, differential protein display, protein
characterization, and protein-protein interaction studies will be
tainted with the constituents of the background tissue.

Fluorescence-activated cell sorter (FACS) can be used to
separate a subset of suspension cells form the rest of the
population. This is especially useful in the study of certain
hematopoetic-derived diseases. For example, there are
known markers for cancerous leukemia and lymphoma cells
that can be used to isolate the afflicted population and
compare its protein profile to that of normal leukocytes. This
technique can be used to isolate metastatic cells that are in
transit in the circulation, also. However, detection of cells that
have metastasized out of their tissue into the circulation
requires great sensitivity.

Laser capture micro dissection (LCM) can be used to
manually isolate a small number of adherent cells, however,
this technique is most suitable for isolation of cells grown in
vitro. For clinical studies, micro dissection of solid tumors can
separate the malignant cells from the normal tissue. This
technique is often reserved for solid tumors with clear
demarcation of the tumor boundary. However, recent work
has shown that LCM can be used to enrich clinical samples
such as epithelial tissue from a human cervix specimen.
Normal and malignant cells were microdissected from normal
and malignant tissue from one cervix (hysterectomy specimen)
and the protein profile of these cells was compared.

Samples can be compared between patients, also. This
approach relieves the demand for separation of the afflicted
and healthy cells, but complicates the analysis by introducing
the differences between individuals. Comparing samples from
different healthy individuals yields a number of polymorphism
that is independent of any particular pathology. Therefore, in
comparing samples from healthy controls and patients, large
number of samples from many individuals must be compared
to reveal only those differences that are related to the
condition of interest.

Care in sample isolation must be taken in studies that are
focused on protein characterization and protein-protein
interaction, also. For example, the post-translational
modification of a particular protein and its subsequent binding
to other polypeptides may be altered in a condition, but this
change may be very subtle and easily masked by
contamination from the healthy surrounding tissue.

Once the correct sample is isolated, the cells or tissue of
interest must be efficiently disrupted and the contents of the
cells solubilized completely in order to obtain a sample
representative of the whole protein population. Physical
disruption techniques such as sonication, homogenization,
shear-induced lysis, and rapid pressure change lysis are often
used to open the cells prior to protein extraction. Lysis buffers
containing detergents, protease inhibitors, and reducing and
chaotropic agents facilitate the solubilization of the proteins
and increase the stability of the polypeptides. For each cell or
tissue type and for each specific application, a specific
protocol is often needed to maximize the recovery of cellular
proteins.

The extent of recovery of membrane and cytoskeletal proteins
can be variable, leaving approximately 10% of the proteins in
the insoluble pellet after extraction. This is specially important
in studies seeking to understand proteins that reside in
compartments that are hard to solubilize. Often, these proteins
have evolved hydrophobic domains that facilitates their
localization to these subcellular compartments and confer their
biological activity. Therefore, these hydrophobic protein, by
their very nature, are difficult to solubilize and retain in the
profiling procedure.

The lysed sample is often clarified by centrifugation at speeds
that pellet large membrane particles and DNA. The presence
of nucleic acids, especially DNA, has severe detrimental
effects on the separation of proteins by 2-D gel
electrophoresis. Under denaturing conditions, such as those
used to lyse and homogenize the sample, DNA samples are
dissociated and cause a marked increase in the viscosity of the
solution. This inhibits protein entry into the gel and retards its
migration. Furthermore, DNA binds proteins and causes
artifactual migration patterns and streaking. There are two
methods for removing DNA from the sample. Endonuclease,
which degrades DNA down to individual nucleotides, can be
added to the sample. This is an attractive method for DNA
removal, because it consist of one quick step that requires
very little handling. Another method for DNA removal uses
ampholytes that form complexes with DNA molecules. The
resulting complexes are subsequently removed by
centrifugation. This method carries the risk of losing some
proteins that can interact with the anionic DNA molecules,
hence, this must be done at high pH (where most proteins are
anionic themselves) which may not be compatible with some
protocols.

Before loading the clarified sample on a 2-D gel, some
prefractionation can be done to study a particular fraction
more carefully. Furthermore, for larger organisms it is
impossible to separate the proteome into individual spots on
one 2-D gel. Therefore, it is useful to split the original sample
between multiple 2-D gels. This approach enhances the
sensitivity and resolution of the gels and reveals more
information about the subcellular location of the separated
proteins.

There are multiple methods for pre-fractionating the sample
before loading on the 2-D gel. Traditional biochemical
separation procedures can be used to isolate subcellular
fractions or organelles. These protocols often rely on
separating nuclei and unbroken cells from cytoplasmic
organelles by differential sedimentation at low centrifugal
forces. The remaining supernatant is subjected to various
density gradients to isolate specific organelles such as
mitochondria or lysosomes. Not only do these
prefractionations improve the resolution of the 2-D gels, but
also they reveal whether a particular protein is mislocalized in
a certain malady.

Preparative liquid isoelectric focusing is another method for
pre-fractionating the original sample. It is noteworthy that
although the principles of separation are the same as those
used in the first direction of 2-D gel electrophoresis,
preparative isoelectric focusing aims to fractionate the sample
into defined pH ranges and not to isolate each protein. This
technique concentrates all the proteins of similar isoelectric
point into specific fractions that can be separated from one
another on a 2-D gel with a narrow pH gradient.

Affinity chromatography is a more selective tool for
prefractionating the sample. Columns using ion exchange,
antibodies, or heparin can be used to separate the starting
material into smaller and more homogeneous fractions. For
example, anti-phosphotyrosine antibodies can be used to
isolate all proteins that are phosphorylated on one or more
tyrosine residues. In instances where both peptide motifs in an
intermolecular interaction are known, one of the two motifs
can be coupled to the stationary phase of the column and used
as bait to retard all cellular proteins that contain the other
motif.

These are just a few examples of all the possible
prefractionation techniques, and many others exist. They all
serve to maximize the sensitivity and resolution of the
separation protocol while simultaneously maximizing the
information yield of the entire procedure. Once the sample has
been fractionated to the desired level, it is often loaded onto a
2-D gel electrophoresis apparatus. Although 2-D gel
electrophoresis is not the only method for separation of
proteins, it is currently the most reliable technique for doing so
rapidly, in large scale, and in parallel. Two-dimensional gel
electrophoresis separates proteins based on their isoelectric
point in the first dimension. This is followed by separation of
proteins based on size in the second dimension. (see section
below for more detail)

After running the 2-D gel in both directions, the contents of
the gel are often transferred electrophoretically onto a
membrane for later analysis. Membranes are more robust and
compatible with automation. Proteins are also more stable on
membranes and more easily manipulated before analysis. For
example, proteins can be electroblotted through a membrane
which has a protease covalently bound to it and the resulting
peptides are trapped on a second hydrophobic membrane.
Such a method allows for automated digestion of multiple
proteins without contamination and excessive handling.
Furthermore, the final hydrophobic membrane can be used
directly to analyze the resulting peptides in instruments such as
MALDI TOF mass spectrometry. However, electroblotting
large 2-D gels is difficult and often nonquantitative due to the
different transfer properties of the proteins and their binding
affinity to the membrane.

Whether the gel is transferred to a membrane or not, the
separated proteins (spots) must be detected. A variety of
detection agents such as Coomassie blue, silver, and SYPRO
are available. The ideal compound is one that detects all
proteins with similar affinity and at very low levels. Coomassie
blue is the most consistent, silver staining is the most sensitive,
and fluorescent agents such as SYPRO tend to be more
compatible with the automation of spot excision.

After the gel has been visualized by staining, it must be
scanned and digitized for storage and downstream
manipulation. Each spot on the 2-D gel must be defined based
on saturation thresholds and defined spot boundaries. The
position of each spot is of limited use unless it is in reference
to known markers that were included in the loaded sample.
The location of the known markers must be identified, and a
warping equation developed to extrapolate the properties of
the proteins in each spot. The quality of the image can be
enhanced using software to filter out the background and
enhance contrast. A reference gel is chosen and a few
best-matched spots are used to compare the migration
patterns of the gels. Triangulation using the aforementioned
information can extend the matching of spots between gels.
The differences can be analyzed based on changes in the
intensity of spots between the gels. The spots of interest can
be excised for later manipulation or all the samples can be
treated in the gel. For example, the excised sample or the
whole gel can be digested with a protease in preparation for
peptide mass fingerprinting.

It is noteworthy that each spot in a 2-D gel is rarely made of
one species of protein. Due to the large number of expressed
proteins in most cells and tissues, and the current limitations of
the technology, it is unlikely that many proteins migrate into
distinct positions on 2-D gels. Loading of gels to their very
limit to allow for the detection of low-copy proteins,
contaminants that cause streaking, and the very nature of
many proteins, causes a great deal of over-lapping migration.
Therefore, it can be beneficial to fractionate the "separated"
samples after 2-D gel electrophoresis. The best approach is to
couple a capillary separation technique directly to the final
analysis instrument, the mass spectrometer. Capillary zone
electrophoresis, high pressure liquid chromatography, and gas
chromatography can separate the constituents of each spot
into individual proteins and peptides that can be analyzed on a
mass spectrometer more easily. However, the advent of more
sensitive mass spectrometers with high resolution and tandem
mass spectrometry, has reduced the need for post 2-D gel
fractionation.

Although multiple tools for analysis are available, mass
spectrometry (MS) is the most widely used and the method
that holds the most promise for large scale studies. Mass
spectrometry technology is discussed in more detail later in
this report. This section only seeks to highlight the role of MS
in the context of the whole process. Initially, data is
automatically accumulated and the appropriate spectra
selected for data extraction. The first level is very rapid, and
serves to identify the bulk of the proteins in the sample. This is
followed by a more methodical scan to select the ions of
interest for further analysis. At this point the mass-to-charge
ratio of the proteins can be deciphered. This can be translated
to molecular weight using standards and in conjunction with
information obtained from the 2- gel (such as isoelectric
point), it can be used to narrow the identification process. At
this point a variety of protocols can be used to obtain more
information about the N-terminal amino acid sequence of the
proteins or their susceptibility to digestion by known
proteases. These studies often require a tandem mass
spectrometer, which can analyze the parent ion on the first
detector and analyze its daughter ions on a second detector.
Such approaches are more time consuming but yield
high-information content. Post-translational modifications can
be studied by either the systematic digestion or removal of the
modifications, or by daughter ion scanning.

At the conclusion of this procedure, most of the proteins and
peptides are separated into near pure fractions and analyzed.
The information about each spot or each peptide in a
particular spot is usually limited at first. It often consists of one
or more of the following: a peptide mass fingerprint, a short
N-terminal sequence, amino acid composition, molecular
weight, or isoelectric point. This information is then compared
to the sequence of all putative proteins that are encoded by
the genome of that organism. Recent advances in software
technology allow for predicting the behavior of these virtual
proteins under experimental conditions. Therefore, the
experimental data can be compared to the theoretical
information to find a match. Here lies the connection between
proteomics and genomics, because it is rarely possible or
economical to sequence proteins in their entirety.

Since identification of proteins is heavily dependent on
genomics databases, it is appropriate to review the progress
of genome projects in various species. The automation of high
throughput DNA sequencing has changed the landscape of
biological research. The traditional approach of characterizing
a single gene at a time has given way to systematically
identifying the information content of an organism. The
genome of the simple bacteriophage _-X 174 was the first to
be completely sequenced in the late 1970s. Haemophilus
influenzae was the first free-living organism to be sequenced in
its entirety in 1995. The genome of Escherichia coli was
sequenced in 1997, and now many prokaryotes have been
fully characterized. Amongst these organisms are a number of
human pathogens such as Helicobacter pylori associated with
ulcers and gastritis, and Rickettsia prowazekii which causes
typhus.

Genome projects have been completed in eukaryotic
organisms, also. The genome of the first unicellular eukaryote,
the yeast Saccharomyces cerevisae, was fully sequenced in
1996, and a multicellular eukaryote, the nematode worm
Caenorhabditis elegans was completely sequenced in late
1998. The genome projects of the fruit fly Drosophila
melanogaster, the plant Arabidopsis thaliana, and of several
other organisms are near completion. In 1998, ongoing
genome projects were reported on over 70 prokaryotic
organisms. There are also over 20 reported programs to
sequence eukaryotic model organisms.

Since a large fraction of eukaryotic DNA does not code for
protein, the initial efforts in the genome projects for larger
eukaryotic organisms have been primarily focused on
sequencing segments of the genome that code for proteins
(genes). Therefore, a great deal of information is available
about potential coding sequences of eukaryotes. This is
specially true for human genes which account for nearly three
quarters of all expressed sequence tags (ESTs). ESTs are
short sequences obtained from random priming from human
cDNA libraries. Sequences near the 5' end or the 3' poly(A)
tail of the cDNA are amplified and sequenced. Using the
sequence information of the coding region, primers can be
designed that will allow for the mapping of that particular gene
on a chromosome. Effectively, an EST database is a catalogue
of all potential genes in that particular organism. Although,
EST information can be beneficial to identification of proteins
only if the short EST sequence overlaps with the partial
sequence of the peptide, these databases are already very
potent tools for linking proteins to their genes.

Public human EST databases contain over 1,200,000 entries,
350,000 of which are considered to be non-redundant. The
power of EST databases is best illustrated by noting that
although only 3% of the human genome was sequenced by the
end of 1998, over 50% of all human genes are thought to be
represented in public EST databases. It is likely that private
EST databases cover as many as 80% of all human genes.
Furthermore, considering that the human genome is expected
to contain about 100,000 genes, some of the 350,000 unique
ESTs representing 50% of all human genes must belong to
different parts of the same gene. Therefore, a larger fraction of
the coding sequences in the human genome is covered by
250-400 base pair stretches of multiple expressed sequence
tags to different parts of the same genes. This coverage makes
identification of proteins based on partial peptide sequencing
and EST information more probable.

At this point a positive match between the experimental
sample and the putative protein can be used to identify the
protein in that sample. However, in organisms with incomplete
genome projects, a positive hit often fails to eliminate the
possibility that a yet unidentified gene codes for the isolated
protein. The criteria used to find a match are rarely
comprehensive, therefore, it is possible for two proteins to
satisfy certain criteria equally well, because they share some
homology over one or two domains. These proteins can be
very different overall, but match the same gene based on
limited-criteria matching. Of course, the absence of a match in
such an organism will also fail to reveal any information about
the protein of interest. Currently, the only method of studying
such a protein is through traditional protocols such as partial
peptide sequencing and screening of cDNA libraries for the
corresponding DNA code.

The importance of bioinformatics has become ever more
obvious as proteomic and genomic studies have become more
efficient in data generation. This is especially true for high
throughput protein analysis, where the quantity of data is
beyond manual deciphering. In the case of HPLC or capillary
zone electrophoresis (CZE) coupled tandem mass
spectrometry, the resulting data files are enormous and very
little can be interpreted by manual viewing.

===========================================================

In the interest of fairness, here's a link to a plug for their full report. bio.com
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext