This database integrates data on the human cDNA expression library hEx1. It is a library of human fetal brain that I constructed in a bacterial expression vector. By induction with IPTG, the clones of the library produce recombinant protein from the human cDNA sequences they contain. 150,000 clones were separated into wells of 384-well microtitire plates. In order to screen the whole library, e.g. with an antibody directed against the protein product of a clone in the library, arrays were constructed that accommodate the library's expression products in an ordered manner. The clones were arrayed on seven large membranes. Colonies were grown on the membrane. Cells were lysed to release the recombinant protein, which was immobilised on the membrane as spots.

Since only 20% of the library clones express any useful protein, a subset was created that fits on 2 filters of 22x22 square centimeters. The vector used to construct the library contains a 6xHistidine sequence, a His-tag, that is fused to the expressed proteins. Using an antibody against this sequence and the seven filter membranes that represent the whole library, all clones expressing recombinant protein with a His-tag were labelled (see Figure).

The labelled clones were combined in a new library of 35,000 clones. Most of these clones now express recombinant protein, as shown in Büssow et al. 2000, but many of these proteins are expressed in insoluble form. These proteins are not folded correctly but end up in inclusion body aggregates in the bacteria.

SDS-PAGE image

Screening for soluble expression and cDNA sequencing

To identify clones that express their inserts in soluble form, small scale protein expression and purification experiments were performed in liquid culture. The results were analysed by SDS-polyacrylamid gelelectrophoresis. Those clones that expressed a recombinant protein were identified by sequencing the cDNA they contain.

Sequence matching to Ensembl

The resulting sequences were matched against the Ensembl database by the cross_match program using the seqjoin script. The Ensembl database is an index of the human genome sequence and contains entries for human gene, protein and transcript sequences. Genes, proteins and transcripts are identified by identifiers starting with ENSG, ENST and ENSP, respectively. The Ensembl database is created automatically by comparison of the human genome sequence with a variety of sequence database. Since Ensembl database entries are not checked (curated) by humans, it should be treated with care. On the other hand, it is a rather complete, yet non redundant data set which is advantageous when matching external data to it.

What this database offers

If you are looking for information on a certain clone, which you might have identified with an antibody on the protein arrays, you can enter the clone's name or ID in the Clone Search mask. If you are looking for expression clones for a certain gene or protein, use the Gene Search mask. You can also enter lists of gene, protein or transcript identifiers there. The search results can be filtered for full length clones or clones with certain expression properties, e.g. soluble expression. It also possible to retrieve a list of all full-length expression clones or all clones with good protein expression by leaving the search field empty. To retrieve a list of all genes and proteins for which there are clones available, choose the Gene list menu entry.


Konrad Büssow, Eckhard Nordhoff, Christine Lübbert, Hans Lehrach and Gerald Walter
A human cDNA library for high-throughput protein expression screening.
Genomics 2000; 65:1-8

Konrad Büssow, Dolores Cahill, Wilfried Nietfeld, David Bancroft, Eberhard Scherzinger, Hans Lehrach and Gerald Walter
A method for global protein expression and antibody screening on high-density filters of an arrayed cDNA library.
Nucleic Acids Research 1998; 26(21):5007-8

My PhD thesis: Arrayed cDNA libraries for antibody screening and systematic analysis of expression products

