Abstract
Novel DNA-binding proteins, especially repressors of gene expression, are obtained by variegation of genes encoding known binding proteins and selection for proteins binding the desired target DNA sequence. A novel selection vector may be used to reduce artifacts. Heterooligomeric proteins which bind to a target DNA sequence which need not be palindromic are obtained by a variety of methods, e.g., variegation to obtain proteins binding symmetrized forms of the half-targets and heterodimerization to obtain a protein binding the entire asymmetric target.
Filing date: Jul 26, 1990
Issue date: Mar 30, 1993
Inventors: Robert C. Ladner, Sonia K. Guterman, Rachel B. Kent, Arthur C. Ley
Assignee: Protein Engineering Corp.
Primary Examiner: John D. Ulm
Download
What is claimed is:
1. A method of obtaining first and second genes encoding first and second homooligomeric DNA binding proteins which hybridize to form a hybrid heteroologomeric DNA binding protein which binds to a predetermined ultimate target double stranded DNA sequence, said sequence being nonpalindromic, said sequence comprising a left target-sequence and right target subsequence each of at least 4 base pairs length, said method comprising:
- producing a first gene encoding a first DNA-binding oligomeric protein binding to a first target sequence and a second gene encoding a second DNA-binding oligomeric protein binding to a second target sequence, wherein said first and second DNA-binding proteins each have at least two essentially dyad-symmetric DNA-binding domains, where said first target sequence is a palindrome or gapped palindrome and comprises said left target subsequence and a palindrome-completing subsequence, and where said second target sequence is a palindrome or gapped palindrome and comprises said right target subsequence and a palindrome-completing subsequence, whereby one of the DNA-binding domains of the second DNA-binding protein binds to the right target subsequence, said genes being produced by a process of at least partially random mutation followed by selection for the binding of the corresponding protein to the corresponding target sequence,
- wherein said first and second proteins can hybridize so as to obtain a heterooligomeric DNA-binding protein comprising a DNA-binding domain recognizing the left target subsequence and a DNA-binding domain recognizing the right target subsequence
- whereby said heterooligomeric protein has an affinity for the ultimate target DNA.
2. The method of claim 1 wherein at least one of said first and second genes is obtained by
- (a) providing a cell culture, said cell culture comprising a plurality of cells, each cell bearing a selection vector, said selection vector comprising a first and a second operon, each comprising at least one expressible gene, the genes of said first and second operons being different, a copy of the target DNA sequence being included in each operon and positioned therein so that under forward selection conditions the transformed cells enjoy a selective advantage if they express a protein or polypeptide which binds to said copies of the target DNA sequence, said cell culture being transformed with a variegated gene encoding potential DNA-binding proteins or polypeptides, where said cells collectively can express a plurality of different but sequence-related potential DNA-binding proteins or polypeptides,
- (b) causing the cells of such culture to express said potential DNA-binding proteins or polypeptides;
- (c) exposing the cells to forward selection conditions to select for cells which express a protein or polypeptide which preferentially binds to said target DNA sequence; and
- (d) recovering the selected cells bearing a gene coding for such protein or polypeptide.
3. The method of claim 2 wherein the level of variegation is such that from 10.sup.6 to 10.sup.9 different potential DNA-binding proteins can be expressed.
4. The method of claim 2 wherein a gene coding for a known DNA binding protein having a helix-turn-helix DNA binding motif is variegated.
5. The method of claim 2 wherein a gene encoding a known DNA binding protein picked from the group consisting of Cro from phage .lambda., cI repressor from phage .lambda., Cro from phage 434, cI repressor from phage 434, P22 repressor, E. coli tryptophan repressor, E. coli CAP, P22 Arc, P22 Mnt, E. coli lactose repressor, MAT-a1-alpha2 from yeast, Polyoma Large T antigen, SV40 Large T antigen, Adenovirus E1A, and TFIIIA from Xenopus laevis is variegated to obtain genes coding on expression for a plurality of potential target DNA-binding proteins.
6. The method of claim 2 wherein said variegated gene comprises at least one variegated codon, said codon having three base positions, each variegated codon being characterized by a mixture of bases at at least one base position wherein the mixture of bases for at least one base position is non-equimolar.
7. The method of claim 2 wherein the ultimate target double stranded DNA sequence is an HIV sequence.
8. The method of claim 7 wherein the ultimate target doublet stranded DNA sequence is HIV 353-369 or a subsequence thereof comprising at least eight base paris.
9. The method of claim 2 wherein at least one of said operons comprises a selectable beneficial gene, an occludible promoter operably linked to said beneficial gene and directing its transcription, an occluding promoter occluding transcription of said beneficial gene, and a copy of the target DNA sequence positioned so that the binding of said protein or polypeptide to said copy represses said occluding promoter and thereby facilitates transcription of said beneficial gene.
10. The method of claim 9 wherein the beneficial gene is aadA.
11. The method of claim 10 wherein the occludible promoter is the aadA promoter and the occluding promoter is Pcon.
12. The method of claim 2 wherein said selection vector comprises:
- a) a first operon, which operon comprises:
- i) a first binding marker gene(s),
- ii) a first promoter directing expression of said binding marker gene(s), and
- iii) a first copy of the target DNA sequence, where said target DNA sequence interferes substantially with expression of the first gene(s) if and only if a protein expressed by the transformed cell binds to the target DNA sequence,
- (b) a second operon, which operon comprises:
- i) a second binding marker gene(s),
- ii) a second promoter directing expression of said binding marker gene(s); and
- iii) a second copy of the target DNA sequence, where said target DNA sequence interferes substantially with expression of said gene(s) if and only if a protein expressed by the transformed cell binds to the target DNA sequence,
- where the binding marker genes of said first and second operons are different, and where, when said cells are exposed to forward selection conditions the gene products of said first and second binding marker genes are deleterious to the cell.
13. The method of claim 12 wherein the binding marker genes are functionally unrelated.
14. The method of claim 12 wherein the promoters of said first and second operons are different.
15. The method of claim 12 wherein a plurality of genetic elements essential to the maintenance of the vector or the survival of the transformed cells under conditions that select for presence of said vector, said operons and said genetic elements being positioned on said vector so no single deletion even can render nonfunctional more than one of said operons without also rendering nonfunctional one of said essential genetic elements.
16. The method of claim 12, said vector further comprising a gene (pdbp) coding for a potential DNA-binding protein or polypeptide, said gene comprising:
- a) a coding region that codes for a polypeptide, each domain of said polypeptide having at least 50% sequence identity to a known DNA-binding domain, and
- b) a promoter operably linked to said coding region for controlling its expression.
17. The method of claim 12 wherein at least one of said genetic elements comprises a beneficial gene, and a control promoter operably linked to said beneficial gene, but where no instance of said target DNA sequence is associated with said genetic element.
18. The method of claim 17 wherein the control promoter is essentially identical to the promoter of one of said selectable binding marker operons, so that proteins binding to the latter promoter will also bind to the control promoter and thereby inhibit expression of said beneficial gene.
19. The method of claim 12 wherein under reverse selection conditions the gene products of said binding marker genes are beneficial to the transformed cells.
20. The method of claim 19 wherein each of the first and second operons confers a phenotype selected independently but not-identically from the group consisting of: galT,K.sup.+, tetA.sup.+, lacZ.sup.+, pheS.sup.+, argP.sup.+, thyA.sup.+, crp.sup.+, pyrF.sup.+, ptsM.sup.+, secA.sup.+ /malE.sup.+ /lacZ.sup.+, ompA.sup.+, btuB.sup.+, lamB.sup.+, tonA.sup.+, cir.sup.+, tsx.sup.+, aroP.sup.+, cysK.sup.+, and dctA.sup.+.
21. The method of claim 12 wherein the vector comprises a plurality of codons, each variegated codon has a root mean square deviation from a flat distribution over the allowed amino acids of less than 0.08.
22. The method of claim 21 wherein the variation at each variegated codon allows all twenty possible amino acids.
23. The method of claim 22 wherein at any variegated codon the expected ratio of occurrence of (Lys+Arg) codons to (Asp+Glu) codon is 0.8 to 1.25.
24. A method of obtaining genes encoding a heterooligomeric protein which binds to a predetermined ultimate target double stranded DNA sequence, said sequence being nonpalindromic, said sequence comprising a left target subsequence and a right target subsequence each of at least 4 base pairs lengths, said method comprising:
- (a) providing a first gene encoding a first DNA-binding oligomeric protein binding to a first target sequence and a second gene encoding a second DNA-binding oligomeric protein binding to a second target sequence, wherein said first and second DNA-binding proteins each have at least two dyad-symmetric DNA-binding domains, wherein said first and second DNA-binding proteins each have a dimerization interface, where said first target sequence is a palindrome or gaped palindrome and comprises said left target subsequence and a palindrome-completing subsequence, whereby one of the dyad-symmetric DNA-binding domains of the first DNA-binding protein binds to said left target subsequence, and where said second target sequence is a palindrome or gaped palindrome and comprises said right target subsequence and a palindrome completing subsequence, whereby one of the dyad-symmetric DNA-binding domains of the second DNA-binding protein binds to the right target subsequence,
- (b) variegating the dimerization interface of the protein encoded by one of said first or second genes to obtain variegants thereof and reverse selecting for expression from said variegant of a first oligomerization mutant protein, encoded by a variegant of said variegated gene, which is no longer capable of forming a homooligomer that can bind to said first or second target sequence, respectively, and verifying that said oligomerization mutant protein maintains a tertiary structure similar to the protein form which is descended,
- (c) variegating the dimerization interface of the protein encoded by the other of said first or second genes to obtain variegants thereof,
- (d) providing host cells carrying the gene encoding said first oligomerization mutant protein and a variegant gene of step (c), each operably linked to a promoter functional in the host cell, and
- (e) forward selecting for expression from a step (c) variegant gene of a second oligomerization mutant protein which is capable of forming a heterooligomer with said first oligomerization mutant protein, said heterooligomer binding said ultimate target DNA sequence, and
- (f) isolating the genes encoding said heterooligomer.
25. The method of claim 24 wherein both of said first and second genes are provided by a process comprising (i) mutation of one or more preselected codons to encode a plurality of predetermined expected amino acids at each preselected codon, in predetermined expected proportions, and thereby obtain a plurality of different potential DNA binding proteins, and (ii) selection for genes encoding proteins which bind the corresponding target sequence.
26. The method of claim 25 in which the first and second DNA binding proteins either are unable to hybridize at all and still bind DNA, or, if they do hybridize, have a substantially diminished affinity or specificity for the corresponding subsequence of the target DNA sequence.
27. The method of claim 25 wherein at least one of said first and second genes is obtained by
- (a) providing a cell culture, said cell culture comprising a plurality of cells, each cell bearing a selection vector, said selection vector comprising a first and a second operon, each comprising at least one expressible gene, the genes of said first and second operons being different, a copy of the target DNA sequence being included in each operon and positioned therein so that under forward selection conditions the transformed cells enjoy a selective advantage if they express a protein or polypeptide which binds to said copies of the target DNA sequence, said cell culture being transformed with a variegated gene encoding potential DNA-binding proteins or polypeptides, where said cells collectively can express a plurality of different but sequence-related potential DNA-binding proteins or polypeptides,
- (b) causing the cells of such culture to express said potential DNA-binding proteins or polypeptides;
- (c) exposing the cells to forward selection conditions to select for cells which express a protein or polypeptide which preferentially binds to said target DNA sequence; and
- (d) recovering the selected cells bearing said first or second gene coding for such protein or polypeptide.
28. The method of claim 27 wherein the level of variegation is such that from 10.sup.6 to 10.sup.9 different potential DNA-binding proteins can be expressed.
29. The method of claim 27 wherein a gene coding for a known DNA binding protein having a helix-turn-helix DNA binding motif is variegated.
30. The method of claim 27 wherein a gene encoding a known DNA binding protein picked from the group consisting of Cro from phage .lambda., cI repressor from phage .lambda., Cro from phage 434, cI repressor from phage 434, P22 repressor, E. coli tryptophan repressor, E. coli CAP, P22 Arc, P22 Mnt, E. coli lactose repressor, MAT-a1-alpha2 from yeast, Polyoma Large T antigen, SV40 Large T antigen, Adenovirus E1A, and TFIIIA from Xenopus laevis is variegated to obtain genes coding on expression for a plurality of potential target DNA-binding proteins.
31. The method of claim 27 wherein said variegated gene comprises at least one variegated codon, said codon having three base positions, each variegated codon being characterized by a mixture of bases at at least one base position wherein the mixture of bases for at least one base position is non-equimolar.
32. The method of claim 27 wherein the ultimate target double stranded DNA sequence is an HIV sequence.
33. The method of claim 32 wherein the ultimate target double stranded DNA sequence is HIV 353-369 or a subsequence thereof comprising at least eight base paris.
34. The method of claim 27 wherein at least one of said operons comprises a selectable beneficial gene, an occludible promoter operably linked to said beneficial gene and directing its transcription, an occluding promoter occluding transcription of said beneficial gene, and a copy of the target DNA sequence positioned so that the binding of said protein or polypeptide to said copy represses said occluding promoter and thereby facilitates transcription of said beneficial gene.
35. The method of claim 34 wherein the beneficial gene is aadA.
36. The method of claim 35 wherein the occludible promoter is the aadA promoter and the occluding promoter is Pcon.
37. The method of claim 27 wherein said selection vector comprises:
- a) a first operon, which operon comprises:
- i) a first binding marker gene(s),
- ii) a first promoter directing expression of said binding marker gene(s), and
- iii) a first copy of the target DNA sequence, where said target DNA sequence interferes substantially with expression of the first gene(s) if and only if a protein expressed by the transformed cell binds to the target DNA sequence,
- (b) a second operon, which operon comprises:
- i) a second binding marker gene(s),
- ii) a second promoter directing expression of said binding marker gene(s); and
- iii) a second copy of the target DNA sequence, where said target DNA sequence interferes substantially with expression of said gene(s) if and only if a protein expressed by the transformed cell binds to the target DNA sequence,
- where the binding marker genes of said first and second operons are different, and where, when said cells are exposed to forward selection conditions the gene products of said first and second binding marker genes are deleterious to the cell.
38. The method of claim 37 wherein the binding marker genes are functionally unrelated.
39. The method of claim 37 wherein the promoters of said first and second operons are different.
40. The method of claim 37 wherein a plurality of genetic elements essential to the maintenance of the vector or the survival of the transformed cells under conditions that select for presence of said vector, said operons and said genetic elements being positioned on said vector so no single deletion event can render nonfunctional more than one of said operons without also rendering nonfunctional one of said essential genetic elements.
41. The method of claim 37, said vector further comprising a gene (pdbp) coding for a potential DNA-binding protein or polypeptide, said gene comprising:
- a) a coding region that codes for a polypeptide, each domain of said polypeptide having at least 50% sequence identity to a known DNA-binding domain, and
- b) a promoter operably linked to said coding region for controlling its expression.
42. The method of claim 37 wherein at least one of said genetic elements comprises a beneficial gene, and a control promoter operably linked to said beneficial gene, but where no instance of said target DNA sequence is associated with said genetic element.
43. The method of claim 42 wherein the control promoter is essentially identical to the promoter of one of said selectable binding marker operons, so that proteins binding to the latter promoter will also bind to the control promoter and thereby inhibit expression of said beneficial gene.
44. The method of claim 37 wherein under reverse selection conditions the gene products of said binding marker genes are beneficial to the transformed cells.
45. The method of claim 44 wherein each of the first and second operons confers a phenotype selected independently but not-identically from the group consisting of: galT,K.sup.+, tetA.sup.+, lacZ.sup.+, pheS.sup.+, argP.sup.+, thyA.sup.+, crp.sup.+, pyrF.sup.+, ptsM.sup.+, secA.sup.+ /malE.sup.+ /lacZ.sup.+, ompA.sup.+, btuB.sup.+, lamB.sup.+, tonA.sup.+, cir.sup.+, tsx.sup.+, aroP.sup.+, cysK.sup.+, and dctA.sup.+.
46. The method of claim 37 wherein the vector comprises a plurality of codons, each variegated codon has a root mean square deviation from a flat distribution over the allowed amino acids of less than 0.08.
47. The method of claim 46 wherein the variation at each variegated codon allows all twenty possible amino acids.
48. The method of claim 47 wherein at any variegated codon the expected ratio of occurrence of (Lys+Arg) codons to (Asp+Glu) codon is 0.8 to 1.25.