ページのタイトル

Research

General information

Summary

We are using X-ray crystallography, nuclear magnetic resonance (NMR), and electron microscopy. Although structure determination is the essential part of our approach, we emphasize that structures themselves are not a goal, but a starting point of studies. "Structures at work" is an important keyword of our researches. Protein structures inspire the design of new biochemical experiments, and the feedbacks produce new targets for structural biology in the future. The tight coupling of structure determination and biological experiments is essential for successful structural studies. Recently, we focus on the physicochemical basis that enables protein molecules to fold into three-dimensional structures smoothly in a short time.

Interactions of proteins with their ligands can be divided into two categories: strong interactions and weak interactions. The strict specificity and tight binding of strong interactions are based on the shape and electrostatic complementarity at the molecular interface. The structure of the protein-ligand complex clearly describes the details of the strong interaction. In contrast, weak interactions are characterized by molecular recognitions with nonstrict specificity and weak affinity. Note that nonstrict specificity is not equal to non-specificity. A protein with nonstrict specificity can bind to a set of ligands without apparent structural similarity. Many people are interested in strong interactions, but weak interactions are also very important in various biological phenomena.

When a protein molecule is in dynamic equilibrium between two states (or structures), we usually consider one equilibrium constant K and two rate constants k (one for going and one for returning). However, when measuring K and k at the amino acid residue level using techniques such as NMR, these values may vary for each residue. Until now, the variance of this value has been ignored as a measurement error, but when we created a log k vs. log K plot, we discovered a linear relationship. This linear relationship is the key to answering the mystery of how protein molecules can fold quickly and smoothly.

The physicochemical basis that allows protein molecules to fold into native structures within a short time

Proteins can fold smoothly into their native three-dimensional structures within a short period of time, typically, on the order of milliseconds. Although this is a special property gained through natural selections in biological evolution, it is not easy to clearly express the difference between naturally optimized proteins and artificial polypeptide chains. Recently, using AI such as AlphaFold2, it has become possible to design artificial proteins that quickly form stable three-dimensional structures. However, AI uses 200,000 structures registered in the Protein Data Bank (PDB) for deep learning and thus uses the protein properties acquired through evolution as they are (without extracting them as rules). Therefore, even in the AlphaFold era, the mystery of quick protein folding has not yet been elucidated. How can we answer the great mystery in the protein science?

Our starting point is to break the conventional view that the folding of protein molecules is highly collaborative. Usually, protein folding, and more generally structural changes, are described using a single equilibrium constant K and a single rate constant k (there are two, forward k and backward k'). However, when the values of K and k are determined for each amino acid residue using techniques such as NMR, the values often differ from residue to residue. Nevertheless, the variations in the K and k values have been ignored as experimental errors for a long time. Our advantage is that we found a linear relationship in a log k (or log k’) vs. log K plot. This linear relationship is called the "linear free energy relationship." Many linear free energy relationships have been empirically found in chemical reactions and biological phenomena, but our linear relationship is completely new based on data points from multiple amino acid residues in one polypeptide chain under one condition （J Biomol NMR 76:87-94, 2022）. Furthermore, when we conducted an exhaustive literature searche and created log-log plots of two-state protein exchange phenomena, we discovered many examples of linear relationships（Sci Rep. 12:16843, 2022）.

Once we accept that foldable polypeptide chains (i.e., proteins) have a linear free energy relationship at the residue level, the question about its physicochemical origin arises. Our answer is the "Consistency Principle." In 1983, Nobuhiro Go (an associate professor at the Faculty of Science in Kyushu University) proposed the Consistency Principle. It is expressed as "There are only two states, an initial state and a final state, in the folding process of proteins, and the non-covalent interactions that occur between two amino acid residues are limited to residue pairs that are in contact in the final folded state.'' Although there is support from computational science to date, there is no direct experimental evidence.

It is necessary to theoretically derive the relationship between the residue-based free energy linear relationship and the Consistency Principle. We showed that the residue-based LFER is an experimental consequence of the consistency principle （BPPB. 20:e200046, 2023).

The residue-based free energy relationship (rbLFER) can be used to obtain information about the transition states of structural changes in proteins. The equation that describes rbLFER is equivalent to the equation of state of ideal gas, and the properties of real proteins are observed as deviations from the ideal protein. As an example, we showed that the data points (7 residues) that deviated from the straight line in the log k vs. log K plot of apomyoglobin corresponded to the transient movement of an α-helix in the transition state（Sci Rep. 12:16843, 2022). It can also be used for a new Φ value analysis without using amino acid mutations （BPPB. 20:e200046, 2023) ．

Mitochondrial Protein Import System

The mitochondrion is one of the organelle in animal and plant cells, and performs various important biological processes, including the production of ATP and apotosis. The mitochondrion is enclosed by two biomembranes, the outer membrane and the inner membrane. The most of mitochondrial proteins are synthesized in the cytosol and must be imported into mitochondria. What is the molecular basis for the sorting of these proteins into mitochondria?

adapted from D. W. Fawcett, The Cell, Its Organelles and Inclusions: An Atlas of Fine Structure, W. B. Saunders, 1966.

Mitochondrial matrix proteins are synthesized as precursor proteins with cleavable N-terminal signal sequences. The mitochondrial signal sequence is termed presequence. The presequences function as a tag for mitochondria and cleaved off after import into mitochondria. It is curious that no consensus sequence is found among the presequences of 1000 mitochondrial preproteins.

https://micro.magnet.fsu.edu/cells/mitochondria/mitochondria.html

Protein import into mitochondria is mediated by protein assemblies, TOM and TIM, in the mitochondrial outer and inner membranes, respectivly. A TOM subunit, Tom20, functions as a general protein import receptor. The cytosolic domain of Tom20 can bind to 1000 different presequences of preproteins. The recognition of presequences by Tom20 is a good example of weak interaction.

We monitored the chemical shift perturbation of the NMR signals of five different 15N-labeled presequence peptides by the addition of the cytosolic domain of Tom20. The perturbed segments occupy different positions, either near the N-terminus or at the C-terminus, in the presequences. Now we are ready to answer the question, “why no consensus sequence are found among mitochondrial presequences ?” A presequence is composed of short amino acid sequences that are recognized by several proteins, and the organization (position, order, and overlapping) is unique for each presequence. Thus, simple alignment of presequences cannot reveal any consensus sequences without the deep understanding of the cryptogram embedded in mitochondrial presequences.
PubMed: 12191763

The NMR analyses revealed a common five-residue pattern for Tom20 binding in different presequences. To refine the common amino-acid motif for the recognition by Tom20, we introduced a new peptide library approach: we prepared a mixture of ALDH presequence variants, tethered these peptides to Tom20 in a competitive manner by an intermolecular disulfide bond, and determined the relative affinities by MALDI-TOF MS spectrometry. We successfully deduced a refined, common motif for the recognition by Tom20. The 5-residue consensus is φχχφφ, where φ is a hydrophobic amino acid and χ is any amino acid. This consensus can represent a huge variety of amino acid sequences.
PubMed: 0012691756

We determined the three-dimensional structure of Tom20 in a complex with an 11-residue peptide derived from rat aldehyde dehydrogenase (ALDH), which contained LSRLL as the Tom20-binding consensus. The cytosolic domain of Tom20 forms all α-helical structure with a groove to accommodate the presequence peptide. The bound presequence forms an amphiphilic helical structure with hydrophobic leucine side-chains aligned on one side to interact with a hydrophobic patch in the Tom20 groove.
PubMed: 0010721992

The NMR structure of the Tom20-pALDH complex was the first structure that revealed the recognition of a signal sequence by its receptor in an α-helical conformation. The pictures of the NMR structure appear in many text books, including “Molecular Biology of the Cell”.

Our first attempt to cocrystallize the cytosolic domain of Tom20 with the ALDH presequence failed, probably due to the weak affinity of the presequence peptide for Tom20. Therefore, we tethered the presequence peptide to Tom20 via an intermolecular disulfide bond.

We successfully obtained three forms of crystals suitable for X-ray data collection. The three-dimensional structures of the complex of Tom20 and the ALDH presequence peptide were determined to 2 A resolutions. To our surprise, Tom20 was equipped with only two hydrophobic sites for the recognition of the three hydrophobic side chains in the Tom20 consensus motif.
PubMed: 17948058, 21591667

The comparison of the crystal structures implied that a dynamic equilibrium exists among two (or more) bound states of the presequence peptide on Tom20 in solution. In accord with this model, an NMR relaxation study revealed motion on the sub-millisecond time scale at the interface between Tom20 and the presequence peptide. We propose a dynamic, multiple mode of recognition that explains the structural basis of the broadly selective specificity of Tom20 towards diverse mitochondrial presequences.
PubMed: 21591667

Creation of crystal contact-free space in protein crystals

Contacts with neighboring molecules in protein crystals inevitably restrict the internal motions of intrinsically flexible proteins. The resultant clear electron densities permit model building, as crystallographic snapshot structures. Although these still images are informative, they could provide biased pictures of the protein motions. If the mobile parts are located at a site lacking direct contacts in rationally designed crystals, then the amplitude of the movements can be experimentally analyzed. We will call the special space CCFS (crystal contact-free space).

We propose a fusion protein method, to create CCFS in protein crystals. We selected MBP as a fusion partner to construct a rigid CCFS scaffold. We successfully used α-helical spacers to fuse the C-terminal α-helix of MBP and the N-terminal α-helix of the target Tom20 protein firmly. In the cases of ligands with weak affinity, the problem of partial ligand occupancy must be considered. The pALDH presequence peptide was tethered onto Tom20, to ensure the full occupancy of the presequence in the binding site. We added a cysteine residue at the C-terminus of pALDH, to form an intermolecular disulfide bond with the single cysteine residue in the fusion protein.

We collected X-ray diffraction data to a 2 Å resolution. Conventional model building fails when large amplitude motions exist. Here, the mobile presequence appears as smeared electron densities in the Fo-Fc difference electron density map, after suitable processing (i.e., low-pass filtering and FreeR-averaging) of the X-ray diffraction data. Now the moving presequence peptide is visualized as an L-shaped smeared electron density in the binding site of Tom20.
PubMed: 26694222

The smeared electron density in the difference electron density map corresponded to the partially-overlapped volume among the multiple poses of the presequence helix.

Our current working hypothesis is that “a rapid equilibrium of multiple states with partial recognitions” is the molecular basis for the promiscuous binding of the Tom20 receptor to diverse mitochondrial presequences with nearly equal affinities. We expect that better diffraction measuring and data processing will improve the signal-to-noise ratio of the Fo-Fc difference electron density map, and reveal the spatial distribution of the moving α-helical presequence peptide experimentally.

Tom20 employs a “dynamic equilibrium recognition mechanism” to achieve loose specificity for presequences. If this is correct, replacing one of the three hydrophobic residues (leucine in our case) in the presequence with another hydrophobic residue should change the spatial distribution of the presequence peptide in the bound state. We replaced the leucine residue at position 15 with tryptophan, which has a larger volume of side chain and analyzed the electron density in CCFS. Compared to the electron density of the L15 presequence (gray), the position and size of the electron density of the W15 presequence (magenta and yellow-green) changed adaptively.
PubMed: 36173160

Structural Biology of the Oligosaccharyltransferase

Protein glycosylation is one of the most important protein modifications. The transfer of oligosaccharide chains to asparagine residues in proteins occurs in the consensus sequence of Asn-X-Thr/Ser, where X can be any amino acid residue except for Pro. Asn-glycosylation is widespread not only in eukaryotes but also in archaea and some eubacteria.

Oligosaccharyltransferase (OST) is an enzyme that catalyzes the transfer of the oligosaccharide from a lipid donor to the side chain of an Asn residue in the sequon. The OST enzyme is a membrane-associated multisubunit protein complex in Eukaryotes. The glycosylation actually occurs at about 60 % sequon. The conformation of the sequon in the bound state is said to determine the occurrence of glycosylation. We think that the occupancy rule of the glycosylation will be solved immediately after the weak interaction nature of the sequon is revealed.

The catalytic subunit of the OST enzyme is referred to as STT3 in Eukarya, AglB in Archaea and PglB in Eubacteria. We determined the 2.7 A resolution crystal structure of the C-terminal soluble domain of P. furiosus AglB. This is the first 3D structure of the STT3/AglB/PglB proteins.
PubMed: 18046457, 17768359

We developed a new assay method for the oligosaccharyl transfer activity. The peptide substrate is a synthetic peptide that contains an N-terminal fluorescent dye for detection. The produced glycopeptide is separated from the unreacted peptide by SDS-PAGE. The addition of a C-terminal biotin tag enables the efficient purification of the glycopeptide product.
PubMed: 17693440

We determined the full-length crystal structures of the Archaeoglobus fulgidus AglB. The comparison with the eubacterial PglB structure determined by another group revealed the structural conservation of the catalytic core and the membrane-spanning region. The N-terminal transmembrane region consists of 13 TM helices, and contains the active site consisting of two conserved acidic residues and a metal ion for the activation of the carboxyamide group of the Asn residue. The C-terminal globular domain contains a binding site for the Ser and Thr residues in the sequon. This Ser/Thr binding pocket might be a dynamic structure, because the plastic segment identified by the structural comparison is involved in the formation of the Ser/Thr pocket.
PubMed: 24127570

A peptide carrying a Asn-X-Thr sequence was tethered to the AglB protein through a disulfide bond. We determined the crystal structure of the AglB-peptide complex. Interestingly, the Asn residue fixed on the enzyme can accept the oligosaccharide chain. This unique reaction system showed that Gln can be glycosylated in place of Asn.
PubMed: 27997792

Next, we determined the crystal structure of the ternary complex, enzyme-peptide-dolichol phosphate. The Dab (analog of the Asn residue)-Val-Thr peptide (yellow) in an extended conformation formed an antiparallel β-sheet structure with the TIXE motif (green) on the enzyme side. In the extended conformation, proline is excluded from the X position deu to its high energetical cost. This is why a proline residue is not allowed at the X position in the consensus sequence for glycosylation.
PubMed: 34354228

Archaeal Glycobiology

The N-glycan structures in Archaea exhibit huge varieties in their monosaccharide compositions, linkages, and branching patterns. We determined the chemical structures of the N-glycans from Pyrococcus furiosus, Archaeoglobus fulgidus, and Pyrobaculum calidifontis by a sugar analysis, MS and NMR. Oligosaccharide chains attached to structurally defined peptides were produced by an in vitro oligosaccharide-transfer reaction, using membrane fractions that contained AglB and lipid-linked oligosaccharides, the donor of oligosaccharide for OST. For better sensitivity in NMR measurements, 13C-glucose was added to the culture medium for stable isotope labeling of the lipid-linked oligosaccharides in the case of A. fulgidus glycan.
PubMed: 24562177, 26093517

The oligosaccharide donor for the OST enzyme is a lipid-linked oligosaccharide (LLO), in which an oligosaccharide chain is preassembled on a lipid-phospho carrier. We determined the archaeal LLO structures from the phylum Euryarchaeota, Pyrococcus furiosus and Archaeoglobus fulgidus and the phylum Crenarchaeota, Pyrobaculum calidifontis and Sulfolobus solfataricus, by LC-MSMS analysis. We found that the euryarchaeal LLOs are dolichol-monophosphate-oligosaccharide, and but crenarchaeal LLOs are dolichol-diphosphate-oligosaccharide. This novel finding provides an insight into the evolution of the N-glycosylation system.
PubMed:27015803

E. coli PriA protein

DNA replication forks are arrested by various internal and external threats. In bacteria, the PriA protein is a sensor protein that recognizes the arrested forks. We found that PriA specifically recognized the 3'-termini of arrested nascent DNA chains. The fluorescence correlation spectroscopy analyses show that the N-terminal domain of E. coli PriA has almost the same affinity for four 3' terminal nucleotides, A, C, G, and T of oligonucleotides. We determined the crystal structures of the N-terminal domain (105 aa) of PriA in complexes with oligonucleotides, ApA, ApC, ApG, ApT, CpCpC.
PubMed: 17464287

A hypothetical complex model of the N-terminal domain of PriA and arrested fork-like DNA structure was made. One aspartate residue (Asp17) has intimate contacts with the four bases in a manner without discriminating them nor disturbing the base pairing, to realize the non-selective recognition of the 3’-end base of dsDNA.
PubMed: 20658707
（collaboration with Prof. Hisao Masai, Tokyo Metropolitan Institute of Medical Science）