|
Refinement TutorialThis is an introduction to the procedures for usingRefmac5 to refine a crystal structure.See also the documentation for Refmac5and Sketcher The ProblemThis example is to refine the protein RnaSa, this isThe final structure (solved by Joseph Sevcik: J. Sevcik, Z. Dauter, V.S. Lamzin, K. Wilson, Acta Cryst. D52 (1996) p327-344) looks like this:
What are Restraints?There are 1749 atoms in the asymmetric unit - for each atom there are three parameters x,y,z and B (temperature factor) so that is 6996 parameters. In the experimental data there are 17991 reflections - this is ratio 17991/6996 = 2.57 This is not very much data. The best way to get more data is to use the information that we known about the geometry of molecules - the bond lengths and bond angles etc.. The refinement program will use restraints which say, for example, that two bonded atoms must be close to the correct bond length apart.Refinement programs have geometry libraries which contain information about the correct bond lengths, bond angles, planar groups etc.. for the chemical structures which are often found in crystallography (e.g. amino acids and nucleic acids). But if your structure contains an unusual molecule then it might not be in the monomer library. Often the most difficult thing in using refinement programs is making the geometry description so we will make the geometry description for the ligand to show how it is done. Outline of the Method1) Create the geometry description of the ligand.2) Use Refmac to find any more special restraints - for example the disulphide bonds. Check the results and make sure the special restraints are correct. 3) Run the Refmac program to refine the structure and
look at the result.
The Data FilesFiles are in directory $DATAGMP.pdb The coordinates of the ligand molecule rnasa.pdb The protein-ligand coordinates rnasa_bad.pdb Some bad coordinates for rnasa. rnasa.mtz The experimental data The output files are in the directory $RESULTS
review.log - The log file
from running Refmac5 to review restraints.
refmac.log - The log
file from Refmac5 refinement.
Stage 1) Create a Monomer Library Entry for the ligand GMPThere are three ways to get a geometry description:a) There are over 2000 molecules in the Refmac
library so your ligand may already be in the library. In fact the
ligand in our example is in the library but we will not use it because
that is too easy.
b) If you have coordinates for the ligand in a PDB file - it is possible to make a monomer library entry very quickly - we will try doing this. c) If you have no coordinates for the ligand then you need to draw the molecule and then the programs will make a geometry description and will also make a PDB files with coordinates. This can be made easier if there is a similar molecule in the library - you can get this molecule from the library and edit - we will also try doing this. b) Creating Monomer Library Entry from Coordinate File1.1) Change to the Refinement module and select the Monomer Library Sketcher task. See a picture hereFor the next steps see a picture here. 1.2) From the File menu at the top of the window select Read File and from the next menu select Read PDB file. Select the file: Go to directory DATA
A message will come up informing you that a geometry file already exists and asks you if you wish to read the geometry information from this file. If you answer YES, go straight to step 1.7, skipping steps 1.3, 1.4, 1.5 and 1.6. If you choose NO, you will have to carry out these steps mentioned below. 1.3) You will see the molecule displayed. You can rotate it by holding down the left mouse button. On the right on the window is also the list of atoms (the list has the element, the atom name and the oxidation state (the charge on the atom). Below the list of atoms is a list of the chiral centres found in the molecule. There is some information which is not in the PDB file which you must enter to give a full description of the molecule: the double bonds and the oxidation states. 1.4) The picture of GMP shows the correct double bonds - edit your molecule to have the correct double bonds. To change a bond to a double bond you must hold down the Shift key on the keyboard and click on the bond with the right mouse button. 1.5) The program has found three chiral centres in the molecule but it did not find the chiral centre a C3* because the atoms are nearly in a plane. We can add the extra chiral centre. Click on Add Row and enter: 4 C3* C4* C2* O3* - 1.6) Two of the oxygen atoms in the phosphate group are charged. Look at the table on the right of the window and find the atoms O31 and O32. Click on the menu in the oxidation state column and set the oxidation state to -1. 1.7) Now create the monomer library. From the File pull-down menu at the top of the window select Create Library Description. In the window enter the name of the ligand (this must be GMP which is the name of the ligand in the PDB file which we will refine). So Unique identifier GMP Full name: guanosine 3 monophosphate The names of files will be created
automatically so you can then select Run.
1.8) You must wait a little time - a program called Libcheck is running. When it is finished the molecule is displayed again. 1.9) Now look at the new monomer library file. Look at the CCP4i main window and select the last job which is called dictionary. Now select the View Files from Job menu (on the right side of the main window) and select the file GMP_mon_lib.cif In this file you will see a list of the atoms.. loop_
Further down the file is the list of bonds _chem_comp_bond.comp_id
The words at the top of the list tell you what is in each column: comp_id
- this is the compound id - this is always GMP.
And also there is information on bond angle, torsion angle, chirality and planar groups. The refinement program will try to make the ligand as defined in this file - you can edit the file if you need to. c) Creating a Monomer Library Entry by Drawing the MoleculeThis section is optional - you can go to Stage 2.If you have no coordinates or other definition of the ligand then you must draw the molecule in the sketcher. Sometimes there may be a similar molecule in the library - you can start from this and edit it. There is a guanosine molecule in the library which we can use to make GMP. 1.10) Delete any molecule that you have displayed - from the Edit pull down menu select Delete All Atoms. 1.11) From the File pull-down menu select Read File and then Load Monomer from Library. In the new window change the fifth line to List RNA monomers Now you will see a list of RNA monomers - click on the line: Gr - Guanine Then click on the Run Now button. You must wait a little while before the molecule of guanosine is displayed. To make GMP you must delete the phosphate group on the O5* and draw a phosphate group on O3*. For the next steps see a picture here 1.12) Make sure that the Mouse Mode (on the left of the Sketcher window) is set to Edit Monomer. 1.13) From the edit tools on the left of the Sketcher window select the delete atom icon from the edit tools on the left of the window. 1.14) Hold down the Shift key and click with the left mouse button on the atoms 01P O2P and P to delete them. 1.15) To add the new phosphate group select the the 'add one atom'from the edit tools. The atoms that you add will be carbon atoms - you will change them later to phosphorus and oxygen. Make O3* the active atom by holding
down the Control key
and clicking on it with the right mouse
button. It is now the flashing, active atom. Now add another atom
by holding down the Shift
key and clicking with the left mouse
button close to the O3* at the place where the new atom should go.
You now have an atom called C21 and it is the active atom. Now click
close to this atom with Shift - left
mouse to make one more atom.
1.16) Now look at the end of table on the right side of the Sketcher window - the new atoms are C21, C22, C23 and C24. You see that each atom has the elment type C - change C21 to a P and the other three to O. The atom names are also wrong - change the names to P3, O31, O32 and O33. 1.17) The atoms O31 and O32 should have and oxidation state -1 so change the number in the third column for O31 and O32 to -1. 1.18) Finally look at the molecule again - there should be a double bond between the atoms P3-O33. To change this hold down the Shift key and click on the bond P3-O33 with the right mouse button. 1.19) Now we create the monomer library. From the File pull-down menu at the top of thewindow select Create Library Description. In the window enter the name of the ligand (call this TEST so you do not overwrite the files you made before). So Unique identifier TESTFull name: guanosine 3 monophosphate The names of files will be created
automatically so you can then select Run.
1.20) Wait while the program runs to build the dictionary
file. The molecule is drawn again. If necessary you can make
corrections and run again. To close the Sketcher window open the
File pull-down menu and select Close
Sketcher.
Stage 2 Create and Review Special RestraintsSo you can see what happens we will use a file $DATA/rnasa_bad.pdb which I have edited to put in some bad things. The residue ARG B:63 is moved so that it is too close to ARG A:63.
We will use Refmac5 in Review Restraints mode. The program will look at the atom coordinates and decide where there are disulphide bonds, cis-pepetides and D peptides. It will also calculate the distance between atoms and if they are very close then it will assume the atoms are bonded and will make a make a restraint to say 'these two atoms must stay close'. Of course this is not always right. It will also add any absent atoms - if a residue does not have the right atoms then it will make them. Refmac5 will help you by finding the disulphide bonds etc. automatically but you MUST check that they are correct. 2.1) From the Refinement menu in CCP4i main window select Run Refmac5. For the next steps see a picture here 2.2) At the top of the window set: Do review restraints 2.3) Now select the input coordinate file: PDB in
DATA rnasa_bad.pdb
Library
DATA GMP_mon_lib.cif
2.4) Look in the folder called Setup Restraints. In here you can decide what to look for in the PDB file. We will use the defaults - you do not need to change anything. 2.5) Now run the Refmac5 program. From the Run menu at the bottom of the window choose Run Now. 2.6) The job will take a little time. When it is finished it will say FAILED - this is OK. Look at the log file (click on the name of the job, refmac5, in the main window and use View Files from Job and ViewLog Files). Some interesting things in the log file: WARNING : CIS peptide bond
is found, angle = 6.49
WARNING : link:SS
is found dist = 2.027 ideal_dist=
2.031
These things are correct, Refmac5 has checked the input protein molecule and found some cis peptide bonds and some disulphide bonds, but there is also: WARNING : description of link:ARG-ARG
not found in the dictionary.
This is bad, Refmac5 will try to make bonds between the residues which are too close. 2.7) Now look at the output PDB file: use View Files from Job and select rnasa_bad_refmac1.pdb. At the top of this file is new information..
LINK NH1 ARG A 40 . NE ARG B 63 ARG-ARG LINK NH1 ARG A 40 . CZ ARG B 63 ARG-ARG1 LINK NH1 ARG A 40 . NH2 ARG B 63 ARG-ARG2 CISPEP 1 GLY A 26 PRO A 27 0.00 SSBOND 1 CYS A 96 CYS A 7 CISPEP 2 GLY B 26 PRO B 27 0.00 SSBOND 2 CYS B 96 CYS B 7Now Quit from the window. 2.8) It is necessary to edit the PDB file to remove the bad LINK information. There is an easy way to do this. From the Refinement menu in CCP4i main window select Edit Restraints in PDB . 2.9) Select the input file: PDB in TEST rnasa_bad_refmac1.pdb Wait while the program reads the file. 2.10) In the window you will now see: The space group and the symmetry operators for the space group. (You may need this information to define disulphide bonds or links between molecules that are not in the same asymmetric unit). The MODRES Ids and LINK Modes - this is information from the dictionary. The MODRES are possible modifications that you can make to a residue and the LINKs are different ways to link two monomers (of amino acid, nucleic acid or sugar). MODRES - Modified Residues. This is a way to define any non-standard residues. There are none in the rnasa protein. SSBOND - Disulphide Bonds. The two disulphide bonds in rnasa are shown. LINK - Inter-Residue Bonds. The three bad bonds are listed. CISPEP - Cis peptides. The two cis-peptides in rnase are listed. 2.11) You can delete the bad LINKs by clicking on the menu Edit Table and selecting Delete Last Row. Do this three times. See a picture here It is also possible to add new things - try clicking on Add Row. 2.12) We will stop using the rnasa_bad so you do not need to save changes - Close the window. Stage 3 Refining the MoleculeNow we will use the monomer library description that we created to refine the rnasa molecule with the GMP ligand.3.1) From the Refinement menu in CCP4i main window select Run Refmac5. For the next steps see a picture here 3.2) At the top of the window set: Do restrained refinement using no prior phase information input Also you will see: Generate weighted difference maps in CCP4 format. If you have a graphics program to look at the maps then click this on and select a map format. 3.3) Now select the input files - the experimental data: MTZ in DATA rnasa.mtz and make sure you have correct data columns: FP FGMP18 SIGFP SIGFGMP18 and the coordinate file: PDB in DATA rnasa.pdb To use the geometry description file which you have made Library test GMP_mon_lib.cif 3.5) Now run the Refmac5 program. From the Run menu at the bottom of the window choose Run Now. 3.6) The job will take a little time. When it is finished look at the log file (click on the name of the job, refmac5, in the main window and use View Files from Job and ViewLog Graphs). It you do not have a log file then click on View Any File and set Go to directory RESULTS
and then select file File refmac.log Go to the last table in the Tables in Files and click on: Rfactor analysis, stats vs cycle You will see a graph of the R factor and the Free R factor for the 6 cycles of refinement. The R factor is very good already but both go down a little.
FOM vs cycle
The FOM tells you how well the molecule matches the experimental data and the Geometry tells you how well the molecule obeys the geometry restraints. Also up the Tables in File list and select the last Rfactor analysis, F distribution v resln This is information about the last cycle of refinement. If you look at <Rfactor> v. resln
The red line is the average R factor versus resolution for the data which is used and the green line is the Free R factor (for the 'free' data which is not used). This is similar for all resolution - it does not go up for high resolution data. This is example of what is good about maximum likelihood refinement compared with the old-fashioned least squares. Also look at the graph <Fobs> and <Fc> v. resln
This is a graph of the average observed structure factors and calculated structure factors. You notice that at low resolution the observed (red) and the calculated (blue) are not the same. At low resolution the water atoms, which we can not see in the crystal structure, are an important part of the structure factors. The refinement program tries to model the water atoms by solvent scaling but it is difficult for this data because some of the very low resolution data is missing. To close the loggraph window click on the File menu and select Exit 3.7) Look at the header of the output MTZ file - click on View Files from Job and select the file rnasa_refmac1.mtz. In the file you will see: * Column Labels : H K L FNAT SIGFNAT FreeR_flag FC PHIC FWT PHWT DELFWT PHDELWT FOM The new data in the file is: FC & PHIC
- the structure factures and phases calculated from the final coordinates
If you selected the option to create output maps then you can look at the maps created from the Refmac output. ...FWT.map - the 'best' weighted map
An example of these maps is shown below for a a tyrosine
residue which is in the wrong place.
Here you can see a large pink area of negative density where the tyrosine side chain is now. This is saying that the side chain should not be here. The large (brown) area of positive density is showing where the side chain should be. The FWT map is the weighted map and looks like this:
You can see region of density to the left of the tyrosine
which is where it should go.
To find out more Refmac: http://www.ysbl.york.ac.uk/~garib/refmac Libcheck: http://www.ysbl.york.ac.uk/~alexei/libcheck.html CCP4: http://www.dl.ac.uk/CCP/CCP4 Prepared by Liz Potterton (lizp@ysbl.york.ac.uk) and Eleanor Dodson, July 2000 |