GPREL.TXT Genetic Sequence Data Bank 06-16-2008 GenPept Release 166.0 Translated Protein-coding Sequences 5609606 loci containing 1714068683 residues Table of Contents 1. INTRODUCTION 1.1 Release 166.0 1.2 Organization of This Document 1.3 Important Changes in Release 166.0 1.4 Recent Changes in the Data Bank 1.4.1 New Record Types Added (Release 141.0) 1.4.2 LOCUS Line Adjusted (Release 144.0) 1.4.3 New Record Type Added (Release 146.0) 1.4.4 ENV Division Added (Release 147.0) 1.5 Upcoming Changes 2. ORGANIZATION OF FILES 2.1 File Descriptions 2.2 Entries by division 3. FILE FORMAT 3.1 File Header Information 3.2 Sequence Entry Files 3.2.1 Entry Organization 3.2.2 Sample Sequence Data File 4 TRADEMARKS, CITATIONS, ETC. 4.1 Registered Trademark Notices 4.2 Citing GenPept 4.3 GenPept Distribution Format 4.4 Disclaimer APPENDIX A - IUPAC-IUB AMINO ACID CODES List Of Examples and Tables Example 1. Sample File Header Example 2. Sample Sequence Data File This document describes the GenPept data bank available via anonymous FTP from the Advanced Biomedical Computing Center (ftp.ncifcrf.gov). GenPept is produced by parsing the corresponding GenBank release for translated coding regions of GenBank as defined in the FEATURES section of each sequence. If you have any questions or comments about the data bank or this document, please contact: Gary Smythers gws@ncifcrf.gov 301-846-5778 or Bob Stephens bobs@ncifcrf.gov # ----------------------------------------------------------------- # | Gary W. Smythers | email: gws@ncifcrf.gov | # | Programmer Analyst IV | | # | Advanced Biomedical Computing Center | Phone: (301) 846-5778 | # | SAIC NCI-Frederick | FAX: (301) 846-5762 | # | PO Box B, Bldg 430 | | # | Frederick, MD 21702-1201 USA | | # ----------------------------------------------------------------- 1. INTRODUCTION 1.1 Release 166.0 GenPept Release 166.0 includes the translations of all protein coding regions in GenBank Release 166.0. GenPept Release 166.0 includes 5,609,606 loci representing 1,714,068,683 residues. Supplemental files of daily updates, both cumulative and non-cumulative are also available. 1.2 Organization of This Document This introduction notes changes to the GenPept data bank since the last release. The next section describes the contents of the files. The third section illustrates the formats of the files. 1.3 Important Changes in Release 166.0 NONE 1.4 Recent Changes in the Data Bank 1.4.1 New Record Types Added (Release 141.0) Format Enhancement - New Record Types Added A number of new record types were added to enhance the data content of GenPept. New Types: VERSION - A compound identifier consisting of the GenPept Locus and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the peptide sequence. Mandatory keyword/exactly one record. KEYWORDS - Short phrases describing gene products and other information, taken directly from the corresponding GenBank entry. Mandatory keyword in all annotated entries/one or more records. SOURCE - Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword. PI - Isoelectric point. Mandatory keyword/exactly one record. COMMENT /NucGI - GI of corresponding nucleotide entry The LOCUS line will contain new additional information: Number of amino acids, GB division, date. 1.4.2 LOCUS Line Adjusted (Release 144.0) The LOCUS line has been ajusted to allow longer GenBank Locus names and longer sequence lengths. Detailed format for the LOCUS line: Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-25 GenPept Locus name 26-26 space 27-42 GenBank Locus name 43-43 space 44-49 Length of peptide sequence 50-50 space 51-52 'aa' 53-53 space 54-56 'PEP' 57-57 space 58-63 'linear' 64-64 space 65-67 GenBank division code 68-68 space 69-79 Date, in format dd-mmm-yyyy 1.4.3 New Record Type Added (Release 146.0) COMMENT /locus_tag - feature tag assigned for tracking purposes 1.4.4 ENV Division Added (Release 147.0) A new division for sequences obtained via environmental sampling methods has been added. This new division contains sequences for which the source organism is unknown, or can only be inferred by sequence comparison. 1.5 Upcoming Changes NONE 2. ORGANIZATION OF FILES 2.1 File Descriptions The GenPept release includes the following files: /pub/genpept/gprel.txt.gz - Release Notes (this document). /pub/genpept/gpdat.seq.gz - All GenPept entries. /pub/genpept/gpdat.fasta.gz - All GenPept entries (fasta format). /pub/genpept/divisions/gpbct1.seq.gz - Bacterial sequences. . . . . /pub/genpept/divisions/gpbct29.seq.gz - . /pub/genpept/divisions/gpenv1.seq.gz - Environmental . . . . /pub/genpept/divisions/gpenv9.seq.gz - /pub/genpept/divisions/gpest1.seq.gz - Expressed sequence tags. . . . . /pub/genpept/divisions/gpest738.seq.gz - . /pub/genpept/divisions/gpgss1.seq.gz - Genome Survey Sequence. . . . . /pub/genpept/divisions/gpgss290.seq.gz - . /pub/genpept/divisions/gphtc1.seq.gz - High Throughput cDNA. . . . . /pub/genpept/divisions/gphtc13.seq.gz - . /pub/genpept/divisions/gphtg1.seq.gz - High Throughput Genome. . . . . /pub/genpept/divisions/gphtg117.seq.gz - . /pub/genpept/divisions/gpinv1.seq.gz - Invertebrate sequences. . . . . /pub/genpept/divisions/gpinv12.seq.gz - . /pub/genpept/divisions/gpmam1.seq.gz - Other mammalian sequences. . . . . /pub/genpept/divisions/gpmam4.seq.gz - . /pub/genpept/divisions/gppat1.seq.gz - Patent sequences. . . . . /pub/genpept/divisions/gppat42.seq.gz - . /pub/genpept/divisions/gpphg.seq.gz - Phage sequences. /pub/genpept/divisions/gppln1.seq.gz - Plant sequences. . . . . /pub/genpept/divisions/gppln29.seq.gz - . /pub/genpept/divisions/gppri1.seq.gz - Primate sequences. . . . . /pub/genpept/divisions/gppri36.seq.gz - . /pub/genpept/divisions/gprod1.seq.gz - Rodent sequences. . . . . /pub/genpept/divisions/gprod26.seq.gz - . /pub/genpept/divisions/gpsts1.seq.gz - STS sequences. . . . . /pub/genpept/divisions/gpsts14.seq.gz - . /pub/genpept/divisions/gpsyn1.seq.gz - Synthetic and chimeric sequences. . . . . /pub/genpept/divisions/gpsyn2.seq.gz . /pub/genpept/divisions/gpuna.seq.gz - Unannotated sequences. /pub/genpept/divisions/gpvrl1.seq.gz - Viral sequences. . . . . /pub/genpept/divisions/gpvrl9.seq.gz - . /pub/genpept/divisions/gpvrt1.seq.gz - Other vertebrate sequences. . . . . /pub/genpept/divisions/gpvrt15.seq.gz - . /pub/genpept/updates/gpseq_updates.dat.gz - Daily cumulative updates. /pub/genpept/updates/gpncMMDD.seq.gz - Daily non-cumulative updates. 2.2 Entries by division: Filename Loci Residues gpbct1 73309 22551097 gpbct10 90811 27029249 gpbct11 89049 28273847 gpbct12 102072 31318494 gpbct13 100228 32543054 gpbct14 103800 33769360 gpbct15 106977 34061041 gpbct16 102839 32679433 gpbct17 100104 33178014 gpbct18 102648 32565373 gpbct19 105272 32239143 gpbct2 102418 30781101 gpbct20 102804 32716512 gpbct21 99927 31392079 gpbct22 104980 32630190 gpbct23 104058 32503227 gpbct24 102984 32403757 gpbct25 100109 31351602 gpbct26 95497 30576040 gpbct27 64993 18546084 gpbct28 55648 13727219 gpbct29 65218 18736685 gpbct3 106203 32811595 gpbct4 88053 26549574 gpbct5 63503 17525309 gpbct6 83042 26004444 gpbct7 86222 26444602 gpbct8 101231 31210832 gpbct9 82292 23742930 SOURCE gpenv1 18765 3885993 gpenv2 17725 3522557 gpenv3 19512 4374792 gpenv4 10475 1837954 gpenv5 5094 998648 gpenv6 2454 419512 gpenv7 8050 1439265 gpenv8 6209 1265599 gpenv9 2368 423867 gpest1 0 0 gpest10 0 0 gpest100 0 0 gpest101 0 0 gpest102 0 0 gpest103 0 0 gpest104 0 0 gpest105 0 0 gpest106 0 0 gpest107 0 0 gpest108 0 0 gpest109 0 0 gpest11 0 0 gpest110 0 0 gpest111 0 0 gpest112 0 0 gpest113 0 0 gpest114 0 0 gpest115 0 0 gpest116 0 0 gpest117 0 0 gpest118 0 0 gpest119 0 0 gpest12 0 0 gpest120 0 0 gpest121 0 0 gpest122 0 0 gpest123 0 0 gpest124 0 0 gpest125 0 0 gpest126 0 0 gpest127 0 0 gpest128 0 0 gpest129 0 0 gpest13 0 0 gpest130 0 0 gpest131 0 0 gpest132 0 0 gpest133 0 0 gpest134 0 0 gpest135 0 0 gpest136 0 0 gpest137 0 0 gpest138 0 0 gpest139 0 0 gpest14 0 0 gpest140 0 0 gpest141 0 0 gpest142 0 0 gpest143 0 0 gpest144 0 0 gpest145 0 0 gpest146 0 0 gpest147 0 0 gpest148 0 0 gpest149 0 0 gpest15 0 0 gpest150 0 0 gpest151 0 0 gpest152 0 0 gpest153 0 0 gpest154 0 0 gpest155 0 0 gpest156 0 0 gpest157 0 0 gpest158 0 0 gpest159 0 0 gpest16 0 0 gpest160 0 0 gpest161 0 0 gpest162 0 0 gpest163 0 0 gpest164 0 0 gpest165 0 0 gpest166 0 0 gpest167 0 0 gpest168 0 0 gpest169 0 0 gpest17 0 0 gpest170 0 0 gpest171 0 0 gpest172 0 0 gpest173 0 0 gpest174 0 0 gpest175 0 0 gpest176 0 0 gpest177 0 0 gpest178 0 0 gpest179 0 0 gpest18 0 0 gpest180 0 0 gpest181 0 0 gpest182 0 0 gpest183 0 0 gpest184 0 0 gpest185 0 0 gpest186 0 0 gpest187 0 0 gpest188 0 0 gpest189 0 0 gpest19 0 0 gpest190 0 0 gpest191 0 0 gpest192 0 0 gpest193 0 0 gpest194 0 0 gpest195 0 0 gpest196 0 0 gpest197 0 0 gpest198 0 0 gpest199 0 0 gpest2 0 0 gpest20 0 0 gpest200 0 0 gpest201 0 0 gpest202 0 0 gpest203 0 0 gpest204 0 0 gpest205 0 0 gpest206 0 0 gpest207 0 0 gpest208 0 0 gpest209 0 0 gpest21 0 0 gpest210 0 0 gpest211 0 0 gpest212 0 0 gpest213 0 0 gpest214 0 0 gpest215 0 0 gpest216 0 0 gpest217 0 0 gpest218 0 0 gpest219 0 0 gpest22 0 0 gpest220 0 0 gpest221 0 0 gpest222 0 0 gpest223 0 0 gpest224 0 0 gpest225 0 0 gpest226 0 0 gpest227 0 0 gpest228 0 0 gpest229 0 0 gpest23 0 0 gpest230 0 0 gpest231 0 0 gpest232 0 0 gpest233 0 0 gpest234 0 0 gpest235 0 0 gpest236 0 0 gpest237 0 0 gpest238 0 0 gpest239 0 0 gpest24 0 0 gpest240 0 0 gpest241 0 0 gpest242 0 0 gpest243 0 0 gpest244 0 0 gpest245 0 0 gpest246 0 0 gpest247 0 0 gpest248 0 0 gpest249 0 0 gpest25 0 0 gpest250 0 0 gpest251 0 0 gpest252 0 0 gpest253 0 0 gpest254 0 0 gpest255 0 0 gpest256 0 0 gpest257 0 0 gpest258 0 0 gpest259 0 0 gpest26 0 0 gpest260 0 0 gpest261 0 0 gpest262 0 0 gpest263 0 0 gpest264 0 0 gpest265 0 0 gpest266 0 0 gpest267 0 0 gpest268 0 0 gpest269 0 0 gpest27 0 0 gpest270 0 0 gpest271 0 0 gpest272 0 0 gpest273 0 0 gpest274 0 0 gpest275 0 0 gpest276 0 0 gpest277 0 0 gpest278 0 0 gpest279 0 0 gpest28 0 0 gpest280 0 0 gpest281 0 0 gpest282 0 0 gpest283 0 0 gpest284 0 0 gpest285 0 0 gpest286 0 0 gpest287 0 0 gpest288 0 0 gpest289 0 0 gpest29 0 0 gpest290 0 0 gpest291 0 0 gpest292 0 0 gpest293 0 0 gpest294 0 0 gpest295 0 0 gpest296 0 0 gpest297 0 0 gpest298 0 0 gpest299 0 0 gpest3 0 0 gpest30 0 0 gpest300 0 0 gpest301 0 0 gpest302 0 0 gpest303 0 0 gpest304 0 0 gpest305 0 0 gpest306 0 0 gpest307 0 0 gpest308 0 0 gpest309 0 0 gpest31 0 0 gpest310 0 0 gpest311 0 0 gpest312 0 0 gpest313 0 0 gpest314 0 0 gpest315 0 0 gpest316 0 0 gpest317 0 0 gpest318 0 0 gpest319 0 0 gpest32 0 0 gpest320 0 0 gpest321 0 0 gpest322 0 0 gpest323 0 0 gpest324 0 0 gpest325 0 0 gpest326 0 0 gpest327 0 0 gpest328 0 0 gpest329 0 0 gpest33 0 0 gpest330 0 0 gpest331 0 0 gpest332 0 0 gpest333 0 0 gpest334 0 0 gpest335 0 0 gpest336 0 0 gpest337 0 0 gpest338 0 0 gpest339 0 0 gpest34 0 0 gpest340 0 0 gpest341 0 0 gpest342 0 0 gpest343 0 0 gpest344 0 0 gpest345 0 0 gpest346 0 0 gpest347 0 0 gpest348 0 0 gpest349 0 0 gpest35 0 0 gpest350 0 0 gpest351 0 0 gpest352 0 0 gpest353 0 0 gpest354 0 0 gpest355 0 0 gpest356 0 0 gpest357 0 0 gpest358 0 0 gpest359 0 0 gpest36 0 0 gpest360 0 0 gpest361 0 0 gpest362 0 0 gpest363 0 0 gpest364 0 0 gpest365 0 0 gpest366 0 0 gpest367 0 0 gpest368 0 0 gpest369 0 0 gpest37 0 0 gpest370 0 0 gpest371 0 0 gpest372 0 0 gpest373 0 0 gpest374 0 0 gpest375 0 0 gpest376 0 0 gpest377 0 0 gpest378 0 0 gpest379 0 0 gpest38 0 0 gpest380 0 0 gpest381 0 0 gpest382 0 0 gpest383 0 0 gpest384 0 0 gpest385 0 0 gpest386 0 0 gpest387 0 0 gpest388 0 0 gpest389 0 0 gpest39 0 0 gpest390 0 0 gpest391 0 0 gpest392 0 0 gpest393 0 0 gpest394 0 0 gpest395 0 0 gpest396 0 0 gpest397 0 0 gpest398 0 0 gpest399 0 0 gpest4 0 0 gpest40 0 0 gpest400 0 0 gpest401 0 0 gpest402 0 0 gpest403 0 0 gpest404 0 0 gpest405 0 0 gpest406 0 0 gpest407 0 0 gpest408 0 0 gpest409 0 0 gpest41 0 0 gpest410 0 0 gpest411 0 0 gpest412 0 0 gpest413 0 0 gpest414 0 0 gpest415 0 0 gpest416 0 0 gpest417 0 0 gpest418 0 0 gpest419 0 0 gpest42 0 0 gpest420 0 0 gpest421 0 0 gpest422 0 0 gpest423 0 0 gpest424 0 0 gpest425 0 0 gpest426 0 0 gpest427 0 0 gpest428 0 0 gpest429 0 0 gpest43 0 0 gpest430 0 0 gpest431 0 0 gpest432 0 0 gpest433 0 0 gpest434 0 0 gpest435 0 0 gpest436 0 0 gpest437 0 0 gpest438 0 0 gpest439 0 0 gpest44 0 0 gpest440 0 0 gpest441 0 0 gpest442 0 0 gpest443 0 0 gpest444 0 0 gpest445 0 0 gpest446 0 0 gpest447 0 0 gpest448 0 0 gpest449 0 0 gpest45 0 0 gpest450 0 0 gpest451 0 0 gpest452 0 0 gpest453 0 0 gpest454 0 0 gpest455 0 0 gpest456 0 0 gpest457 0 0 gpest458 0 0 gpest459 0 0 gpest46 0 0 gpest460 0 0 gpest461 0 0 gpest462 0 0 gpest463 0 0 gpest464 0 0 gpest465 0 0 gpest466 0 0 gpest467 0 0 gpest468 0 0 gpest469 0 0 gpest47 0 0 gpest470 0 0 gpest471 0 0 gpest472 0 0 gpest473 0 0 gpest474 0 0 gpest475 0 0 gpest476 0 0 gpest477 0 0 gpest478 0 0 gpest479 0 0 gpest48 0 0 gpest480 0 0 gpest481 0 0 gpest482 0 0 gpest483 0 0 gpest484 0 0 gpest485 0 0 gpest486 0 0 gpest487 0 0 gpest488 0 0 gpest489 0 0 gpest49 0 0 gpest490 0 0 gpest491 0 0 gpest492 0 0 gpest493 0 0 gpest494 0 0 gpest495 0 0 gpest496 0 0 gpest497 0 0 gpest498 0 0 gpest499 0 0 gpest5 0 0 gpest50 0 0 gpest500 0 0 gpest501 0 0 gpest502 0 0 gpest503 0 0 gpest504 0 0 gpest505 0 0 gpest506 0 0 gpest507 0 0 gpest508 0 0 gpest509 0 0 gpest51 0 0 gpest510 0 0 gpest511 0 0 gpest512 0 0 gpest513 0 0 gpest514 0 0 gpest515 0 0 gpest516 0 0 gpest517 0 0 gpest518 0 0 gpest519 0 0 gpest52 0 0 gpest520 0 0 gpest521 0 0 gpest522 0 0 gpest523 0 0 gpest524 0 0 gpest525 0 0 gpest526 0 0 gpest527 0 0 gpest528 0 0 gpest529 0 0 gpest53 0 0 gpest530 0 0 gpest531 0 0 gpest532 0 0 gpest533 0 0 gpest534 0 0 gpest535 0 0 gpest536 0 0 gpest537 0 0 gpest538 0 0 gpest539 0 0 gpest54 0 0 gpest540 0 0 gpest541 0 0 gpest542 0 0 gpest543 0 0 gpest544 0 0 gpest545 0 0 gpest546 0 0 gpest547 0 0 gpest548 0 0 gpest549 0 0 gpest55 0 0 gpest550 0 0 gpest551 0 0 gpest552 0 0 gpest553 0 0 gpest554 0 0 gpest555 0 0 gpest556 0 0 gpest557 0 0 gpest558 0 0 gpest559 0 0 gpest56 0 0 gpest560 0 0 gpest561 0 0 gpest562 0 0 gpest563 0 0 gpest564 0 0 gpest565 0 0 gpest566 0 0 gpest567 0 0 gpest568 0 0 gpest569 0 0 gpest57 0 0 gpest570 0 0 gpest571 0 0 gpest572 0 0 gpest573 0 0 gpest574 0 0 gpest575 0 0 gpest576 0 0 gpest577 0 0 gpest578 0 0 gpest579 0 0 gpest58 0 0 gpest580 0 0 gpest581 0 0 gpest582 0 0 gpest583 0 0 gpest584 0 0 gpest585 0 0 gpest586 0 0 gpest587 0 0 gpest588 0 0 gpest589 0 0 gpest59 0 0 gpest590 0 0 gpest591 0 0 gpest592 0 0 gpest593 0 0 gpest594 0 0 gpest595 0 0 gpest596 0 0 gpest597 0 0 gpest598 0 0 gpest599 0 0 gpest6 0 0 gpest60 0 0 gpest600 0 0 gpest601 0 0 gpest602 0 0 gpest603 0 0 gpest604 0 0 gpest605 0 0 gpest606 0 0 gpest607 0 0 gpest608 0 0 gpest609 0 0 gpest61 0 0 gpest610 0 0 gpest611 0 0 gpest612 0 0 gpest613 0 0 gpest614 0 0 gpest615 0 0 gpest616 0 0 gpest617 0 0 gpest618 0 0 gpest619 0 0 gpest62 0 0 gpest620 0 0 gpest621 0 0 gpest622 0 0 gpest623 0 0 gpest624 0 0 gpest625 0 0 gpest626 0 0 gpest627 0 0 gpest628 0 0 gpest629 0 0 gpest63 0 0 gpest630 0 0 gpest631 0 0 gpest632 0 0 gpest633 0 0 gpest634 0 0 gpest635 0 0 gpest636 0 0 gpest637 0 0 gpest638 0 0 gpest639 0 0 gpest64 0 0 gpest640 0 0 gpest641 0 0 gpest642 0 0 gpest643 0 0 gpest644 0 0 gpest645 0 0 gpest646 0 0 gpest647 0 0 gpest648 0 0 gpest649 0 0 gpest65 0 0 gpest650 0 0 gpest651 0 0 gpest652 0 0 gpest653 0 0 gpest654 0 0 gpest655 0 0 gpest656 0 0 gpest657 0 0 gpest658 0 0 gpest659 0 0 gpest66 0 0 gpest660 0 0 gpest661 0 0 gpest662 0 0 gpest663 0 0 gpest664 0 0 gpest665 0 0 gpest666 0 0 gpest667 0 0 gpest668 0 0 gpest669 0 0 gpest67 0 0 gpest670 0 0 gpest671 0 0 gpest672 0 0 gpest673 0 0 gpest674 0 0 gpest675 0 0 gpest676 0 0 gpest677 0 0 gpest678 0 0 gpest679 0 0 gpest68 0 0 gpest680 0 0 gpest681 0 0 gpest682 0 0 gpest683 0 0 gpest684 0 0 gpest685 0 0 gpest686 0 0 gpest687 0 0 gpest688 0 0 gpest689 0 0 gpest69 0 0 gpest690 0 0 gpest691 0 0 gpest692 0 0 gpest693 0 0 gpest694 0 0 gpest695 0 0 gpest696 0 0 gpest697 0 0 gpest698 0 0 gpest699 0 0 gpest7 0 0 gpest70 0 0 gpest700 0 0 gpest701 0 0 gpest702 0 0 gpest703 0 0 gpest704 0 0 gpest705 0 0 gpest706 0 0 gpest707 0 0 gpest708 0 0 gpest709 0 0 gpest71 0 0 gpest710 0 0 gpest711 0 0 gpest712 0 0 gpest713 0 0 gpest714 0 0 gpest715 0 0 gpest716 0 0 gpest717 0 0 gpest718 0 0 gpest719 0 0 gpest72 0 0 gpest720 0 0 gpest721 0 0 gpest722 0 0 gpest723 0 0 gpest724 0 0 gpest725 0 0 gpest726 0 0 gpest727 0 0 gpest728 0 0 gpest729 0 0 gpest73 0 0 gpest730 0 0 gpest731 0 0 gpest732 0 0 gpest733 0 0 gpest734 0 0 gpest735 0 0 gpest736 0 0 gpest737 0 0 gpest738 0 0 gpest74 0 0 gpest75 0 0 gpest76 0 0 gpest77 0 0 gpest78 0 0 gpest79 0 0 gpest8 0 0 gpest80 0 0 gpest81 0 0 gpest82 0 0 gpest83 0 0 gpest84 0 0 gpest85 0 0 gpest86 0 0 gpest87 0 0 gpest88 0 0 gpest89 0 0 gpest9 0 0 gpest90 0 0 gpest91 0 0 gpest92 0 0 gpest93 0 0 gpest94 0 0 gpest95 0 0 gpest96 0 0 gpest97 0 0 gpest98 0 0 gpest99 0 0 gpgss1 0 0 gpgss10 0 0 gpgss100 0 0 gpgss101 0 0 gpgss102 0 0 gpgss103 0 0 gpgss104 0 0 gpgss105 0 0 gpgss106 0 0 gpgss107 0 0 gpgss108 0 0 gpgss109 0 0 gpgss11 0 0 gpgss110 0 0 gpgss111 0 0 gpgss112 0 0 gpgss113 0 0 gpgss114 0 0 gpgss115 0 0 gpgss116 0 0 gpgss117 0 0 gpgss118 0 0 gpgss119 0 0 gpgss12 0 0 gpgss120 0 0 gpgss121 0 0 gpgss122 0 0 gpgss123 0 0 gpgss124 0 0 gpgss125 0 0 gpgss126 0 0 gpgss127 0 0 gpgss128 0 0 gpgss129 0 0 gpgss13 0 0 gpgss130 0 0 gpgss131 0 0 gpgss132 0 0 gpgss133 0 0 gpgss134 0 0 gpgss135 0 0 gpgss136 0 0 gpgss137 0 0 gpgss138 0 0 gpgss139 0 0 gpgss14 0 0 gpgss140 0 0 gpgss141 0 0 gpgss142 0 0 gpgss143 0 0 gpgss144 0 0 gpgss145 0 0 gpgss146 0 0 gpgss147 0 0 gpgss148 0 0 gpgss149 0 0 gpgss15 0 0 gpgss150 0 0 gpgss151 0 0 gpgss152 0 0 gpgss153 0 0 gpgss154 0 0 gpgss155 0 0 gpgss156 0 0 gpgss157 0 0 gpgss158 0 0 gpgss159 0 0 gpgss16 0 0 gpgss160 0 0 gpgss161 0 0 gpgss162 0 0 gpgss163 0 0 gpgss164 0 0 gpgss165 0 0 gpgss166 0 0 gpgss167 0 0 gpgss168 0 0 gpgss169 0 0 gpgss17 0 0 gpgss170 0 0 gpgss171 0 0 gpgss172 0 0 gpgss173 0 0 gpgss174 0 0 gpgss175 0 0 gpgss176 0 0 gpgss177 0 0 gpgss178 0 0 gpgss179 0 0 gpgss18 0 0 gpgss180 0 0 gpgss181 0 0 gpgss182 0 0 gpgss183 0 0 gpgss184 0 0 gpgss185 0 0 gpgss186 0 0 gpgss187 0 0 gpgss188 0 0 gpgss189 0 0 gpgss19 0 0 gpgss190 0 0 gpgss191 0 0 gpgss192 0 0 gpgss193 0 0 gpgss194 0 0 gpgss195 0 0 gpgss196 0 0 gpgss197 0 0 gpgss198 0 0 gpgss199 0 0 gpgss2 0 0 gpgss20 0 0 gpgss200 0 0 gpgss201 0 0 gpgss202 0 0 gpgss203 0 0 gpgss204 0 0 gpgss205 0 0 gpgss206 0 0 gpgss207 0 0 gpgss208 0 0 gpgss209 0 0 gpgss21 0 0 gpgss210 0 0 gpgss211 0 0 gpgss212 0 0 gpgss213 0 0 gpgss214 0 0 gpgss215 0 0 gpgss216 0 0 gpgss217 0 0 gpgss218 0 0 gpgss219 0 0 gpgss22 0 0 gpgss220 0 0 gpgss221 0 0 gpgss222 0 0 gpgss223 0 0 gpgss224 0 0 gpgss225 0 0 gpgss226 0 0 gpgss227 0 0 gpgss228 0 0 gpgss229 0 0 gpgss23 0 0 gpgss230 0 0 gpgss231 0 0 gpgss232 0 0 gpgss233 0 0 gpgss234 0 0 gpgss235 0 0 gpgss236 0 0 gpgss237 0 0 gpgss238 0 0 gpgss239 0 0 gpgss24 0 0 gpgss240 0 0 gpgss241 0 0 gpgss242 0 0 gpgss243 0 0 gpgss244 0 0 gpgss245 0 0 gpgss246 0 0 gpgss247 0 0 gpgss248 0 0 gpgss249 0 0 gpgss25 0 0 gpgss250 0 0 gpgss251 0 0 gpgss252 61 12916 gpgss253 0 0 gpgss254 0 0 gpgss255 0 0 gpgss256 0 0 gpgss257 0 0 gpgss258 0 0 gpgss259 0 0 gpgss26 0 0 gpgss260 0 0 gpgss261 0 0 gpgss262 0 0 gpgss263 0 0 gpgss264 0 0 gpgss265 0 0 gpgss266 0 0 gpgss267 0 0 gpgss268 0 0 gpgss269 0 0 gpgss27 0 0 gpgss270 0 0 gpgss271 0 0 gpgss272 0 0 gpgss273 0 0 gpgss274 0 0 gpgss275 0 0 gpgss276 0 0 gpgss277 0 0 gpgss278 0 0 gpgss279 0 0 gpgss28 0 0 gpgss280 0 0 gpgss281 0 0 gpgss282 0 0 gpgss283 0 0 gpgss284 0 0 gpgss285 0 0 gpgss286 0 0 gpgss287 0 0 gpgss288 0 0 gpgss289 0 0 gpgss29 0 0 gpgss290 0 0 gpgss3 0 0 gpgss30 0 0 gpgss31 0 0 gpgss32 0 0 gpgss33 0 0 gpgss34 0 0 gpgss35 0 0 gpgss36 0 0 gpgss37 0 0 gpgss38 0 0 gpgss39 0 0 gpgss4 0 0 gpgss40 0 0 gpgss41 0 0 gpgss42 0 0 gpgss43 0 0 gpgss44 0 0 gpgss45 0 0 gpgss46 0 0 gpgss47 0 0 gpgss48 0 0 gpgss49 0 0 gpgss5 0 0 gpgss50 0 0 gpgss51 0 0 gpgss52 0 0 gpgss53 0 0 gpgss54 0 0 gpgss55 0 0 gpgss56 0 0 gpgss57 0 0 gpgss58 0 0 gpgss59 0 0 gpgss6 0 0 gpgss60 0 0 gpgss61 0 0 gpgss62 0 0 gpgss63 0 0 gpgss64 0 0 gpgss65 0 0 gpgss66 0 0 gpgss67 0 0 gpgss68 0 0 gpgss69 0 0 gpgss7 0 0 gpgss70 0 0 gpgss71 0 0 gpgss72 0 0 gpgss73 0 0 gpgss74 0 0 gpgss75 0 0 gpgss76 0 0 gpgss77 0 0 gpgss78 0 0 gpgss79 0 0 gpgss8 0 0 gpgss80 0 0 gpgss81 0 0 gpgss82 0 0 gpgss83 0 0 gpgss84 0 0 gpgss85 0 0 gpgss86 0 0 gpgss87 0 0 gpgss88 0 0 gpgss89 0 0 gpgss9 0 0 gpgss90 0 0 gpgss91 0 0 gpgss92 0 0 gpgss93 0 0 gpgss94 0 0 gpgss95 0 0 gpgss96 0 0 gpgss97 0 0 gpgss98 0 0 gpgss99 0 0 gphtc1 9622 2376172 gphtc10 713 352883 gphtc11 105 56038 gphtc12 8074 2905579 gphtc13 6682 1041106 gphtc2 6107 2409279 gphtc3 5881 2325978 gphtc4 5734 2190624 gphtc5 7441 3334462 gphtc6 9229 3742950 gphtc7 4274 1747938 gphtc8 12147 3620108 gphtc9 6963 1375269 gphtg1 1 712 gphtg10 0 0 gphtg100 0 0 gphtg101 91 50947 gphtg102 228 125633 gphtg103 1345 1035424 gphtg104 2562 689104 gphtg105 11 4597 gphtg106 19266 10107356 gphtg107 16 7139 gphtg108 0 0 gphtg109 4268 1240469 gphtg11 0 0 gphtg110 0 0 gphtg111 0 0 gphtg112 0 0 gphtg113 0 0 gphtg114 0 0 gphtg115 0 0 gphtg116 0 0 gphtg117 86 21890 gphtg12 0 0 gphtg13 0 0 gphtg14 0 0 gphtg15 7 3848 gphtg16 0 0 gphtg17 0 0 gphtg18 0 0 gphtg19 26 14030 gphtg2 0 0 gphtg20 0 0 gphtg21 0 0 gphtg22 0 0 gphtg23 0 0 gphtg24 0 0 gphtg25 0 0 gphtg26 10 3482 gphtg27 14 5360 gphtg28 19 15452 gphtg29 0 0 gphtg3 0 0 gphtg30 0 0 gphtg31 0 0 gphtg32 23 11441 gphtg33 0 0 gphtg34 0 0 gphtg35 0 0 gphtg36 0 0 gphtg37 0 0 gphtg38 0 0 gphtg39 0 0 gphtg4 0 0 gphtg40 0 0 gphtg41 0 0 gphtg42 0 0 gphtg43 0 0 gphtg44 0 0 gphtg45 0 0 gphtg46 33 14631 gphtg47 1 548 gphtg48 0 0 gphtg49 0 0 gphtg5 0 0 gphtg50 40 14183 gphtg51 0 0 gphtg52 0 0 gphtg53 0 0 gphtg54 1 243 gphtg55 0 0 gphtg56 0 0 gphtg57 0 0 gphtg58 0 0 gphtg59 0 0 gphtg6 0 0 gphtg60 0 0 gphtg61 0 0 gphtg62 0 0 gphtg63 0 0 gphtg64 0 0 gphtg65 0 0 gphtg66 0 0 gphtg67 0 0 gphtg68 0 0 gphtg69 0 0 gphtg7 0 0 gphtg70 316 99716 gphtg71 0 0 gphtg72 0 0 gphtg73 59 22011 gphtg74 0 0 gphtg75 0 0 gphtg76 71 22015 gphtg77 0 0 gphtg78 0 0 gphtg79 0 0 gphtg8 0 0 gphtg80 0 0 gphtg81 0 0 gphtg82 0 0 gphtg83 0 0 gphtg84 0 0 gphtg85 0 0 gphtg86 0 0 gphtg87 0 0 gphtg88 0 0 gphtg89 0 0 gphtg9 0 0 gphtg90 0 0 gphtg91 0 0 gphtg92 0 0 gphtg93 0 0 gphtg94 0 0 gphtg95 0 0 gphtg96 0 0 gphtg97 0 0 gphtg98 0 0 gphtg99 0 0 gpinv1 21486 6732146 gpinv10 47613 10738302 gpinv11 49763 11146365 gpinv12 31561 12917097 gpinv2 3377 1578896 gpinv3 21893 12509170 gpinv4 41294 14215972 gpinv5 36458 9682072 gpinv6 42410 17453880 gpinv7 47680 12457548 gpinv8 45781 12565763 gpinv9 46887 11457933 gpmam1 5715 1486868 gpmam2 40707 9589683 gpmam3 22086 6676002 gpmam4 17645 4498575 gppat1 5126 1505189 gppat10 3802 1496173 gppat11 11555 2102327 gppat12 0 0 gppat13 0 0 gppat14 0 0 gppat15 0 0 gppat16 0 0 gppat17 0 0 gppat18 0 0 gppat19 1572 581256 gppat2 0 0 gppat20 3250 1217682 gppat21 3974 1283047 gppat22 5909 1976221 gppat23 3782 1503895 gppat24 192 63977 gppat25 0 0 gppat26 0 0 gppat27 0 0 gppat28 0 0 gppat29 0 0 gppat3 0 0 gppat30 0 0 gppat31 0 0 gppat32 0 0 gppat33 0 0 gppat34 0 0 gppat35 0 0 gppat36 0 0 gppat37 0 0 gppat38 0 0 gppat39 0 0 gppat4 0 0 gppat40 0 0 gppat41 0 0 gppat42 93 31001 gppat5 0 0 gppat6 5371 1766684 gppat7 4608 1631874 gppat8 3278 1321570 gppat9 4019 1475963 gpphg 40013 8321524 gppln1 34707 12166884 gppln10 25125 10236718 gppln11 7118 3432255 gppln12 6083 2897058 gppln13 6129 2940865 gppln14 16052 5400176 gppln15 24831 7067179 gppln16 27046 9899867 gppln17 13090 5199250 gppln18 10117 4024972 gppln19 25688 8405348 gppln2 21646 9468890 gppln20 31667 8953115 gppln21 31556 8158270 gppln22 43475 17001928 gppln23 19415 9266292 gppln24 33010 11377832 gppln25 33922 8489778 gppln26 28708 6788348 gppln27 34138 7617719 gppln28 32692 7717186 gppln29 36668 13976422 gppln3 8624 3365727 gppln4 1514 413909 gppln5 29700 12086307 gppln6 29841 8269407 gppln7 25297 7777860 gppln8 1945 601197 gppln9 22066 9338586 gppri1 21771 7853443 gppri10 165 52278 gppri11 155 45739 gppri12 129 49780 gppri13 88 27447 gppri14 34 9798 gppri15 8 1830 gppri16 0 0 gppri17 0 0 gppri18 0 0 gppri19 7236 2190616 gppri2 688 268986 gppri20 27109 7948318 gppri21 16714 4596635 gppri22 11046 4815697 gppri23 3234 1659863 gppri24 2780 1547111 gppri25 2596 1480340 gppri26 2823 1612953 gppri27 5241 2126890 gppri28 8473 1311261 gppri29 26101 7575392 gppri3 312 121019 gppri30 33986 8846510 gppri31 24354 10222690 gppri32 8363 3612540 gppri33 26579 6525213 gppri34 46506 11893259 gppri35 24419 7007479 gppri36 493 237417 gppri4 189 76586 gppri5 319 127773 gppri6 82 32799 gppri7 33 10376 gppri8 261 91546 gppri9 105 33698 gprod1 9022 2984736 gprod10 0 0 gprod11 0 0 gprod12 0 0 gprod13 0 0 gprod14 0 0 gprod15 0 0 gprod16 0 0 gprod17 7498 2545675 gprod18 20296 8086936 gprod19 3748 1921562 gprod2 0 0 gprod20 3029 1588514 gprod21 11840 4334240 gprod22 30691 11941376 gprod23 17402 7454496 gprod24 2116 946364 gprod25 9376 2502123 gprod26 26614 7490601 gprod3 4 2148 gprod4 1 38 gprod5 1 120 gprod6 0 0 gprod7 0 0 gprod8 0 0 gprod9 0 0 gpsts1 1 66 gpsts10 0 0 gpsts11 0 0 gpsts12 0 0 gpsts13 4 432 gpsts14 4 314 gpsts2 0 0 gpsts3 0 0 gpsts4 0 0 gpsts5 0 0 gpsts6 0 0 gpsts7 0 0 gpsts8 0 0 gpsts9 0 0 gpsyn1 31277 12065109 gpsyn2 12998 4773091 gpuna 44 5771 gpvrl1 84073 19797167 gpvrl2 81736 19683696 gpvrl3 79230 19173552 gpvrl4 77474 20032954 gpvrl5 69119 24279840 gpvrl6 82851 22436354 gpvrl7 76020 20268807 gpvrl8 72159 23291276 gpvrl9 47242 11781408 gpvrt1 13931 4138011 gpvrt10 960 428298 gpvrt11 2585 966291 gpvrt12 2041 899388 gpvrt13 11027 2676963 gpvrt14 46281 10925394 gpvrt15 49955 11490060 gpvrt2 28801 6715510 gpvrt3 16905 4872384 gpvrt4 39564 9620299 gpvrt5 40910 12198431 gpvrt6 24264 9981590 gpvrt7 1448 680010 gpvrt8 1555 773655 gpvrt9 1877 944744 3. FILE FORMAT 3.1 File Header Information All of the files in the distribution begin with the same header, except for the first line, which contains the division name, the fifth line, which contains a description of the file contents, and the seventh line which contains the number of loci and residues in the file. The first line of the file contains the division name in character positions 1 to 9 and the full data bank name (Genetic Sequence Data Bank) starting in column 16. The brief names of the files in this release are listed in section 2.1. The second line contains the date of the current release in the form month-day-year, "MM-DD-YYYY". The fourth line contains the current GenPept release number. The release number consists of two numbers separated by a decimal point. The number to the left of the decimal is the major release number. The digit to the right of the decimal indicates the version of the major release; it is zero for the first version. The fifth line contains a title for the file. The seventh line lists the number of entries (loci) and the number of residues in this release of GenPept. 1-------10--------20--------30--------40--------50--------60--------70------78 gpbct1 Genetic Sequence Data Bank 06-16-2008 GenPept Release 166.0 Translated Protein-coding Sequences 73309 loci containing 22551097 residues 1-------10--------20--------30--------40--------50--------60--------70------78 Example 1. Sample File Header 3.2 Sequence Entry Files GenPept entries are derived from entries in the GenBank nucleotide sequence data bank. They contain minimal annotation, primarily extracted from the corresponding GenBank entries. For the complete annotations, refer to the GenBank entry or entries referenced by the accession number(s) in the GenPept entry. 3.2.1 Entry Organization Each record (one record = one line) consists of two parts. The first part is found in positions 1 to 10 and may contain: 1. A keyword, beginning in column 1 of the record (e.g., DEFINITION is a keyword). 2. Blank characters, indicating that this record is a continuation of the information under the keyword above it. 3. A number, ending in column 9 of the record. This number occurs in the portion of the entry containing the actual amino acid sequence and designates the numbering of sequence positions. 4. Two slashes (//) in positions 1 and 2, marking the end of an entry. The second part of each sequence entry record (line) contains the information appropriate to its keyword, in positions 13 to 80 for keywords and positions 11 to 80 for the sequence. The following is a brief description of each entry field. LOCUS The entry name. It consists the accession number of the GenBank nucleotide sequence entry (or entries) from which this product was translated, followed by an underscore character ( _ ) and a number indicating which coding region (CDS) in the feature table of the original GenBank entry was used for this translation. The number is determined by assigning a number to each CDS according to its order of appearance in the original GenBank entry's feature table. DEFINITION This field contains the feature as it appeared in the original GenBank entry that was translated to produce the sequence in this GenPept entry. If the GenBank CDS had a "/note" qualifier, the text of this qualifier is placed on a continuation line as part of the GenPept DEFINITION record. DATE Entry date for the GenBank locus used to create this record. ACCESSION Primary accession numbers of all the GenBank entries from which this GenPept entry was created. VERSION A compound identifier consisting of the GenPept Locus and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the peptide sequence. Mandatory keyword/exactly one record. KEYWORDS Short phrases describing gene products and other information, taken directly from the corresponding GenBank entry. Mandatory keyword in all annotated entries/one or more records. SOURCE Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword. ORGANISM Source organism of the nucleic acid sequence COMMENT This field identifies the coding regions translated to make this protein. It reproduces the relevant lines from the Feature tables of the GenBank data bank entries. WEIGHT Protein molecular weight calculated from the sequence. PI Isoelectric point. Mandatory keyword/exactly one record. LENGTH Protein length in amino acid residues. ORIGIN Indication of codon phase used in translation The sequence immediately follows the ORIGIN line. It uses the IUPAC-IUB one-letter amino acid codes (see Appendix A). The first 9 columns in each line are reserved for a right-justified integer representing the residue number of the first amino acid on the line. Column 10 is blank and the sequence begins in column 11. The sequence is presented with up to 60 residues per line, in groups of 10 residues separated by spaces. Note that "?"s in GenBank entries' /translation qualifier sequences are converted to "X"s in GenPept. Residues are in uppercase. // A double slash marks the end of each entry. The next entry begins on the following line. 3.2.2 Sample Sequence Data File An example of a complete sequence entry follows. 1-------10--------20--------30--------40--------50--------60--------70------78 LOCUS Z31371_1 A7120FTSZ 428 aa PEP linear BCT 18-APR-2005 DEFINITION Anabaena 7120 ftsZ and gsh-III genes. DATE 18-APR-2005 ACCESSION Z31371 VERSION Z31371_1.1 GI:1100794 KEYWORDS FtsZ protein. SOURCE Nostoc sp. PCC 7120 (Anabaena sp. PCC 7120) ORGANISM Nostoc sp. PCC 7120 Bacteria; Cyanobacteria; Nostocales; Nostocaceae; Nostoc. COMMENT CDS 385..1671 /transl_table=11 /product="FtsZ" /protein_id="CAA83241.1" /db_xref="GI:1100794" /db_xref="GOA:P45482" /db_xref="InterPro:IPR000158" /db_xref="InterPro:IPR003008" /db_xref="InterPro:IPR008280" /db_xref="UniProtKB/Swiss-Prot:P45482" /note="environmental isolate" /NucGI="1100793" WEIGHT 44730.04 PI 5.12 LENGTH 428 ORIGIN Translated using phase 1 1 MTLDNNQELT YRNSQSLGQP GFSLAVNSSN PFNHSGLNFG QNNDSKKISV ENNRIGEIVP 61 GRVANIKVIG VGGGGGNAVN RMIESDVSGV EFWSINTDAQ ALTLAGAPSR LQIGQKLTRG 121 LGAGGNPAIG QKAAEESRDE IATALEGADL VFITAGMGGG TGTGAAPIVA EVAKEMGALT 181 VGVVTRPFVF EGRRRTSQAE QGIEGLKSRV DTLIIIPNNK LLEVIPEQTP VQEAFRYADD 241 VLRQGVQGIS DIITIPGLVN VDFADVRAVM ADAGSALMGI GVSSGKSRAR EAAIAAISSP 301 LLECSIEGAR GVVFNITGGS DLTLHEVNAA AETIYEVVDP NANIIFGAVI DDRLQGEVRI 361 TVIATGFTGE IQAAPQQNAA NARVVSAPPK RTPTQTPLTN SPAPTPEPKE KSGLDIPDFL 421 QRRRPPKN // 1-------10--------20--------30--------40--------50--------60--------70------78 Example 7. Sample Sequence Data File 4 Trademarks, citations, etc. 4.1 Registered Trademark Notices GenBank (R) is a registered trademark of the U.S. Department of Health and Human Services for the Genetic Sequence Data Bank. GenPept (R) is a registered trademark of the U.S. Department of Health and Human Services for the GenBank Gene Products Data Bank. 4.2 Citing GenPept If you have used GenPept in your research, please include a reference to the database in all publications related to that research. For instance: 1. GenPept (GenBank Gene Products) Database. Distributed on the Internet via anonymous FTP from ftp.ncifcrf.gov, under the auspices of the National Cancer Institute's Advanced Biomedical Computing Center. When citing data in GenPept, it is appropriate to give the sequence name, release number, and the publication in which the parent GenBank sequence first appeared. It is also appropriate to list a reference for GenBank itself, since GenPept is derived from the GenBank data. The following publication, which describes the GenBank data bank, should be cited: Burks, C., Cassidy, M., Cinkosky, M.J., Cumella, K.E., Gilna, P., Hayden, J.E-D., Keen, G.M., Kelley, T.A., Kelly, M., Kristofferson, D., and Ryals, J. GenBank. Nucl. Acids Res. 19 (Suppl):2221-2225(1991) 4.3 GenPept Distribution Format The GenPept data bank is available by anonymous FTP from ftp.ncifcrf.gov. 4.4 Disclaimer Science Applications International Corp. and the United States Government make no representations or warranties regarding the content or accuracy of this information. Science Applications International Corp, and the United States Government also make no representations or warranties of merchantability or fitness for a particular purpose and accept no responsibility for any consequences of the receipt or use of the information. APPENDIX A - IUPAC-IUB AMINO ACID CODES Code Amino Acid A Alanine (ala) R Arginine (arg) N Asparagine (asn) D Aspartic acid (asp) C Cysteine (cys) Q Glutamine (gln) E Glutamic acid (glu) G Glycine (gly) H Histidine (his) I Isoleucine (ile) L Leucine (leu) K Lysine (lys) M Methionine (met) F Phenylalanine (phe) P Proline (pro) S Serine (ser) T Threonine (thr) U Selenocysteine W Tryptophan (trp) Y Tyrosine (tyr) V Valine (val) B Aspartic acid or Asparagine (asx) Z Glutamic acid or Glutamine (glx) X Any amino acid (xxx)