/*---------------------------------------------------------------------------- Program reticulate.c (Compatibility matrix) Author Ingrid Jakobsen X11 code Hugh Fisher Copyright 1995 by Ingrid Jakobsen, Australian National University Permission is granted to use, copy, modify and distribute this program provided no fee is charged, and this copyright notice is not removed. No representations are made about the suitability of this software for any purpose. Notes on modifying the program code *********************************** The #define statements at the beginning of the program may need to be modified for your requirements before it is compiled. Apart from PostScript modifications, they assume familiarity with the program, at least as far as judging the need for modification. PostScript If you need to adjust the program's PostScript output for your local papersize, you need to change the following three #define. Here are the point values for some common papersizes: A4 US letter Folio Legal A3 A5 B4 B5 PAPERWIDTH 593 611 593 611 841 419 728 515 PAPERLENGTH 841 791 934 1007 1190 593 1031 728 Note that these may not be the values you've seen elsewhere (they are generally 1 point smaller). PAPEREDGE This specifies how many points around the edges of the piece of paper will be left blank as a minimum. Most laserprinters in my experience are not capable of printing much closer than 15 point from the edge, but PAPEREDGE can be increased if larger margins are desirable (for publication for example). NAMELEN The input file format allows each name line to be as long as you like (okay, 160 characters) however only NAMELEN characters are actually stored by the computer as the sequence name. MATXLIM Once the compatibility matrix has been generated, it is possible to create randomisations of the matrix. MATXLIM is number of matrices the program can store. It can generate as many as desired, they are however destroyed once the statistics have been squeezed out of them. REGIONS The user can define regions of the sequence to compare compatibility as observed to random matrices. This specifies the maximum number of regions the sequence can be divided into. *********************************** Summary : A compatibility matrix is a square matrix containing entries for all pairwise comparisons of parsimoniously meaningful sites in a DNA (or protein) sequence. The entry is white if the sites are compatible - i.e. they can be fitted on the same phylogenetic tree with a minimum of changes. The entry is black if they are not compatible: no tree can be constructed from the data at these two sites, without assuming multiple changes. The intention is that it makes it easy to detect visually which sites contain homoplasy (the same mutation has occurred in several lineages) and more importantly, I hope that recombination and gene conversion (segmental transfer, whatever) will become more easy to detect. A "parsimoniously informative site" must have at least two different states, each occurring in at least two sequences. For example, a point mutation in one sequence can support any tree and is not interesting for this analysis. The matrix is displayed on screen using the X11 graphics interface. Output is written as a postscript file, with the squares coloured white for compatibility marked with the letter "i", and black as "m". The postcript file itself can be read reasonably comfortably particularly for small matrices. Statistics for the matrix are calculated by comparsion to random shufflings of the sites making up the matrix. The Neighbour Similarity Score is the number of adjacent squares of the same colour (compatible or not compatible). Otherwise, the user divides the sequence into regions and the compatibility of the regions and between the regions are calculated, also the compatibility of the regions when the matrix is randomised. ----------------------------------------------------------------------------*/ #include #include #include #include #include /* only required for calculations of standard deviations, currently omitted #include */ /* The following three lines are for X11 graphic display */ #include #include #include "XPlot.h" #define NAMELEN 15 /* allow 15 character names */ #define MATXLIM 40 /* allow 40 matrices to be built */ #define REGIONS 10 /* allow 10 regions of sequence forming squares around the diagonal. used in subroutine Shuffle() only */ /* #define for postscript output - paper size. */ #define PAPERWIDTH 593 /* 593 = The width of A4 paper in points */ #define PAPERLENGTH 791 /* 791 = The length of US letter in points */ #define PAPEREDGE 15 /* 15 = Minimum edges not to print on */ /* Function Declartions */ /* Reading in the data */ FILE *GetNumandLen (int *Num, int *Len, char *Marker, int *gdeformat); int ReadFormat (FILE *input, int NumSeq, int SeqLen, char Marker, int gdeformat, char **names, char **Data); /* finding the parsimonious sites and constructing the matrix */ int FindSites(int Num, int Len, char **Data, int *codons); int MakeMatrix(int Num, char **Data, int codoncount, int *codons, char **matrix); int MaxButOne(int i, int *array); int Consistency(int max, char *pairs); /* Functions related to randomisation and Neighbour Similarity Score */ int Shuffle(int codonnum, int *codons, char **matrix, char **output); void Whiteness(char **data, int bits, int *starts, float *result); void OffWhite(char **data, int bits, int *starts, float (*result)[REGIONS]); double CombRand(int seed); float Neighbour(char **matrix, int size); /* Output */ int SiteOutput(int codoncount, int Num, int *codons, char **Data, char **names); int WritePostscript (int NumSeq, int SeqLen, int codonnum, int *codons, char **matrix); int PostscriptBlurb(FILE *Out, int font, int codonnum, int psstate); void DrawPSBox(FILE *Out, int lhs, int font, int max); /* Memory allocation */ void *MyMalloc (size_t size); char **MxMalloc (int size); float **FloatMalloc (int repeats, int boxes); char datasetname[101]; int main (void) { int NumSeq = 0, SeqLen = 0; int RetCode; char **seq_names; char **store_seq; char Marker = 'A'; int gdeform = 0; int *codon_pos; int codonnum; char **site_matrix[MATXLIM]; char Description[80] = "Original matrix prior to any randomisation"; int matx = 1; int into = 0; char option = 'd'; char optst[16]; int i, j; FILE *Data; /* Find out how many sequences and how long they are */ Data = GetNumandLen (&NumSeq, &SeqLen, &Marker, &gdeform); if (Data == 0) return(1); /* Try to set up matrices of the appropiate size. */ seq_names = (char **)MyMalloc( sizeof(char *) * NumSeq ); store_seq = (char **)MyMalloc( sizeof(char *) * NumSeq); for (i=0; i < NumSeq; i++) { seq_names[i] = (char *)MyMalloc(sizeof(char) * (NAMELEN+1) ); store_seq[i] = (char *)MyMalloc(sizeof(char) * SeqLen); } codon_pos = (int *)MyMalloc( sizeof(int) * (SeqLen+1) ); for (i=0; i <= SeqLen; i++) codon_pos[i] = 0; /* Read in the data for analysis */ RetCode = ReadFormat (Data, NumSeq, SeqLen, Marker, gdeform, seq_names, store_seq); if (RetCode) /* something went wrong reading the data */ return(RetCode); /* Get a description of the dataset to add to output */ printf("Please enter a short description of the dataset, 50 chars max. "); printf("Don't use ()\nFor example: Human mitochondrial D-loop sequences"); printf("\nDescription:> "); fgets(datasetname, 100, stdin); /* Find interesting sites, then allocate matrix of correct size. */ codonnum = FindSites(NumSeq, SeqLen, store_seq, codon_pos); if (codonnum < 2) { printf("%d site found. No further analysis attempted\n", codonnum); return(0); } site_matrix[into] = (char **)MyMalloc( sizeof(char *) * (codonnum) ); for (i=0; i < codonnum; i++) site_matrix[into][i] = (char *)MyMalloc(sizeof(char) * (codonnum) ); /* Build the matrix */ RetCode = MakeMatrix(NumSeq, store_seq, codonnum, codon_pos, site_matrix[into]); /* Ask the user if they want the interesting sites saved, and if so, do so */ RetCode = SiteOutput(codonnum, NumSeq, codon_pos, store_seq, seq_names); if (RetCode) /* something went wrong writing the data */ return(RetCode); /* We don't need the actual sequences anymore. free the space so we have more room to build matrices :-) */ for ( i = 0 ; i < NumSeq; i++) free(store_seq[i]); free( store_seq ); /* interactive section. You can display the matrix, save the matrix as postscript, and play around with randomisation of the matrix */ while (option != 'q') { if (option == 'd') /* Display the matrix on screen */ { printf("\n\nThe matrix will be displayed in a separate window."); printf(" When you have examined\nit, click the mouse once in the "); printf("window to continue the program.\n"); RetCode = ShowXPlot(NumSeq, SeqLen, codonnum, codon_pos, site_matrix[into]); if (RetCode) printf("Unable to display matrix. \n"); } else if (option == 'p') /* postscript output */ { RetCode = WritePostscript (NumSeq, SeqLen, codonnum, codon_pos, site_matrix[into]); } else if (option == 'c') /*list the matrices built so far */ { printf("\nRandom matrices built so far: %d\n\n", matx-1); printf("%c%d %s\n", (into == 0)?'*':' ',0, Description); for (i = 1; i < matx; i++) printf("%c%d Random matrix\n", (into == i)?'*':' ', i); printf("Enter the number of the matrix you wish to use > "); fgets(optst, 15, stdin); into = atoi(optst); printf("\nMatrix now set to number %d\n\n", into); } else if (option == 'r' && (matx < MATXLIM)) /* shuffling randomisation and neighbour sim */ { site_matrix[matx] = MxMalloc(codonnum); if (site_matrix[matx]) { RetCode = Shuffle(codonnum, codon_pos, site_matrix[into], site_matrix[matx]); if (RetCode == 0) { matx++; if (matx == MATXLIM) printf("That was the last matrix that can be built\n"); } } else printf("Unable to allocate memory for randomised matrix\n"); } else printf("\nThat is not an available option, try again\n"); printf("\nYou may now chose to:\n"); printf(" (d) Display the current matrix on screen (X display)\n"); printf(" (p) Save the current matrix as a postscript file\n"); if (matx < MATXLIM) { printf(" (r) Randomise order of sites and calculate neighbour "); printf("similarity scores\n Also calculate compatibility of"); printf(" defined regions\n"); } if (matx > 1) printf(" (c) Change current matrix\n"); printf(" (q) Quit the program\n\n"); printf("Enter your choice > "); fgets(optst, 5, stdin); option = tolower(optst[0]); } /* The end. Thank you, thank you. */ for (i = 0 ; i < matx ; i++) { for (j = 0; j < codonnum; j++) free( site_matrix[i][j]); free(site_matrix[i]); } free( codon_pos ); for (i=0; i < NumSeq; i++) free(seq_names[i]); free(seq_names); return(RetCode); } /* Function Name : GetNumandLen() Function Summary : This function will ask the user the name of the input file, and determine how many sequences will be read in, and how many bases are in each sequence. */ FILE *GetNumandLen (int *Num, int *Len, char *Marker, int *gde) { int TmpLen, MaxNum, MaxLen; char Filename[40]; int linebegin, seqline; FILE *InFile; char ch; int i; int offset = 0; printf("The sequences should be in FASTA format or variants like "); printf("gde flat file format.\nSequence names should be at most "); printf("%d characters long.\nUnknown or ambiguous characters ", NAMELEN); printf("should be marked with '?' or 'N' or 'X'\n"); /* Ask the user for the name of the data file */ printf ("Please enter the name of the input file > "); fgets(Filename, 40, stdin); for (i = 0; Filename[i] != '\n'; i++); Filename[i] = '\0'; /* Try to open the input file for reading. If the file cannot be opened, print an error message and return to main() with an error code */ if ((InFile = fopen (Filename, "r")) == NULL) { printf ("Unable to open input file %s\n", Filename); printf ("Check you are in the correct directory\n"); return (0); } MaxLen = 0; MaxNum = 0; TmpLen = 0; linebegin = 1; seqline = 0; while (!feof(InFile)) { while((ch = getc(InFile)) != '\n' && ch != EOF) { if (linebegin == 1) { if (*Marker == 'A') { if (ispunct(ch)) { *Marker = ch; MaxNum = 1; seqline = 0; /* Ask user if this is gde format and we need to look for offsets */ printf("Are any initial gaps marked with "); printf("a number in brackets (y/n) > "); fgets(Filename, 40, stdin); if (toupper(Filename[0]) == 'Y') *gde = 1; } else { printf("Marker character not found!\n"); printf("The first character in the file should "); printf("be a marker character e.g. > # ; \n"); return(0); } } else if (ispunct(ch) && ch == *Marker) { MaxNum++; TmpLen = 0; seqline = 0; } else seqline = 1; } linebegin = 0; /* if we are on a line with sequence, count in the sequence */ if (seqline) { if (isalnum(ch) || ch == '-' || ch == '?') TmpLen++; } /* if we are in gde format look for offsets on the nameline. */ else if (*gde) { if ( ch == ')' ) offset = 0; if ( offset == 1 ) TmpLen = TmpLen * 10 + (ch - '0'); if ( ch == '(' ) offset = 1; } } if (TmpLen > MaxLen) MaxLen = TmpLen; linebegin = 1; } /* Once the values have been stuck in, copy them into the appropriate variables */ *Num = MaxNum; *Len = MaxLen; printf("Analysing %d sequences %d bases long\n", MaxNum, MaxLen); /* Reset the file to the beginning for the sequence reading program. rewind() is meant to reset the pointer to the beginning of the file and clear any error messages floating around */ rewind(InFile); return(InFile); } /* Function Name : ReadFormat() Function Summary : This function wil read the data when it arrives in this format: all the sequences are in blocks in one file and the names of the sequences are delimited by a marker character such as '#' or '%' It now also handles if there is an offset eg (13) instead of ------------- as in the new gde format. */ int ReadFormat (FILE *InFile, int NumSeq, int SeqLen, char Marker, int gde, char **names, char **Data) { char ch, nameholder[161]; int offset, i, j; int unknown = 0; char Answer[5]; char unkch = '?'; /* This is the main loop... while there haven't been enough sequences read in yet, keep trying to read another one... */ for (i = 0; i < NumSeq; i++) { /* Look for the marker character. If we can't find it, print an error message and return to main() */ while ((ch = fgetc (InFile)) == ' ' || ch == '\n') ; if (feof (InFile)) { printf ("Unexpected end-of-file encountered in file\n"); printf ("It appears to be in sequence %d\n", i+1); return (1); } else if (ch != Marker) { printf ("Error in input file - Marker character %c was " "expected and not found\n", Marker); printf ("Missing for sequence %d\n", i+1); return (1); } /* read in the sequence name - it should be the next thing in the file. If it's gde there may be a number in brackets corresponding to the offset. */ fgets (nameholder, 160, InFile); for (j = 0; nameholder[j] != '\n'; j++) ; nameholder[j] = '\0'; offset = 0; if (gde) { j = 0; while ((nameholder[j] != '(' ) && j < 159) j++; if (nameholder[j] == '(' ) { offset = atoi(nameholder+j+1); nameholder[j] = '\0'; } } strncpy(names[i], nameholder, NAMELEN); names[i][NAMELEN] = '\0'; if (feof (InFile)) { printf ("End of File encountered in input file\n"); return (1); } /* Fill in the front gaps if there are any */ if (offset) printf("Gap at front of sequence %s filled with '-' \n", names[i]); for (j = 0; j < offset ; j++) Data[i][j] = '-'; /* Try to read in the sequence */ while ( !feof (InFile) && j < SeqLen) { ch = fgetc (InFile); if ( ch == '-' || isalnum(ch) || ch == '?' ) { if (unknown%2 == 0 && (toupper(ch) == 'N')) { printf("Would you like 'N's to be recorded as unknown? "); fgets(Answer, 5, stdin); if (toupper(Answer[0]) == 'Y') { ch = '?'; unkch = 'N'; } unknown += 1; } else if (unknown < 2 && (toupper(ch) == 'X')) { printf("Would you like 'X's to be recorded as unknown? "); fgets(Answer, 5, stdin); if (toupper(Answer[0]) == 'Y') { ch = '?'; unkch = 'X'; } unknown += 2; } if (toupper(ch) == unkch) Data[i][j] = '?'; else Data[i][j] = toupper(ch); j++; } /* if the character is not sequence, make sure it's not the marker of the next sequence. */ else if ( ch == Marker ) { printf ("Sequence %s shorter than expected\n", names[i]); ungetc(ch, InFile); /* fill in the rest of the sequence with '-'s */ printf("Remaining %d sites filled with '-' \n", SeqLen -j); for( ;j < SeqLen; j++) Data[i][j] = '-'; break; } } /* Check the last sequence entered is long enough */ if ( j < SeqLen) { printf ("Sequence %s shorter than expected\n", names[i]); printf("Remaining %d sites filled with gap chars \n", SeqLen -j); for( ;j < SeqLen; j++) Data[i][j] = '-'; } } /* close the input file and return to main() */ fclose (InFile); return (0); } /* Function name : FindSites Summary : Find the interesting sites for compatibility analysis. they have to have at least two distinct characters appearing at least twice each. Gaps and unknowns (`?`) are ignored. Sites with more than two character states are treated separately, depending on user requests. */ int FindSites(int Num, int Len, char **data, int *codon) { int site, i, j, k; char *state; int *count; int flag = 0; int codoncount = 0; int option; char purstate[5] = "RAG"; char pystate[5] = "YCTU"; int purcount, pycount; int newsize; char *reply, *one, *other; int onecount, othercount; /* First, do a bit of mallocing */ state = (char *)MyMalloc(sizeof(char) * Num); count = (int *)MyMalloc(sizeof(int) * Num); reply = (char *)MyMalloc(sizeof(char) * (Num + 10)); one = (char *)MyMalloc(sizeof(char) * Num); other = (char *)MyMalloc(sizeof(char) * Num); /* Determine how to treat sites with several characters */ printf("How would you like sites with more than two characters treated\n"); printf(" 1 Ignore these sites, use binary sites only\n"); printf(" 2 As transversion sites (DNA/RNA only)\n"); printf(" 3 As sites with more than two characters\n"); printf(" 4 Make a decision about each site in turn\n"); printf("Enter your choice > "); fgets(reply, 10, stdin); option = atoi(reply); if (option < 1 || option > 4) option = 4; /* Go through and discover which are the interesting codons to do the analysis on. */ for (site = 0; site < Len; site++) { i = 0; /* leaf through all the sequences, add each one's character to the tally or create a new tally for it. Gaps ('-') or unknowns ('?') are ignored */ for (j = 0; j < Num; j++) { if ((data[j][site] != '-') && (data[j][site] != '?')) { flag = 0; for (k = 0; k < i; k++) { if (state[k] == data[j][site]) { count[k]++; flag = 1; } } if (flag == 0) { state[i] = data[j][site]; count[i] = 1; i++; } } } /* If there are more than 2 states at a site allow the user to decide how to split it. multi == 3 means leave as multiple states. */ if (i > 2) { if (option == 1) /* drop like a hot potato */ i = 1; else if (option == 2) /* ct vs ag */ { pycount = 0; purcount = 0; for (k = 0; k < i; k++) { if (strchr(pystate, state[k])) { pycount += count[k]; count[k] = 0; } else if (strchr(purstate, state[k])) { purcount += count[k]; count[k] = 0; } else printf("WARNING: non-DNA/RNA character: %c\n",state[k]); } for (j = 0; j < Num; j++) { if (strchr(pystate, data[j][site])) data[j][site] = 'Y'; else if (strchr(purstate, data[j][site])) data[j][site] = 'R'; } for (k = 0; k < i; k++) { if (count[k] == 0) { if (pycount) { count[k] = pycount; state[k] = 'Y'; pycount = 0; } else if (purcount) { count[k] = purcount; state[k] = 'R'; purcount = 0; } else { for (j = i-1; j > k; j--) if (count[j]) { count[k] = count[j]; state[k] = state[j]; count[j] = 0; j = -3; } if (j == k) /* no count[j] to be found */ i = k; } } } } if ((option == 4) || (option == 2 && i > 2)) { printf("Site %d has ", site+1); for (k = 0; k < i; k++) printf("%d %c ", count[k], state[k]); printf("\nThe options are:\n"); printf("(1) Ignore this site - treat as non-polymorphic\n"); printf("(2) Convert to a binary site\n"); printf("(3) Leave data in %d groups\n", i); printf("Enter your choice > "); fgets(reply, 10, stdin); newsize = atoi(reply); if (newsize < 2) i = 1; else if (newsize == 2) { onecount = 0; othercount = 0; one[0] = '\0'; other[0] = '\0'; printf("Please enter the characters that will form "); printf("the first group > "); fgets(reply, Num+9, stdin); for (j = 0 ; reply[j] != '\0' ; j++) reply[j] = toupper(reply[j]); for (k = 0; k < i; k++) { if (strchr(reply, state[k])) { onecount += count[k]; count[k] = 0; one = strncat(one, (state +k), 1); } else { othercount += count[k]; count[k] = 0; other = strncat(other, (state + k), 1); } } printf("%d %s %d %s\n", onecount, one, othercount,other); /* reset the data according ly */ for (j = 0; j < Num; j++) { if (strchr(one, data[j][site])) data[j][site] = one[0]; else if (strchr(other, data[j][site])) data[j][site] = other[0]; else if (data[j][site] == '-' || data[j][site] == '?') /* leave gaps or unknowns alone */; else { printf("ERROR: while converting data to binary\n"); return(0); } } if (onecount) { count[0] = onecount; state[0] = one[0]; if(othercount) { count[1] = othercount; state[1] = other[0]; i = 2; } else i = 1; } else { if (othercount) { count[1] = othercount; state[1] = other[0]; i = 1; } else i = 0; /* unlikely but it can't hurt */ } } } /* end of option 4 or 2 when multi > 2 */ } /* i > 2 */ /* this is a rough check on which are the interesting sites. They get remembered by sticking them into the codon matrix */ if ((i > 1) && (MaxButOne(i, count) > 1)) { codon[codoncount] = site; codoncount++; } } /* end of the "site" for loop */ /* free up used memory */ free(state); free(count); free(reply); free(one); free(other); return(codoncount); } /* Function name : MakeMatrix Summary : this has to make the consistency matrix for the dataset read in, surprise surprise. */ int MakeMatrix(int Num, char **data, int codoncount, int *codon, char **matrix) { int site, sitej, i, j, k; int flag1, flag = 0; char *pair; /* First, do a bit of mallocing */ pair = (char *)MyMalloc(sizeof(char) * Num * 2); /* Okay, we should have a array (codon) with numbers corresponding to the interesting sites we want to do all the pairwise comparisons on */ if (codoncount < 2) return(1); for ( site = 0; site < codoncount; site++) { for (sitej = 0; sitej < site; sitej++) { i = 0; /* Find out for the two sites, which are the distinct pairs */ for (j = 0; j < Num; j++) { flag = 0; for (k = 0; k < i; k++) { if ((pair[2*k] == data[j][codon[site]]) && (pair[2*k+1] == data[j][codon[sitej]])) flag = 1; } if (flag == 0) { pair[2*i] = data[j][codon[site]]; pair[2*i +1] = data[j][codon[sitej]]; i++; } } /* Check to see if the pairs are consistent or not, enter into matrix */ flag1 = Consistency(i, pair); if (flag1) { matrix[site][sitej] = 'i'; matrix[sitej][site] = 'i'; } else { matrix[site][sitej] = 'm'; matrix[sitej][site] = 'm'; } } } free( pair); /* Fill in the diagonal neatly */ for (i = 0; i < codoncount; i++) matrix[i][i] = 'x'; return(0); } /* Function name : MaxButOne Summary : finds the next-largest number in an array of numbers. Used to find parsimonious sites, i.e. at least two characters have to be present in at least two sequences. */ int MaxButOne(int i, int *count) { int j; int max = 1; int next = 1; for (j = 0; j < i; j++) { if (count[j] > next) { if (count[j] > max) { next = max; max = count[j]; } else next = count[j]; } } return(next); } /* Function name : Consistency Summary : Given all the possible pairs at two sites, decide if they form a compatible tree. Return 1 if yes, 0 if not. The pairs with '-' (gaps) or '?' (unknown) are thrown out first. */ int Consistency(int max, char *pairs) { int i, j; int *live; int remain = max; int change = 1; int flag; live = (int *)MyMalloc(sizeof(int) * max); /* Set up the initial conditions. We can kill pairs with gaps or unknowns immediately - they are uninformative */ for (i = 0; i < max; i++) { if (pairs[2*i] == '-' || pairs[2*i] == '?' || pairs[2*i+1] == '-' || pairs[2*i+1] == '?') { live[i] = 0; remain--; } else live[i] = 1; } /* In the occasional instance that that killed all the pairs off, we need to return a compatible result, rather than entering the loop below */ if (remain == 0) { free (live); return(1); } /* Test the remaining living pairs, kill them if one of the parts of the pair is unique. Keep going until there are two or less pairs left - this is a consistent pair OR until we go through a round where we haven't been able to kill anything (change becomes 0) - this is an inconsistent pair */ while (remain && change) { change = 0; for ( i = 0; i < max ; i++) { if (live[i]) { flag = 0; for( j = 0; j < max ; j++) if ((i != j) && live[j]) if (pairs[2*i] == pairs[2*j]) flag = 1; if (flag == 0) { live[i] = 0; remain--; change++; } else { flag = 0; for(j= 0; j < max ; j++) if ((i != j) && live[j]) if (pairs[2*i+1] == pairs[2*j+1]) flag = 1; if (flag == 0) { live[i] = 0; remain--; change++; } } } } if (remain <= 2) { free(live); return(1); } } free(live); return(0); } /* Function Name : SiteOutput() Function Summary : Gives the user the option of printing out the interesting sites in a text file with site number and nucleotide for each sequence. Also includes fasta type format at the bottom, following Lars' suggestion. */ int SiteOutput(int sites, int numseq, int *codons, char **Data, char **names) { FILE *OutFile; char Reply[10]; char Filename[120]; char *sitend; int i, j, k; int page, endpage; printf("There are %d informative sites in the %d sequences.\n", sites, numseq); printf("Do you want the informative sites saved to a file? (y/n) "); fgets(Reply, 10, stdin); if (toupper(Reply[0]) == 'N') return(0); /* Prompt the user for the name of the output file... */ printf ("\nEnter the name of the output file to be used (.sit) > "); fgets(Filename, 120, stdin); for (i = 0; Filename[i] != '\n'; i++); Filename[i] = '\0'; sitend = strstr(Filename, ".sit"); if (sitend == 0) strcat(Filename, ".sit"); /* try to open the output file. If no can do, return 1 to Main. */ if ((OutFile = fopen (Filename, "w")) == NULL) { printf ("Unable to open Output file - %s\n", Filename); return (1); } /* add a few headers and stuff */ fprintf(OutFile, "Parsimoniously informative sites listing for\n"); fprintf(OutFile, "%s\n\n", datasetname); fprintf(OutFile, "There are %d informative sites in %d sequences:\n", sites, numseq); for (i = 0; i < numseq; i++) { fprintf(OutFile, "%d %s ", i+1, names[i]); if (i%4 == 3) fprintf(OutFile, "\n"); } fprintf(OutFile, "\n\n"); /* Only put 50 sequences on each line... */ for( page = 0 ; page < numseq ; page = page + 50) { endpage = page + 50; if (endpage > numseq) endpage = numseq; fprintf(OutFile, "Sequence "); if (page > 99) { for (i = page ; i < endpage ; i++) { fprintf(OutFile, "%d", ((i+1)/100)%10 ); if ((i+1)%5 == 0 ) fprintf(OutFile, " "); } fprintf(OutFile, "\n "); } if (endpage > 9) { for (i = page ; i < endpage ; i++) { fprintf(OutFile, "%d", ((i+1)/10)%10 ); if ((i+1)%5 == 0 ) fprintf(OutFile, " "); } fprintf(OutFile, "\n "); } for (i = page ; i < endpage ; i++) { fprintf(OutFile, "%d", (i+1)%10 ); if ((i+1)%5 == 0 ) fprintf(OutFile, " "); } fprintf(OutFile, "\n\n"); /* write out each site number followed by the appropriate residues */ for ( j = 0 ; j < sites ; j++) { fprintf(OutFile, "%8d ", codons[j]+1); for ( k = page ; k < endpage ; k++) { fprintf(OutFile, "%c", Data[k][codons[j]]); if ( (k+1)%5 == 0 ) fprintf(OutFile, " "); } fprintf(OutFile, "\n"); } fprintf(OutFile, "\n\n"); } /* Write the parsimonious sites out as a fasta-type file below this */ fprintf(OutFile, "Parsimoniously informative sites in fasta format:\n\n"); for (i = 0; i < numseq; i++) { fprintf(OutFile, ">%s informative sites\n", names[i]); for (j = 0; j < sites; j++) { fprintf(OutFile, "%c", Data[i][codons[j]]); if ( (j+1) != sites && (j+1)%60 == 0) fprintf(OutFile, "\n"); } fprintf(OutFile, "\n"); } /* Tidy up, close the file, and return */ fflush(OutFile); fclose(OutFile); return(0); } /* SiteOutput() */ /* Function : Shuffle Summary : Re-arrange entries to create matrices with the same overall amount of black and white but breaking up the structure imposed by the ordering of characters by sites. Then calculates some statistics to let you figure out if what you're seeing looks random. These are neighbour similarity scores for the matrix as a whole and the whiteness of user-defined regions around the diagonal. REGIONS is the number of regions you can divide the matrix into. */ int Shuffle(int codonnum, int *codons, char **matrix, char **random) { int regions = 0; /* setting up regions and boundaries */ int bounds[REGIONS]; int newbd; char happy; float overall[1]; /* variables for storing whiteness values */ float observe[REGIONS]; int relative[REGIONS]; /* is the observed box darker or lighter? */ float offdiag[REGIONS][REGIONS]; int repeats = 0; /* number of repeats of shuffling to do */ float obsnb; /* variables for storing neighbour similarity scores */ float *shufnb; float avgnb; float **shuffle; float summary[REGIONS]; int count[REGIONS]; int allcount = 0; time_t now; int seed; int *ordering; /* shuffling of sites */ double newrand; int intran; int temp; FILE *Out; char Reply[10]; char Filename[120]; double fraction; char updown; int i, j, k; /* variables used for calculating standard deviations of shuffled results ** float floattemp; ** float stddnb; ** float stddev[REGIONS]; */ /* Ask the user how many regions they want the matrix divided up into */ while (regions < 1) { printf("You can calculate neighbour similarity scores and compatibili"); printf("ty of defined\nregions. If you only want to do neighbour sim"); printf("ilarity scores, \nenter 1 as number of regions to test \n"); printf("How many regions do you want to test (max %d) > ", REGIONS); fgets(Reply, 10, stdin); regions = atoi(Reply); if (regions > REGIONS ) { printf("That's too many, sorry. Try again\n"); regions = 0; } } if (regions > 1) printf("There is a total of %d sites to divide up between %d regions\n", codonnum, regions); /* Ask for the boundaries where each region starts and stops. */ for (i = 0; i < regions - 1; i++) { happy = 'n'; while (happy == 'n') { printf("Region number %d starts at site number %d and ends at > ", i+1, (i==0)?1:bounds[i-1]+1); fgets(Reply, 10, stdin); newbd = atoi(Reply); if ((i == 0) && (newbd > 1) && (codonnum - 2*regions + 2*i > newbd)) { printf("Region %d : %d sites from %d to %d\n", i+1, newbd, codons[0] +1, codons[newbd-1]+1); printf("Is this what you want to use (y/n) > "); fgets(Reply, 10, stdin); if (toupper(Reply[0]) == 'Y') happy = 'y'; } else if ((newbd > bounds[i-1]+1) && (codonnum - 2*regions + 2*i > newbd)) { printf("Region %d : %d sites from %d to %d\n", i+1, newbd - bounds[i-1], codons[bounds[i-1]]+1, codons[newbd-1]+1); printf("Is this what you want to use (y/n) > "); fgets(Reply, 10, stdin); if (toupper(Reply[0]) == 'Y') happy = 'y'; } else printf("That is not a suitable value, try again \n "); } bounds[i] = newbd; } /* Set up the last region (which must end at the end of the matrix) */ i = regions - 1; bounds[i] = codonnum; if (regions > 1) printf("Region %d : %d sites from %d to %d\n", i+1, bounds[i]- bounds[i-1], codons[bounds[i-1]]+1, codons[bounds[i]-1]+1); else { printf ("Entire square is one region. Shuffling will not change "); printf ("compatibility level.\nThe 'neighbour similarity score' will "); printf ("be determined over randomisation\n"); } /* Calculate the compatibility of the actual matrix and give the user the option of quitting if it looks like the values aren't very different. bounds+i is passed as a pointer is required by the Whiteness function there. */ Whiteness(matrix, 1, bounds+i, overall); Whiteness(matrix, regions, bounds, observe); OffWhite(matrix, regions, bounds, offdiag); obsnb = Neighbour(matrix, codonnum); printf("The overall compatibility of the observed matrix is %f\n", overall[0]); if (regions > 1) { printf("The individual observed regions are "); for (i = 0; i < regions; i++) { if (observe[i] > overall[0]) relative[i] = 1; else relative[i] = -1; printf(" %c %f ", (relative[i]>0)?'*':'o' , observe[i]); } printf("\n'*' indicates higher, 'o' lower than overall compatibility"); } printf("\nThe observed 'neighbour similarity score' is %f\n", obsnb); printf("Do you wish to go ahead with randomisation? (y/n) "); fgets(Reply, 10, stdin); if (toupper(Reply[0]) == 'N') return(1); ordering = (int *)MyMalloc(sizeof(int) * codonnum); while (repeats < 1) { printf("How many random matrices would you like generated > "); fgets(Reply, 10, stdin); repeats = atoi(Reply); if (repeats > 0) { shufnb = (float *)MyMalloc(sizeof(float) * repeats); if (regions > 1) { shuffle = FloatMalloc(repeats, regions); if (shuffle == 0) { printf("Unable to do that many calculations. Try again\n"); repeats = 0; free(shufnb); } } } } /* seed random number routine - allow the user to enter a seed */ printf("Please enter a seed value for the random number generator.\n"); printf("If the value is 0, system time will be used as a seed > "); fgets(Reply, 10, stdin); seed = atoi(Reply); if (seed <= 0) { time(&now); seed = now%2000 +1; } newrand = CombRand(-seed); for (j = 0; j < codonnum; j++) ordering[j] = j; /* Finally this subroutine can get on with its job! First generate random numbers and from it create a shuffled order for the sites */ for ( i = 0; i < repeats; i++) { for ( j = codonnum ; j > 1 ; j--) { newrand = CombRand(0); intran = (int)(j * newrand); temp = ordering[j-1]; ordering[j-1] = ordering[intran]; ordering[intran] = temp; } /* Now build the matrix using the ordering just generated */ for (j = 0; j < codonnum; j++) for(k = j; k < codonnum; k++) { random[ordering[j]][ordering[k]] = matrix[j][k]; random[ordering[k]][ordering[j]] = matrix[k][j]; } /* Calculate neighbour similarity score of the random matrix */ shufnb[i] = Neighbour(random, codonnum); /* Then calculate the appropriate whitenesses */ if (regions > 1) Whiteness(random, regions, bounds, shuffle[i]); } /* Time to create some printed output of all this stuff. */ printf ("\nEnter the name of a file to print the results into > "); fgets(Filename, 120, stdin); for (i = 0; Filename[i] != '\n'; i++); Filename[i] = '\0'; if ((Out = fopen (Filename, "w")) == NULL) { printf ("Unable to open Output file - %s\n", Filename); return (1); } fprintf(Out, "\nSummary statistics for non-randomness of compatibility "); fprintf(Out, "matrix for\n%s\n\n", datasetname); fprintf(Out, "The matrix had %d sites with overall compatibility %f\n", codonnum, overall[0]); fprintf(Out, "The neighbour similarity score of the matrix was %f\n", obsnb); fprintf(Out, "\nStatistics for %d random matrices:\n", repeats); avgnb = 0; temp = 0; for (i = 0; i < repeats; i++) { avgnb += shufnb[i]; if (shufnb[i] >= obsnb) temp++; } /* calculate average neighbour similarity of shufflings */ avgnb = avgnb / repeats; /* standard deviations calculations omitted : ** stddnb = 0; ** for (i = 0; i < repeats; i++) ** { ** floattemp = shufnb[i] - avgnb; ** floattemp = floattemp * floattemp; ** stddnb += floattemp; ** } ** stddnb = stddnb/(repeats -1); ** stddnb = sqrt(stddnb); */ fprintf(Out, "\nMean Neighbour similarity score for random matrices"); fprintf(Out, " %f \n", avgnb); fprintf(Out, "%d random matrices equalled or exceeded the observed\n\n", temp); fprintf(Out, "The P value is %f\n\n", (float)(temp+1)/(repeats+1)); free(shufnb); if (regions == 1) { fflush(Out); fclose(Out); return(0); } /* Calculate summary statistics for all the regions.... Initialise first */ for ( j = 0; j < regions; j++) { summary[j] = 0; count[j] = 0; /* Standard deviation calculations omitted ** stddev[j] = 0; */ } /* Find the average and the number of random regions more extreme than observed, and count how many shufflings have all regions the same or more extreme. */ for ( i = 0; i < repeats; i++) { temp = 0; for (j = 0; j < regions; j++) { summary[j] += shuffle[i][j]; if (relative[j] > 0) { if (shuffle[i][j] >= observe[j]) { count[j]++; temp++; } } else { if (shuffle[i][j] <= observe[j]) { count[j]++; temp++; } } } if (temp == regions) allcount++; } /* calculate actual average */ for ( j = 0; j < regions; j++) { summary[j] = summary[j] / repeats; } /* standard deviation calculations omitted : ** for (i = 0; i < repeats; i++) ** for (j = 0; j < regions; j++) ** { ** floattemp = shuffle[i][j] - summary[j]; ** floattemp = floattemp * floattemp; ** stddev[j] += floattemp; ** } ** ** for ( j = 0; j < regions; j++) ** { ** stddev[j] = stddev[j]/(repeats -1); ** stddev[j] = sqrt(stddev[j]); ** } */ fprintf(Out, "The individual regions analysed were : \n"); for (j = 0; j < regions ; j++) { if (j > 0) fprintf(Out, " %d: %d to %d %d sites from %d to %d\n", j+1, bounds[j-1]+1, bounds[j], bounds[j]-bounds[j-1], codons[bounds[j-1]] +1, codons[bounds[j] -1] +1); else fprintf(Out, " %d: %d to %d %d sites from %d to %d\n", j+1, 1, bounds[j], bounds[j], codons[0] +1, codons[bounds[j] -1] +1); } fprintf(Out, "\nCompatibility of each region compared to random matrices"); fprintf(Out, "\n Shuffled Fraction \n"); fprintf(Out, " Observed Average exceeding \n"); for (j= 0; j < regions; j++) { fraction = ((float)count[j])/repeats; if (relative[j] > 0) updown = '+'; else updown = '-'; fprintf(Out, "%d %f %f %.4f %c\n", j+1, observe[j], summary[j], fraction, updown); } fprintf(Out, "\n'Fraction exceeding' is the fraction of random matrices "); fprintf(Out, "with equal or\nhigher '+' or lower '-' values than observed"); fraction = ((float)allcount)/repeats; fprintf(Out, " in that region.\nOverall, %.4f", fraction); fprintf(Out, " exceeded observed for all regions at once.\n\n"); fprintf(Out, "Observed compatibility of the off-diagonal boxes:\n"); for ( j = 0 ; j < regions; j++) for ( i = j +1; i < regions; i++) fprintf(Out, "Region %d vs %d: %f for %.0f squares\n", j+1, i+1, offdiag[j][i], offdiag[i][j]); for (j = 0; j < repeats; j++) free(shuffle[j]); free(shuffle); fflush(Out); fclose(Out); return(0); } /* Function Name : Whiteness Summary : Given a matrix, a number of regions, an array of starting postions for those regions and an array for putting floats into, calculate the whiteness of the regions around the diagonal specified and put the values (white = 1, black = 0) into the floating point array. To calculate just the matrix itself, pass it, bits = 1, array[0] = matrixsize, and a float. (which should be defined as part of an array so it'll return properly....) */ void Whiteness(char **data, int bits, int *starts, float *result) { int size; int total, count; int i, j, k; for (i = 0; i < bits; i++) { if (i == 0) { size = starts[i]; j = 0; } else { size = starts[i] - starts[i-1]; j = starts[i-1]; } total = (size * (size-1)) /2; count = 0; for ( ; j < starts[i]; j++) for ( k = j+1; k < starts[i]; k++) if (data[j][k] == 'i') count++; result[i] = (double)count/ total; } return; } /* Function Name : OffWhite Summary : Given a matrix, a number of regions, an array of starting postions for those regions and an array for putting floats into, calculate the whiteness of the regions off the diagonal, defined by the region boundaries. in the output matrix (result) the result[i][j] entries (i < j) contain the whiteness value (white = 1, black = 0) while the result[j][i] entry contains the size of the region. */ void OffWhite(char **data, int bits, int *starts, float (*result)[REGIONS]) { int size, side; int total, count; int i, j, ix, jx; for (i = 0; i < bits; i++) { for (j = i+1 ; j < bits; j++) { if (i == 0) { size = starts[i]; ix = 0; } else { size = starts[i] - starts[i-1]; ix = starts[i-1]; } side = starts[j] - starts[j-1]; total = size * side; result[j][i] = total; count = 0; /* the next 'for ( ; ' is blank because starting value was set above */ for ( ; ix < starts[i]; ix++) for ( jx = starts[j-1] ; jx < starts[j]; jx++) if (data[ix][jx] == 'i') count++; result[i][j] = (double)count/ total; } } return; } /* Function Name : CombRand() Summary : Produces a random number between 0 and 1, but only has 714025 distinct values in that range, (0 is a possible value but 1 isn't) so it should only be used when the desired range of values is smaller - for example, for a number between 1 and 100. In fact it's probably not much good beyond 0 to 1000. It should be portable, AND produce decent numbers This function is called RAN2 in Numerical Recipes (Press, Flannery, Teukolsky, Vetterling) Initialise with a negative seed value, if you just want the next random number pass 0 or any postive value, it'll ignore them :-) */ double CombRand(int seed) { int m_ran = 714025; int a_ran = 1366; int c_ran = 150889; static int array[97]; static int initial = 0; static int current; static int feedin; double realm = 1.0/m_ran; double out; int i, j; /* This is where the random number function is initialised */ if ((seed < 0) || (initial == 0)) { initial = 1; seed = (c_ran - seed)%m_ran; for (i = 0; i < 97; i++) /* initialise table */ { seed = (a_ran * seed + c_ran)%m_ran; array[i] = seed; } current = (a_ran * seed + c_ran)%m_ran; feedin = current; } /* This is where the random number function actually does its job */ j = (int)(97 * (current * realm)); current = array[j]; out = current *realm; feedin = (a_ran * feedin + c_ran)%m_ran; array[j] = feedin; return (out); } /* CombRand */ /* Function :Neighbour() Summary :Given a square matrix, count the number of times adjacent squares are the same colour, counting diagonals as white */ float Neighbour(char **matrix, int size) { int count = 0; int total = 0; float normalised; int i, j; char one, other; for (i = 1; i < size; i++) { for (j = 0; j < size; j++) { one = matrix[j][i-1]; other = matrix[j][i]; if (one == 'x') one = 'i'; if (other == 'x') other = 'i'; if (one == other) count++; } total += size; } normalised = count / (float) total; return (normalised); }/* Neighbour */ /* Function Name : WritePostscript() Function Summary : This function writes the current matrix as a postscript file */ int WritePostscript (int numseq, int seqlen, int max, int *codon, char **matrix) { FILE *OutFile; char Reply[10]; char Filename[120]; int i, j, k; int font = 4; int lhs; char *pschk; int eps = 1; char psend[5] = ".eps"; int textspace; /* Do lots of checking of options with the user */ printf("'e' for encapsulated postscript - to import into other "); printf("applications \n'p' for straight to the printer (e/p) "); fgets(Reply, 10, stdin); if (toupper(Reply[0]) == 'P') { eps = 0; strcpy(psend , ".ps"); } while ( (max+5) * font + 4 > PAPERWIDTH - PAPEREDGE * 2) font = font/2; if (font < 1) { printf("Too many sites (%d) to fit nicely on one page.\n", max); max = PAPERWIDTH - PAPEREDGE *2 - 9; /* 9 = 4 (edge) + 5 (numbers) */ font = 1; printf("The %d first sites can be printed, do you wish to\n", max); printf("do this (y/n) > "); fgets(Reply, 10, stdin); if (toupper(Reply[0]) == 'N') return(0); } else if (font > 1) { printf("The font size is currently %d. If you would like it ", font); printf("smaller,\n enter the desired size > "); fgets(Reply, 10, stdin); if (atoi(Reply) > 0) if (atoi(Reply) < font) { font = atoi(Reply); printf("Font size now %d\n", font); } } /* Prompt the user for the name of the output file... */ printf ("\nEnter the name of the output file (%s) > ", psend); fgets(Filename, 120, stdin); for (i = 0; Filename[i] != '\n'; i++); Filename[i] = '\0'; pschk = strstr(Filename, psend); if (pschk == 0) strcat(Filename, psend); /* try to open the output file. If no can do, return 1 to main(). */ if ((OutFile = fopen (Filename, "w")) == NULL) { printf ("Unable to open Output file - %s\n", Filename); return (1); } /* Stick all the boring postscript headers, fonts etc into the file */ lhs = PostscriptBlurb(OutFile, font, max, eps); /* Setup page */ fprintf(OutFile, "%%%%Page: ? 1\n"); fprintf(OutFile, "/savepage save def\n\n"); if (!eps) { textspace = (PAPERLENGTH - PAPERWIDTH - PAPEREDGE)/3 - 6; fprintf(OutFile, "/Helvetica findfont 12 scalefont setfont\n"); fprintf(OutFile, "(Pairwise site comparison matrix of %s) %d %d ms\n", datasetname, PAPEREDGE + 20, PAPERWIDTH + textspace*2); fprintf(OutFile, "(There were %d parsimoniously informative ", max); fprintf(OutFile, "sites in %d bases from %d sequences) %d %d ms\n\n", seqlen, numseq, PAPEREDGE + 20, PAPERWIDTH + textspace); } /* Print numbers representing the interesting sites along the side. Note that the numbers appear slightly "lifted" in order to line up nicely with the squares i.e. PAPERWIDTH + 1 */ fprintf(OutFile, "/Courier findfont %d scalefont setfont\n",font); for (i = 0 ; i < max ; i++) fprintf(OutFile, "(%6d) %d %d ms \n", codon[i]+1, lhs-2, (PAPERWIDTH +1 - font*i)); /* Print the actual matrix */ fprintf(OutFile, "\n/Ingrid-Blocks findfont %d scalefont setfont\n", font); i = lhs + 5*font; for (j = 0 ; j < max ; j++) { fprintf(OutFile, "("); for (k = 0 ; k < max ; k++) { fputc (matrix[k][j], OutFile); if (((k+1)%70) == 0) fprintf(OutFile, "\\\n"); } fprintf(OutFile, ") %d %d ms\n", i, (PAPERWIDTH - font*j)); } /* Add a border around the matrix */ DrawPSBox(OutFile, lhs, font, max); /* Now for the final bits at the bottom of the postscript file */ fprintf(OutFile, "\n\nsavepage restore\nshowpage\n%%%%PageTrailer\n"); fprintf(OutFile, "%%%%Trailer\n%%%%EOF\n\n"); /* Tidy up, close the file, and return */ fflush(OutFile); fclose(OutFile); return(0); } /* Function :DrawPSBox Summary : Draws a postscript box of width 2 around the matrix located between lhs and lhs +(max*font)) The numbers are the same as for Postscript blurb (or rather, 1 point smaller, so the drawn line fits inside the boundary box) */ void DrawPSBox(FILE *Out, int lhs, int font, int max) { int llx, lly, urx, ury; llx = lhs + (font * 5) - 1; urx = llx + (max * font) + 2; lly = PAPERWIDTH - ( (max - 1) * font) - 1; ury = PAPERWIDTH + font + 1; fprintf(Out, "\n%d %d moveto", llx , lly); fprintf(Out, "\n%d %d lineto", urx , lly); fprintf(Out, "\n%d %d lineto", urx , ury); fprintf(Out, "\n%d %d lineto", llx , ury); fprintf(Out, "\nclosepath\n2 setlinewidth\nstroke\n"); return; } /* Function :PostscriptBlurb Summary :Writes all the horrid stuff that has to go first in the postscript file, particularly the custom font, "Ingrid-Blocks" which draws blocks... It follows the example font on page 282-3 of The Postscript Reference Manual (2nd ed) closely. There are lots of Copyright and Trademark restrictions on the use of Postscript but as far as I can tell what I'm doing here is totally legitimate usage. Ingrid-Blocks is a font I designed which draws white and black squares, and white squares with a line through the diagonal. The characters all fill the allocated space so that the result looks like graphics rather than text. */ int PostscriptBlurb(FILE *Out, int font, int sites, int encap) { time_t now; char username[120]; char descrip[10] = "EPSF-3.0"; int llx, lly, urx, ury; int i = 0; printf("Please enter your name > "); fgets(username, 120, stdin); while(username[i] != '\n') i++; username[i] = '\0'; time (&now); if (!encap) strcpy(descrip , ""); /* The left hand side of the matrix is based on the centre of the piece of paper, the number of sites plus 5 for room to write position numbers down the side, the font size, all halved (and rounded according to int rules) to give a good starting place for drawing the matrix. */ llx = (PAPERWIDTH/2) - ( ( (sites + 5) / 2 ) * font); urx = llx + ( (sites + 5) * font); lly = PAPERWIDTH - ( (sites - 1) * font); ury = PAPERWIDTH + font; fprintf(Out, "%%!PS-Adobe-3.0 %s\n", descrip); if (encap) fprintf(Out, "%%%%BoundingBox: %d %d %d %d\n", llx-2 +font*5, lly-2, urx+2, ury+2); else fprintf(Out, "%%%%BoundingBox: %d %d %d %d \n", PAPEREDGE, PAPEREDGE, PAPERWIDTH - PAPEREDGE, PAPERLENGTH - PAPEREDGE); fprintf(Out, "%%%%Title: Compatibility Matrix of %s\n", datasetname); fprintf(Out, "%%%%Creator: reticulate.c \n"); fprintf(Out, "%%%%For: %s\n", username); fprintf(Out, "%%%%CreationDate: %.24s\n", ctime(&now)); fprintf(Out, "%%%%Pages: 1 \n"); fprintf(Out, "%%%%DocumentNeededResources: font Courier Helvetica\n"); fprintf(Out, "%%%%DocumentSuppliedResources: font Ingrid-Blocks \n"); fprintf(Out, "%%%%+ procset Ingrid-PWSCM-Abbrev 0 0\n"); fprintf(Out, "%%%%EndComments \n\n"); fprintf(Out, "%%%%BeginProlog \n"); fprintf(Out, "%%%%IncludeResource: font Courier Helvetica\n"); fprintf(Out, "%%%%BeginResource: font Ingrid-Blocks \n"); fprintf(Out, "9 dict begin \n"); fprintf(Out, "/FontType 3 def \n"); fprintf(Out, "/FontMatrix [.001 0 0 .001 0 0] def \n"); fprintf(Out, "/FontBBox [0 0 1000 1000] def \n"); fprintf(Out, "/Encoding 256 array def \n"); fprintf(Out, "0 1 255 {Encoding exch /.notdef put} for \n"); fprintf(Out, "Encoding 109 /black put %% m = 109 \n"); fprintf(Out, "Encoding 105 /white put %% i = 105 \n"); fprintf(Out, "Encoding 120 /diagon put %% x = 120 \n"); fprintf(Out, "/CharProcs 4 dict def \n"); fprintf(Out, "CharProcs begin \n"); fprintf(Out, " /.notdef { } def \n"); fprintf(Out, " /black \n"); fprintf(Out, " {0 0 moveto 1000 0 lineto 1000 1000 lineto \n"); fprintf(Out, " 0 1000 lineto closepath fill} bind def \n"); /* For small font sizes and encapsulated postscript, it seems to be best to leave out the black borders around white squares. The easiest way to get postscript to draw a totally white box is simply to omit the defintion. Horrid, I know. */ if ((font >= 2) && (!encap)) { fprintf(Out, " /white \n"); fprintf(Out, " {0 0 moveto 1000 0 lineto 1000 1000 lineto 0 1000 "); fprintf(Out, "lineto \n 0 0 lineto 50 50 lineto 50 950 lineto "); fprintf(Out, "950 950 lineto\n 950 50 lineto 50 50 lineto "); fprintf(Out, "closepath fill} bind def\n"); } fprintf(Out, " /diagon \n {0 900 moveto 0 1000 lineto 100 1000 lineto "); fprintf(Out, "1000 100 lineto \n 1000 0 lineto 900 0 lineto "); fprintf(Out, "closepath fill} bind def \n"); fprintf(Out, "end %% of CharProcs \n"); fprintf(Out, "/BuildGlyph { \n 1000 0\n 0 0 1000 1000\n"); fprintf(Out, " setcachedevice \n exch /CharProcs get exch \n"); fprintf(Out, " 2 copy known not {pop /.notdef} if \n"); fprintf(Out, " get exec \n} bind def\n"); fprintf(Out, "/BuildChar { \n 1 index /Encoding get exch get\n"); fprintf(Out, " 1 index /BuildGlyph get exec \n} bind def\n"); fprintf(Out, "currentdict \nend %%of font dictionary\n"); fprintf(Out, "/Ingrid-Blocks exch definefont pop \n"); fprintf(Out, "%%%%EndResource\n\n"); fprintf(Out, "%%%%BeginResource: procset Ingrid-PWSCM-Abbrev 0 0 \n"); fprintf(Out, "/ms {moveto show} bind def\n"); fprintf(Out, "%%%%EndResource\n"); fprintf(Out, "%%%%EndProlog\n\n"); return(llx); } /* * Function: MyMalloc * Summary : malloc with error checking like everyone else has to have. * If the system can't allocate memory of the appropriate size, * it breaks out of the program altogether. */ void *MyMalloc (size_t size) { void *temp; temp = malloc(size); if (temp == 0) { fprintf(stderr, "\nUnable to allocate %d bytes from the system \n", size); fprintf(stderr, "Try running the program with a smaller dataset\n"); exit(1); } return(temp); } /* MyMalloc */ /* * Function: MxMalloc * Summary : Specialized malloc with error checking for building extra * matrices without breaking out if it fails. (it just tells * you politely no more matrices can be made in the main * program ). */ char **MxMalloc (int size) { char **temp; int stuffup = 0; int i, j; temp = (char **)malloc( sizeof(char *) * (size) ); for (i=0; i < size; i++) { temp[i] = (char *)malloc(sizeof(char) * (size) ); if (temp[i] == 0) { stuffup = i; i = size +2; } } if (i > size) { for (j = 0; j < stuffup; j++) free(temp[j]); free(temp); temp = 0; } return (temp); } /* Function : FloatMalloc Summary : Specialized Malloc for allocating space for floats in whiteness calculating function. It also doesn't break out if it doesn't succeed but gives the user a chance to try with less numbers in the main body of the function. */ float **FloatMalloc (int repeats, int boxes) { float **temp; int stuffup = 0; int i, j; temp = (float **)malloc(sizeof(float *) * repeats ); if (temp) { for (i = 0 ; i < repeats; i++) { temp[i] = (float *)malloc(sizeof(float) * boxes); /* if we try to allocate space but can't do it, we have to unravel everything done so far.... */ if (temp[i] == 0) { stuffup = i; i = repeats+2; } } if (i > repeats) { for(j = 0; j < stuffup; j++) free(temp[j]); free(temp); temp = 0; } } return(temp); }