Mercurial > repo
comparison interps/c-intercal/pit/lib/floatlib.doc @ 996:859f9b4339e6
<Gregor> tar xf egobot.tar.xz
author | HackBot |
---|---|
date | Sun, 09 Dec 2012 19:30:08 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
995:6883f5911eb7 | 996:859f9b4339e6 |
---|---|
1 * Summary of routines in floatlib.i: | |
2 | |
3 (5000) :3 <- :1 plus :2 | |
4 (5010) :3 <- :1 minus :2 | |
5 (5020) :2 <- the integer part of :1 | |
6 :3 <- the fractional part of :1 | |
7 (5030) :3 <- :1 times :2 | |
8 (5040) :3 <- :1 divided by :2 | |
9 (5050) :3 <- :1 modulo :2 | |
10 | |
11 (5060) :2 <- :1 cast from a two's-complement integer into a | |
12 floating-point number | |
13 (5070) :2 <- :1 cast from a floating-point number into the nearest | |
14 two's-complement integer | |
15 (5080) :2 <- :1 cast from a floating-point number into a decimal | |
16 representation in scientific notation | |
17 (5090) :2 <- :1 cast from a decimal representation in scientific | |
18 notation into a floating-point number | |
19 | |
20 (5100) :2 <- the square root of :1 | |
21 (5110) :2 <- the natural logarithm of :1 | |
22 (5120) :2 <- e to the power of :1 (the exponential function) | |
23 (5130) :3 <- :1 to the power of :2 | |
24 | |
25 (5200) :2 <- sin :1 | |
26 (5210) :2 <- cos :1 | |
27 (5220) :2 <- tan :1 | |
28 | |
29 (5400) :1 <- uniform random number between zero and one exclusive | |
30 (5410) :2 <- :1 times phi | |
31 (5419) :2 <- :1 divided by phi | |
32 | |
33 Note: All of the above routines except (5020), (5060), (5080), (5200), | |
34 (5210), and (5400) also modify .5 as follows: .5 will contain #3 if | |
35 the result overflowed or if the arguments were out of domain, #2 if | |
36 the result underflowed, #1 otherwise. (See below.) | |
37 | |
38 * Description of floatlib.i: | |
39 | |
40 The INTERCAL floating-point library uses the IEEE format for 32-bit | |
41 floating-point numbers, which uses bit 31 as a sign bit (1 being | |
42 negative), bits 30 through 23 hold the exponent with a bias of 127, | |
43 and bits 22 through 0 contain the fractional part of the mantissa with | |
44 an implied leading 1. In mathematical notation: | |
45 | |
46 N = (1.0 + fraction) * 2^(exponent - 127) * -1^sign | |
47 | |
48 Thus the range of floating-point magnitudes is, roughly, from | |
49 5.877472*10^-39 up to 6.805647*10^38, positive and negative. Zero is | |
50 specially defined as all bits 0. (Actually, to be precise, zero is | |
51 defined as bits 30 through 0 as being 0. Bit 31 can be 1 to represent | |
52 negative zero, which the library generally treats as equivalent to | |
53 zero, though don't hold me to that.) | |
54 | |
55 Note that, contrary to the IEEE standard, exponents 0 and 255 are not | |
56 given special treatment (besides the representation for zero). Thus | |
57 there is no representation for infinity or not-a-numbers, and there is | |
58 no gradual underflow capability. Conformance with widely-accepted | |
59 standards was not considered to be a priority for an INTERCAL library. | |
60 (The fact that the general format conforms to IEEE at all is due to | |
61 sheer pragmatism.) | |
62 | |
63 Here, for easy reference, are some common values as they would be | |
64 directly represented within an INTERCAL program: | |
65 | |
66 Zero: #0 | |
67 One: #30720$#28672 | |
68 Two: #0$#32768 | |
69 One million: #9280$#40480 | |
70 One-half: #28672$#28672 | |
71 Square root of two: #31757$#30509 | |
72 e: #1760$#33742 | |
73 Pi: #571$#35133 | |
74 | |
75 However, a more human-accessible representation of floating-point | |
76 numbers is made possible by the routines (5080) and (5090). For this | |
77 representation, scientific notation is used with six digits of | |
78 precision after the decimal point. The seven digits of the mantissa | |
79 are suffixed with the two digits of the exponent. If the number is | |
80 negative, a one is prefixed (in the billions' place), so there can be | |
81 ten decimal digits in all. Finally, if the exponent is negative, fifty | |
82 is added. As is the usually the case with scientific notation, the | |
83 digit to the left of the decimal point must be nonzero except for the | |
84 case of zero itself. | |
85 | |
86 The above may sound convoluted, but it is not nearly as confusing as | |
87 it perhaps should be. As an example, if you wished to enter the value | |
88 of the speed of light measured in centimeters per seconds squared, you | |
89 would type TWO NINE NINE SEVEN NINE TWO FIVE ONE OH. The INTERCAL | |
90 program would then call (5080) to transform this into the | |
91 floating-point number 2.997925e+10. The same value printed out with | |
92 _____ | |
93 the help of (5090) would appear as ccxcixDCCXCMMDX. Similarly, the | |
94 value -1.602189e-19 (the charge of an electron measured in Coulombs) | |
95 would be represented respectively as ONE ONE SIX ZERO TWO ONE EIGHT | |
96 _______ | |
97 NINE SIX NINE and mclxCCXVIIICMLXIX. Note that the negative sign on | |
98 the exponent always translates to an L between the fraction and the | |
99 exponent when output. | |
100 | |
101 The 16-bit variable .5 is used by the floating-point library as an | |
102 error indicator. Upon return from most of the functions, .5 will | |
103 normally be set to #1 if the return value is valid. If .5 is set to | |
104 #2, this indicates that the result of the function underflowed (that | |
105 is, the magnitude of the result is less than 2^-127). If .5 is set to | |
106 #3, this indicates either that the result overflowed (magnitude | |
107 greater than 2^128), or that the arguments to the function were | |
108 illegal. The latter can occur for the following situations: division | |
109 by zero (either via division or modulus), negative root (either via | |
110 square root or a fractional power), and non-positive logarithm. Also, | |
111 (5070) will set .5 to #3 if the magnitude of the argument is greater | |
112 than 2^31, and (5080) will do likewise if given a bad number (e.g., if | |
113 it is greater than two billion). | |
114 | |
115 It may be worth nothing that there are some cases in which an under- | |
116 or overflow can be recovered from with tolerable grace. The sign and | |
117 fraction bits of such a value will usually still be correct, and the | |
118 exponent bits will just be the lower eight bits of the true exponent: | |
119 that is, the true exponent plus or minus 256 as appropriate. If such a | |
120 value is passed to another function as is, and the return value from | |
121 that over- or underflows in the opposite direction, it is likely that | |
122 the final result can be trusted. Of course, this depends entirely on | |
123 the nature of the operations involved, and this paragraph should not | |
124 be taken as advice to pursue such approaches in general, just as this | |
125 document should not be taken as advice to make use of this library at | |
126 all. Note also that in the case of (5110), the exponential function, | |
127 and (5130), the power function, it is quite easy to request a | |
128 massively under- and/or overflowed result. In these cases the | |
129 functions in question exit early, and attempting to salvage something | |
130 from such results, with the possible exception of the sign bit, is | |
131 pretty much guaranteed to be fruitless. | |
132 | |
133 (5020) is the integral partition function (referred to as modf in the | |
134 C library). Both return values are floating-point numbers, and both | |
135 have the same sign as the argument, such that :2 + :3 = :1 while | |
136 keeping the magnitude of :3 less than 1. | |
137 | |
138 (5060) and (5070) are the "type-casting" functions. They translate | |
139 values to and from 32-bit two's-complement integers. (5070) truncates | |
140 any fractional amounts from its argument, and will signal an error if | |
141 the magnitude of the value requires more than 32 bits. | |
142 | |
143 The trigonometric functions (5200), (5210), and (5220) will return | |
144 erroneous results when given very large arguments. At magnitudes | |
145 around 8 million or so, the result is occasionally off by one bit. | |
146 The errors get progressively worse, so that with magnitudes in the | |
147 neighborhood of the quintillions and higher, the number is wholly | |
148 inaccurate. However, at magnitudes around 8 million, there is already | |
149 a difference of over 20pi between consecutive floating-point numbers, | |
150 so it is hard to conceive of a purpose for which improved accuracy | |
151 would be useful. | |
152 | |
153 In the descriptions for (5410) and (5419), phi refers to the golden | |
154 ratio. | |
155 | |
156 * Internal routines in floatlib.i: | |
157 | |
158 A floating-point library involves a fair amount of mathematical | |
159 functionality to begin with, something that INTERCAL is not generally | |
160 mistaken for having. The floating-point library therefore has a number | |
161 of internal functions to provide it with some 64-bit arithmetic, among | |
162 other things. While these routines were designed specifically for | |
163 internal use, and in many cases one would be hard-pressed to find | |
164 other uses for them in their current form, it was felt that they were | |
165 nonetheless worth documenting. Besides providing a guide for anyone so | |
166 foolish as to examine the actual code, it provides a sort of "itemized | |
167 bill" that helps to justify the inordinate size of the library. | |
168 | |
169 In the following list, the notation :1:2 simply indicates that two | |
170 32-bit variables, in this example :1 and :2, are being used to hold a | |
171 single 64-bit integer value. The notation :1:2.1, on the other hand, | |
172 indicates a kind of double-precision floating-point value, a format | |
173 that the library uses frequently when storing intermediate values. The | |
174 two 32-bit variables hold the fraction bits, with no implied bits, and | |
175 ideally with the highest 1 bit at bit 55. The 16-bit variable holds | |
176 the exponent, but with a bias of 639 instead of 127. This is done so | |
177 that underflows do not affect the sign, which is also stored in the | |
178 16-bit variable in bit 10. These intermediate values are rounded (to | |
179 even), truncated, and stored in a single 32-bit variable by the | |
180 floating-point normalization function, (5600). | |
181 | |
182 Finally, note that the numbers in the tables accessed by routines | |
183 (5690) through (5693) are generally tailored for the function that | |
184 applies them, and all of them use slightly different representations | |
185 for the values they encode. | |
186 | |
187 Here are brief descriptions of the internal routines: | |
188 | |
189 (5500) :1:2 <- :1:2 plus :3:4 | |
190 .5 <- #0 if no overflow, #1 otherwise | |
191 (5510) :1 <- :1 plus :6, where :6 has at most one 1 bit | |
192 :6 <- #0 on overflow, nonzero otherwise | |
193 (5519) :1 <- :1 minus :6, where :6 has at most one 1 bit | |
194 :6 <- #0 on underflow, nonzero otherwise | |
195 (5520) .1 <- .1 minus #1 | |
196 (5530) :1:2:3:4 <- :1:2 times :3:4 | |
197 (5540) :1 <- :1:2 divided by :3:4, with :1:2 aligned on bit 63, :3:4 | |
198 aligned on bit 62, and the result aligned on bit 31 | |
199 .1 <- the exponent of the quotient plus #126 | |
200 .5 <- #1 if the result should be rounded up, #0 otherwise | |
201 (5550) :1:2.1 <- :1:2.1 modulo :3:4.2 | |
202 (5560) :1:2 <- two's complement of :1:2 | |
203 (5570) :1:2 <- :1:2 left-shifted logically so that bit 63 is 1 | |
204 .1 <- .1 minus the no. of bits shifted out | |
205 (5580) :3:4 <- :3:4 shifted right arithmetically .3 bits | |
206 (5590) :1:2 <- :1:2 shifted so that bit 55 is the highest 1 bit | |
207 .1 <- .1 plus or minus the no. of bits shifted | |
208 .5 <- the bits right-shifted out of :1:2, if any | |
209 (5600) :1 <- the normalization of :1:2.1 | |
210 (5610) :1:2.1 <- the natural logarithm of :1:2.1 | |
211 (5620) :1:2.1 <- e to the power of :1:2.1 | |
212 (5640) :1:2.1 <- :1:2.1 to the power of the two's-complement integer | |
213 in :3 | |
214 (5650) :1:2.1 <- :1 to the power of :2 | |
215 .5 <- #1 if the power is not real, #2 otherwise | |
216 (5670) :1:2 <- an angle between -pi/2 and pi/2 that has the same | |
217 value for sine as :1, and stored in two's-complement, | |
218 fixed-point form, with the one's position at bit 62 | |
219 .1 <- #1024 if the cosine of :1:2 will have the opposite sign | |
220 as the cosine of :1, #2048 otherwise | |
221 (5680) :1:2 <- the sine of :1:2 in two's-complement, fixed-point | |
222 :3:4 <- the cosine of :1:2 | |
223 (5690) ;1 <- a table of 32 significant bits of the numbers 10^i, | |
224 where i ranges from 1 to 39 | |
225 ,1 <- a table of exponents for the above powers of ten | |
226 (5691) ;1 <- a table of bit patterns representing ln(1 + 1/(2^i - 1)), | |
227 where i ranges from -1 to -26 | |
228 (5692) ;1 <- a table of bit patterns representing -ln(1 + 2^i), where | |
229 i ranges from -1 to -32 | |
230 (5693) ;1 <- a table of bit patterns representing arctan(2^i), where | |
231 i ranges from 0 to -30 | |
232 | |
233 * Bibliography: | |
234 | |
235 Knuth, Donald E. "The Art of Computer Programming: Vol. 2 | |
236 (Seminumerical Algorithms)", 2nd edition. Addison-Wesley, 1981. | |
237 This book provided the essential background as well as algorithms | |
238 for the basic arithmetic operations. Mention should also be made | |
239 of Vol. 1, which provided an algorithm for the natural logarithm | |
240 as a side note. | |
241 | |
242 Koren, Israel. "Computer Arithmetic Algorithms". Prentice-Hall, 1993. | |
243 In addition to information on the IEEE standard, this book | |
244 provided the algorithms used for the exponential and trigonometric | |
245 functions. | |
246 | |
247 Plauger, P. J. "The Standard C Library". Prentice-Hall, 1992. | |
248 This book provided assistance in designing the modulus and power | |
249 functions, as well as other miscellaneous contributions. |