Floating Point Math Functions - Microchip …

M 1997 Microchip Technology 1 INTRODUCTION This application note presents implementations of thefollowing math routines for the Microchip PICmicro microcontroller family: square root function, exponential function, base 10 exponential function, natural log function, common log function, trigonometric sine function trigonometric cosine functiontrigonometric sine and cosine func-tions power function, floor function, largest integer not greater than x, as float, Floating Point logical comparison tests integer random number generatorRoutines for the PIC16 CXXX and PIC17 CXXX familiesare provided in a modified IEEE 754 32-bit formattogether with versions in 24-bit reduced techniques and methods of approximation pre-sented here attempt to balance the usually conflictinggoals of execution speed verses memory consumption,while still achieving full machine precision 32-bit arithmetic routines are available andconstitute extended precision for the 24-bit versions.

Noextended precision routines are currently supported foruse in the 32-bit routines , thereby requiring moresophisticated error control algorithms for full or nearlyfull machine precision function estimation. Differencesin algorithms used for the PIC16 CXXX andPIC17 CXXX families are a result of performance andmemory considerations and reflect the significant plat-form dependence in algorithm TestaFJT Consultingsqrtx()xexpx()exexp10x()10xx() logxlnlog10x()log10xx()sinx()cosx()cossi npowxy,()xyfloorx()xtaxxbab,()randx() MATHEMATICAL FUNCTION EVALUATION Evaluation of elementary and mathematical functionsis an important part of scientific and engineering com-puting. Although straightforward Taylor series approxi-mations for many Functions of interest are well known,they are generally not optimal for high performancefunction evaluation. Many other approaches are avail-able and the proper choice is based on the relativespeeds of Floating Point and fixed Point arithmetic oper-ations and therefore is heavily implementationdependent.

Although the precision of fixed Point arithmetic is usu-ally discussed in terms of absolute error, Floating pointcalculations are typically analyzed using relative example, given a function f and approximation p ,absolute error and relative error are defined byIn binary arithmetic, an absolute error criterion reflectsthe number of correct bits to the right of the binarypoint, while a relative error standard determines thenumber of significant bits in a binary representationand is in the form of a the 24-bit reduced format case, the availability ofextended precision arithmetic routines permits *ulp, or one-half U nit in the L ast P osition, accuracy,reflecting a relative error standard that is typical of mostfloating Point operations. The 32-bit versions cannotmeet this in all cases. The absence of extended preci-sion arithmetic requires more time consuming pseudoextended precision techniques to only approach thisstandard.

Although noticeably smaller in most cases,the worst case relative error is usually less than 1*ulpfor the 32-bit format. Most of the approximations, pre-sented here on the PIC16 CXXX and PIC17 CXXX pro-cessors, utilize minimax polynomial or minimax rationalapproximations together with range reduction andsome segmentation of the interval on the transformedargument. Such segmentation is employed only whenit occurs naturally from the range reduction, or whenthe gain in performance is worth the increased con-sumption of program memory. abs errorpf rel errorpf f------------- Floating Point math Functions AN660 AN660 DS00660A-page 2 1997 Microchip Technology Inc. RANGE REDUCTION Since most Functions of scientific interest have largedomains, function identities are typically used to mapthe argument to a considerably smaller region whereaccurate approximations require a reasonable effort.

Inmost cases range reduction must be performed care-fully in order to prevent the introduction of cancellationerror to the approximation. Although this process canbe straightforward when extended precision routinesare available, their unavailability requires more com-plex pseudo extended precision methods[3,4]. Theresulting interval on the transformed argument some-times naturally suggests a segmented representationwhere dedicated approximations are employed in eachsubinterval. In the case of the trigonometric Functions sin( x ) and cos(x) , reduction of the infinite naturaldomain to a region small enough to effectively employapproximation cannot be performed accurately for anarbitrarily large x using finite precision arithmetic,resulting in a threshold in | x | beyond which a loss ofprecision occurs. The magnitude of this threshold isimplementation dependent.

MINIMAX APPROXIMATION Although series expansions for the elementary func-tions are well known, their convergence is frequentlyslow and they usually do not constitute the most com-putationally efficient method of approximation. Forexample, the exponential function has the Maclaurinseries expansion given by To estimate the function on the interval [0,1], truncationof the series to the first two terms yields the linearapproximation, a straight line tangent to the graph of the exponentialfunction at x = 0. On the interval [0,1], this approxima-tion has a minimum relative error of zero at x = 0, anda maximum relative error of |2- e |/ e = at x = 1,underestimating the function throughout the that this undesirable situation is in partcaused by using a tangent line approximation at one ofthe endpoints, an improvement could be made by usinga tangent line approximation, for example, at the mid- Point x = , yielding the linear function,with a minimum relative error of zero at x = , a max-imum relative error of at x = 0, and relativeerror of at x = 1, again underestimating thefunction throughout the interval.

We could reduce themaximum error even further by adjusting the interceptof the above approximation, producing subintervals ofexxjj!-----1xx22!-----x33! ++++=j0= =ex1x+ ,exe12 +() , both positive and negative error, together with possiblyequalizing the values of maximum error at each occur-rence by manipulating both the slope and intercept ofthe linear approximation. This is a simple example of avery powerful result in approximation theory known asminimax approximation, whereby a polynomial approx-imation of degree n to a continuous function can alwaysbe found such that the maximum error is a minimum,and that the maximum error must occur at least at n + 2points with alternating sign within the interval ofapproximation. It is important to note that the resultingminimax approximation depends on the choice of a rel-ative or absolute error criterion. The evaluation of theminimax coefficients is difficult, usually requiring aniterative procedure known as Remes method, and his-torically accounting for the attention given to near-min-imax approximations such as Chebyshev polynomialsbecause of greater ease of computation.

With theadvances in computing power, Remes method hasbecome much more tractable, resulting in iterative pro-cedures for minimax coefficient evaluation[3]. Remark-ably, this theory can be generalized to rationalfunctions, offering a richer set of approximation meth-ods in cases where division is not too slow. In the abovesimple example, the minimax linear approximation onthe interval [0,1] is given by with a maximum relative error of , occurringwith alternating signs at the n + 2 = 3 points ( x = 0, x = , and x = 1). Occasionally, constrained mini-max approximation[2] can be useful in that some coef-ficients can be required to take on specific valuesbecause of other considerations, leading to effectivelynear-minimax great advantage in using minimax approximationslies in the fact that minimizing the maximum error leadsto the fewest number of terms required to meet a givenprecision.

The number of terms is also dramaticallyaffected by the size of the interval of approximation[1],leading to the concept of segmented representations,where the interval of approximation is split into sub-intervals, each with a dedicated minimax approxima-tion. For the above example, the interval [0,1] can besplit into the subintervals [0, ] and [ ,1], with the lin-ear minimax approximations given bySince the subintervals were selected for convenience,the maximum relative error is different for the two sub-intervals but nevertheless represents a significantimprovement over a single approximation on the inter-val [0,1], with the maximum error reduced by a factorgreater than three. Although a better choice for thesplit, equalizing the maximum error over the + max , ,[]max error,,+ ,[]max error,,+ { . 1997 Microchip Technology 3 AN660 vals, can be found, the overhead in finding the correctsubinterval for a given argument would be muchgreater than that for the convenient choice used minimax approximations used in the implementa-tions for the PIC16 CXXX and PIC17 CXXX device fam-ilies presented here, have been produced by applyingRemes method to the specific intervals in question[3].}

USAGE For the unary operations, input argument and result arein AARG, with the exception of the sincos routineswhere the cosine is returned in AARG and the sine inBARG. The power function requires input arguments inAARG and BARG, and produces the result in the logical test routines also require inputarguments in AARG and BARG, the result is returnedin the W register. SQUARE ROOT FUNCTION The natural domain of the square root function is allnonnegative numbers, leading to the effective domain[0,MAXNUM] for the given Floating Point representa-tion. All routines begin with a domain test on the argu-ment, returning a domain error if outside the the PIC17 CXXX, the greater abundance of programmemory together with improved Floating Point division,using the hardware multiply permits a standard New-ton-Raphson iterative approach for square root evalua-tion[1].

Floating Point Math Functions - Microchip …

Tags:

Information

Transcription of Floating Point Math Functions - Microchip …

Related search queries

Floating Point Math Functions - Microchip …

Tags:

Information

Documents from same domain

Related documents

Related search queries