3. A simple factorization algorithm for univariate polynomials

This section discusses factoring polynomials using arithmetic modulo prime numbers. Information was used from D. Knuth, The Art of Computer Programming, Volume 2, Seminumerical Algorithms and J.H. Davenport et. al., Computer Algebra, SYSTEMS AND ALGORITHMS FOR ALGEBRAIC COMPUTATION.

A simple factorization algorithm is developed for univariate polynomials. This algorithm is implemented as the function BinaryFactors. The algorithm was named the binary factoring algorithm since it determines factors to a polynomial modulo 2^n for successive values of n, effectively adding one binary digit to the solution in each iteration. No reference to this algorithm has been found so far in literature.

Berlekamp showed that polynomials can be efficiently factored when arithmetic is done modulo a prime. The Berlekamp algorithm is only efficient for small primes, but after that Hensel lifting can be used to determine the factors modulo larger numbers.

The algorithm presented here is similar in approach to applying the Berlekamp algorithm to factor modulo a small prime, and then factoring modulo powers of this prime (using the solutions found modulo the small prime by the Berlekamp algorithm) by applying Hensel lifting. However it is simpler in set up. It factors modulo 2, by trying all possible factors modulo 2 (two possibilities, if the polynomial is monic). This performs the same action usually left to the Berlekamp step. After that, given a solution modulo 2^n, it will test for a solution f _i modulo 2^n if f _i or f _i+2^n are a solution modulo 2^(n+1).

This scheme raises the precision of the solution with one digit in binary representation. This is similar to the linear Hensel lifting algorithm, which factors modulo p^n for some prime p, where n increases by one after each iteration. There is also a quadratic version of Hensel lifting which factors modulo p^2^n, in effect doubling the number of digits (in p-adic expansion) of the solution after each iteration. However, according to "Davenport", the quadratic algorithm is not necessarily faster.

The algorithm here thus should be equivalent in complexity to Hensel lifting linear version. This has not been verified yet.


3.1 Modular arithmetic

This section copies some definitions and rules from The Art of Computer Programming, Volume 1, Fundamental Algorithms regarding arithmetic modulo an integer.

Arithmetic modulo an integer p requires performing the arithmetic operation and afterwards determining that integer modulo p. A number x can be written as

x=q*p+r

where q is called the quotient, and r remainder. There is some liberty in the range one chooses r to be in. If r is an integer in the range $0,1, ... ,(p-1)$ then it is the modulo, r=Mod(x,p).

When Mod(x,p)=Mod(y,p), the notation Mod(x=y,p) is used. All arithmetic calculations are done modulo an integer p in that case.

For calculations modulo some p the following rules hold:

For polynomials v _1(x) and v _2(x) it further holds that

Mod((v _1(x)+v _2(x))^p=v _1(x)^p+v _2(x)^p,p)

This follows by writing out the expression, noting that the binomial coefficients that result are multiples of p, and thus their value modulo p is zero ( p divides these coefficients), so only the two terms on the right hand side remain.


Some corollaries

One corollary of the rules for calculations modulo an integer is Fermat's theorem, 1640 : if p is a prime number then

Mod(a^p=a,p)

for all integers a (for a proof, see Knuth).

An interesting corollary to this is that, for some prime integer p:

Mod(v(x)^p=v(x^p),p).

This follows from writing it out and using Fermat's theorem to replace a^p with a where appropriate (the coefficients to the polynomial when written out, on the left hand side).


3.2 Factoring using modular arithmetic

The task is to factor a polynomial

p(x)=a _n*x^n+...+a _0

into a form

p(x)=C*g(x)*f _1(x)^p _1*f _2(x)^p _2*...*f _m(x)^p _m

Where f _i(x) are irreducible polynomials of the form:

f _i(x)=x+c _i

The part that could not be factorized is returned as g(x), with a possible constant factor C.

The factors f _i(x) and g(x) are determined uniquely by requiring them to be monic. The constant C accounts for a common factor.

The c _i constants in the resulting solutions f _i(x) can be rational numbers (or even complex numbers, if Gaussian integers are used).


3.3 Preparing the polynomial for factorization

The final factoring algorithm needs the input polynomial to be monic with integer coefficients (a polynomial is monic if its leading coefficient is one). Given a non-monic polynomial with rational coefficients, the following steps are performed:


Convert polynomial with rational coefficients to polynomial with integer coefficients

First the least common multiple lcm of the denominators of the coefficients p(x) has to be found, and the polynomial is multiplied by this number. Afterwards, the C constant in the result should have a factor 1/lcm.

The polynomial now only has integer coefficients.


Convert polynomial to a monic polynomial

The next step is to convert the polynomial to one where the leading coefficient is one. In order to do so, following "Davenport", the following steps have to be taken:

The polynomial is now a monic polynomial in y.

After factoring, the irreducible factors of p(x) can be obtained by multiplying C with 1/a _n^(n-1), and replacing y with a _n*x. The irreducible solutions a _n*x+c _i can be replaced by x+c _i/a _i after multiplying C by a _n, converting the factors to monic factors.

After the steps described here the polynomial is now monic with integer coefficients, and the factorization of this polynomial can be used to determine the factors of the original polynomial p(x).


3.4 Definition of division of polynomials

To factor a polynomial a division operation for polynomials modulo some integer is needed. This algorithm needs to return a quotient q(x) and remainder r(x) such that:

Mod(p(x)=q(r)*d(x)+r(x),p)

for some polymomial d(x) to be divided by, modulo some integer p. d(x) is said to divide p(x) (modulo p) if r(x) is zero. It is then a factor modulo p.

For binary factoring algorithm it is important that if some monic d(x) divides p(x), then it also divides p(x) modulo some integer p.

Define deg(f(x)) to be the degree of f(x) and lc(f(x)) to be the leading coefficient of f(x). Then, if deg(p(x))>=deg(d(x)), one can compute an integer s such that

Mod(lc(d(x))*s=lc(p(x)),p)

If p is prime, then

s=Mod(lc(p(x))*lc(d(x))^(p-2),p)

Because Mod(a^(p-1)=1,p) for any a. If p is not prime but d(x) is monic (and thus lc(d(x))=1),

s=lc(p(x))

This identity can also be used when dividing in general (not modulo some integer), since the divisor is monic.

The quotient can then be updated by adding a term:

term=s*x^(deg(p(x))-deg(d(x)))

and updating the polynomial to be divided, p(x), by subtracting d(x)*term. The resulting polynomial to be divided now has a degree one smaller than the previous.

When the degree of p(x) is less than the degree of d(x) it is returned as the remainder.

A full division algorithm for arbitrary integer p>1 with lc(d(x))=1 would thus look like:

divide(p(x),d(x),p)
   q(x) = 0
   r(x) = p(x)
   while (deg(r(x)) >= deg(d(x)))
      s = lc(r(x))
      term = s*x^(deg(r(x))-deg(d(x)))
      q(x) = q(x) + term
      r(x) = r(x) - term*d(x) mod p
   return {q(x),r(x)}

The reason we can get away with factoring modulo 2^n as opposed to factoring modulo some prime p in later sections is that the divisor d(x) is monic. Its leading coefficient is one and thus q(x) and r(x) can be uniquely determined. If p is not prime and lc(d(x)) is not equal to one, there might be multiple combinations for which p(x)=q(x)*d(x)+r(x), and we are interested in the combinations where r(x) is zero. This can be costly to determine unless q(x),r(x) is unique. This is the case here because we are factoring a monic polynomial, and are thus only interested in cases where lc(d(x))=1.


3.5 Determining possible factors modulo 2

We start with a polynomial p(x) which is monic and has integer coefficients.

It will be factored into a form:

p(x)=g(x)*f _1(x)^p _1*f _2(x)^p _2*...*f _m(x)^p _m

where all factors f _i(x) are monic also.

The algorithm starts by setting up a test polynomial, p _test(x) which divides p(x), but has the property that

p _test(x)=g(x)*f _1(x)*f _2(x)*...*f _m(x)

Such a polynomial is said to be square-free. It has the same factors as the original polynomial, but the original might have multiple of each factor, where p _test(x) does not.

The square-free part of a polynomial can be obtained as follows:

p _test(x)=p(x)/Gcd(p(x),D(x)p(x))

It can be seen by simply writing this out that p(x) and D(x)p(x) will have factors f _i(x)^(p _i-1) in common. these can thus be divided out.

It is not a requirement of the algorithm that the algorithm being worked with is square-free, but it speeds up computations to work with the square-free part of the polynomial if the only thing sought after is the set of factors. The multiplicity of the factors can be determined using the original p(x).

Binary factoring then proceeds by trying to find potential solutions modulo p=2 first. There can only be two such solutions: x+0 and x+1.

A list of possible solutions L is set up with potential solutions.


3.6 Determining factors modulo 2^n given a factorization modulo 2

At this point there is a list L with solutions modulo 2^n for some n. The solutions will be of the form: x+a. The first step is to determine if any of the elements in L divides p(x) (not modulo any integer). Since x+a divides p _test(x) modulo 2^n, both x+a and x+a-2^n have to be checked.

If an element in L divides p _test(x), p _test(x) is divided by it, and a loop is entered to test how often it divides p(x) to determine the multiplicity p _i of the factor. The found factor f _i(x)=x+c _i is added as a combination ( x+c _i, p _i). p(x) is divided by f _i(x)^p _i.

At this point there is a list L of factors that divide p _test(x) modulo 2^n. This implies that for each of the elements u in L, either u or u+2^n should divide p _test(x) modulo 2^(n+1). The following step is thus to set up a new list with new elements that divide p _test(x) modulo 2^(n+1).

The loop is re-entered, this time doing the calculation modulo 2^(n+1) instead of modulo 2^n.

The loop is terminated if the number of factors found equals deg(p _test(x)), or if 2^n is larger than the smallest non-zero coefficient of p _test(x) as this smallest non-zero coefficient is the product of all the smallest non-zero coefficients of the factors, or if the list of potential factors is zero.

The polynomial p(x) can not be factored any further, and is added as a factor (p(x), 1).

The function BinaryFactors, when implemented, yields the following interaction in Yacas:

In> BinaryFactors((x+1)^4*(x-3)^2)
Out> {{x-3,2},{x+1,4}}
In> BinaryFactors((x-1/5)*(2*x+1/3))
Out> {{2,1},{x-1/5,1},{x+1/6,1}}
In> BinaryFactors((x-1123125)*(2*x+123233))
Out> {{2,1},{x-1123125,1},{x+123233/2,1}}

The binary factoring algorithm starts with a factorization modulo 2, and then each time tries to guess the next bit of the solution, maintaining a list of potential solutions. This list can grow exponentially in certain instances. For instance, factoring (x-a)*(x-2*a)*(x-3*a)*... implies a that the roots have common factors. There are inputs where the number of potential solutions (almost) doubles with each iteration. For these inputs the algorithm becomes exponential. The worst-case performance is therefore exponential. The list of potential solutions while iterating will contain a lot of false roots in that case.


3.7 Efficiently deciding if a polynomial divides another

Given the polynomial p(x), and a potential divisor

f _i(x)=x-p

modulo some q=2^n an expression for the remainder after division is

rem(p)=Sum(i,0,n,a _i*p^i)

For the initial solutions modulo 2, where the possible solutions are x and x-1. For p=0, rem(0)=a _0. For p=1, rem(1)=Sum(i,0,n,a _i) .

Given a solution x-p modulo q=2^n, we consider the possible solutions Mod(x-p,2^(n+1)) and Mod(x-(p+2^n),2^n+1).

x-p is a possible solution if Mod(rem(p),2^(n+1))=0.

x-(p+q) is a possible solution if Mod(rem(p+q),2^(n+1))=0. Expanding Mod(rem(p+q),2*q) yields:

Mod(rem(p+q),2*q)=Mod(rem(p)+extra(p,q),2*q)

When expanding this expression, some terms grouped under extra(p,q) have factors like 2*q or q^2. Since q=2^n, these terms vanish if the calculation is done modulo 2^(n+1).

The expression for extra(p,q) then becomes

extra(p,q)=q*Sum(i,1,n/2,(2*i-1)*a(2*i)*p^(2*i-2))

An efficient approach to determining if x-p or x-(p+q) divides p(x) modulo 2^(n+1) is then to first calculate Mod(rem(p),2*q). If this is zero, x-p divides p(x). In addition, if Mod(rem(p)+extra(p,q),2*q) is zero, x-(p+q) is a potential candidate.

Other efficiencies are derived from the fact that the operations are done in binary. Eg. if q=2^n, then q _next=2^(n+1)=2*q=q<<1 is used in the next iteration. Also, calculations modulo 2^n are equivalent to performing a bitwise and with 2^n-1. These operations can in general be performed efficiently on todays hardware which is based on binary representations.


3.8 Extending the algorithm

Only univariate polynomials with rational coefficients have been considered so far. This could be extended to allow for roots that are complex numbers a+I*b where both a and b are rational numbers.

For this to work the division algorithm would have to be extended to handle complex numbers with integer a and b modulo some integer, and the initial setup of the potential solutions would have to be extended to try x+1+I and x+I also. The step where new potential solutions modulo 2^(n+1) are determined should then also test for x+I*2^n and x+2^n+I*2^n.

The same extension could be made for multivariate polynomials, although setting up the initial irreducible polynomials that divide p _test(x) modulo 2 might become expensive if done on a polynomial with many variables (2^(2^m-1) trials for m variables).

Lastly, polynomials with real-valued coefficients could be factored, if the coefficients were first converted to rational numbers. However, for real-valued coefficients there exist other methods (Sturm sequences).


3.9 Newton iteration

What the BinaryFactor algorithm effectively does is finding a set of potential solutions modulo 2^(n+1) when given a set of potential solutions modulo 2^n. There is a better algorithm that does something similar: Hensel lifting. Hensel lifting is a generalized form of Newton iteration, where given a factorization modulo p, each iteration returns a factorization modulo p^2.

Newton iteration is based on the following idea: when one takes a Taylor series expansion of a function:

f(x[0]+dx):=f(x[0])+(D(x)f(x[0]))*dx+...

Newton iteration then proceeds by taking only the first two terms in this series, the constant plus the constant times dx. Given some good initial value x _0, the function will is assumed to be close to a root, and the function is assumed to be almost linear, hence this approximation. Under these assumptions, if we want f(x _0+dx) to be zero,

f(x[0]+dx)=f(x[0])+(D(x)f(x[0]))*dx=0

This yields:

dx:= -f(x[0])/(D(x)f(x[0]))=0

And thus a next, better, approximation for the root is x[1]:=x _0-f(x[0])/(D(x)f(x[0])), or more general:

x[n+1]=x[n]-f(x[n])/(D(x)f(x[n]))

If the root has multiplicity one, a Newton iteration can converge quadratically, meaning the number of decimals precision for each iteration doubles.

As an example, we can try to find a root of Sin(x) near 3, which should converge to Pi.

Setting precision to 30 digits,
In> Builtin'Precision'Set(30)
Out> True;

We first set up a function dx(x):
In> dx(x):=Eval(-Sin(x)/(D(x)Sin(x)))
Out> True;

And we start with a good initial approximation to Pi, namely 3. Note we should set x after we set dx(x), as the right hand side of the function definition is evaluated. We could also have used a different parameter name for the definition of the function dx(x).

In> x:=3
Out> 3;

We can now start the iteration:

In> x:=N(x+dx(x))
Out> 3.142546543074277805295635410534;
In> x:=N(x+dx(x))
Out> 3.14159265330047681544988577172;
In> x:=N(x+dx(x))
Out> 3.141592653589793238462643383287;
In> x:=N(x+dx(x))
Out> 3.14159265358979323846264338328;
In> x:=N(x+dx(x))
Out> 3.14159265358979323846264338328;

As shown, in this example the iteration converges quite quickly.


Finding roots of multiple equations in multiple variables using Newton iteration

One generalization, mentioned in W.H. Press et al., NUMERICAL RECIPES in C, The Art of Scientific computing is finding roots for multiple functions in multiple variables.

Given N functions in N variables, we want to solve

f _i(x[1],...,x[N])=0

for i=1..N. If de denote by X the vector $$ X := x[1],x[2],...,x[N] $$

and by dX the delta vector, then one can write

f _i(X+dX):=f _i(X)+Sum(j,1,N,D(x _j)f _i(X))*dx[j]

Setting f _i(X+dX) to zero, one obtains

Sum(j,1,N,a[i][j]*dx _j)=b[i]

where

a[i][j]:=(D(x _j)f _i(X))

and

b _i:= -f _i(X)

So the generalization is to first initialize X to a good initial value, calculate the matrix elements a[i][j] and the vector b[i], and then to proceed to calculate dX by solving the matrix equation, and calculating

X[i+1]:=X[i]+dX[i]

In the case of one function with one variable, the summation reduces to one term, so this linear set of equations was a lot simpler in that case. In this case we will have to solve this set of linear equations in each iteration.

As an example, suppose we want to find the zeroes for the following two functions:

f _1(a,x):=Sin(a*x)

and

f _2(a,x):=a-2

It is clear that the solution to this is a=2 and x:=N*Pi/2 for any integer value N.

We will do calculations with precision 30:
In> Builtin'Precision'Set(30)
Out> True;

And set up a vector of functions $f_1(X),f_2(X)$ where $X:=a,x$
In> f(a,x):={Sin(a*x),a-2}
Out> True;

Now we set up a function matrix(a,x) which returns the matrix a[i][j]:

In> matrix(a,x):=Eval({D(a)f(a,x),D(x)f(a,x)})
Out> True;

We now set up some initial values:

In> {a,x}:={1.5,1.5}
Out> {1.5,1.5};

The iteration converges a lot slower for this example, so we will loop 100 times:

In> For(ii:=1,ii<100,ii++)[{a,x}:={a,x}+\
   N(SolveMatrix(matrix(a,x),-f(a,x)));]
Out> True;
In> {a,x}
Out> {2.,0.059667311457823162437151576236};

The value for a has already been found. Iterating a few more times:

In> For(ii:=1,ii<100,ii++)[{a,x}:={a,x}+\
   N(SolveMatrix(matrix(a,x),-f(a,x)));]
Out> True;
In> {a,x}
Out> {2.,-0.042792753588155918852832259721};
In> For(ii:=1,ii<100,ii++)[{a,x}:={a,x}+\
   N(SolveMatrix(matrix(a,x),-f(a,x)));]
Out> True;
In> {a,x}
Out> {2.,0.035119151349413516969586788023};

the value for x converges a lot slower this time, and to the uninteresting value of zero (a rather trivial zero of this set of functions). In fact for all integer values N the value N*Pi/2 is a solution. Trying various initial values will find them.


Newton iteration on polynomials

von zur Gathen et al., Modern Computer algebra discusses taking the inverse of a polynomial using Newton iteration. The task is, given a polynomial f(x), to find a polynomial g(x) such that f(x)=1/g(x), modulo some power in x. This implies that we want to find a polynom g for which:

h(g)=1/g-f=0

Applying a Newton iteration step g[i+1]=g[i]-h(g[i])/(D(g)h(g[i])) to this expression yields:

g[i+1]=2*g[i]-f*g[i]^2

von zur Gathen then proves by induction that for f(x) monic, and thus f(0)=1, given initial value g _0(x)=1, that

Mod(f*g _i=1,x^2^i)

Example:

suppose we want to find the polynomial g(x) up to the 7th degree for which Mod(f(x)*g(x)=1,x^8), for the function

f(x):=1+x+x^2/2+x^3/6+x^4/24

First we define the function f:

In> f:=1+x+x^2/2+x^3/6+x^4/24
Out> x+x^2/2+x^3/6+x^4/24+1;

And initialize g and i.

In> g:=1
Out> 1;
In> i:=0
Out> 0;

Now we iterate, increasing i, and replacing g with the new value for g:

In> [i++;g:=BigOh(2*g-f*g^2,x,2^i);]
Out> 1-x;
In> [i++;g:=BigOh(2*g-f*g^2,x,2^i);]
Out> x^2/2-x^3/6-x+1;
In> [i++;g:=BigOh(2*g-f*g^2,x,2^i);]
Out> x^7/72-x^6/72+x^4/24-x^3/6+x^2/2-x+1;

The resulting expression must thus be:

g(x):=x^7/72-x^6/72+x^4/24-x^3/6+x^2/2-x+1

We can easily verify this:

In> Expand(f*g)
Out> x^11/1728+x^10/576+x^9/216+(5*x^8)/576+1;

This expression is 1 modulo x^8, as can easily be shown:

In> BigOh(%,x,8)
Out> 1;