Assignment #7: Neural Networks

  • Name:Lan Vu

  • Date:12/5/2007

  • Course ID: CSC 5682

  • Semester:Fall 2007

OFFLINE HANDWRITING CHARACTER RECOGNITION

INTRODUCTION

 My project is an Offline Handwriting Character Recognition (OHCR) System. This system allows users creating their own recognition networks for being used in other specific applications. OHCR problem requires solving two main smaller problems: the image processing problem and recognition problem.

For the first problem, some image processing algorithms are introduced. For the second one, I choose the way of using neural network.

What is Offline Handwriting Character Recognition (OHCR) System?

OHCR is a method of converting visually readable characters into computer readable characters. OHCR programs are used with scanners to enter text into the computer from a hardcopy version. We need OHCR because, even though we can read characters, computers can’t. They use binary codes (ASCII) to represent characters. When a document is scanned, the scanner provides us the image of the document (is can be color image or binary image). OHCR applications will convert the image to text. Then a word processing program can correct the mistake.

 

The Processes of Handwriting Character Recognition

    1. Preprocessing: the hardcopy is scanned and the handwritten parts are found, separated and transformed into matrices with binary values (0s for the black pixels and 1s for the white ones).
    2. Character segmentation: looking for the area of each character in the matrix.
    3. Feature extraction: each character is divided vertically so that its primitive features are shown. Each primitive feature is represented by a vector.
    4. Recognition procedure: follows which is based on a recognition model.

 The results of each process

My system isn’t a specific application in any field. Instead, it provides a tool to create a recognition engine for using in other recognition application.  In this system, I concern to only two of above processes: Preprocessing and Recognition.

PREPROCESSING

Preprocessing in my project is used to process raw patterns archived from scanned images. After several preprocessing steps, the result is the normalizing patterns. These patterns then can be used for training or testing. There are several algorithms used by the application to convert the raw image into proper image. The purpose of these transformation algorithms is to provide the best possible input data for the recognition algorithms.

The problems of scanning image

 Some used Algorithms

 Binary Image Making Algorithm: to convert original image (color or gray scale) into binary image (0s for the black pixels and 1s for the white ones).

 Noise filtering algorithm:

Straightening algorithm

Some other normalized algorithms: skeletonize, despace, deline…

RECOGNITION

Recognition model

Recognition can be occurred by different methods such as:

ˇ         Neural Network (FNN, ANN, Combined NN...)

ˇ         Hidden Markov model

ˇ         Wavelet packet transform and Neuron-Fuzzy

ˇ         Genetic-Algorithm

ˇ         Adaptive Statistical Similarity (using Statistical tools such as Bayesian and Gaussian)

ˇ         Hybrid methods

Among these above methods, application of neural network is the most popular one and I also choose this solution for my project. The following image shows the model of my recognition network:

                             

The training set

The training set is a set of patterns and text codes (ASCII) standing for characters. A pattern is the input of system and text code is the expected output character.

Each pattern is a binary image standing for character. This image results from a raw image after being processed through some image processing algorithms introduced in previous section. The default size of recognition image is 24x24. Because of learning by pattern, the recognition neural network has 576 input nodes (=24*24). Each node will receive value of a pixel.

In my project, training set can be created by users or get from other resource such as scanned patterns or NIST Database.

The training process

A new recognition neural network is created with random weights of arcs. I don’t set these values to 0 because I find that use random values, the training time will be shorten.

Before training, a training set is loaded and some of training information should be set such as training times, min error, and momentum.

ˇ         Training times: the time that network be trained. It’s not depending on the size of training set.

ˇ         Min error: The difference between output value and 1. It makes a threshold to specify whether revise the weights of arcs of network or not.

ˇ         Momentum: this parameter used to specify that the weights of arcs of network are revised much or less after each learning time.

In the training process, a pattern in the training set is converted to input vector with 576 elements and the text code of that pattern is converted to expected output vector. In this input vector, the value of elements standing for black pixels is 1, otherwise the value is 0. The Propation method of the neural network will calculate an output vector depending on this input vector. Then the output vector and the expected output vector are used in the Backpropation method to revise the weights of arcs in the neural network. Two parameters Min error and Momentum are also used in this method to have a suitable learning way. The learning process continues with next pattern until the training times is reached.

TESTING RESULT

This section will show you the effectiveness (%correct on the training set) of the system after it has been trained and how well the system generalized to data that was not in the training set.

For testing purpose, I have created a digit recognition neural network which allow to recognize ten digits (0, 1, 2, 3,… 9) with the structure:

             Input layer: 576 nodes (image matrix 24x24)

             Hidden layer 1: 256 nodes

             Hidden layer 2: 256 nodes

            Output layer: 256 nodes (only ten nodes standing for 10 digits are activated, the other nodes are masked and not work).

The training data is a part of NIST Database with 2000 digit patterns and the testing data is also a part of NIST Database with other 2000 digit patterns. The result of training and testing on this data set is shown in the below table.

Training times

Training times/pattern

% Correct on Training data set

% Correct on Testing data set

0

0

12.6%

8.9%

2000

1

8.8%

13.5%

4000

2

22.9%

13.5%

8000

4

30.6%

30.1%

16000

8

37.7%

36.3%

32000

16

46.8%

44.9%

64000

32

61.4%

58.3%

You can find in the source code package the file TrainData.dat containing the training data set the file TestData.dat containing the testing data set. The file DigitNet.net saves information of the recognition neural network after training.

IMPLEMENT

The computer simulation of Offline Handwriting Character Recognition System is shown in below interface. It is implemented in C++ language NET Framework 2.0. I choose C++ because of it conveniences in working with memory for the image processing need.

Click here for full source code of this program.

This program consists of some main parts:       

Steps to build a recognition neural network for a specific purpose:

1.    Create new neural network:

-       Click button Network Info

 

-       Set information of neural network that you want to create

-       Click OK

2.    Prepare the training data set in the Training data section

3.    Set parameters in training screen section and start to train until the percent of correction is accepted.

Some main class in source code:

 CMLNet class: This is general class of multi layer neural network. It can be used for any purpose

#define     NUM_OF_LAYERS 50 

 

class CMLNet

{

public:

      CMLNet(void);

      ~CMLNet(void);

 

      enum modeOutput  { OUT_ON  = 0, OUT_OFF = 1};

 

      CVecto   m_OutputMask;

      CVecto   m_errors;

      CVecto   m_deltas;        

      CVecto   m_weights;

      CVecto   m_layers;

      double   m_alpha;

      double   m_bias;

 

      void Initial(int LayerCount,int* pNodes);

 

      double FN(double f) {return double(1.0 / (1.0 + exp(-f)));}

      double FD(double f) {return double(f *  (1.0 - f));}   

 

      // warning: will change a parameters of in_synapsys and  

      //outValss vectors (size, capacity, doubles)     

      bool  Propagate(CVecto& inVals, CVecto& outVals);

      bool  LearnPropagate(CVecto& inVals, CVecto& outVals);

      void  BackPropagate(CVecto& outVals, CVecto&         

                          idealVals,double N,double M);

      void  GetErrorWeight(CVecto& eVals,CVecto& wVal);

 

      bool  Serialize(CArchive& ar);

      void  RandomizeWeigths(double, double);

      bool  CheckConvergence(double);

      void  SetOutputMask(CString strMask);

      CString GetOutputMask(void);

 

      double GetErrorVal(void) { return m_errors.GetTotalVal() ;};

 

      void  SetAlpha(double alpha)  {m_alpha = alpha;}

      void  SetBias(double bias)    {m_bias  = bias   ;}

 

      int   GetInputCount() { return (int)m_layers[0] ;};

      int   GetOutputCount()

                 {return (int)m_layers[m_layers.size()-1];};

      int   GetLayerCount() { return m_layers.size();};

      int   GetNodeCount(int nIndex)

                 { return (int)m_layers[nIndex];};

 

      bool Propagate()  {return Propagate(*m_inVal, *m_outVal);};

      bool LearnPropagate()

           {return LearnPropagate(*m_inVal, *m_outVal);};

 

 

 private:

      //used only for mutil nets structure . Ex : modular net

      CVecto*                 m_inVal     ;    

      CVecto*                 m_outVal;

 

      double Delta (double N, double M,

                    double d, double e , double y)

      { return  double( N * ((M * d) + ((1.0 - M) * e * y)));     }          

};

//******************************************************

void  CMLNet::Initial(int LayerCount,int* pNodes)

{     SetBias (1.0);

      SetAlpha(1.0);

      int numWeights = 0;

      int numNeurons = 0;

      int t = 0;

 

      m_layers.resize(LayerCount);

     

      for (int i = 0; i < LayerCount ; ++i)

      {     m_layers[i] = pNodes[i];

            numWeights +=   (int)(t * m_layers[i]);

            t                 =     (int)m_layers[i] + 1    ;     // add bias

      }

      for (int l = 1 ; l <LayerCount; l++)

            numNeurons += (int)m_layers[l];

 

      TRY

      {     m_weights.release();

            m_deltas.release() ;

            m_errors.release() ;

            m_OutputMask.release();

            m_weights.resize(numWeights);

            m_deltas.resize(numWeights);

            m_errors.resize(numNeurons);

            m_OutputMask.resize((int)m_layers[m_layers.size()-1]);  

      }

      CATCH_ALL(e)

      {     e->ReportError();

            e->Delete();

            return;

      }

      END_CATCH_ALL

 

 

}

//******************************************************

bool CMLNet::Propagate(CVecto& inVals, CVecto& outVals)

{     int         w           = 0         ;

      int         max     = (int)m_layers[0] ;

      double*     inVal = NULL      ;

      double*     outVal      = NULL      ;

     

      if (inVals.size() != (int)m_layers[0])  return false;

      //get num of mode in biggest layer

      for (int  i = 1 ; i < m_layers.size(); i++ )

            if ((int)m_layers[i] > max) max = (int)m_layers[i];

 

      inVal   = new double[max];

      outVal  = new double[max];

     

      //set input val

      for (int  i = 0 ; i < inVals.size() ; i++ ) {

            inVal[i] = inVals[i];

      }

           

      //Propagate

      for (int  l = 1 ; l < m_layers.size(); l++ )

      {    

            //for each out layer

            for (int i = 0; i < (int)m_layers[l] ; i++)

            {    

                  if (l==m_layers.size()-1 && m_OutputMask[i] ==

                                                     OUT_OFF)

                  {     outVal[i] = 0.0 ; }

                  else

                  {     double v = (m_weights[w] * m_bias); w++;

                        //for each in layer

                        for (int j = 0 ; j <(int)m_layers[l-1];

                                                    j++, w++ )        

                        {     v = v + (inVal[j] * m_weights[w]);

                        }

                        outVal[i] = FN(v);

                                  // transfer Function         

                  };

            }

            double* tmp = inVal     ;

            inVal       = outVal;

            outVal            = tmp ;

      }

      double* tmp = inVal     ;

      inVal       = outVal;

      outVal            = tmp ;

 

      //set output val

      for (int  i = 0 ; i < outVals.size() ; i++ )

            outVals[i] = outVal[i];

 

      delete [] inVal  ;

      delete [] outVal ;

      return true;

}

//******************************************************

//return true if success

//outVals must be = NULL

bool CMLNet::LearnPropagate(CVecto& inVals, CVecto& outVals)

{     int   w           = 0                                 ;

      int max     = 0                                 ;

      int numNode = 0                                 ;    

      int   inPos = 0                                 ;

      int   outPos  = (int)m_layers[0];

      double*     inVal = NULL            ;

     

      //get num of node of layers

      for (int  i = 0 ; i < m_layers.size(); i++)

      {     numNode += (int)m_layers[i];

      }

     

      outVals.resize(numNode);

      //set input val to the fisrt

      for (int  i = 0 ; i < inVals.size() ; i++ )

            {outVals[i] = inVals[i];}

 

      //Propagate

      for (int  l = 1 ; l < m_layers.size(); l++ )

      {     //for each out layer

            for (int i = 0; i < (int)m_layers[l] ; i++)

            {     if (l==m_layers.size()-1 && m_OutputMask[i] ==

                                                OUT_OFF)

                  {   outVals[outPos + i] = 0.0;}

                  else

                  {     double v = (m_weights[w] * m_bias); w++;

                        //for each in layer

                        for (int j = 0 ; j < (int)m_layers[l-1];

                                                  j++, w++ )        

                        {     v = v + outVals[inPos + j] *

                                       m_weights[w];

                        }          

                        outVals[outPos + i] = FN(v);                    // transfer function         

                  }

                 

            }

            inPos   = outPos;

            outPos += (int)m_layers[l];

      }

      return true;

}

//******************************************************

//back propagate

void CMLNet::BackPropagate(CVecto& outVals, CVecto& idealVals, double N, double M)

{     int o       = outVals.size()       ;

      int e       = m_errors.size()      ;

      int w       = m_weights.size() ;

 

      for ( int l  = m_layers.size()-1; l>0; l-- )

      {     int Nodes      = (int)m_layers[l];

            int prevNodes  = (int)m_layers[l-1];

            int aw     = w ;//pos of matrix weight after layer

            int ae     = e ;//used when caculate non last layers

            e         -= Nodes;//go to fisrt node in this layer

            o             -= Nodes              ;

            w             -= (prevNodes+1)*Nodes;

            int po         = o - prevNodes      ;

            int nw         = w                        ;    

 

            for (int i = 0 ; i<Nodes ; i++)

            {     if (l==m_layers.size()-1)

                  {     if (m_OutputMask[i] == OUT_OFF ) continue;

                        m_errors[e + i] = (idealVals[i] - outVals

                                   [o+i])* FD(outVals[o+i]);     

                  }

                  else 

                  {     //compute error for each node

                        //of non last layers

                        double      dsum = 0.0;

                        int         dw   = aw+i+1;

                        for (int j=0 ; j<(int)m_layers[l+1]; j++)

                        {     dsum += (m_errors[ae + j] *

                                       m_weights[dw]);

                              dw    += (Nodes + 1); // skip bias      

                        }

                        m_errors[e + i] = dsum * FD(outVals[o+i]);

                        //m_errors.SetVal(e + i,dsum );

                  }

 

                  if (m_errors[e+i] == 0) continue;

                  // bias

                  m_deltas[nw] = Delta(N, M, m_deltas[nw],

                                 m_errors[e+i], m_bias);

                  m_weights[nw]= m_weights[nw] + m_deltas[nw];

                  nw++;

                  for(int j = 0 ; j< prevNodes ;j++,nw++)

                  {     m_deltas[nw] = Delta(N, M, m_deltas[nw],

                                     m_errors[e+i],outVals[po+j]);

                        m_weights[nw]= m_weights[nw]+ m_deltas[nw];

                  }

            }

      }

}

 

//******************************************************

void CMLNet::GetErrorWeight(CVecto& eVals,CVecto& wVals)

{     int i,j,k;

      eVals.resize((int)m_layers[1]);

      wVals.resize((int)m_layers[0] * (int)m_layers[1]);

      for (i = 0 ; i< eVals.size() ;i++)

            eVals[i] = m_errors[i]; 

     

      for (i=0,k=0 ; i< (int)(m_layers[1]) ;i++)

      {     for(j = 1 ; j<= (int)m_layers[0] ;j++)

                  wVals[k++] = m_weights[i*(int)m_layers[0] + j]; 

 

      }

}

//******************************************************

void CMLNet::RandomizeWeigths(double lo, double hi)

{     for (int i = 0 ; i < m_weights.size(); i++)

      {     m_weights[i] = lo + ((double)rand() / RAND_MAX) * (hi - lo);}    

}

….

 CRecognitionNetwork class: this class inherit from the CMLNet class and is only used for the character recognition purpose.

#define     HEADER_DATA_FILE_SIZE   20   

#define     INPUT_NUM_POS                 0    

#define     OUTPUT_NUM_POS                1    

#define     INPUT_SIZE_POS                2    

#define     OUTPUT_SIZE_POS               3    

#define     RECORD_NUM_POS                4    

 

class CRecognitionNetwork :public CMLNet

{

     

public:

      int         m_InputRow        ;

      int         m_InputCol        ;

      int         m_OutputRow       ;

      int         m_OutputCol       ;

 

      enum  modeStatus  { modeBusy =0 , modeReady = 1};

      char  m_Status;

 

      bool  Get(unsigned, unsigned z, CVecto& inVal,

                CVecto& idealVal, CImageMatrix* Inputs, CImageMatrix* Outputs);

      bool  LoadDataFile(CString filename,CImageMatrix* Inputs,CImageMatrix* Outputs);

      bool  SaveDataFile(CString filename,CImageMatrix* Inputs,CImageMatrix* Outputs);

      void  SetInputOutput(int InputRow,int InputCol,int OutputRow,int OutputCol);

      bool  Serialize(CArchive& ar);

            CRecognitionNetwork(void);

            ~CRecognitionNetwork(void);

};

 

//---------------------------------------------------------------

CRecognitionNetwork::CRecognitionNetwork():CMLNet()

 {m_Status        =     modeReady; }

//---------------------------------------------------------------CRecognitionNetwork::~CRecognitionNetwork(void){}

//---------------------------------------------------------------

void CRecognitionNetwork::SetInputOutput(int InputRow,int InputCol

                                          ,int OutputRow,int OutputCol)

{     m_InputRow  =     InputRow    ;

      m_InputCol  =     InputCol    ;

      m_OutputRow =     OutputRow   ;

      m_OutputCol =     OutputCol   ;

}

//-----------------------------------------------------------

bool CRecognitionNetwork::LoadDataFile(CString filename,

                      CImageMatrix* Inputs,CImageMatrix* Outputs)

{     bool  nFlag = true;

      CFile* file = new CFile();

      //file couldn't read

      if (!file->Open(filename, CFile::modeRead)) return false;

 

      //HEADER_DATA_FILE_SIZE = number of bytes used to store header file

      WORD* header      = new WORD[HEADER_DATA_FILE_SIZE/2];

 

      //incorrect file

      if (file->Read(header,HEADER_DATA_FILE_SIZE)< HEADER_DATA_FILE_SIZE )

      {     file->Close();

            return false;

      }

 

      if (Inputs->GetMatrixSize() != (int)*(header + INPUT_NUM_POS ) ||

            Outputs->GetMatrixSize() != (int)*(header + OUTPUT_NUM_POS) )

      {     file->Close();

            return false;

      }

 

      int inRecSize     = (int)*(header + INPUT_SIZE_POS) *

                          (int)*(header + INPUT_NUM_POS );

      int outRecSize    = (int)*(header + OUTPUT_SIZE_POS)*

                          (int)*(header + OUTPUT_NUM_POS);

      int numRecords    = (int)*(header + RECORD_NUM_POS);

 

      BYTE* InputBuf = new BYTE[inRecSize];

      BYTE* OutputBuf= new BYTE[outRecSize];

      WORD  val;

 

      for(int i=0 ; i< numRecords ;i++)

      {    

            if ((file->Read(InputBuf,inRecSize))   < (UINT)inRecSize ||

                  (file->Read(&val,sizeof(WORD))) < (UINT)sizeof(WORD) )

                 break;

 

            memset(OutputBuf,0,outRecSize);

            OutputBuf[val] = 1;//ON_OUTPUTVAL;

 

            if (!Inputs->AddImageMatrix(InputBuf)) break;

 

            if (!Outputs->AddImageMatrix(OutputBuf))

            {     Inputs->RemoveAt(Inputs->GetCurrentImage());

                  break;

            }

      }

 

      file->Close();

 

      delete        file            ;    

      delete [] header  ;

      delete [] InputBuf      ;

      delete [] OutputBuf     ;

      return true;

}

//-----------------------------------------------------------------------------------

bool CRecognitionNetwork::SaveDataFile(CString filename,

              CImageMatrix* Inputs,CImageMatrix* Outputs)

{

      CFile* file = new CFile();

      //file couldn't read

      if (!file->Open(filename, CFile::modeCreate |

                                CFile::modeReadWrite|

                                CFile::shareExclusive)) return false;

 

      //HEADER_DATA_FILE_SIZE = number of bytes used to store header file

      WORD* header = new WORD[HEADER_DATA_FILE_SIZE/2];

      memset(header,0,HEADER_DATA_FILE_SIZE);

 

      int inRecSize     = Inputs->GetMatrixSize()  ;

      int outRecSize    = Outputs->GetMatrixSize() ;

 

      *(header + INPUT_NUM_POS      )     = (WORD)inRecSize;

      *(header + OUTPUT_NUM_POS     )     = (WORD)outRecSize;

      *(header + INPUT_SIZE_POS     )     = (WORD)1;             

      *(header + OUTPUT_SIZE_POS    )     = (WORD)1;

      *(header + RECORD_NUM_POS     )     = (WORD)Inputs->GetImageCount();

 

      file->Write(header,HEADER_DATA_FILE_SIZE); 

      WORD  val;

      BYTE  *pOut;

 

      for(int i=0 ; i< Inputs->GetImageCount();i++)

      {

            Inputs->SetCurrentImage(i);

            Outputs->SetCurrentImage(i);

           

            val  = 0;

            pOut = Outputs->GetCurrentBuf();

           

            for(int j=0; j<outRecSize;j++){

                  if (*pOut++ == 1){//ON_OUTPUTVAL){

                        val = j;

                        break;

                  }

            }

 

            file->Write(Inputs->GetCurrentBuf(),inRecSize); 

            file->Write(&val,sizeof(WORD)); 

      }

      file->Close();

      delete file;           

      delete [] header;

      return true;

}

//-----------------------------------------------------------------------------------

bool CRecognitionNetwork::Get(unsigned, unsigned z,

                   CVecto& inVal, CVecto& idealVal,

                   CImageMatrix* Inputs, CImageMatrix* Outputs)

{

      static int pRecPos      = 0   ;     //pointer to pos of Rec to read

     

      if (Inputs->GetImageCount()   == 0 ||

            Outputs->GetImageCount()== 0 )      return false;

 

      if (pRecPos >= Inputs->GetImageCount()) pRecPos = 0   ;

           

      Inputs->SetCurrentImage(pRecPos)                ;    

      Outputs->SetCurrentImage(pRecPos)               ;

      BYTE* InputBuf  = Inputs->GetCurrentBuf() ;

      BYTE* OutputBuf = Outputs->GetCurrentBuf()      ;

 

      //at present m_pDataBuf have type if byte , but it can change

      //when the size of each input value or out input value change

      for (int i  = 0 ;i  < inVal.size() ; i++ )

            {

                  inVal[i] = (InputBuf[i] == Inputs->PIXEL_ON)? 1 : 0;

            }

      for (int id = 0; id < idealVal.size(); id++)

            {    

                  idealVal[id] = OutputBuf[id];

            }

 

      pRecPos++;

      return true;

}

….

CImageMatrix class: manage the training data set. See the source code for more detail.

SOME APPLICATIONS

Address readers

Form readers

Check readers

A check reader captures check images and recognizes courtesy amounts and account information on the checks. Some readers also recognize the legal amount on checks and use the information in both fields to cross-check the recognition results. An operator can correct misclassified characters by cross-validating the recognition results with the check image that appears on a system console.

REFERENCE

  1. Links on the Links page (Feedforward Backpropagation Notes).

  2.  "An Introduction to Computing with Neural Nets" by R. P. Lippmann, IEEE ASSP Magazine, April 1987

  3. Neural Networks at http://www.willamette.edu/~gorr/classes/cs449/intro.html

  4. Off-line Handwriting Recognition Using Artificial Neural Networks, Andrew T. Wilson, University of Minnesota, Morris Morris, MN 56267

  5. Hierarchical Character Recognition And Its Use In Handwrittenword/Phrase Recognition by Jaehwa Park, A dissertation submitted to the Faculty of the Graduate School of the State University of New York at Buffalo

  6. Handwritten Numeral Recognition: CARE - Computer Aided Recognition, László Török1, Tibor Katona, József Vörös and Endre Jóföldi Kandó Polytechnic of Technology, H-1431 Budapest, P.O.B. 112, Hungary

  7. Implementation of an Algorithm for the Automatically Number Recognition of Bus Numbers by Murat Cetin Matriculation Number: 143201

  8. Representation and Recognition of Handwritten Digits Using Deformable Templates, Anil K. Jain, Fellow, IEEE, and Douglas Zongker

  9. A Neural Network for Real-World Postal Address Recognition, by Michael Blumenstein and Brijesh Verma, School of Information Technology, Faculty of Engineering and Applied Science Griffith University, Australia

  10. Methods for Enhancing Neural Network Handwritten Character Recognition, by M. D. Garris, R. A. Wilkinson, and C. L. Wilson, International Joint Conference on Neural Networks, Volume I IEEE, Seattle, July 1991

  11. Some other internet resource.