Reviews & Opinions
Independent and trusted. Read before buy Team International RC5!

Team International RC5


Bookmark
Team International RC5

Bookmark and Share

 

Team International RC5About Team International RC5
Here you can find all about Team International RC5 like manual and other informations. For example: review.

Team International RC5 manual (user guide) is ready to download for free.

On the bottom of page users can write a review. If you own a Team International RC5 please write about it to help other people.
[ Report abuse or wrong photo | Share your Team International RC5 photo ]

 

 

Manual

Preview of first few manual pages (at low quality). Check before download. Click to enlarge.
Manual - 1 page  Manual - 2 page  Manual - 3 page 

Download (French)
Team International RC5, size: 17.8 MB
Download (English)
Check if your language version is avaliable.
Most of manuals are avaliable in many languages.

 

Team International RC5

 

 

Video review

Stupit 1991

 

User reviews and opinions

<== Click here to post a new opinion, comment, review, etc.

Comments to date: 9. Page 1 of 1. Average Rating:
lstrohe 9:44am on Saturday, October 9th, 2010 
According to HTC executive, Cheng Hui-ming, HTC Touch Diamond is the most important product for HTC this year. I am completely satisfied with a Windows OS. The phone operates through 2-3.5G connections, has wi-fi, bluetooth, GPS.
Johnny-B 8:46pm on Monday, September 20th, 2010 
This phone is really a mixed bag! I Purcharsed this phone a year ago from sprint because my two year agreement with my palm centro was up.
mp.green 3:51am on Tuesday, August 17th, 2010 
Having used both this and the iPhone 3G (I switched from AT&T to T-Mobile for the G1), I have to say that I miss the iPhone. The HTC touch Diamond is a new phone by HTC. I have used this phone for awhile, but decided to ditch it for the HTC Touch Pro.
diegofavaro 11:57pm on Thursday, June 24th, 2010 
I am extremely happy with phone and some of the problems mentioned above I feel are just getting used to using the phone... The worst phone I have ever had. Once you pass over the frustrations and you adapt to it, the gadget becomes quite funny.
rok2774 2:27pm on Thursday, May 27th, 2010 
The HTC touch Diamond is a new phone by HTC. I have used this phone for awhile, but decided to ditch it for the HTC Touch Pro.
Muz 2:08pm on Thursday, May 6th, 2010 
i want to like this phone. i played with one in the T-mobile store. i love google and expect android to be awesome. Presents a modern 2.8-inch touch screen housed in an impressive body of brushed steel and impeccably faceted edges.
lypxue 5:51am on Thursday, March 18th, 2010 
Not for heavy use unless you have spare battery or you can keep it plugged in. Jack of all trades but definitely masters none well.
rabscuttle 7:08pm on Monday, March 15th, 2010 
I notice in the specifications page that it says that this phone is on the 900/1900/2100MHz GSM bands, and the 850/2100MHz UTMS bands. Overall the Diamond has grown on me to become a very strong device. It functions as a solid phone, browser, messenger. I love this phone, I am a previous Blackberry owner and it does keep me happy that i got this instead of any of the new BB out there.
openoffice2006 6:46pm on Thursday, March 11th, 2010 
I do love this phone, despite a few shortfalls. Some of the reviews are misleading. Overall I believe this phone is worthwhile getting if you can look past the minor flaws that this phone contains.

Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.

 

Documents

doc0

Issue 1, Volume 2, 2008

Design and Analysis of Various Models of RC5192 Embedded Information Security Algorithm

Omar Elkeelany

AbstractThis article presents the design and analysis of various hardware reconfigurable models of RC5 Encryption algorithm. The original contribution herein is to determine the effects of loopunrolling design concept on improving the encryption performance. We show how we determined the optimal design value of the number of unrolled loops to implement the RC5 algorithm using 192-bit encryption key. The various models tested were based on singlecustom processor with no-loop-unrolling and with various sizes of loop unrolled implementations. In this research, various performance measures were considered. Namely, these were; the maximum frequency of operation, circuit size, throughput and energy consumption. To achieve proper comparison results, all models were implemented in the same hardware reconfigurable chip, a Field Programmable Gate Array (FPGA). The performance metrics of each model were evaluated to determine the best hardware model. Verilog hardware description language was used to model and test all implementations. Results revealed that while no-loop-unrolling provided the least circuit size, the 3-loop-unrolled approach provided the highest encryption throughput. Further, a throughput speed up of 24% was achieved as compared to a reference system implemented in a similar target device using a Xilinx FPGA family. Comparing our implementations on the same Altera FPGA family, a maximum throughput speed up of 50% was achieved. These results provide a much better ground for applications involving high performance embedded data security, such as in military communications, nuclear digital instrumentation and control, and portable biomedical devices.
Keywords Cryptography, FPGA design and analysis, RC5 encryption, Loop-unrolling I. INTRODUCTION
ecure data transmission over unreliable medium is continuously gaining higher importance. It demands improvements in the performance of existing encryption algorithms. This is particularly needed in a wide range of applications including: virtual enterprise security [1] [2], portable network devices [2], and visual cryptography [3]. Even though the software implementation of encryption algorithms has the advantages of portability, flexibility, and ease of use, it provides a limited physical security and agility
Manuscript received October 30, 2008: Revised version received December 16, 2008. This work was supported in part by the Center of Manufacturing Research and the Department of Electrical and Computer Engineering at Tennessee Technological University. Omar Elkeelany is Assistant Professor of Electrical and Computer Engineering at Tennessee Technological University, Cookeville, TN 38505 USA (phone: 931-372-3677; fax: 931-372-3436; e-mail: OElkeelany@ TnTech.edu).
compared to hardware implementations. Major advantages that lead to the hardware implementation include less power consumption, small circuit size, hardware reconfigurability, cost efficiency, high operating speed and security. Symmetric cryptosystems are based on algorithms in which identical keys are used for encryption and decryption [4]. The secret key used for encryption/decryption should be known only to the legitimate senders and receivers in order to protect data. Symmetric key algorithms can be further divided into block ciphers for fixed transformations on plain-text data, and stream ciphers for time varying transformations. Block ciphers are the most basic type of ciphers and operate on the principle of encrypting/decrypting fixed size blocks. The size of the block is algorithm specific. For example, the Advanced Encryption System (AES) operates on a block size of 128 bits, Data Encryption Standard (DES) [5] works on a block size of 64 bits. But, Rivest Cipher 5 (RC5) does not have a fixed block size; it can be 32, 64, or 128 bits. Thats why the Wireless Application Protocol (WAP) forum for example, specifies RC5 as its encryption algorithm for its Wireless Transport Level Security (WTLS) clients and servers [6]. Any particular RC5 algorithm is represented with the notation of RC5-w/r/b, where w/r/b are reconfigurable parameters. W is the word size in bits, r signifies the number of rounds and, b signifies the number of bytes in the secret key. The parameters w/r/b are configured such that the algorithm gives maximum security [7]. RC5 was originally developed for the software implementation, but fits hardware implementation as well [8]. In general ciphers implemented in software are not efficient when compared to their implementations in hardware [9]. The hardware implementation of the RC5 algorithm can be done using different approaches. Various research efforts exploited the System on Chip (SoC) implementation or hardware implementation of RC5 algorithm to provide enhanced security architecture when compared with the conventional architecture [10] [11]. Enhanced performance measures include: less power consumption, allocation of resources, re-configurability, architecture efficiency and cost efficiency [12]. The potential features of Field Programmable Gate Arrays (FPGA) implementation is that it allows SoC modeling. The performance evaluation of different hardware models of RC5 algorithm will be done with FPGA as the target technology. As stated before, research has exploited RC5 hardware implementations for many advantages such as less power

INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND INFORMATICS
consumption, reduced circuit size, reconfigurability, cost efficiency, improvement in operating speed, and gaining extra security [13], [14]. The work of Skalvos [8], [15], [9] proposed hardware architecture of the RC5 core into a FPGA device with fewer resources than the conventional implementation by introducing the use of shared resources. However, the encryption throughput it reported was found less than that of the conventional architecture [8]. In cases where speed is more desirable than circuit size, more research is needed to investigate design choices to achieve the best possible encryption rate without demanding a huge increase in the circuit size. Specifically, it is not obvious how loop unrolling design choice affects the performance of the algorithm as well as the effect on circuit size. This paper also studies the effect of the loop unrolling technique, on implementing RC5 on reconfigurable hardware. Loop unrolling is a used generally in system design to improve throughput and optimize critical parts of the system by duplicating hardware components [16]. The rest of the paper is organized as follows. Section 2 provides an overview of the RC5 encryption algorithm. Section 3 presents design methodologies with focus on loopunrolling technique. Section 4 provides synthesis results and analysis. The paper then concludes in section 5, with insight to future work. II. RC5 ALGORITHM OVERVIEW AND CONVENTIONAL ARCHITECTURE As mentioned briefly in section 1, RC5 is a parameterized symmetric encryption algorithm. RC stands for Rivest Cipher, or alternatively, Rons Code [7]. RC5 parameters are; a variable block size, a variable key size and a variable number of rounds. Allowable choices for the block size are 32, 64 and 128 bits. The number of rounds can range from 0 to 255, while the key size can range from 0 bits to 2040 bits in size. RC5 has three modules: key expansion, encryption and decryption. It is the latest in a family of secret key cryptographic methods; RC5 is more secure than RC4 [17] but is slower. Generally, implementing ciphers in software is not efficient based on its speed in terms of computation and hence the use of hardware devices is an alternative [9]. The RC5 algorithm uses three primitive operations and their inverses. These are: (1) Addition/subtraction of words modulo 2W, where w is the word size. (2) Bit-wise exclusive-OR of words denoted by XOR. (3) Rotation: the rotation of word x left by y bits is denoted x<<<y. The inverse operation is the rotation of word x right by y bits, denoted by x>>>y. In the key expansion module, the password key K is expanded to a much larger size in a key table S. The size of table S is 2(r+1), where r is the number of rounds [7]. The encryption process takes a plaintext input and produces a cipher-text as the output. The key-expansion process must have already been performed before this process. The

decryption process takes a cipher-text as the input and produces a plaintext as the output. In general, the same plaintext block will always encrypt to the same cipher-text when using the same key in a block cipher whereas the same plaintext will encrypt to different cipher-text in a stream cipher [18]. Both processes use the expanded key along with segments of the input message to produce their outputs. The conventional architecture of RC5, shown in fig. 1, performs the encryption and decryption tasks in two separate cores. The RC5 Core, shown in the figure, needs to read the expanded key, in a sequential way, in order to encrypt the plain text given to the core. As shown in the figure, the RC5 Core needs to read the expanded key, in a sequential way, in order to encrypt the plain text. This gives a chance for un-authorized users, if they have access to the system to tap and record the memory contents. Later on another site they can use their recordings to decrypt the ciphered text. What is worse, if they have a physical access to the system they will also have access to the user-secret key. A basic way to avoid such attack is to physically secure the system, which might not be enough, especially if the user does not intend to change the secret key very often. Our system architecture avoids the damages caused by this type of attack and also gives an improved performance in terms of throughput by making use of a three loop-unrolling technique in its design as discussed in section 3. One can use the following equation to calculate encryption throughput:

Th( Mbps) =

b b * MIPS , = I *C I
where b is the block size in bits, I is the number of software instructions needed per block and C is the cycle time (in micro seconds). It is noteworthy that using this equation, one finds that other encryption algorithm (i.e. 3DES algorithm) may have an encryption throughput of 4 Mbps at 500 MIPS
Fig. 1 The Conventional Architecture of the RC5
(million instructions per second) rate [19]. Table 1 lists all RC5 basic operations and their equivalent simple ones, using the same virtual machine used before. In this table, one finds that RC5-32/16/16 needs only 160 instructions to run 16 rounds. Figure 4 shows the computed encryption time in micro seconds as function of input packet size in blocks of 64 bits. Using eq(1), this yeilds a throughput of 200 Mbps (Th=64*500/160). It is clear from this comparison that RC5 outperforms more complicated encryption algorithms (of equal key length) operating at the software level. In section 3, we will show that use of a three loop-unrolling technique at the hardware level on RC5 can improve this throughput at much slower processing rates. III. DESIGN METHODOLOGIES Schematic based design methodology is a conventional hardware design approach that is been recently replaced by hardware description language based (HDL) methods. As the architectural complexity increases, schematic design methodology becomes no longer a feasible technique [12]. Coming to the language based design tools, which are solely dependent on the synthesis tools, a rapid improvement in synthesis tools has resulted in choosing the HDL based design methodology. In this research, Verilog HDL is used to model the various RC5 design methods. Herein, the two investigated hardware models are: 1. Soft-core general purpose processor based and 2. Single-custom processor based The soft-core general-purpose processor based model yields a programmable device that can be used in a variety of applications. Hence, it offers high flexibility, but typically at the cost of design size, and power consumption. The single-custom processor based model, on the other hand, executes a single program, or has a custom hardware to perform a single function. It is used to achieve low circuit size, low power, and high performance designs. It does not have a programming memory since its function is inherently

integrated in the design. Hence, called single functioned. In order to evaluate the performance of these two hardware models under investigation, there must be certain parameters to be measure in each model. The parameters that best suit to evaluate the performance of these hardware models are: maximum frequency of operation, resource utilization, power/energy consumption, encryption throughput, and cost efficiency. After acquiring all the parameters, the two models were compared to determine the best hardware implementation of the RC5 algorithm. A. Design Implementations The first model is designed by using a soft-core general purpose processor. The Nios II soft processor is selected due to its ample features and flexibility; and also it is the most popular soft microprocessor available in the market today [20] [21]. As Nios II is a soft processor defined in HDL, it can be implemented in the FPGA by using the Quartus II design tool [21]. The System-on-a-programmable-chip (SOPC) Builder software [22] was used to add the necessary functional units such as memories, I/O interfaces and timers to the Nios II processor. It is well known that a general purpose processor by itself wont be a useful system [22]. Writing software for this processor is similar to other microcontroller family. The Nios II integrated development environment (IDE) interface is used to write the software for this processor. The hardware abstraction layer (HAL) of the Nios II software serves as a board-support package for Nios II processor systems. The tight integration between SOPC Builder and Nios II IDE allows the HAL system library to be generated automatically. The HAL system library provides a hosted C runtime environment based on the ANSI C standard libraries. It also provides generic I/O devices, allowing user to write programs that access hardware using the C standard library routines. After SOPC Builder generates a hardware system, the Nios II IDE can generate a custom HAL system library to match the hardware configuration [21]. RC5 algorithm that is available in C was used to implement the same in the Nios II and modifications were made to suit for the Nios II processor environment. The second model is implemented using a bottom-up design methodology in Verilog HDL with Quartus II as the synthesis tool. The model was exhaustively tested using ModelSim simulations. The results obtained after the simulation are compared with the results of RC5 algorithm written in C language. For both the models the registration of the secret key into the system was done with a sequence of short pulses or in parallel feed depending on the device support. B. RC5 Loop Unrolling Discussion in Single Custom Processor As mentioned earlier, loop unrolling is a technique used in system design to improve the throughput and optimize its critical parts. In computer programming languages, loop unrolling mechanism is done to instructions that are called in

Step #

Block Operation

# of rounds

Equivalent Operations# per round 2 Total (T)

Total 32 160

1&3&3&6
32-bit XOR 32-bit data dependent shift left 32-bit modulo Addition 32-bit data dependent shift right 32-bit one dimensional table lookup
Table 1 RC5 operations per block
multiple iterations of the loop by combining them into a single iteration [16]. We propose the use of this technique to improve the throughput of the RC5 core. Specifically, we use this technique to unroll the r rounds of encryption task into duplicate hardware components. In no-unrolling approach every block of data requires a number of (Cc) clock cycles to complete the entire encryption process. If a round of encryption task is done for every clock cycle, so to perform r rounds of encryption task, a circuit needs r clock cycles. Instead of performing the encryption task once per clock cycle we can use loop-unrolling approach to cut short the number of cycles required. The unrolling of r rounds of encryption task can be done in multiple ways by considering the integer factors of r. When r=15, we can have a total of 5 cycles with 3 loops-unrolled and performed per clock cycle. Or, we can have 3 cycles with 5 loops unrolled per cycle. Finally, we can have 15 loops of encryption done in a single clock cycle. When compared to no-loop-unrolling approach, the first approach requires only five clock cycles, which means cutting the number of clock cycles required to one third. Similarly second approach requires five times less clock cycles and the third approach requires r times less clock cycles for encrypting a block of data. Fig. 2 shows the pictorial view of the loop unrolling technique, for number of rounds, r=15. The penalty for this approach will be the increase in chip size. This is because, when a loop unrolling is implemented in hardware, the unrolled loop of the task is duplicated as extra hardware on the chip. So with the increase in number of unrolled loops, the size of hardware increases. Loop-unrolling requires more logic elements compared to the no-loop-unrolling. This will be seen in detail in the next section. IV. VERIFICATION, SYNTHESIS AND ANALYSIS RESULTS Both investigated models were designed, simulated and synthesized using Altera Quartus II development tool. The simulation results were observed for the functional correctness of the algorithm. Fig. 3 shows the simulation result of Single custom model of RC5. After the functional verification is done the code was synthesized, placed and routed, and re-simulated to check whether the implementation is successful or not. The two models were implemented using the same hardware target [12]. The selection of a target device depended on several requirements like available clock speeds, number of I/O pins, on chip memory, display interfaces (LEDs, LCD), etc. The DE2 development board, which has the Cyclone II EP2C35F672C6 FPGA and EPCS16 serial configuration device, is selected as the target device. The Cyclone II device family has the features of high-density architecture with 4,608 to 68,416 logic elements, embedded multipliers, advanced I/O support, flexible clock management circuitry, etc [23]. We used Verilog HDL to implement both conventional (for comparison) and the proposed architectures. Various code implementations were synthesized using Altera FPGA development tools. The result obtained after modeling the

Fig. 2 Mechanism of Loop Unrolling (no unrolling, 3 loops unrolled, 5 loops unrolled). Total number of loops =15. algorithm tallies with the result obtained from the original code in C, which models the conventional architecture. Fig. 4 shows the circuit used for providing plain-test input, and Fig. 5 shows the target FPGA board with input test circuit. Fig. 6 shows two samples of in-circuit verification for plain-text input 0x98721827-BE7B1E6F provided to RC5-32/15/16. An output is verified on the seven-segment display to show valid values of 0xD56280F6 as first word and 0xE836330E on the second. A. Throughput Analysis of Embedded General Purpose Processor For throughput analysis, we started with a general purpose processor model embedded on the FPGA and implemented the RC5 in a high level programming language (C code). An embedded processor (NIOS II/s) system was used to execute the C code, which was almost same for both RC5-32/15/16 and RC5-32/15/24 versions. We then inserted timing calculation functions to measure throughput. A 50 MHz clock was the on-board system clock, so the maximum operating frequency could not exceed 50 MHz. Table 2 summarizes the resource utilization of the general purpose processor model. This model implementation utilizes multipliers, random access memory (RAM), and PLL blocks. The embedded multipliers are used to speed up multiplier intensive applications. The processor need for memory is satisfied by employing embedded memories, which consists of M4K (4Kbit) blocks that can be configured to provide various memory functions like RAM, first-in-first-out (FIFO), or read only memory. In addition, potential timing problem can rise from clock skew,
Fig. 3: RC5 Simulation waveforms using blank plaintext for known correct cipher output of 0xb7c4b44a-9faa44d8 which leaves external memory (i.e., SDRAM) inaccessible. Hence, Phase Lock Loop module (PLL) was used. Throughput calculation of this model involves the calculation of number of cycles required to perform the encryption operation. High resolution Altera timing functions were used. Of these functions available, alt_timestamp_start() function can be used start the counter and a call to alt_timestamp() will provide the value of timestamp counter. These two functions were used to obtain the number of clock pulses required to complete the encryption process. The number of cycles required was found to be 1529 for both the versions of RC5. Using Fmax of 50 Mhz, Throughput was calculated using:

Throughput = Fmax * (64/Cc).
Fmax is the maximum frequency of operation, 64 is the block size, and Cc is the number of clock cycles required to encrypt one block. So, Throughput = 50x106 * (64/1529) =2.1 Mbps
Fig. 4: Parallel input test circuit for in-circuit verification
Fig 5: Target FPGA board, with input test circuit
When we compare this result with discussion presented in section 2, we find that this embedded processor runs on roughly 5 MIPS, which is so slow to perform high performance. However, this solution offer fastest time to market design, and minimal design cost. The extra resources like multipliers, RAM and PLL are not major contributors to cost, especially in bulk production. The bottom line is; the performance is poor at the advantage of low cost. B. Throughput Analysis of Single Custom Processor As explained in section 2, RC5 is a block cipher that works on two-word input and two-word output blocks. The number of clock cycles required for encryption is dependent on the parameter r. No-loop unrolling (conventional) implementation needs 15 clock cycles for encrypting a data block of 64 bits (two words). The throughput is calculated by (2). As the number of loops unrolled increase, it is intuitive that the longest path delay of the circuit increase as more components are connected in sequence, and fewer resources are shared iteratively. Consequently, the maximum clock frequency of operation decreases. At the same time, circuit size increases, since more components are needed. But, as the maximum clocking frequency decreases, the total number of cycles needed is also reduced. Hence, it is not obvious, how does the size of unrolling (the number of unrolled loops) affect the overall throughput of the system. There exists an optimal value which achieves the highest rate of encryption (i.e., the highest throughput). In this section we study the effect of loop-unrolling on RC5 by developing multiple Verilog HDL implementations to a same target, Cyclone II- EP2C35F672C6 FPGA device (for fair comparisons). Fig. 7 summarizes the throughput for RC5 implementations using no-unrolling and the loop-unrolled implementations and is used in calculating Throughput per Logic Elements (TPLE). TPLE is calculated by dividing the throughput with its respective number of Logic Elements (LEs) required for each implementation. The total cost of implementation is proportional to the number of LEs. Because, if the number of LEs required increases then the total area occupied by the circuit will increase, resulting in increase in production costs. So we have to select an implementation that gives optimum performance and requires least number of LEs. TPLE is a measure of the circuits cost. If the TPLE is more, it indicates that the logic elements of the implementation are used efficiently to increase the throughput. Table 3 compares the FPGA resource utilizations of both RC5 models based on single-custom processor and general-purpose processor.

(b) Fig 6: Correct in-circuit verification for plain-text input 0x98721827-BE7B1E6F provided to RC5-32/15/16. An output is displayed sequentially on (a) first word (b) second
FPGA Resource Logic elements Registers Combinational Elements Logic array blocks I/O pins Clock pins Total fan-out Average fan-out Embedded multipliers PLLs M4Ks Total RAM block bits
Utilization 3,331 / 33,216 (10 %) 319 / 2,/ 475 8/3./ 70 1// 105 (14 %) 69,120 / 483,840
Table 2: Resource utilization for general-purpose processor
unrolling implementation provides lower throughput when compared with the 3-loops unrolled approach. So, 3-loops unrolled approach provides a better alternative as it meets the throughput requirements at an affordable cost. Fig. 8 shows the throughput vs. the number of loops unrolled for RCand 24 byte key sizes. As shown from the figure, the best throughput (highest) is associated with the rightmost horizontal side of the graph. It is clearly shown that 3-loops unrolled outperforms 5 and 15-loops unrolled. The reason behind that is the extra increase in the propagation delay of higher unrolled loops, downgraded the value of Fmax by a rate grater than the improvement of reduced value of required clock cycles Cc, see (2). To illustrate how this was in effect, table 4 shows the longest path delay and the respective value of Fmax on the four loop-unrolling models for both versions of RC5 encryption 32/15/16 and 32/15/24. The value of Cc parameter required for these models are 15, 5,3 and 1 respectively. In addition, the achieved encryption throughput of 3-loops unrolled is higher than that reported of the related work of 207 Fig. 7 Throughput per Logic Elements for various RC5 Mbps using a Xilinx VirtexII-1000 FPGA device [8]. Hence, implementations 3-loops unrolled offered a maximum throughput improvement of 24% compared to the work related. Comparing In other words, the higher the TPLE the higher the cost implementations on the same Altera Cyclone IIefficiency will be. Even though the circuit is cost efficient EP2C35F672C6 FPGA, a maximum throughput speed up of there is no guarantee that performance of the circuit will be 50% is achieved for 192 bit key. (i.e., comparing throughput good. So, we should consider the performance as well as the speedup of 3-loops unrolled to no-unrolling in Fig. 7) cost efficiency into consideration when choosing any approach to implement on hardware. Fig 7 indicates that 15-loops unrolled approach has the least TPLE, which means that its cost will be high, whereas the no-unrolling approach and 3rounds unrolled approach indicate less cost among the listed implementations. Obviously, the cost of manufacturing nounrolling is less due to its small circuit size. However, noFPGA Resources Registers PLLs Total memory bits Logic array blocks I/O pins Clock pins Maximum fanout Total fan-out Average fan-out Embedded multipliers Logic elements Single-Custom Processor 0/483,840 330/2076 307/475 5/24520 3.68 0/70 5050/33216 Soft-core Processor 46592/483,840 319/2076 365/475 8/20469 3.77 4/70 3184/33216

Table 3: Resource utilization comparison of single purpose (SCP) Fig. 8 Throughput vs. Number of Rounds Unrolled in RC5 implementations using 128 and 192 bit key and soft-core general-purpose processor (GPP)
Parameter Fmax Longest Path Delay (or clock period) Worst Case Setup (tsu) and Hold (th) times
RC5-32/15/16 42.15 MHz 23.72 ns tsu = 8.519 ns th = 4.206 ns (a) RC5-32/15/16 20.03 MHz 49.92 ns tsu = 9.920 ns th = 4.104 ns (b) RC5-32/15/16 10.48 MHz 95.41 ns tsu = 9.556 ns th = 4.616 ns (c) RC5-32/15/16 3.22 MHz 310.55 ns tsu = 13.224 ns th = 4.815 ns (d)
RC5- 32/15/24 40.09 MHz 24.941 ns tsu = 9.605 ns th = 4.198 ns
RC5- 32/15/24 20.03 MHz 49.92 ns tsu = 9.920 ns th = 4.104 ns
RC5- 32/15/24 10.38 MHz 96.33 ns tsu = 9.110 ns th = 4.695 ns
RC5- 32/15/24 3.19 MHz 313.47 ns tsu = 13.424 ns th = 4.739 ns
Table 4 Timing Analysis of (a) no loop unrolling (b) 3, (c) 5, (d) 15 loop unrolling respectively V. CONCLUSION We presented two hardware models of the RC5 algorithm and from the summary of the results we can observe both the models have their advantages and disadvantages. In this paper, various hardware implementations of RC5 algorithm were presented using variable sizes of loop unrolling technique, and implemented on FPGA. The performance evaluation suite involved calculating maximum frequency of operation, circuit size (in terms of the number of Logic Elements, power consumption, throughput computation, and cost efficiency. With the aid of performance evaluation Approach No-unrolling 3 loop-unrolling 5 loop-unrolling 15 loop-unrolling Power (mW) 138.27 140.0 141.79 145.82 Energy (nJ) 51.728 34.944 40.975 45.71
C. Energy Evaluations of Single Custom Processor An estimate of power consumption can be made with FPGA development tools after the synthesis and place-and-route phases. For Altera FPGA devices, the PowerPlay power analyzer tool performs after synthesis power estimation. It calculates a close estimate for power consumption by using inputs from the resource utilization phase (Fitter report), signal activities from the functional simulation and operating condition of the design (junction temperature and board cooling solution settings). One of these features is the toggle rate, which is how often the output changes with respect to the input clock signal [24]. Table 5 summarizes the power and energy consumption generated with a toggle rate of 25% (assuming that toggle rate is same for all implementations). It is shown from the table that the no-loop-unrolling consumes the most energy and that the 3-loops unrolled consume the least energy. This is due to the fact that it experiences the minimum encryption time per block (or highest throughput).

Table 5 Power and Energy consumption for various RC5 models under test
tools, it is concluded that, of all the implemented models, 3loops unrolled approach can be selected to achieve highest throughput. However, only if the user requirements are such that the model should fit in least circuit size with moderate throughput, the no-loop-unrolled can be used. The achieved encryption throughput of 3-loops unrolled is higher than that reported of the related work of 207 Mbps using a Xilinx VirtexII-1000 FPGA device [8]. Hence, 3-loops unrolled offered a maximum throughput improvement of 24% compared to the work related. Comparing implementations on the same Altera Cyclone II- EP2C35F672C6 FPGA, a maximum throughput speed up of 50% is achieved for 192 bit key. It was shown that, when we integrated the maximum operating frequency (Fmax) into the required number of cycles to operate, the soft-core general-purpose processor was found to be less than 100 times slower than the single custom processor (although it operates at higher frequency). Even though the second model is faster there is price paid for it. It incurs extra time of design, and longer time to market. However, when we compare the resource utilization of the two models, we show that the single custom processor generally uses a lot less resources in terms of multipliers, memory, PLLs, registers etc. So, in high volume production, it can also be cheaper than the first model. Since the resource utilization is less for the second model the manufacturing cost will be less. In future work, the author would like to investigate the effect of loop-unrolling of other encryption algorithms such as RC5 [25], and AES. ACKNOWLEDGMENT The author would like to acknowledge the Masters students: Suman Nimmagadda and Adegoke Olabsi for their thorough research work related to this paper. The author also acknowledges the Center for Energy Systems Research and The Department of Electrical and Computer Engineering at Tennessee Technological University for supporting this research. REFERENCES
[1] M. M. Zanjireh, A. Kargarnejad and Tayebi, M. A. Virtual Enterprise Security: Importance, Challenges and Solutions. WSEAS Transactions on Information Science & Applications, volume 4, no. 4, pages 879 884, 2007. Lee, Tsang-Yean and Lee, Huey-Ming. Encryption and Decryption Algorithm of Data Transmission in Network Security. WSEAS Transactions on Information Science & Applications, volume 3, no. 12, pages 25572562, 2006. Chao, Kun-Yuan and Lin, Ja-Chen. Fault-Tolerant and Non-Expanded Visual Cryptography for Color Images. WSEAS Transactions on Information Science & Applications, volume 3, no. 11, pages 2184 2191, 2006. Schubert, A. and Anheier, W. Efficient VLSI Implementation of Modern Symmetric Block Ciphers. In The ICECS99, , 1999. G. Rouvroy, J. J.-Quisquater, F. X.-Standaert and D.-Legat, J. Efficient uses of FPGAs for implementations of DES and its experimental liner cryptanalysis. IEEE Transactions on Computers, volume 52, no. 4, pages 473482, 2003.

[10] [11]

[13] [14]

[18] [19]

[23] [24]
RSA Security. RSA Security Algorithm. URL http://www.rsasecurity.com/press release.asp?doc-id=172&id=1034 (Accessed: April 23, 2006) L.-Rivest, R. The RC5 Encryption Algorithm.In The 1994 Leuven Workshop on Fast Software Encryption (Springer 1995), pages 8696, 1994. N. Sklavos, C. Machas and Koufopavlou, O. Area Optimized Architecture and VLSI Implementation of RC5 Encryption Algorithm. In The IEEE ICECS 2003, volume 1, December 2003. Sklavos, N. and Koufopavlou, O. Mobile Communications World: Security Implementation Aspects- A State of the Art. Computer Science Journal of Moldova,Institute of Mathematics and Computer Science, volume 11, no. 2, 2003. Olabisi, A. System on Chip Architecture for RC5 with Enhanced Security. Masters thesis, May 2006. N. Sklavos, K. Touliou and Efstathiou, C. Security & Privacy Architectural Modeles: On the Hardware & Software Integration Platforms. WSEAS Transactions on Information Science & Applications, volume 3, no. 5, pages 965971, 2006. A.J. Elbirt, B. Chetwynd, W. Yip and Paar, C. An FPGA Implementation and Performance Evaluation of the AES Block Cipher Candidate Algorithm Finalists. In The AES Candidate Conference 2000, pages 1327, 2000. Olabisi, A. and Elkeelany, O. Integrated design of RC5 algorithm. In The IEEE 39th Southeastern Symposium on System Theory, , 2007. Nimmagadda, S. and Elkeelany, O. Performance evaluation of different hardware models of RC5 algorithm. In The IEEE 39th Southeastern Symposium on System Theory, , 2007. N. Sklavos, A. P.-Fournaris and Koufopavlou, O. WAP Security: Implementation Cost and Performance Evaluation of a Scalable Architecture for RC5 Parameterized Block Cipher. In The IEEE Mediterranean Electrotechnical Conference (IEEE MELECON04), , May 2004. Ken, K. and Randy, A. Optimizing Compilers for Modern Architectures: A Dependence-based Approach- Loop unrolling technique and its advantages and disadvantages. A Morgan Kaufmann, 2001. F. Scott, M. Itsik and Adi, S. Weakness in the Key Scheduling Algorithm of RC4. In The 8th Annual Workshop on SAC, , August 2001. Sessions, J. B. Fast Software Implementations of Block Ciphers. Masters thesis, 1998. Elkeelany O., and Olabisi A., "Performance Comparisons, Design and Implementation of RC5 Symmetric Encryption Core." Journal of Computers, no 1, pages 48-55, 2008. Altera Corporation. Nios Processor. URL http://www.altera.com/products/ip/processors/nios2/ni2-index.html (Accessed: April 23, 2006) Altera Corporation. Nios II Processor. URL http://www.altera.com/literature/hb/nios2/n2cpu nii51004.pdf# (Accessed: April 23, 2006) Altera Corporation. SOPC Design Tool. URL http://www.altera.com/education/univ/materials/manual/labs/tut sopc introduction verilog.pdf (Accessed: April 23, 2006) Altera Corporation. DE2 user manual., 2006.URL http://www.altera.com/ Xilinx Inc. How to calculate toggle rate. URL http://www.xilinx.com/ise/powertools/wpt help/app docs/calculating toggle rates.htm (Accessed: April 23, 2006) Riaz, M. and M.-Heys, H. The FPGA Implementation of the RC6 and CAST-256 Encryption Algorithms. In The IEEE Canadian Conference on Electrical and Computer Engineering, pages 367372, 1999.

[4] [5]

Omar S. Elkeelany received the B.Sc. and M.Sc. degrees in Computer Science and Automatic Control from the University of Alexandria, Egypt 1992 and 1998 respectively. In 2004, he received the Ph.D. degree from the University of Missouri-Kansas City (UMKC) in Engineering and Networking disciplines. While being at UMKC, he served as an adjunct faculty of electrical engineering department. In May 2004, after he received the Ph.D. degree, he joined the research team of Wideband Corporation, where he
worked in the design and development of layer 3 network routers. In August 2005, he joined Tennessee Technological University, Cookeville, TN USA as an Assistant Professor. Dr. Elkeelany is a member of the Institute of Electronic and Electrical Engineers (IEEE), and the Eta Kappa Nu honorary society. He has a distinguished educational record being the recipient of the UMKC Outstanding Doctoral Interdisciplinary Ph.D. Student Award in 2004, the UMKC Chancellors Interdisciplinary Ph.D. Merit Award in 2001-2002 and the UMKC Outstanding Graduate Student Award from the School of Engineering, during 1999, 2000 and 2002. He received his B.Sc. degree with Distinction and degree of honor. In May 2005, Dr. Elkeelany received the Doctor of Research degree from the International Institute of Science and Technology. In 2008, Dr. Elkeelany was recognized as a lifetime member of The Strathmores Whos Who.

doc1

JOURNAL OF COMPUTERS, VOL. 3, NO. 3, MARCH 2008
Performance Comparisons, Design, and Implementation of RC5 Symmetric Encryption Core using Reconfigurable Hardware

Omar Elkeelany1

Tennessee Technological University/Electrical and Computer Engineering, Cookeville, TN Email: oelkeelany@tntech.edu

Adegoke Olabisi

Tennessee Technological University/Electrical and Computer Engineering, Cookeville, TN Email: aoolabisi21@tntech.edu
AbstractWith the wireless communications coming to homes and offices, the need to have secure data transmission is of utmost importance. Today, it is important that information is sent confidentially over the network without fear of hackers or unauthorized access to it. This makes security implementation in networks a crucial demand. Symmetric Encryption Cores provide data protection via the use of secret key only known to the encryption and decryption ends of the communication path. In this paper, first, an overview of two well known symmetric encryption cores is presented, namely the 3DES and RC5. Then a performance evaluation of their computer based implementation is compared to demonstrate the RC5 superior performance. The conventional hardware architecture of the RC5 core is presented and investigated. A hardware system design is proposed to improve its performance. The proposed architecture achieved with three stage pipeline technique an increased encryption throughput as compared to related work. By exploiting modern features in Field Programmable Gate Arrays (FPGA), which allow the modeling of a Systemon-Programmable-Chip (SoPC), this paper proposes a model for symmetric encryption algorithms (e.g., RC5). Structural System analysis of the proposed model shows that it offers extra security against single-site physical access attack that other implementations are vulnerable to. By evaluating the performance of this proposed SoPC model, one finds that it raises the encryption throughput to 300 Mbps. Hence, we report over 80% increase in the encryption throughput as compared to related work. Moreover, our work lowers the implementation cost due to the integration of all system parts into one chip. Index TermsCryptography, Systems-on-Programmable Chips, analysis and simulation, Hardware Description Language
I. INTRODUCTION The efficiency and success of e-commerce and business was fueled by the underlying growth of available network bandwidth. Over the past few years,

1: Corresponding author

internet-enabled business or e-business has drastically improved revenue and efficiency of large scale organizations. It has enabled organizations to lower operating costs and improve customers satisfaction. Such applications require networks which accommodate voice, video and protected data [1, 2]. Obviously, privacy must be protected, to maintain the growth of these applications. However, as todays networks make more and more application to users, they become vulnerable to a wider range of security threats. To prevent those threats and ensure that networks are not compromised, security and privacy must be integrated into network backbone (i.e. switches, routers, servers, etc.). There is a dire need for integrity and confidentiality of information passed across large scale networks. Encryption algorithms play a great role in achieving these needs [1]. Cryptosystems are of two types: symmetric and asymmetric. Symmetric cryptosystems [2-4] use the same key (the secret key) to encrypt and decrypt a message, and Asymmetric cryptosystems use one key (the public key) to encrypt a message and a different key (the private key) to decrypt it. Asymmetric cryptosystems are also called public key cryptosystems. It has however been proven that asymmetric systems are slow to support bulk data encryption [5-7]. Wireless Application Protocol (WAP) forum specifies RC5 [8-11] as its encryption algorithm for its Wireless Transport Level Security (WTLS) clients and servers [12]. The work of [13] proposed an area optimized hardware architecture to the RC5 core into a Field Programmable Gate Array (FPGA) device with fewer resources than the conventional one. But the encryption throughput was found less than the conventional architecture, and it did not propose any modifications to the conventional system architecture. This paper presents integrated RC5 encryption and decryption cores with memory elements, and key-expansion units into one chip. It presents a case study and undertakes a performance comparison between RC5 and 3DES as a low end for symmetric encryption family of algorithms. Various design methodologies will be presented to achieve the

2008 ACADEMY PUBLISHER

optimal utilization of the target chip. Simulation and prototype synthesis results are summarized and discussed in detail. Hardware Description Language is used to model both conventional and proposed architectures. On top of the security add-in that is inherited by the avoidance of external memory use, an important test measure is the speed of operation. Simulation tools are used to verify this speed improvement. II. OVERVIEW OF TWO SYMETRIC ENCRYPTION ALGORITHMS: 3DES & RC5 Symmetric Encryption cores provide security to data by using a secret key both for encryption and decryption processes. Historically, Triple Data Encryption Standard (3DES) in Cipher Block Chaining (CBC) mode was proposed in IPSec ESP network encryption [14] as a symmetric encryption algorithm. Encryption using DES algorithm is the most time consuming process. DES uses a 56-bit short key, and block sizes of 64 bits. The algorithm has 19 distinct steps as shown if figure 1. The first step is a key independent transposition on the 64-bit input block, using a fixed 64-bit permutation table to change bit locations in the input block. The last step is the exact inverse of this transposition. In the pre-output 18th step, a 32 bit SWAP operation is performed. The remaining steps (2 to 17) are functionally identical and are dependent on different portions of the input key. Each of these 16 steps (or 16 rounds) takes two 32 bit inputs, and produces two 32 bit outputs. The left output is a COPY of right input. The right output is from an exclusive-or operation (XOR) of left input and a function f of right input and the step key (Ki, i=1:16). The function f consists of 4 operations. First, a 32:48 bit transposition/expansion of the right input is applied; second a 48 bit XOR of the output with the step key. Then a group mapping is performed to reduce the output size from 48 to 32 bits using two dimensional look-up table of 4 rows and 16 columns for all possible 64 inputs. Lookup tables (Si, i=1:8) are indexed using row and column numbers given by 2 and 4-bits of the 6-bit input respectively. The table entry is a 4-bit value. Also, a 32bit transposition is performed. Each of the 16 steps has a special key (Ki), which is derived from the 56-bit secrete key using a special
TABLE I. DES BASIC OPERATIONS Basic Operation b bit transposition Equivalent simple operations # Times Type Space needed needed One dimensional b b table look-up Multiply Add One dimensional table look-up rows x 16 cols.
Figure 1: DES encryption steps
function. In this function, an initial 56-bit transposition is performed. Before each step, the key is divided into two 28- bit sub-keys, each of which is rotated left using LEFT SHIFT operation. Finally, a 56:48 bit transposition/ reduction is performed. Table 1 lists all DES basic operations along with a set of equivalent simplified instructions for further analysis, and comparison. These simplified instructions are derived from a virtual instruction set of a 32-bit machine capable of performing simple two operand instructions. We assume all instructions of this machine execute in one cycle (i.e. all instructions are of similar complexity). More recently, RC5 algorithm was developed by Ronald Rivest in 1995 [2] as a parameterized symmetric encryption core. RC stands for "Rivest Cipher", or alternatively, "Ron's Code". RC5 parameters are; a variable block size (w), a variable number of rounds (r)

TABLE II DES OPERATIONS IN ONE BLOCK ENCRYPTION Step # 1,19 2-17 2-17 2-17 2-17 2-17 2-2-17 2-Operation 64 bit transposition 32 bit COPY 32 bit XOR 48 bit transposition 48 bit XOR 6:4 bit Two dimensional Table Mapping 32 bit transposition 56 bit transposition 28 bit LEFT SHIFT 48 bit transposition 32 bit SWAP # Times 16 8x1 2x1 Equiv. Total* 64x16 48x3x128 32x32 48xNotes 16 steps f function

KS function

Two dimensional table map (for 6:4 bit map)
Pre-output DES Total 3DES Total
* Using equivalent simple substitutions from table II
and a variable key size (k). Allowable choices for the block size (w) are 32, 64 and 128 bits. The number of rounds can range from 0 to 255, while the key size can range from 0 bits to 2040 bits in size. RC5 has three modules: key-expansion, encryption and decryption units. Relatively, RC5 is more secure than RC4 [15] but is slower in operation. Generally, implementing ciphers in software is not efficient based on its speed in terms of computation and hence the use of hardware devices is an alternative [16, 17]. The RC5 algorithm uses three primitive operations and their inverses. (1) Addition/subtraction of words modulo 2w, where w is the word size. (2) Bit-wise exclusive-or denoted by XOR. (3) Rotation: the rotation of word x left by y bits is denoted by x<<<y. The inverse operation is the rotation of word x right by y bits, denoted by x>>>y. In the key expansion module, the password key K is expanded to a much larger size using an expansion table (S). The size of table S is 2(r+1), where r is the number of rounds [8]. The key-expansion process must be performed before encryption or decryption processes. The encryption process takes a plain text input and produces a cipher text as the output. The decryption process takes a cipher text as the input and produces a plain text as the output. In general, the same plaintext block will always encrypt to the same ciphertext when using the same key in a block cipher whereas the same plaintext will encrypt to different ciphertext in a stream cipher [18]. Both processes use the expanded key along with segments of the input message to produce their outputs. The conventional architecture of RC5, shown in figure 2, performs the encryption and decryption processes in two separate cores. As shown in figure 2, the RC5 Core needs to read the expanded key, in a sequential way, in order to encrypt the plain text. This gives a chance for unauthorized users, if they have access to the system, to tap and record the memory contents. On another site, they can use their recordings to decrypt the ciphered text. What is worse, if they have a physical access to the system they will also have access to the user-secret key. A basic way to avoid such attack is to physically secure the system, which might not be enough, especially if the authorized user does not intend to change the secret key

very often. The proposed system architecture avoids the damages caused by this type of attack and also gives an improved performance in terms of throughput by making use of a three stage pipelining technique in its design.
III. PERFORMANCE EVALUATIONS OF 3DES AND EXISTING RC5 MODELS Although 3DES has a short key (56-bits) it is used her as a low end security algorithm for the sake of comparison. The RC5 with its parameterized feature and variable key lengths is more secure and efficient. It is desirable to compare the performance of these two algorithms, to understand how much effort is needed to achieve the higher levels of security offered by RC5. This section presents performance comparison of the two algorithms using a virtual instruction set of a 32-bits that consists of one cycle, homogenous, two-operand based, reduced simple instructions. Consider, for example, replacing DES with its RC5 equivalent. One reasonable choice of parameters is RC532/16/7 for such a replacement. The input/output blocks will be 32-bits long just as in DES. The number of rounds is also the same (16) although each RC5 round is more like two DES rounds since all data registers rather than just half of them are updated in one RC5 round. Obviously, DES and RC5-32/16/7 each have 56 bit (7byte) secret keys. Unlike DES which has no parameterization and hence no flexibility, RC5 permits security upgrades as necessary. For example one can upgrade the above choice for a DES replacement to a 128-bit key by upgrading to RC5-32/16/16. Extra bits in the RC5 secret key make it less vulnerable to exhaustive search attacks. A. Tripple DES Performance Evaluation Table 2 lists the DES and 3DES computations of total number of simple operations needed. 3DES is the chained form of DES with a chain size of 3. For simplicity, 3DES is assumed to have only 3 times number of operations as DES has. More accurately, 3DES actually requires an extra random Initialization Vector of 8 bytes, which is omitted here. For complete details of DES and 3DES algorithms see [19].
Encryption time (micro sec.) n: Input size (64 bit blocks)
32 S_out Key_In KEY_EXP UNIT RAM 26X32 bits 32 addr_out In32 In1
S(2i) Plain 64 RC5 CORE Cipher

S(1) 32

S(2i+1)
Figure 2:Conventional Architecture of RC5 Encryption System
Figure 3: 3DES Encryption time vs. # of input blocks and processing power
Encryption time (micro sec.)
As shown in table 2, the total number of operations per 64-bit block is 8091 operations. Given a message size of N bits then the number of blocks (n) is given by
n: Input size (64 bit blocks)
N (1) n = . 64 Where. denotes the smallest integer bigger than or
equal to the operand. 3DES time complexity is linear and function of (n), (i.e. O(n) ). Figure 3 shows the computed encryption time in micro seconds as function of (n) in blocks of 64 bits. The results are plotted with a range of operation frequency from 100 to 500 MHz. Since the machine executes one instruction per cycle, the operation frequency is equivalent to million of instructions per seconds (MIPS). One may use this general equation of encryption throughput:

Figure 4: RC5 Encryption time vs. # of input blocks and processing power

Encryption Algorithms

Throghput (Mbps) 50

1. 6 0. 8 2. 4 3. 2 4

Th( Mbps ) =

b b * MIPS , = I *C I

where b is the block size in bits, I is the number of virtual machine instructions needed per block and C is the cycle time (in micro seconds). Consequently, one finds that a 3DES algorithm running on this virtual machine will have an encryption throughput of only 4 Mbps at 500 MHz operation frequency. B. RC5 Performance Evaluation of Existing Models Table 3 lists all RC5 basic operations and their equivalent simple ones, using the same virtual machine used before. In this table, one finds that RC5-32/16/16 needs only 160 instructions to run 16 rounds. Figure 4 shows the computed encryption time in micro seconds as function of input packet size in blocks of 64 bits. Using eq(2), this predicts a throughput of 200Mbps (Th=64*500/160). Hence, the RC5 algorithm is much faster than 3DES, and it is simple to implement both in software and in hardware. In section 4, we will present a hardware implementation of RC5, and show that it delivers a better throughput of 300Mbps using much slower processing rates.

Processing pow er (MIPS)

Figure 5: Encryption Algorithm comparison; Throughput vs. processing power
It is clear from figures 3 and 4 that the computation time for 3DES is much larger than that of RC5, for same input block size. For example, in 16 input blocks, comparison shows that RC5 takes only 2% of the computation time consumed by 3DES. This is further illustrated in figure 5. The conventional RC5 algorithm implemented in hardware reported 247 Mbps while an area optimized hardware realization of RC5 [13] reported a net 207 Mbps. It is well known that the key size of 3DES is too short and there is no easy way to increase it, and hence it is only presented here as a benchmark of comparison. IV. PROPOSED RC5 SYSTEM-ON-CHIP MODEL The proposed architecture, namely the System-onChip (SoPC) Model has three main components: the key_exp unit, where the user secret key is recorded and expanded in size; the internal memory unit where the expanded keys are stored; and the RC5 Core, where the encryption and decryption processes are performed. In this architecture, the memory is part of the chip to avoid memory tapping attacks. This is made feasible using modern FPGA technology, with on-chip memories. This not only has the advantage of added security but also increases speed of operation as it decreases the data transaction latency between the core and the memory. The proposed system uses the RC5-32/15/16 parameters. This means two 32-bit word inputs and outputs, 15 rounds and 16- byte (128-bit) secret key.

TABLE III RC5 OPERATIONS PER BLOCK Equivalent Operations# per round 2 Total (T)
Step # 1&3&3&6
Block Operation 32-bit XOR 32-bit data dependent shift left 32-bit modulo Addition 32-bit data dependent shift right 32-bit one dimensional table lookup

# of rounds 16

Tota l 32 160
The proposed model is made flexible by providing these parameters as inputs to the circuit. Hence they can be modified to suit whatever ones goal is. The choice of r, for instance, affects both encryption speed and security. For some applications, high speed may be the most critical requirement and one could thus choose a small value of r. In other applications, such as credit card transaction, security is the primary concern and speed is relatively unimportant and one could thus go for a larger value of r. This makes the algorithm flexible. The word size also affects speed and security. For example, choosing a larger value of word size, w larger than the register size of the CPU can degrade the encryption speed. It is also unusual and risky to have a fixed set of parameters. Figure 6 below shows the proposed architecture of RC5-SoPC. This architecture requires a registration of the user secret key into the system at a particular timing of a SetKey signal. This registration is done with a sequence of short pulses. However, one step registration may be possible, assuming enough parallel lines for providing the secret key. In any case, the secret key does not stay as input and can change only at the presence of the SetKey pulse. In regular (i.e. encrypt) operation, only Plaintext input blocks are processed, and the RC5 core reads the internal memory logic module for the successful generation of the Ciphertext. V. DESIGN IMPLEMENTATIONS We used Verilog HDL to implement both conventional (for comparison) and the proposed architectures in FPGA. The result obtained after modeling the algorithm tallies with the result obtained from the original code in C, which models the conventional architecture.
Various code implementations were tested using Xilinx design tools. The target hardware is a Xilinx Virtex-II -1000 FPGA. The resource utilizations of the various design methodologies are summarized in Table 4. The conventional iterative RC5 C-based code was transformed into a Verilog HDL code. However, with the selected target FPGA, the design required 10,603 slices of the FPGA, where only 5120 are available. In order to minimize FPGA resource utilization, various design optimizations are researched. Some of the main methodologies followed are listed below: A. Fixed Key Synchronous Encryption Here, each round is performed via a pulse given at the Start signal. Since the number of rounds is 15, fifteen Start pulses were given. The synthesis results have shown that this design fitted into the Virtex-II FPGA. Although the number of registers increased, the number of Adders/Subtractors and logic shifters decreased considerably. The number of required slices of the FPGA was 609. The use of the Start pulse also decreased the synthesis time. B. Variable Secret Key Asynchronous Encryption Since the use of a fixed secret key does not allow for flexibility, the code is modified to accept a variable secret key. The signal Setkey is introduced to the design. At the positive edge of the Setkey signal, 8 bits of the user secret key are registered. Since the secret key is 128 bits (16 bytes), sixteen pulses of the Setkey signal were needed to fully register the keys. Once these pulses were provided, the encryption process follows, provided that an active high level of the Start signal is present. The total number of required slices of the FPGA was 31,791, and hence the design could not be synthesized as it was 620% over mapped. An increase in the number of logic components resulted from making the key a variable input as opposed

Figure 6: Integrating Key expansion and memory logic in one chip with the encryption core
to the fixed key code. Obviously, further code optimizations were needed to reduce the drastic increase in the number of required components. C. Variable Secret Key and Synchronous Encryption Synchronous encryption feature was added to the variable secret. This in essence means that the user secret key is being retrieved before the encryption, followed by encryption rounds performed at the edges of consecutive pulses of the Start signal. By making the secret key a variable input, and performing all rounds of encryption afterwards the synthesis result have shown that the Virtex-II chip was over utilized and thus the design could not fit into the chip. The number of required slices of the FPGA was 27,368, (i.e. 534% over mapped). The design thus needed further optimizations in order to be implemented to target hardware. D. Partial Serial Variable Key Registration (Phase I) The setup task of the algorithm was modified to test the effect of partial serial variable key registration. Conventionally, the setup task iterates 96 times in a forloop fashion. The setup task was modified to perform 32 iterations for each pulse of an introduced Setkey signal. Thus, 3 Setkey pulses yields 96 iterations as needed. With this serial un-winding of the for-loop, we noticed a reduction in required FPGA slices, to 9,953, This was still not synthesizable, as it was 194% over mapped. Further optimization was needed to fit the design fit into the target FPGA. E. Full Serial Variable Key Registration (Phase II) The design was further optimized by performing a full un-winding of the for-loop in setup task, using the Setkey pulse introduced in phase I. Instead of performing 32 iterations for each pulse of the Setkey, 96 pulses were used, each corresponding to a single iteration of the setup task. With this full serial un-winding of for loop, we noticed a reduction in required FPGA slices, to 3,053 which fits the target FPGA, and occupies only 59% of its slices. F. Synchronizing the inputs with External Clock, and using Variable Key, and Single stage Encryption In order to efficiently perform the RC5 tasks, dedicated input clock signal must be used. For testing purposes, the input clock was divided internally, for slower operation, to be able to visually monitor the progress of the states of the design. Unifying all sequential input pulses into one global input clock and using internal counters to progress the design states properly, reduced the total number of required slices by 4%. G. Multiple Stage Pipelines with Synchronous Encryption As the target FPGA implementation of previous design was under utilized, the encrypt part of the design was optimized further to achieve faster performance. To the extent that is possible, the previous implementation was improved by taking advantage of CPU architectural advances such as pipelining [20]. A three stage pipeline is

TABLE IV RESOURCE UTILIZATION AND MAXIMUM DELAY FOR SOME RC5 DESIGN

METHODOLOGIES

Design Methodology Conventional RC5 A. Fixed Key Synchronous Encryption E. Full Serial Variable Key Registration (Phase II) F. Synchronizing the inputs with External Clock, and using Variable Key, and Single stage Encryption G. Multiple Stage pipelines with Synchronous Encryption
FPGA * Longest Utilization Path Delay 207% 11% 59% 55% 70% N.A. 21.21 ns 21.087 ns 28.539 ns 66.303 n
*not necessarily the critical or dominant path incorporated in the design, where three rounds of encryption are performed in sequence for each input clock pulse, provided that a Start signal is active. The total number of required slices of the FPGA reached 3,618, which is a 15% increase over the single stage encryption method, (i.e., roughly 7.5% increase per extra stage). H. Integrating Design for Test The previously presented design, though fits the target FPGA, does not provide means for tracking the number of pulses given to the encryption process. Two 7Segment display, and internal hexadecimal counters, and encoders were incorporated to the design to improve its testing capability. A slight increase in the FPGA slices was required (i.e. 41 slices, less than extra 1%). This design was tested successfully in simulation before placing and routing it to the target FPGA prototype.
VI. IMPLEMENTATION AND EVALUATION Shown in figure 7, simulation results of null plain text, with non-zero secret key (keystr), and a pre-known and properly matching cipher text of B7C4B44A9FAA44D8 after the 15 rounds (highlighted in the figure is the first word of the cipher text output). Also shown is the duration of the Start signal, which allows for only 5 changes for the cipher text. This is due to the fact that 3 rounds were executed per clock edge (in the 3 stage pipeline). Also shown in the figure the test signals which drive the seven-segments, which illustrate internal state modifications of the design. The final design was implemented and mapped to the Virtex-II 1000 FPGA device (XC2V1000-4FG456C) in the 456 fine-pitch ball grid array package, see figure 8. The power consumption for the final design was found to be 351.3 mW of power. The maximum delay for some of the designs which fitted into the Virtex-II -1000 FPGA is shown in Table 4. The operating frequency is 24 MHz which is provided by the Virtex-II prototype board (a clock period of about 42 ns). This clock period is suitable since the longest path delay of the final design methodology in Table 4 is

66.303 ns. However, 99% of the paths are less than half the clock period. Again, in the final design, one block cipher is produced every 5 clock cycles, while the Start signal is active. Since the block size is 64 bits, we can calculate the encryption throughput as in (3):
Throughut = (64)bits / block (5)cycles/ block ( 1 ) 24x10 cycles/ sec

300Mbps

This encryption throughput is higher than conventional and the related work in [3]. It was reported that 207 to 247 Mbps were achievable using the same FPGA family [3, 4]. Here, faster encryption throughput is believed to happen due to the integration of memory logic as part of the design, and the use of internal three stage pipelines. Doing away with external interfaces, improves overall performance. Moreover, by increasing the clock frequency of the proposed design to 35 MHz, the encryption throughput can reach 450 Mbps using equation in (3). Thus, this work can yield up to an 80% increase over the encryption throughput of conventional architecture and related work. VII. CONCLUSIONS We have presented high performance RC5- integrated architecture with variable key registration, enhanced security and improved encryption throughput. The proposed architecture is synthesized to FPGA device similar to the family of related work for comparisons. The proposed architecture shows an improvement in the speed of operation as compared to the conventional architecture and related work. Moreover, Comparing to conventional architecture, we show that we avoid damages caused by single site physical access attacks. The deliverable encryption throughput of the proposed RC5-SoPC design is from 300 Mbps to 450 Mbps,
Figure 8: Hardware implementation of RC5 Encryption Algorithm
depending on the choice of the clock frequency (i.e. 24 MHz or 35MHz). This makes it suitable to high-speed networks (e.g. Fast Ethernet). Compared to conventional RC5 encryption throughput, we have shown an 80% increase in the achievable encryption throughput. REFERENCES [1] Andrew Mason, Network Security and Virtual Private Networks Technologies, Cisco Press, 2004. [2] K. Hausman, N. Alston, M. Chapple, Kalani K. Hausman, Protecting Your Network from Security Threats, Addison Wesley Professional, Nov. 2005. [3] Bassard, G., Modern Cryptography, Springer-Verlag, 1988. [4] Feistel, H., Cryptography and Data Security, Scientific American, vol. 228, No. 5, pp. 15-23, 1973. [5] Coppersmith, D., Cryptography, IBM Journal of Research and Development, vol. 31, pp. 244-248, 1987.

Figure 7: Simulation waveforms using blank plaintext for correct cipher output
[6] Garry C. Kessler, An overview of Cryptography, http://www.garykessler.net/library/crypto.html#intro, 2006 [7] C. Meyer and S. Matyas, Cryptography: A New Dimension in Computer Data Security, John Wiley, New York, 1982. [8] Ronald L. Rivest, The RC5 Encryption Algorithm, Proceedings of the 1994 Leuven Workshop on Fast Software Encryption (Springer 1995), pages 86-96. [9] Ronald L. Rivest, A Description of the RC2 Encryption Algorithm, RFC 2268, 1998. [10] Burton S. Kalinski Jr., Yiqun Lisa Yin, On the security of the RC5 Encryption Algorithms, RSA Laboratories Technical Report, September 1998. [11] R. Baldwin and R. Rivest, The RC5, RC5-CBC, RC5-CBC-Pad, and RC5-CTS Algorithms, RFC 2040, October 1996. [12] A. Schubert and W. Anheier, Efficient VLSI Implementation of Modern Symmetric Block Ciphers, In the Proceedings of THE ICECS99, Cyprus 1999.
http://www.rsasecurity.com/press_release.asp?doc_id=172&id=(accessed 04/23/06)
1998 respectively. In 2004, he received the Ph.D. degree from the University of Missouri-Kansas City (UMKC) in Engineering and Networking disciplines. While being at UMKC, he served as an adjunct faculty of electrical engineering department. In May 2004, after he received the Ph.D. degree, he joined the research team of Wideband Corporation, where he worked in the design and development of layer 3 network routers. In August 2005, he joined Tennessee Tech University as an Assistant Professor. Dr. Elkeelany is a member of the Institute of Electronic and Electrical Engineers (IEEE), and the Eta Kappa Nu honorary Society. He has a distinguished educational record being the recipient of the UMKC Outstanding Doctoral Interdisciplinary Ph.D. Student Award in 2004, the UMKC Chancellors Interdisciplinary Ph.D. Merit Award in 2001-2002 and the UMKC Outstanding Graduate Student Award from the School of Engineering, during 1999, 2000 and 2002. He received his B.Sc. degree with Distinction and degree of honor. In May 2005, Dr. Elkeelany received the Doctor of Research degree from the International Institute of Science and Technology.
[13] N. Sklavos, C. Machas, O. Koufopavlou, Area Optimized Architecture and VLSI Implementation of RC5 Encryption Algorithm, In Proceedings of the IEEE ICECS 2003, vol. 1, Dec. 2003. [14] C. Madson, and N. Doraswamy, The ESP DESCBC Cipher Algorithm with Explicit IV, RFC 2405, 1998. [15] F. Scott, M. Itsik, S. Adi Weakness in the Key Scheduling Algorithm Of RC4, In Proceedings of The 8th Annual Workshop on SAC, August, 2001. [16] N. Sklavos and O.Koufopavlou, Mobile Communications World: Security Implementation Aspects- A State of the Art, Computer Science Journal of Moldova, Institute of Mathematics & Computer Science, vol. 11 (2), 2003. [17] Menezes, A., Van Oorschot, P.C., Vanstone, S.A., Handbook of Applied Cryptography, CRC Press, 1997. [18] J. B. Sessions, Fast Software Implementations of Block Ciphers, M.S. Thesis, Department of Electrical & Computer Engineering, Oregon State University, 1998. [19] US. National Bureau of Standards, Data Encryption Standard, Federal Information Processing Standard (FIPS) publication 46-2, December 1993

http://www.itl.nist.gov/fipspubs/fip46-2.htm
[20] Patterson D., and Hennessy J., Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann Publishers, 1994.
Omar S. Elkeelany received the B.Sc. and M.Sc. degrees in Computer Science and Automatic Control from the University of Alexandria, Egypt 1992 and

 

Tags

CDX-P25 C2-01 Encoder CDX-C90 Serie 2 KX-TG7100GR Mexico PRO AW16G UA40C5000QM Focusrite RED1 Hkts 18 Motorola W181 Hdcsd700 AJ3230 Doro 80 Firepodod SP700 Mercury F4 NP-FM30 Volvo V50 DX-C390 NAD C320 EF2201 Lifestyle 28 UE40B8000 EMT-10 Urc-6011 Benq X900 520 S Series FE1026N Favor S-DV1000SW Diego Mp27 Review C-LUX 1 BJ-200 32PFL7762D SRU1010-10 DX-R5 C8-SGT Hardware Trimmer PSS-480 Aspire-3100 7130S Kgrt607 ICD-UX200F SR5300 Trtl 500 PM 2658 XD600 BS902 Camera Asko 1375 NS-C103 QW104D LH RR96L Runner 200 KX-TC1801B Shotgun DSC-W120 L W2242S-BF I-aquos EX-Z3 B1215J HT-CN410DVH - Gold SA-AK17 T 210 RE-1000 Xyron 2500 XP500 BL-PA100KT Reference Card DCR-SR300E 37PFL5522D 05 B5941-5 C6000 Syndicate SDT-9000 RR640CD Optio E85 HDC-TM700 ICF-R550V EP761 SL-D3 WGR614V9 Intermatic T104 TI-college XP6102 Konftel 200W VGP-PRZ1 W958C AVL 105 Freehand 2043NWX MCD 200 VP-D365WI VP-70 LN-121

 

manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding

 

Sitemap

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101