ATI Radeon X1900


ATI Radeon X1900Apple 661-4335 Card, Video, ATI Radeon X1900 XT, 512 MB, Ver. 2

DVI - Dual-DVI - AMD Radeon - 512 MB

Brand: Apple
Part Number: 661-4335

User reviews and opinions

Comments to date: 1. Average Rating:
cclarkecapo 7:55pm on Saturday, April 10th, 2010 
ASUS P5K-VM, Intel C2D E6550, Zero-Therm CF900 cooler, 2 GB of G-SKILL PC2 8500, Rosewill Stallion Series RD500-2SB ATX12V v2.2 500W Power Supply.

ATI Radeon X1900 XTX Maintenance
Like all graphics cards, the Radeon X1900 XTX occasionally needs to be opened up, and cleaned in order to function properlly.

Step 1 Start by removing the card from the PC if you haven't already.
A Performance-Oriented Data Parallel Virtual Machine for GPUs
Mark Peercy Mark Segal Derek Gerstmann

Problem Statement

.signicant barriers still exist for the developer who wishes to use the inexpensive power of commodity graphics hardware, whether for in-game simulation of physics or for conventional computational science. These chips are designed for and driven by video game development; the programming model is unusual, the programming environment is tightly constrained, and the underlying architectures are largely secret. The GPU developer must be an expert in computer graphics and its computational idioms to make effective use of the hardware, and still pitfalls abound.
Course Description, SIGGRAPH

2005 GPGPU Course

GPU as Compute Device
Interest for using GPU for compute - Linear Algebra - Physics - Convolution - Sorting - FFT - Rendering (nal-frame) Exercises a small fraction of features in

graphics hardware

Current GPU Abstraction
Rendering Pipeline (OpenGL, Direct3D) Graphics-centric programming model Forced to manage graphics state Even for non-visual computation(!) Implemented through graphics driver Mechanism designed to hide hardware Imposes critical policy decisions How + when + where data resides Optimized for graphics and games Driver updates exhibit different behavior

A Data Parallel Approach

The Data Parallel Virtual Machine (DPVM) Expose relevant parts of the GPU as they really are Command Processor Data parallel processor arrays Memory controller Hide all other graphics-specic features Provide direct communication to device Eliminate driver implemented procedural API Push policy decisions back to application

The Data Parallel VM

openManagedConnection() closeManagedConnection() submitCommandBuffer() commandBufferConsumed()

Host Memory

Commands Instructions Constants Inputs Outputs

Command Processor

Data Parallel Processor Array

Memory Controller

GPU Memory
Abstracts communication from architecture Commands are architecturally independent Accepts command buffers (CBs) in memory Interprets commands in buffer Distributes work to processor array Application manages command buffers Application lls and submits CB Application handles synchronization
Complete List of Data Parallel Commands
Program Execution set_cond_val set_domain start_program set_out_mask set_cond_out_mask set_cond_test set_cond_loc Cache Control inv_inst_cache inv_constf_cache inv_consti_cache inv_constb_cache inv_cond_out_cache inv_inp_cache flush_out_cache flush_cond_out_cache Performance Counters init_perf_counters start_perf_counters stop_perf_counters read_perf_counters Memory Layout set_inst_fmt set_inp_fmt set_out_fmt set_cond_out_fmt set_constf_fmt set_consti_fmt set_constb_fmt

Data Parallel Processors

Performs floating-point computations Accepts binary executable (ELF) - Formal application binary interface (ABI) - Uses native instruction set architecture of
processors (ISA) ISA is architecturally dependent Only ISA needs to be updated for new architectures Application submits compiled binary - Executable is immune to driver changes
Services GPU requests to read/write memory Exports graphics memory directly GPU memory (accessible by GPU only) Host memory (accessible by CPU + GPU) Application manages memory Species locations and formats Can cast between formats (w/o copying) Controls cache invalidation


Implementation (CTM - Close to the Metal) Radeon X1k architecture (eg X1900) Exposes hardware resources (SM3.0 DX9+) Native ISA (ASM Text + Binary Formats) Runtime library Low-level driver components Support libraries Command buffer packing Memory allocation Assembler/Disassembler
Processor Resources (ATI Radeon X1900)
x16 Inputs (textures) float1/2/4 x4 Outputs (MRT). float1/2/4 assigned (x,y). or xINF Outputs float1 arbitrary (x,y) x512 Instructions any combination.
x256 Float Constants float4 x32 Integer Constants int4 x32 Boolean Constants bool1 x128 Registers (GPR) float4
x256 Float Constants float4 x32 Integer Constants int4 SCATTER! x32 Boolean Constants bool1 x128 Registers (GPR) float4
Additional Features (beyond SM3.0) Scatter (output float1 values to arbitrary locations) Tiled memory formats Fetch4 (retrieve x4 float1 values in a single clock) ABI w/native ISA allows hand-optimizations Ability to read/write directly to/from host memory Avoid non-IEEE oating point optimizations Application dictates granularity of CB submission Unlimited application execution time (arbitrary CB)


DPVM Applications

App Benet Features

Matrix-Matrix Multiply FFT GPURay QJulia

x10 x2 x2 x2

CB, ISA, mem-formats, memoffsets, interleaving, fetch4 CB, ISA, interleaving CB, mem-formats CB, mem-formats


Benefits of the Data Parallel Approach Straight-forward programming model Allows hand-tuned optimizations Exposes actual hardware device Direct control over memory + processors Application binary interface + native ISA Application is responsible for all policy decisions Allows consistent performance for compute

Future Work

Other things to explore. Open area for tool development Debuggers(!) Statistical runtime prolers New opportunities for compiler research Support for high-level languages Non-graphics optimizations

Special Thanks.

ATI Research, Inc. Mark Peercy, Mark Segal, Alpana Kaulgud, Raja Koduri,
and everyone else. Stanford University Mike Houston, Daniel Horn.


For more info contact:



