Floating point and Double Data Type

Numbers can be represented using Integer data types in C++. But some numbers like real numbers cannot be stored like integers because there is a decimal part associated with the real numbers.

C++ programming language has floating point and double-precision data type to represent real numbers.

A real number is declared using keyword – float or double. The main difference between float and double is the size. The size of float is 4 bytes or 32 bits, where the size of double type is 8 bytes or 64 bits. There is a long version of the double data type which is about 12 bytes or 16 bytes in size.

Data TypesByte SizeBit Size (1 byte = 8 bits)
float432
double864
long double12 or 1696 or 128

Computer Representation of Floating Point

The real numbers are represented in scientific notation (or exponential notation) because it is easier to perform arithmetic involving real values.

The Exponential Notation

The exponential notation has two parts – a mantissa and an exponent.  The equation to represent the floating point numbers in exponential notation is shown below.

\pm M * 10^E where 0 \leq M \leq 10

For example, suppose you want to represent 20000 in exponential notation then it becomes

2.0 * 10^4  where 0 \leq 2 \leq 10

if you want to represent 133 in scientific notation, then

1.33 * 10^3 where 0 \leq 1.33 \leq 10

The number 0.00005454 can be represented as

5.454 * 10^-5 where 0 \leq 5.454 \leq 10

The table isolates the different parts of the examples given above.

MantissaExponentE-notation
2.042.0E4
1.3331.33E3
5.454-55.454E-5

The above notation is suitable for human, but the computer needs a binary representation of floating point numbers and that too, in exponential format.

Since we already know that 4 bytes or 32 bit is required to store a floating point number in a computer. The floating point number is divided into 3 parts – 23 bits for the mantissa, 1 bit for sign, and 8 bit for exponents.

The sign bit 0 means positive number and 1 means a negative number.

The 8-bit exponent can store values between -128 to 127.

The computer representation of exponential notation is:

(b_0.b_1 b_2 b_3…) * 2^E where b_{0} = 1

Computer Representation of Floating Point Number
Computer Representation of Floating Point Number

Declaring Floating Type and Double Type

Declaring a floating type and double data type variabe in a C program is similar.

float PI  3.14;
double radius 5,33;

There is little difference between float and double though they are represented in the same way in a computer. The double precision is longer than the float in terms of allowing the real part of a floating number.

3.244440

3.244440000000000 (double is has longer)

Please support us by disabling your adblocker or whitelist this site from your adblocker. Thanks!

turn of adblocker imag