Data File Handling
Introduction:
Files: A file (or data file) is a stream or sequence of characters/data occupying named place on the disk where a sequence of related data is stored.
The programs we have seen so far are transient, i.e., they run for a short
time and produce some output, but when they end, their data disappears. If you
run the program again, it starts with a clean slate. This happens because the
data entered is executed inside primary memory, RAM, which is volatile
(temporary) in nature.
Other
programs are persistent: they run for a long time (or all the time); the keep
at least some of their data in permanent storage(for example, a hard drive);
and of they shut down or restart, they execute from the same point Examples of
persistent programs are operating systems, which run whenever a computer is
switched on, and web servers, which run all the time, waiting for requests to
come in on the network.
One
of the simplest ways for programs to maintain their data is by reading and
writing text files.
Programs
that are used in day-to-day business operations rely extensively on files.
Payroll programs keep employee data in files; inventory programs keep data
about a company’s products in files, accounting systems keep data about a
company’s financial operations in files, and so on.
Thus,
a file is a document of the data stored on a permanent storage device which can
be read, written or rewritten according to requirement. In other words, data is
packaged up on the storage device as data structures called Files. All files
are assigned a name that is used for identification purposes by the operation
system and the user.
File
I/O (Input-Output) means transfer of data from secondary memory (hard disk) to
main memory or vice versa. As shown in the figure, when we are working on a computer system, the
file or the document stored on the hard
disk is brought into RAM (Random Access Memory)for execution and vice versa.
Need of Files:
Data maintained inside the files is termed as persistent data, i.e., it is permanent in nature. Python allow us to read data from and save data to external text files permanently on secondary storage media,
Data
is stored using file permanently on secondary storage media. We usually use the
files to store data permanently while working and storing data in Word
processing applications, spreadsheets, presentation applications, etc. All of
these operations require data to be stored in files so that it may be used
later.
Thus,
files provide a means of communication between the program and the outside worlds.
In
a nutshell, a file is a stream of bytes, comprising data of interest.
Data
file operations
Before
we start working with a file, first we need to open it. After performing the
desirable operation it needs to be closed so that resources that are tied with
the file are freed.
Thus, Python file handling takes place in the following order
- Opening a file.
- Performing operations (read, write etc.) or Processing Data
- Closing the file.
Using the above-mentioned basic operations, we can process file in several ways, such as:
- Creating a file
- Traversing a file for displaying data on screen
- Appending data in a file
- Inserting data in a file
- Deleting data from a file
- Creating a copy of a file
- Updating data in a file, etc.
File
types
Python
allows us to create and manage two types of files:
- Text File
- Binary File
Text file : a text file consists of a sequence of lines. A line is a sequence of characters (ASCII or UNICODE), stored on permanent storage media. Although default character coding in Python is ASCII, using the constant ‘u’ with string , it supports Unicode as well. In a text file, each line is terminated by a special character, known as End of Line (EOL). By default, this EOL character is the newline character (‘\n’). so, at the lowest level, text file will be a collection of bytes. Text files are stored in human readable form and can be created using any text editor.
In binary file, there is no delimiter for
a line. Also, no character translations can be carried out in a binary file, as
a result binary files are easier and much faster than text file for carrying
out reading and writing operations on data.
It is perfectly possible to interpret a
stream of bytes originally written as string as numeric value, but that will be
an incorrect interpretation of data and we may not get the desired output after
the file processing activity. So, in the case of binary file, it is extremely
important that we interpret the correct data type while reading the file.
Python provides special, module(s) for encoding and decoding of data for binary
file.
opening
and closing files
To handle data files in python, we need to have a file variable of file object or file handle. Object can be created by using open() function or file() function. To work on a file, the first thing we do is open it. This is done by using the built-in function open(). Using this function, a file object created which is then used for accessing various methods and functions available for file manipulation
Open() – opening a file
When we want to read or write a file, we must first open the file. Opening the file communicated with the operating system, which knows where the data for each file stored. When we open a file, we are asking the operating system to find the file by name and make sure the file exists.
In the example given
below, we open the file test.txt, which should be stored in the same folder
that we are in when we start python.
Open() function takes the name of the file
as the first argument. The second argument indicates the mode of accessing the
file. The syntax for open() is:
Syntax:
<File variable> / <file object or
handle> =open (file _name, access mode)
Here, the first argument with open() is
the name of the file to be opened and the second argument describes the mode,
i.e., how the file will be used throughout the program. This is an optional
parameter as the default mode is the read mode (reading).
Modes for opening a file:
·
Read(r): to read
the file
·
Write (w): to
write to the file
·
Append (a): to
write at the end of the file
The object of file type is returned using which we will manipulate the file in our program. When we work with file(s), a buffer (area in memory where data is temporarily stored before being written to the file) is automatically associated with the file when we open it. While writing the content to a file, first it goes to buffer, and once the buffer is full, data is written to the file.
Also, when the file is closed, any unsaved data is transferred to the file.
Flush() function is used to force transfer of data from buffer to file.
If the opening is successful, the
operation system returns us a file handle .
the file handle is not the actual data contained in the file; instead,
it is a “handle” that we can use to read the data. We are given a handle if the
requested file exists and we have proper permission to read to file.
Example 1
>>> f = open(‘test.txt’)
>>> print (f)
from the above example, when we open a file, it opens
in read mode by default and ‘f’ as the file object or file variable or file
handle, whatever we may say shall be returned as the output b the operation
system.
Note : when you open a file in read mode, the given
file must exist in the folder, otherwise python will raise FileNotFoundError.
Now, in the other situation in which the file does not exist, open() will fail with a trace back and we will not get a handle to access the contents of the file.
File Modes
The second parameter of the open() function corresponds to a mode which is either read (‘r’), write (‘w’), or append (‘a’). the file mode defines how the file will be accessed. The modes suffixed with ‘b’ represent binary files.
mode |
Description |
r |
Opens a file for reading
only. The file pointer is placed at the beginning of the file. This is the
default mode. If the specified file does not exist, it will generate
File Not Found Error. |
rb |
Opens a file for reading only in binary format. The
file pointer is placed at the beginning of the file. This is the default
mode. |
r+ |
Opens a file for both
reading and writing. (+) the file pointr will be at the beginning of the
file. |
rb+ |
Opens a file for both reading and writing in binary
format. (+) the file pointer will be at the beginning of the file. |
w |
Opens a file for writing
only. Overwrites the file of the file exist. If the file does not exist, it
creates a new file for writing. |
wb |
Opens a file for writing only in binary format.
Overwrites the file of the file exist. If the file does not exist, creates a
new file for writing |
w+ |
Opens a file for both
reading and writing. Overwrites the existing file. If the file exist. If the
file does not exist, creates a new file for reading or writing. |
wb+ |
Opens a file for both reading and writing in binary
format. Overwrites the existing file. If the file exist. If the file does not
exist, creates a new file for reading or writing. |
a |
Opens a file for appending.
The file pointer is at the end of the fie if the file exists. That is, the
file is in the append mode. If the file does not exist, it creates a new file
for writing. |
ab |
Opens a file for appending in binary format. The
file pointer is at the end of the file if the file exists. That is, the file
is in the append mode. If the file does not exist, it creates a new file for
writing. |
a+ |
Opens a file for both
appending and reading. The file pointer is at the end of the file if the file
exists. The file is in the append
mode. If the file does not exist, it creates a new file for reading and
writing. |
ab+ |
Opens a file for both appending and reading in
binary format. The file pointer is at the end of the file if the file exists.
The file is in the append mode. If the
file does not exist, it creates a new file for reading and writing. |
Example 2:
>>>file =open(“test.txt”, ‘”r+”)
Will open the file ‘test.txt’ for reading and writing purpose. Here, the name (by which it exists on secondary storage media) of the file specified is constant.
We can use a variable instead of a constant as name of the file, test file; if it already exists, then it has to be in the same folder where we are working now, otherwise we have to specify the complete path.
It is not mandatory to have file name with extension. In the above example, .txt extension is used for our convenience of identification as it is easy to identify the file as a text file. Similarly, for binary file we will use .dat extension.
Another function which can be used for creation of a
file is file(). Its syntax and its usage is same as open().
Close()- closing a file
The close() method of a file object
flushes any unwritten information and closes the file object, after which no
more writing can be done. Python automatically closes a file when the reference
object of a file is reassigned to another file. It sis a good practice to use
the close() method to close a file.
Syntax:
File object.close()
A close() function breaks the link of file
object and the file on the disk.
After closing a file (using close()), no
tasks can be performed on that file through the file-object.
Example 3:
>>>f
=open(‘test.txt’)
>>>print
(“The name of the file to be closed is:”,f.name)
The
name of the file to be closed is : test.txt
>>>f.close()
In the above example, we have used ‘name’ property of the file object ‘f’ along with print() Statement, which will return the name if the currently used file, which is ‘test.txt’ in this case. Let us discuss these properties of File Object.
Various
properties of File Object:
Once
open() is successful and file object gets created, we can retrieve various
details related to that file using its associated properties.
1. name: Name of the opened file.
2. mode: Mode in which the file gets opened.
3. closed: returns Boolean value, which indicates whether
the file is closed or not.
4. readable: returns
Boolean value, which indicates whether the file is readable or not
Reading
from a file
Python
provides various methods for reading data from a file. We can read character
data from text file by using the following read methods:
a) read(): To read
the entire data from the file; starts reading from the cursor up to the end of
the file.
b) read(n): To read
’n’ characters from the file, starting from the cursor; If the file holds fewer
than ‘n’ characters, it will read until the end of the file.
c) readline(): To
read only one line from the file; starts reading from the cursor up to, and
including, the of the line character.
d) readlines(): To read all lines from the file into a list;
starts reading from the cursor up the end of the file and returns a list of
lines.
Let
us understand these methods with the help of suitable examples using a text
file ‘test.txt’.
Example1: read() by reading the entire data from a file (test.txt)
data=f.read()
print(data)
f.close()
Example2: read(n) by reading the entire data from a file (test.txt)
data=f.read(10)
print(data)
f.close()
Example3: to read data line by line using readline() method
line1=f.readline()
print(line1,end=' ')
print(line2,end=' ')
f.close()
lines=f.readlines()
for line in lines:
print(line,end=' ')
f.close()
print(f.readlines())
print(f.read(3))
print ("Remaining data")
Writing to File
We
can write character data into a file in Python by suing following two methods:
1.
write(string)
2.
writerlines (sequence
of lines)
1.
write(): write()
method takes a string (as parameter) and writes it in the file. For storing
data with end of line character, we will have to add ‘\n’ character to the end
of the string. Notice the addition of ‘\n’ at the end of every sentence while
talking of data.txt. As argument to the function has to be string, for storing
numeric value, we have to convert it to string.
Syntax:
Fileobject.write(string)
Example:
#Program to write data to the file
f=open(“test2.txt”,”w”)
f=write(“We are writing \n”)
f=write(“data to a\n”)
f=write(“text file\n”)
print(“Data written to the file successfully”)
f.close()
We are writing
Data to a
Text file
#Program to write numeric data to a file.
f=open(“newtest.txt”,”w”)
x=100
f.write(Hello Word \n”)
f.write(str(x)) #Numeric value is converted into string
f.close()
>>> I
Syntax:
Fileobject.writelines(sequence)
So, whenever we have to write a sequence of string/data type, we must use writelines() instead of write().
#Program to illustrate writelines() method
#for writing list into the file
f=open(“test4.txt”,”w”)
list =[“Computer Science\n”, “Physics\n”,”Chemistry\n”,”Maths”]
f.writelines(list)
print(“List of lines written to the file successfully””)
f.close()
Output:
List of lines written to the file successfully
To see output open test4.txt file in notepad.
Note:
While reading from or writing to the file, the cursor always starts from the beginning of the file.
Also to be noted here is that writelines() method does
not add any EOL character to the end of string. We have to do it ourselves. So,
to resolve this problem, we have used’\n’ new line character(in the program)
after the end of each list item or string.
With statement
Apart from using open() or file() function for
creation of file, with statement can also be used for the same purpose. We can
use this statement to group file operation statements within block. Using with
ensures that all the resources allocated to the file objects get deallocated
automatically once we stop using the file. In the case of exceptions also, we
are not required to close the file explicitly using with statement. Its syntax is:
With open() as file object:
File manipulation statements
With open (“test1.text”,”w”) as f:
f.write(“Python\n”)
f.write(“is an easy\n”)
f.write(“language\n”)
f.write(“to work with\n”)
print(“Is file closed: ”, f. closed)
print(“Is file closed: ”, f. closed())
APPENDING TO FILE
Append means ‘to add to’; so if we want to add more data to a file which already has some data in
it, we will be appending data. In such a case, use the access mode ‘a’, which
means:
‘open for writing, and if it exists, then
append data to the end file’.
In python, we can use
the ‘a’ mode to open an output file in append mode. This means that:
- Ø If the file already exits, it will not
be erased. If the file does not exist, it will be created.
- Ø
When data is written to the file, it
will be written at the end of the file’s current contents.
Syntax:
<file
_object>=open(<filename>,’a’)
Here,
‘a’ stands for append mode. Which allows to add data to the existing data
instead of overwriting in the file.
For
example:
>>>f=open(“test1.txt”,”a”)
#Program to add data to existing data in the file
f.open(“test.txt”,’a’) #opening file in append mode
f.write(“simple syntax of the language\n”)
f.write(“marks Python programs easy to read and write”)
print(“More Data appended to the file ”)
f. closed()
Output:
More data appended to the file
Contents of the text file “test.txt”:
Hello user
you are working with
python
files
Simple syntax of the language
Make Python programs easy to read and write
BINARY FILE OPERATIONS
If we wish to write a structure such as a list or dictionary to a file and read it subsequently, we need to use the Python module pickle.
Pickling refers to the process of converting the structure to a byte stream before writing to the file.
While reading the contents of the file, a reverse process called Unpicking is used to convert the byte stream back to the original structure.
We know that the methods provided in Python for writing/reading a file work with string parameters. So when we want to work on binary file, conversion of data at the time of reading as well as writing is required.
Pickle module can be used to store any king of object in file as it allows us to store Python objects with their structure. So for storing data in binary format, we will use pickle module.
First we need to import the module.
It provides two methods for the purpose- dump and load.
For creation of a binary file we will use pickle.dump() to write the object in file, which is opened in binary access mode.
Syntax
of dump() method is:
Dump(object,fileobject)
#Program to write list sequence in a binary file
Def foperation():
import pickle
List1= [10’20’30’40’100]
f.open(‘list.data’,’wb’) #’b’ in access mode represents binary file
pickle.dump(list1,f) #writing contents to binary file
print(“list added to binary the file ”)
f. close()
foperation()
#Program to write dictionary to a binary file
import pickle
dict1= {‘Python’ : 90,'Java’: 95,‘C++’ : 85}
f.open(‘Bin_file.dat’,‘wb)
pickle.dump(dict1,f)
f. close()
Once
data is stored using dump()m it can then be used for reading. For reading data
from a file, we have to use pickle.load() to read the object from pickle file.
Syntax
of load() is :
Object=load(fileobject)
Note:-
we need to call load() each time dump() is called.
#Program to to read python dictionary contents back from the file file
import pickle
f.open(‘Bin_file.dat’, ‘rb)
doct1=pickle.load(f)# reading data from binary file
f. close()
print (dict)
Most of the files the we see in our computer system are called binary files.
Example:
·
Document files:
.pdf, .doc, .xls, etc.
·
Image file: .png,
.jpg, .gif, .bmp, etc.
·
Video files: .mp4, .3gp, .mkv, .avi, etc.
·
Audio files: mp3,
.wav, .mka, .aac, ctc
·
Database files:
.mdb, .accde, .frm, .sqlite, etc.
·
Archive files:
.zip, .rar, .iso, .7z, etc.
· Executable files : .exe, .dll, .class, etc.
All binary files follow a specific format. We can open
some binary files in the normal text editor but we cannot read the content
present inside the file. This is because all binary files are encoded in the
binary format which can be understood only by a computer or a machine.
In binary files, there is no delimiter to end a line.
Since they are directly in the form of binary hence there is no need to translate them. That is why these files
are easy to work with and fast.
The four major operations performed using a binary file such as-
1.
Inserting/Appending
a record in a binary file
2. Reading
records from a binary file
3. Searching a record in a binary file
4. Updating a record in a binary file
Assuming that we have a “ student “ file with the fields Roll_no, name and marks.
Inserting/Appending a record in a binary file
Inserting or adding (appending) a record into a binary file requires importing pickle module into your program followed by dump() method to write onto the file.
#Program to inserting/appending record in a binary file-student
import pickle
record= []
while true :
roll_no = int (input(“Enter student Roll no. :”))
name = input(“Enter student name:”)
marks = int (input(“Enter the marks ibtained :”))
data = [roll_no, name, marks]
record.append(data)
choice = input (“wish to enter more records (Y/N) ?:”)
if choice.upper() == ‘N’ :
break
f = open (“student”,”wb”)
pickle.di,[ (record,f)
print (“Record Added”)
f.close()
#Program to read a record from the binary file- “student.dat”
import pickle
f=open(“student”,”rb”)
stud_rec=pickle.load(f) #To read the object from the opened file
print(“Contents of student file are :”)
#reading the fields from the file
for R in stud_rec:
roll_no=R[0]
name=R[1]
marks=R[2]
print (roll_no, name,marks)
f.close()
#Program to search a record from the binary file- “student.dat”
import pickle
f=open(“student”,”rb”)
stud_rec=pickle.load(f) #To read the object from the opened file
found = 0
rno=int(input(“Enter the roll number to search:”))
for R in stud_rec:
if R [0] == rno:
print(“successful search”, R[1], “Found!”)
found[0]
break
if found == 0:
print (“sorry, record not found”)
f.close()
#Program to update the name of the student from the binary file- “student.dat”
import pickle
f=open(“student”,”rb+”)
stud_rec=pickle.load(f) #To read the object from the opened file
found = 0
rollno=int (input(“Enter the roll number to search:”))
for R in stud_rec:
rno= R [0]
if rno== rollno:
print(“current name is”, R[1] )
R[1]= input (“New Name:”)
found =1
break
if found == 1:
f.seek(0) #Taking the file pointer to the beginning of the file
pickle.dump (stud_rec,f)
print (“Name Updated ! ! !”)
f.close()
RELATIVE AND ABSOLUTE PATHS
File are organized into directories (also called “folders”). Every running program has a “current directory”, which is the default directory for most operations.
For example, while opening a file for reading,Python looks for it in the
current directory.
The os module provides functions for working with files and directories (“os” stands for “operation system”). Os.getcwd returns the name of the current directory
>>>import
os
>>>
cwd=os.getcwd()
>>>print(cwd)
Files are always stored in the current folder/directory by default.
The os (operation system) module of Python provides various methods to work with file and folder/directories.
For using
these function, we have to import os module in our program.
cwd stands for “current working directory”.
A string like cwd that identifies a file is called a path.
A relative path starts from the current directory, whereas
an absolute path starts from the topmost directory in the file system.
For example, the text file we have created in the
previous programs was opened though the absolute path.
>>>import os
>>>cwd=os.getcwd()
>>>print (cwd)
C:\user\KVD\AppData\Local\Programs\Python|Python36-32
Alternatively,
>>>f=open(“test.txt”)
\\test.txt
is the Relative File Path
Note:
The Python program and external file must be in the same directory, else we
will need to enter the entire file path.
STANDARD FILE STREAMS
We use file object(s) t work with data file; similarly input/output from standard I/Odevices is also performed using standard I/O stream object. Since we use high-level functions for performing input/output through keyboard and moitor, such as :-
eval(), input() and print statement, we
are not required to explicitly use I/O stream object.
The
standard streams available in Python are:
- Standard input stream,
- Standard output stream, and
- Standard error stream.
These standard stream are nothing but file objects, which get automatically connected to your program’s standard device(s) when we start Python, in order to work with standard I/O stream, we need to import sys module. The methods which are available for I/O operations in it are read()
For reading a byte at a time from keyboard write() for writing data on console, i.e., monitor.
The three standard streams are described as follows:
1.sys.stdin: when a stream reads from standard input.
2.sys.stdout: data written to sys.stdout typically appears on your screen, can be linked to the standard input of another program with a pipe.
3.Sys.stderr: Error messages are written to sys.stderr.
#Program to implement standard streams
import sys
F1=open(r”test.txt”)
Line1 = f.readline()
Line2 = f.readline()
Line3 = f.readline()
sys.stdout.write(line1)
sys.stdout.write(line2)
sys.stdout.write(line3)
The lines containing the method stdout.write() shall write the respective lines (form the file ‘test.txt’) on device/file associated with sys.stdout, which is the monitor.
#Program to to copy the contents of a file to another file
import os
def fileCopy(file1,file2) :
f1=open(file1, ‘r’)
f2=open(file1, ‘w’)
line = f.readline()
while line != ‘ ‘ :
f2.write(line) #write the line from f1 with additional newline
line = f.readline()
f1close()
f2close()
def main() :
fileName1=input(‘Enter the source file name: ‘)
fileName2=input(‘Enter the destination file name: ‘)
filecopy(fileName1, fileName2)
if_name_== ‘_main_’ :
main()
RANDAM ACCESS IN FILE USING TELL() AND SEEK()
- Absolute positioning
- Relative Positioning
Files in the CSV format can be imported to and exported from programs that store data n tables, such as Microsoft Excel or Open office Calc.
CSV stands for “comma separated values”. Thus, we can say that a comma separated file is a delimited text file that uses a comma to separate values.
Each line in a file is known as data/record. Each record consists of one or more fields, separated by commas (also known as delimiters), i.e. each of the records is also a part of this file. Tabular data is stored as text un a CSV file. the use of comma as a field separator is the source of the name for this file format. It stores our data into a spreadsheet or a database.
Question 1 : Write a statement in Python to perform the following operations:
- To open a text file "MYPET.TXT" in write mode
- To open a text file "MYPET.TXT" in read mode
- f1=open("MYPET.TXT",'w')
- f2=open("MYPET.TXT",'r')
Question 2 :
Question 3 :
Question 4 :
No comments:
Post a Comment