forked from atulkumarpccs/Linux-Learning-By-Doing
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLinux_file_System
546 lines (419 loc) · 24.1 KB
/
Linux_file_System
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
1. - in Linux, following is a list of fs :
- ext2 fs
- ext3 fs
- ext4 fs
- xfs
- vfat
- reisefs
2. file system managers - these are the subsystems in the operating
system kernel that really manage active file systems and active
files. meaning, active filesystems are filesystems that can
be accessed - basically, they are in use - active files are
files that can be accessed and in use .contrast this with
passive file systems and passive files . these are resident
on the disk, but cannot be accessed and manipulated .
3.
- if there are several file system layouts active in a
computing system, operating system kernel must support
as many different file system managers
- of course, there will be a logical file system manager
in the system managing all the file system managers(physical)
and active file system layouts.
4. physical file system
- boot block - 2 sectors are reserved and not used
by the file system, but accounted
as part of the file system
- 1024 bytes
- file system descriptor/ super block -2 sectors
allocated to manage this in the file system and
file system descriptor
is the meta data that manages file system related
information
- 1024 bytes
- no of file system blocks
- no of free file system blocks
- no of free files
- no of used files
- file system block size
- other meta data related information
- file descriptors table contain several file descriptors
- each file descriptor manages meta data for a specific
file . these descriptors are of fixed size - say,
128 bytes or 256 bytes -
- after these meta data sectors, a file system contains
several sectors used for storing file data/real user/
application.
- using dumpe2fs or debugfs, you can understand the layout
of a modern day Linux file system
- based on using dumpe2fs on /dev/sda11 of a test system,
we managed to map the layout and describe meta data
blocks and data blocks - the file system used in this
case is ext2 type - this can be done on any other file
system of Linux
- one important caution - do not use these tools on
active file systems and file systems contains sensitive
data
5. logical file system
- a file is maintained by the file system using
a meta data - typically this meta data is of fixed
size and may contain additional, dynamic meta data
- such meta data entities used
to manage files are known as inodes (index nodes) -
these are the same as file descriptors
- a typical inode is of size 128 or 256 bytes with
additional meta data assoicated as needed
- inode information can be divided into 2 parts -
one part contains attributes of the file ./ inode
- another part contains addresses of data blocks
of the file that contain the real data of the file
- attributes may contain the following:
- size
- credentials(certain ids)
- access permissions
- no. of blocks
- link count
- file type
- each inode is assigned an unique no. > 0
- system uses inodes and inode nos to access and
manage files - interfaces use names - typically,
system call APIs and utilities use names.
- how does the system map a given filename to inode no.
- special tables are managed by the file system
to maintain file names to inode nos in the system
- these tables are used to maintain a logical
hierarchy on behalf of the user / application
- in addition, there are several special files
created in the system to maintain specific
set of directory tables
- the best way to start is with the first file of the
file system that is exposed to the user/application/
system.
- can you spot the first file of the file system in
a Unix/Linux system ?
- inode no. 2 is hard coded in a unix/linux
system to be treated as the first file
of the file system
- in addition, this file has the name / (
this is known as root, but name is /)
- this file is not treated as a regular file-
this file is treated as a directory file.
- meaning, every inode has a file type field and
this is set to directory type.
- such a directory file is allocated an inode
and one or more data blocks.
- data blocks are supposed to maintain directory
entries, not user/application data.
- a directory entry contains the following info:
- name of the file (it can be of any type)
- inode no of the file
- file type
- name length
/root/os_linux/day10_11/filesystem_class1.txt
Note: try to visualize how this file is stored in the file system
with the help of directories, inodes and other meta data.
- in addition, try to visualize how a file system manager
will access inodes, directory tables and data blocks,
as needed from the file system layout.
- it is assumed that the current
file belongs to an active filesystem
- load corresponding file system block
containing directory entries .the
file system blocks are loaded into
system space cache - such a cache is
known as file system cache.
- there will be one directory entry describing
"root" directory file in this directory .
- using that we can get the inode no.of the
"root" directory file
- "root" directory file will have an associated
inode no, inode and a set of data blocks
- in these data blocks, there will be directory
entries
- the above is accomplished using the same techniques
described for inode no. 2 .
- one such entry will be maintained for "os_linux"
- corresponding to "os_linux", there will be an
inode and data blocks - data blocks will contain
directory entries - one such directory entry
is "day10_11" - this continues and
eventually, a regular file is encountered -
this will also have a directory entry in the
data block of the corresponding directory file.
- using the inode no. and inode of the regular file,
data blocks of the regular file can be accessed
Note: every file system is identified using a device no. or
device id. - this information is maintained in the
run-time meta data of the file system, when the
file system is active - this is closely tied to
I/O subsystem, but not used by the file system layout.
6. whenever a file is accessed, an active file instance of the file
is created in the system space - in most of these cases, a
process is involved.
- to create an active file, a system call API is needed
- in Linux system,such an API is known as open() system call API .
- fd = open("/root/os_linux/file1", O_RDWR);
- complete pathname translation and
get access to an inode copy, in memory -
such an inode copy is known as in memory
inode.
where fd will contain an index into the open
file descriptor table of the current process
ret = read(fd, buffer, n);
where, fd is the index that was obtained in the
previous step - buffer is the ptr to the user-space
buffer and n is the no. of bytes that we are interested
in reading from system space
- in this read() system call API, we are interested in
reading n bytes from the file and copy the same
into the user space area pointed by buffer.
- in this context, ret returns error or a +ve
number - when +ve number is returned, it is the
number of bytes successfully copied into user-space
buffer - there is also a special return value -
this is 0 - when does this occur ?? this is the
condition that signifies end of file data - that
is no more data is stored in the file.
- there is also a possibility of ret value to
be +ve and less than n - when does this occur ??
- available amount of data in the file is
less than requested amount of data.
- when we are reading data from a regular file, all
data is represented with respect to logical bytes.
it is the responsibility
of logical file system and physical file system
manager to translate logical bytes requests to
logical blocks request and pass it to the low
level I/O subsystems.
- for a given active file, when a process is reading
or writing, system must remember the no. of logical
bytes read by the process with respect to a given
active file - how is this done ? meaning, if a
process has read 2000 bytes from an active file,
how does the system remember that it must read from
2000th byte for the next read system call API ?
- there is a location in the open file table
entry corresponding to every active file of
a process.
- this offset field represents file byte no
currently being accessed via this active file -
this file byte no is initially set to 0, when
an active file is set up using open() system
call API.
- this is updated following any read/write system
call API and also used during subsequent
read/write system call APIs.
- internally, system calls such as read()
use this file byte no and pass it certain
system APIs.
- in addition, file system/file system managers
deal with blocks of data, not bytes of data -
however, system call APIs and other library APIs
access files with byte units - it is the responsibility
of file system managers to translate byte units to
block units for access - this is done using the
file byte no of interest of the file system block size -
file system block size is also store inside the
in memory inode, when a file is activated - actually,
this information is retrieved from file system meta
data managed for the current active file system.
- how does the read system call API know that
for the current file, end of file data is reached -
meaning, all the data of the file has been read ?
- in a typical unix/linux file system,
there is no special characters allowed
by the operating system - meaning, there
are no special characters stored at the beginning
of a regular file or end of a regular file.
- each on disk and in memory inode contains
a size field - this holds the total size
of data stored in the file in bytes.
- given the size field in the in-memory inode
and the file byte offset field of the open
file table entry, system can easily conclude
end of file scenarios and how many more bytes
may be read from the active file.
- every regular file /inode contains credentials
and permissions - you can get this info using
stat utility or ls utility - you can also get
this info using stat() system call API or lstat()
system call API .
- credentials typically include user id and
group id
- permissions are set of bits which have specific
meanings in terms of permissions for certain
credentials of the current process that
is using an active file instance of the given
file/inode .
- using ls or stat, extract specific credentials
and permissions of regular files and try to
match and map them .
- every file has 3 permission bits for 3 sets -
first set is used to provide permissions to
processes that execute with euid(uid) set to
that of the uid of the regular file - second
set of permissions is used to provide permissions
to processes that execute with egid(gid) set to
that of the regular file - meaning, the corresponding
credentials of a given process and the active file
are matching - accordingly, permission bits are
used and interpreted - of course, for all processes
whose euid as well as egid do not match with
any of the credentials of a file, will be provided
access permissions based on the last 3 bits .
- based on the above, try to interpret the credentials
of processes and files with respect to permissions .
- if a process wishes to access a file, it must
be able to match its credentials with the credentials
of all directories in the hierarchy of the file's
pathname and satisfy appropriate access permissions .
if so, process will allowed to match its credentials
with the credentials of the actual file/regular
file and further active file setup will be completed -
if at any directory, credentials do not match or
appropriate permissions are not granted, active file
setup will be aborted .
- consider the permissions of directories
in the path name of a file - if all permissions
are satisfied, a process may be allowed to create
an active file instance of a given file with
appropriate access permissions mentioned via
flags in the open() system call API -
- for instance, if a process uses O_RDWR
in access request flags of the open()
system call API, what does it mean ??
- process wishes to use read() and
write() system calls on the specific
file, if permissions are granted .
- to find the other possible permission
request flags, refer to man 2 open()
how will the system do the validation
of requested permissions and allowed
permissions ??
- credentials of process and active
file are matched - based on the matched
credentials, corresponding permissions
are extracted and matched against the
requested permissions - if the requested
permissions are allowed, requested
access permission are
granted - active file is created and
open file descriptor table index is
returned to open() system call API .
what will be the consequence if requested
permissions are granted by the open() system
call API ?
-active file is created - requested
permissions are granted and these
requested permissions are stored in
another field of the open file table
entry corresponding to this active file .
- these stored permissions are subsequently
checked by read() and write() system
call APIS .
what will be the consequence if the requested
permissions are not granted by the open()
system call API ?
- open() will return error .
- active files are maintained via open file descriptor
table
- when a fork() system call API is invoked, child process
is given a duplicate open file descriptor table -
what is the consequence of this duplication on
active files currently managed by the parent process ?
- since children processes inherit credentials of the
parent process, permissions are not an issue - focus
more on the active files and how they may be inherited
by children processes .
- active files of the parent process are also inherited
by the child process, but must be used judiciously -
meaning, if parent writes and child also attempts to
write, they may see the same logical byte offset
as they share the same open file table entry -
such peculiarities are very common in unix/linux
mechanisms - if you can understand and use them
appropriately, there should not be any problem .
- if an active file cannot be shared between the
parent and child, parent or child may use
close(index) system call API to destroy the
active file maintained in the parent or child, not
both .
- refer to the diagram used for pipe IPC discussion,
below .
- after an execl()/ or related system call API,
open file descriptor continues to be the same -
developer may use the existing open files or
use a new set of open files as needed
- in most of the above cases, developer can control
certain behaviour with special flags and other
techniques - explore system call APIs of
files for more details
- close() system call
- dup() system call
7. links and their uses in a unix/linux system :
- there are 2 types of links in unix/linux - hard links
and soft links (symbolic links)
- hard link to be understood first :
- to understand a hard link, you must understand the
what is a hard link - it is nothing but a directory
entry of a file - meaning, mapping between a filename
and corresponding inode no. - in this context, file
is really represented inode no. and corresponding
inode
- for a given inode no, unix/linux allows more than
one directory entry - each such directory entry
is known as a hard link of the file - this is
useful for convenience of naming ..
- in short, each hard link of a file, represents the
same file, but with a different name - such naming
conventions are popular and useful in unix/linux
systems .
- one such example is . file in every directory
- one more example is .. file in every directory
- if a file has several hard links associated with
the file, there will be +ve (>1) link count
maintained in the inode of the file - otherwise,
it will be just 1 - a file must have one hard link.
- you are free to create hard links for a file using
ln utility or link() system call API - as more hard links
are created, link count is incremented - when each hard link
is deleted, link count is deceremented and eventually,
inode and data blocks are freed - this is when we say that
a file is deleted .
- look into the man pages of ln and link() for usage .
- however, hard links are only allowed within a given
file system layout instance of a unix file system type -
meaning, you cannot span hard links(directory entries)
across file system layout instances - this is not allowed .
find why ?
- explore the layout of a file system and
how it manages inode tables and directory
entries
- also try to activate 2 file system layout instances
using mount command - try to create hard links
for a file in a given file system layout instance
on another file system layout instance - ofcourse,
both must be of linux/unix file system types -
you cannot implement these across file systems
of windows operating system .
- soft links are explained as below:
- in the case of a softlink, a new inode is created
and inode does not contain normal data - inode is
is forced to store pathname of a target file .
- best way to understand a symbolic link or a soft
link is to create one using ln and list is using
stat or ls commands .
- for example, let us see a few standard soft links
in our system :
which gcc and also list the file's pathname using ls
which cc ""
which bash ""
which sh ""
- symbolic links have the same usage and utility
of hard links - but, they are more flexible -
they can be created for regular file and
directory files - they can be created across
unix/linux file systems - due to these reasons
, they are more popular than hard links - particularly,
they are useful when it comes to directories,
where softlinks can save a lot of space and
provide convenience .