chandans' ejournal

The EZone where I will post all informations and resources I found interesting and useful for me and like minded people. I am interested in computers, PC games, various open source softwares, latest tech applications and informations resources in the web.

Name:
Location: Halisahar, West Bengal, India

go http://chandansaha.tripod.com/personal/

Wednesday, November 01, 2006

ISSUES INVOLVED IN SETTING UP AN INSTITUTIONAL E-PRINT REPOSITRY WITH SPECIAL REFERENCE TO THE UNIVERSITY OF KALYANI

Dissertation on
ISSUES INVOLVED IN SETTING UP AN INSTITUTIONAL E-PRINT REPOSITRY WITH SPECIAL REFERENCE TO THE UNIVERSITY OF KALYANI

Submitted by :
Chandan Saha
Course : M.L.I.Sc
Session : 2005
Examination Role: 99/MLI No.050009
Department of Library and information science
The University of Kalyani
Kalyani, Nadia

Dissertation on

ISSUES INVOLVED IN SETTING UP AN INSTITUTIONAL E-PRINT REPOSITRY WITH SPECIAL REFERENCE TO THE UNIVERSITY OF KALYANI


GUIDE: MR. ARUP ROYCHOWDHURY

Submitted by :
Chandan Saha

Course : M.L.I.Sc
Session : 2005
Examination Role: 99/MLI No.050009




Department of Library and information science
The University of Kalyani
Kalyani, Nadia
ABSTRACT
Day-by-day scholarly publications are becoming costlier and unmanageable for any library or information centers to collect them all. There are other factors also affecting access to scholarly publications. To overcome these barriers, and to make access easier and barrier free, e-print repositories are demand of time. There are lots of issues affecting setup of an institutional e-print repository. But most important part of them is its policies. A well thought policy could make it successful and acceptable to all concerned. Policies and issues relating to institutional repositories are discussed with special references to KU; and recommendations for setting up an e-print repository are made. Some other key factors are also controlling setup of institutional repositories. Advocacy methods are discussed in brief. Some problems relating to setup-like legal issues, role of University/parent body, metadata issues, control over archives, selection criteria, administration, submissions and impact on staffs etc. are explained in brief. An attempt has been made to formulate a model guideline for University of Kalyani to set up an institutional e-print repository. Some possible solutions for problems are also indicated. Technical aspects including LAN and software selections are also dealt with.

PREFACE


Researchers publish their work to inform every interested person to know about their findings. But publishers has created access barrier to them by demanding high access toll. There are other types of barriers also. Information and communication technology has developed to a great extent. Internet has become a popular media of communication and exchange of views among peer groups can be very easy through it. A movement has initiated world wide to make research output free from grip of publishers and make access bearer free. Internet can act as a communication medium for the purpose.
To make computation free from monopoly of commercial organizations, open source software came into play. This results various open source software, including open source Operating Systems. These software and OS are downloadable free of cost, and can be customized based on requirements-as they are open source. This provided advocators of Open Access Movements to go for seating up e-print repositories where authors can archive an electronic copy of their research output for toll free access. This generated concept of Institutional e-print repositories.
Institutional e-print repositories are those repositories setup by any institution to archive their research output. Various software are available over Internet freely download able and can be used to setup an institutional repositories. But the most important part of it is its policies. Only a full proof well thought policy can built an Institutional e-print repository to go for a long run. Here in this dissertation, these policies and factors that can affect it are discussed based on existing literature on open access archives and repositories. Technicalities and software related issues are also taken into consideration. An attempt has been made to prepare a brief guideline relating to policy issues for seating up an institutional repository for the University of Kalyani. An attempt has been made to install Dspace software to show that it may serve the purpose.
This has done for the partial fulfillment of M.L.I.Sc course. Hence, this is a very vast job and requires a long time to consider all factors. It is just next to impossible to enumerate all aspects of policies and all related issues in a time limited project. 45 days are even not well enough to consider all aspects of technical issues, so I had to restrict myself in just in installation. I could not found any time to customize it for requirements of KU repository. A lot of scope has left for further development relating to those issues and finding problems and their solutions.

ACKNOWLEDGEMENT

I have taken my dissertation on 'Issues involved in setting up an Institutional E-print Repository with special references to the University of Kalyani which was a vast job for me, specifically within in just 45 days, a very short period. During this preparation, my guide was Mr. Arup Roychowdhury-Deputy Librarian of Information and Documentation Center, ISI, B.T.Road Calcutta. He just not only guided me, but also helped me to complete this vast job within a very short period. He helped me most. I will remain ever thankful to him for his kind guidance and simultaneous hard labor he gave for me.
I want to express my cordial thanks to Dr. A.R.D. Prasad, Associate professor, DRTC, ISI (Bangalore). Without his help, technical problems could stop me to install Dspace software. I want to thank him a lot for his kind directions to the package they build for ease of installation.
I want to thank my teachers of MLISc courses, Mr. Dibyendu Paul, Mr. Sabuj Dasgupta (former Head of the Dept.) and Mr. Bidhan Chandra Biswas (Head of the Dept.) for their kind cooperation during the course of study. I also want to thank Mr. Swapan Kumar Roy and other staffs of the department for their cooperation during the course of study.

I have to thank Mr. Mriganka Mondal, Assistant Librarian (Library in-charge) and Mr. Swapan Dasgupta of University Internet center for their kind permission of using his personal information resource during the project. I want to use this opportunity to thank Mr. Joydip Chandra-our senior friend, and my other classmates who encouraged me in different times during the course of study
.

Name of the student


(Chandan Saha)
Course: M.L.I.Sc
Session: 2004-05
Examination Roll. 99/MLI No.050009




LIST OF CONTNTS
List of Abbreviations Used
Abbreviations Full form
Archive e-print archive(here)
DCMES Dublin Core Metadata Element Set
DSpace Disk space software
EPrints Eprints software
e.g. Example
etc. etcetera
HDD Hard disk drive
H/W Hard Ware
Internet Inter-network
IRs Institutional Repositories
KU the University of Kalyani
OA Open Archive
OAI Open Archive Initiative
OAI-PMH Open Archive Initiative Protocol For Metadata Harvesting
OS Operating System
RAM Random Access Memory
ROM Read Only Memory
S/W Software

GLOSSARY
Dspace: Free software for producing an archive of eprints. Provided by http://sourceforge.net/projects/dspace/
eprint : An electronically published research paper (or other literary item).
EPrints : Free software for producing an archive of eprints. Provided by www.eprints.org/
eprint archive :An online archive of preprints and post prints.
May or may not running using EPrints software.
OA: Open Access- restriction free access to use documents for academic purpose (in electronic archives here).
OAI: Open Access Initiative. From their mission statement "The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content."
OAI-PMH: Open Access Initiative-Protocol for Metadata Harvesting. A way for an archive to share it's metadata with harvesters which will offer searches across the data of many OAI-Compliant Archives.
OAI compliant: An archive which has correctly implemented the OAI Protocol.
Post print: The digital text of an article that has been peer-reviewed and accepted for publication by a journal.
Preprint: The digital text of a paper that has not yet been peer-reviewed and accepted for publication by a journal
INTRODUCTION

Writing is a method innovated by human beings to preserve their intellect and carry it to the next generation. From ancient past to present era of artificial computing, writing is a proven method to disperse one’s experiences and share his knowledge to others. Through writing their works and experiences, authors also want to gain fame among other people. Due to invent of technology, dispersing of knowledge became easier. Publication of journals for scholarly communication started. Its aim was to disseminate research information, intellectual works and sharing knowledge among peer groups. But, there are lots of barriers to serve the noble purpose. Geographical distance, communication gap, lack of access, lack of awareness of previous works etc. are some of them. To over come some of them, publishers started commercially production and dissemination of scholarly journals all over the world. This generated another barrier– access tolls. Now a day, most of all the scholarly publications are controlled by commercial organizations for profit making purpose. This controls access to scholarly publications by mass, leading to duplication of works, wastage of time, money and energy.
During last decade of 20th century, Internet became popular medium of communication, and become a newer platform of publication. With advent of information and communication technology and tremendous development in computing, Internet becomes very popular and affordable to most of the people, even in developing countries. Simultaneously, the world of publishing has undergone many changes. Purely paper based publication is slanting towards electronic publications for some extra advantage, like ease of retrieval and accessibility world wide. Publishers have started e-copy service to patrons. Authors write their research outputs and findings in articles. Government and other agencies fund for them. Authors write for fame. Generally they do not get money from publishers for writing. But publishers are the people who generate money for business with them. They control access on those publications and force scholars to spend for it. Library and information centers are the agencies responsible for providing information services to scholars. To serve the purpose, they have to collect articles and subscribe a very big amount to vendors as access toll. Increasing cost of journals force library and information centers to cut their list of preferred journals to cope up with their budget. This again generates barriers to access the scholarly world of communications.
Open access is the only probable solution to this problem. Different renowned persons worldwide has opined in favor of it. Technologies have made it possible to setup electronic document archives following internationally acceptable open standards and using open source software. The output shows that, open access articles get more citation and more read by authors. So impacts of open access articles are increasing in scholarly communications. Though open access movement has started worldwide, but till now, very poor in number and volume. Many of the researchers are till now unaware of it. So, they are also suffering from access problems. Library has an important role here to make the activity popular. Institutions like Universities and research organizations can also play a very important role. They can setup institutional repositories and preserve their research output, and make it freely accessible worldwide. It will again present their activities to the world. These can automatically act as active components of a world wide scholarly storage area network, and in future, will remove access barriers to scholarly communications to a large extent.
The most important part of starting an institutional repository is its policy and issues. Well planned policies can avoid many unwanted problems generated during implementations of policies. More over, policies should be made keeping eyes on future of the repository. Getting advantage of latest technology and open source software may be an important aspect to reduce costs and make it easy to start quickly.
Scope and coverage of the work:
In this dissertation, an attempt has been taken to discuss different policies and issues relating to setup an institutional repository including a touch to it’s technological parts. After that, a brief guide line has been made regarding policies for setup an institutional repository at ‘The University of Kalyani’.
Research problems:
Prepare, “a guideline for setup an institutional repository for ‘The University of Kalyani”. This includes discussion of different policies and issues including technical aspects aiming that while going to setup the repository, university may not need to do some major changes in it.
Literature Study:
Various literatures available over Internet on e-prints repositories, institutional repositories, their various issues, policies, experiences gained by practitioners are studied, including some forum news letters, too.

Relevance of the Study:
Institutional repositories are demand of time. The University of Kalyani authority is thinking of setting up it. They have to discuss over various policies and issues relating to it. The dissertation has been prepared keeping this in mind. If a model guideline can be made for the University, they may adopt it without major changes in it. This will help the University to avoid many unnecessary problems in future and save a lot of time, too. This also includes some general discussions over various issues and policies, which will help them to develop a primary idea about those issues. This may be also helpful for other University or any institution going for setup an Institutional e-print repository.
Objective:
Prepare a model guideline for setup an institutional e-print repository for ‘The University of Kalyani’.
Hypothesis:
Preparation of model guidelines for setup of institutional repository is possible based on studying existing literature dealing with experiences of others.
Area of Study:
Open access, open source, e-print archives, institutional repositories, standards related to them (OAI-MH), their legal issues, administrative structure and policies,
Technicalities, open archive software, requirements relating to hardware peripherals, etc. are studied. Then their scope of applications in the University of Kalyani is presented as a model guideline.
Tools and Techniques of data collection:
Data are collected by searching Internet using Google search engine. Links of different institutional repositories and articles available in some open access e-journals are also used for data collection and software download. Personal resources are also used.

Open access:
Open access is where electronic versions of scholarly materials are available free at the point of use to anyone who wants to read the. Open access basically calls for scholarly publications are made freely available to libraries and end users. This can be done in two ways [Oppenheim, Charles.2005. Open access and UK Science and Technology select committee report : free for all?. Journal of librarianship and information science. 37,1. p4] :
Ø Publishing in a n open access journals, or
Ø By depositing in an electronic repository, which is searchable from remote locations with out any restrictions in access to them, and use their resources for academic purposes free of cost.
In 1989, the first open access (i.e. no subscription price) fully peer reviewed electronic journal ‘Pcycoloqug’ was launched. At present, there are around o thousand open access journals are present over the web. Steven Harnad was the editor of the journal.
At present, Steven Harnad is one of the leading advocator for open access e-print repositories. Repositories are good alternatives for open access e-journals. E. M. Corrado [Corrado, E M. 2005.the importance of open access, open source, and open standards for libraries. Issues in science and technology librarianship. Available at http://www.istl.org/05-spring/article2.html ] sid that J. Willinsky has identified nine aspects of open access as follows:
a) E-print archives (author’s self-archiving pre or post prints);
b) Unqualified (immediate and full open access versions of a journal);
c) Dual mode (both print subscription and open access version of a journal);
d) Delay open access (open access is available after a certain period);
e) Author fee (authors pay a fee to support open access);
f) Partial open access (some articles of a journal are available via open access);
g) Per-capita (open access available to countries based on per-capita income);
h) Abstract (only abstracts and table of contents are available for open access);
i) Co-operational (institutional members support open access journals).
The advantage of open access is:
Ø A moral /ethical argument that it allows people all over the world to gain access at no cost.;
Ø And, the argument that it means the article is seen by more people & there fore, has a greater impact.
Ø It ensures long time access to scholarly articles. Libraries and others can create a local copy and repositories of such literatures, and can ensure continual access via their repositories in distant future.
Ø It’s message is diffused more widely than by subscription based journals.
It is observed that, articles that are available online free of cost are cited many times more than those that are not available this way.
Open access initiative
Open access movement is the worldwide effort to provide free online access to scientific and scholarly literature, especially peer-reviewed articles and their reprints (http://www.earlham.edu/~peters/fos/timeline.htm ). The concept was not very new, but the movement started in 1990. Steven Harnad, a renowned professor of Philosophy and first editor of open access journal is the strongest advocator of it. LosAlmos arxiv database – the oldest archive of pre and post print of Physics is the oldest one and running successfully for more than 10 years. In the dawn of 21st century, it strengthened.
Due to the advent of Internet and telecommunication technology, channels of communication among scholars worldwide have opened. So, demand for access to all scholarly publications seems to become effective by establishing e-print archives. Papers archived by authors in their institutional archives & cross-search facility among such archives will provide scholars access to the world of scholarly publications, irrespective of their actual location. Institutional repositories will be interlinked to produce global database of scholarly publications. To serve these purpose, different archiving software are available to build such databases. To build a data base, the most important thing is Meta data incorporation. Meta data are fields which are indexed and can be searched.
Metadata:
Meta data is ‘data about data’. According to the Association of American Publishers, “Metadata is information that describes content. An every day example is a card catalog in a library, an entry in a book catalog, or the information in an online index” [MICI Metadata Clearinghouse (Interactiv) (homepage). Available at http://www.metadatainformation.org/ ] .But W3C thinks it as “Metadata is machine understandable information for the web”[Metadata and Resource description. Available at http://www.w3c.org/metadata/ ]. Though it does not bear the actual meaning, but for our purpose, this may represent our need.
At presence, there are different metadata schemes at work. But very popular of them is ‘The Dublin Core Metadata Element Set (DCMES) ‘. Some others are ‘The Visual Resource Association Core Categories (VRA Core)’ , ‘The Encoded Archival Description (EAD)’ etc. DCMES is a simple set of descriptive data elements intended to generally applicable to all types of resources. This is developed by Dublin Core Metadata Initiative. This includes some qualifiers to enhance its’ scope of application. But till now, it is not self sufficient to describe all types of bibliographic elements with all necessary fields. So, for e-print archives and repositories, it can not function to serve all the purposes. Local variations can not be recommended for the sake of international data search and interpretability.
OAI-PMH:
For the repository to provide access to the broader research community, users outside the institution must be able to find and retrieve information from the repository. Therefore, systems must be able to support interoperability in order to provide access via multiple search engines and other discovery tools. An institution does not necessarily need to implement searching and indexing functionality to satisfy this demand [Crow, Rayam. SPARC institutional repository checklist & resource guide. Available at http://www.arl.org/sparc/IR/%20IR_Guide.html ]. it could simply maintain and expose metadata, allowing other services to harvest and search the content. This simplicity lowers the barrier to repository operation for many institutions, as it only requires a file system to hold the content and the ability to create and share metadata with external systems.
Interoperability requires persistent naming, standardized metadata formats, and a metadata harvesting protocol. The metadata harvesting protocol allows third-party services to gather the metadata from distributed repositories and conduct searches against the assembled metadata to identify and ultimately retrieve documents. These mechanisms can be applied to any type of compliant e-print repositories & digital library, creating a global network of digital research materials.
The Open Archives movement spawned the Open Archives Initiative (OAI), which was established to develop and promote interoperability solutions to facilitate the dissemination of content. The OAI is a collaborative effort to develop interoperability mechanisms that facilitate access to distributed digital content in the academic environment. The OAI provides the framework for facilitating the discovery of content in distributed repositories.
The OAI developed a set of interoperability standards called the OAI Protocol for Metadata Harvesting (OAI-PMH), which allows repositories to create metadata to describe content stored in the repository and make it available to others who wish to use it. The OAI OAI-PMH supports the interoperability of digital repositories irrespective of type (institutional, discipline-specific, commercial, etc.) or content.
Making repositories OAI compliant:
The OAI maintains a list of OAI-compliant repositories from which OAI Service Providers can harvest metadata. To participate in this process, a repository must register with the OAI, once the institution's repository infrastructure is in place. The OAI certifies that a repository is fully compliant by validating the repository's metadata using a program that issues periodic OAI queries. Once these checks are complete, the OAI confirms the registration with the repository and adds the repository to the list of data providers.
The OAI protocol requires that repositories offer the 15 metadata elements employed in unqualified Dublin Core metadata. However, the OAI protocol supports parallel metadata sets, allowing repositories to expose additional metadata specific to the repository's specific needs. Repositories that add domain-specific metadata sets to the Dublin Core should do so in consultation with other repositories to ensure a standardized presentation of these extended metadata sets.
What is metadata harvesting
Metadata harvesting means gathering metadata. Data providers collect metadata from archived e-prints. Again, service providers collect these metadata for preparing a combined large searchable user-friendly interface. But, they can gather metadata if archives are OAI-compliant. This whole process is popularly known as metadata harvesting.
Data providers and service providers
The OAI framework posits a publishing model that separates data providers (including institutional repositories) from service providers (metadata harvesters, search/retrieval, and other value-added access tools). Institutional repositories may serve both roles. Data providers provide metadata for harvesting. Service providers gather all those metadata together and provide service with it. They provide search facility for users. The efficiency of service provider thus depends upon data providers also. So, it is data providers’ responsibility to make their archives OAI-PMH compliant. Thus, together, both data providers and service provides play crucial role to serve users.

What are e-prints
The term eprint/e-print bears different meanings to different people. EPrints glossary at http://www.eprint.org/glossary says e print as “An electronically published research paper (or other literary item).” They are electronic copies of academic research papers. Budapest Open Access Archive FAQ says e-prints are the digital texts of peer reviewed research articles, before and after refereeing. These eprints are divided in to two categories:
Pre-prints, and
Post prints. Eprint.org defines pre-prints and post prints as follows:
Pre-prints: The digital text of a paper that has not yet been peer-reviewed and accepted for publication by a journal.
Post prints: The digital text of an article that has been peer-reviewed and accepted for publication by a journal. This includes the author's own final, revised, accepted digital draft, the publisher's, edited, marked-up version, possibly in PDF , any subsequent revised, corrected updates of the peer-reviewed final draft. The watershed separating preprints from post prints is whether they are before or after peer-review and acceptance for publication
E-print includes both preprints and post prints.
They may be:
· Journal articles,
· Conference papers,
· Research reports,
· Book chapters bearing research output, and
· Other forms of research outputs, etc.

E-print archive
An e-print archive is simply an on-line repository of research output, either in preprint or in post print form. These are collection of digital documents. Eprint.org defines ‘e-print archive’ as ‘an online archive of preprints and post prints. Possibly, but not necessarily, running on Eprints software’. Generally they are available free of cost over the web. OAI compliant e-print archives share the same metadata, making their contents interoperable with one another. Their metadata can then be harvested in to global “virtual” archives that are seamlessly navigable by one another.
E-print archives may be institutionally –located and administered, in which they are usually called institutional e-print archives. Or they may be subject specific archives physically located at a suitable side and, commonly mirrored elsewhere. The content is open to access by all. They may be pre-print only archive. Or contain both pre-print and post prints.

Objectives of e-print archives
Main purpose of e-print archives is to provide access to scholarly publications archived there in. Institutional e-print repositories provide scope of archiving:
· Animation;
· Article;
· Book;
· Book chapter;
· Course materials;
· Conference papers, posters, proceedings etc.
· Dataset;
· Learning Object;
· Image ;Image,3-D;
· Map;
· News letters;
· Plan, Blueprints etc.
· Preprint;
· Presentations;
· Recording, acoustical; Recording, musical; Recording, oral;
· Research reports;
· Software;
· Technical Report;
· Thesis and Dissertations;
· Video;
· Working Paper;
· Others;
It helps to:
· Preserving materials,
· Self archiving,
· Increase impact of research outputs,
· Shows productivity of the organization,
· Increase access to archived materials,
· Disseminate information in a faster way,
· Provides scope for enhanced citation analysis, etc.
Thus, e-print archives may become another face of the institution as well as research workers to the web of scholarly publications.

Self-archiving is not self-publishing
We know, e-print = pre-print + post-print. Post-prints are those articles published in some peer-reviewed articles some time in somewhere, or accepted for publication. That means the write-up has gone through some screening and reviewing process. This means information content in that piece of writing is authentic, and accepted by a group of peers. Researcher may rely upon them without hesitation.
Those pre-prints who, yet have not published or accepted for publication in any peer-reviewed journal implies, authenticity of the content may subject to criticism. As the content is not discussed among reviewers and peers have not comment on it, researchers hesitate to use those data because it may arise questions of authenticity and reliability of their own work. Thus, it loose citations. To handle this problem, institutional repositories may follow some reviewing policy like
Setting up a review board/committee to review pre-print articles’ thought contents before archiving them. Thus it becomes a reviewed article, and gets authenticity and reliability. The standard of archived materials also be maintained.
May mention in the article heading as ‘pre-print’ or ‘non-reviewed’.
But, this will arise some questions like-
Who will be members of that review committee?
It may be subject specialists or teachers of that subject in that institution. Specialists and educationalists from other institutions may be involved, if possible. This may be totally voluntary service for the sake of knowledge enhancement.
What will be their guidelines for reviewing?
This should be prepared by a group of experts and experienced reviewers, and subject to be revised in course of time.
What will be review policy? –like
Ø Date line of review
Ø No. of specialists engaged in review process
Ø Impartiality of reviewer,(this question arises in case of institutional repositories. As reviewer from that institution will know each subscriber personally; this may influence him to be soften or harden in case of some subscriber. Involving some specialists from different institutions or reviewing by more than one specialist can reduce this chance.)

Model centralized subject based vs institutional e-print archives
From perspectives of archiving agencies and materials, e-print may be distinguished as
1. Subject based archives/E-print archives,
2. Institution based e-print repositories.
In subject based archives, only documents dealing with the particular subject are archived. Their model of collection is centralized, and they try to collect the entire document published on that subject. A good example of this is Los Almos Arxiv Database – a pre and post print repository of articles covering various branches of physics.
But, institutional repositories are setup to archive and providing access to publications of institutional members. This is a way of measuring total productivity of that institution. Distributed, institution-based self-archiving benefits institutions in different ways [p.28…..]:
a) It maximizes the visibility and impact of their refereed research output.
b) It maximizes researcher’s access to the fully peer-reviewed research output.
c) By providing such access, library can reduce their subscription to serials to some extent.
d) This highlights research activity of that institution at a glance.


Administrative Issues

An institutional repository (IR) is a digital archive of an academic institution's intellectual output. Institutional Repositories adhere to an open access model, by centralizing and preserving the knowledge of an academic institution and making it accessible to anyone with internet access.. But setting up IR is not a very tough job. The most important part of it is preparing a fool proof plan and then executing them. This involves various steps enlisted below.
First, we have to decide it’s purpose. Institutional repositories are not discipline-specific, and aim to archive the entire range of a university's intellectual output. So, specific requirements are to be jot down. Some of them are:
Ø Open access nature: the archive should be accessible to any person have internet connection.
Ø To form part of a larger global system of repositories,
Ø Should be indexed in a standardized way,
Ø Should be searchable using one interface and be user friendly,
Ø Should provide foundation for a new model of scholarly publication,
Ø Should help to built a worldwide distributed database of scholarly publication,
Ø Should help to development of knowledge,
Ø Can submit from remote places,
Software:
Based on all the needs of the institution, software should be selected. At presence, different software are available for this purpose, e.g. CDSWare, Dspace, Eprints, Fedora etc. Some price based customized software are also available and distributed by vendors on specific conditions. But during last decade, a lot of open source software became available free of cost over internet. Their supporting software (web server, programming language, compiler, database builder etc.) are also available over internet free of cost, and most of them are open source too. Most of the webs browsers are also support them. [A brief comparison of different institutional repository software is available in ‘OSI Guide to Institutional Repository Software v2.0]’. These are working very well and lots of different repositories and open access e-print archives are running for a long time. As they are pen source, one can customize them as per requirements. So choice of software and corresponding operating system will not be a very trivial job.
Hardware:
While once the software and operating system is decided, corresponding hardware requirements are to be checked from their web sites. It is found that, in general no specific requirements are mentioned in different software’s sites. But service speed and reliability of archive depends quality of hardware peripherals. So, it can be opined that, a latest configuration with a very high speed processor, big volume of Random Access Memory (more than 1 GB), high capacity SCSI Hard Disk Drive with high rotation speed (7200 R.P.M) may ease its functionality. More than one physical hard disk drive is recommended for security purose. More than one HD may be used. In that case, crash of one physical disk will not damage all data. A good speed modem is essential also.
Network Infrastructure:
Existing network infrastructure should be considered. IR server requires a 24 x 7 connection, with high internet speed. This server is required to be available over LAN and Storage Area Network (SAN) [OSI Guide to Institutional Repository Software v2.0] also.
Customizing Software:
While selecting software, some more points [OSI Guide to Institutional Repository Software v2.0] should also be considered-like
Ø Programming in which it has been developed: they are mostly PERL , PHP and java.
Ø Staff requirements: UNIX systems administrator, Java programmer, PERL programmer, Python programmer, Network knowledge etc.
Ø How much software are to be installed,
Ø Avail able in package or requires separate download,
Ø System registration,
Ø Allowing registration with specific interests,
Ø Help for pass word recovery for users forgotten their account password,
Ø E-mail alerts to registered members about presence of new archived material related to their interest,
Ø Requires distribution license or not,
Ø Content submission, administration etc.
Ø Submission support: through e-mail notification to administer or personalized system access to registered users for submission etc.
Ø Ease of content export – import, size of file uploading restrictions, file format restrictions, support of multiple comprised files together, etc.
Ø Metadata support,
Ø Indexing facilities : limited in words or supports full-text indexing,
Ø Supports modification of user interface or not,
Ø Multiple language interface supports or not,
Ø Discussion forum support,
Ø Search facilities like: Boolean logic, truncation, wildcard in metadata and in full texts etc.
Ø Browsing by: authors, title, subject, issue date, collection type etc. should support.
Ø Scope of customizing search facilities by administrator for whole database,
Ø Cross searching in multiple databases at a time,
Ø Searching in more than one language database at a time,
Ø Availability of help desk, etc.
Availability of all these are not mandatory, but will be helpful to provide best services available to users.
Metadata:
Institutional repositories contain various types of bibliographic materials, like articles, dissertations, thesis, research reports or even study materials. To make them searchable, Institutional repositories must incorporate, index, and search items from diverse collections in diverse formats. They have to deal with writings of different levels (e.g. dissertation for Masters Degree and for M.Phil, PhDs etc.). They have to deal with standard vocabularies from many different fields of study; and include metadata to all types of contents. Unqualified Dublin Core (http://www.dublincore.org/ ) is the minimum metadata required for OAI interoperability [A Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/projects/institutional_repositories/setup_guide-e.html]; however, depending on the type of content in the repository, may include other metadata sets.
OAI is based on exchange of metadata. So, to make the archive effectively OAI compliant, right metadata incorporation is essential. Most of the repository software is OAI compliant, so Dublin Core Meta data element set can be used in general. But it may not work well for some types of publications like research papers, thesis etc. In India, UGC in a Higher Education Information Systems Project (HISP) [available at http://www.ugc.ac.in/new_initiatives/hisp.html ] has planned to develop a knowledge repository. They opined to develop a mechanism for tracking academic information resources such as learning resources, curricula, question banks, national theses etc., published in various formats through systematic, internationally used metadata data framework for tagging such resources. They have taken initiative to create a Knowledge Repository for communities of teachers and researchers in the Country. So, this could be helpful to rid over this problem, as this will implement that metadata set all over India.
Document Types To Be Archived:
Institutional archives may decide to incorporate documents like:
Ø E-prints,
Ø E-books,
Ø Working papers,
Ø Journal articles,
Ø Pre-prints and post prints,
Ø Thesis and Dissertations (of various levels),
Ø Research and technical reports,
Ø Departmental and news centers newsletters and bulletins,
Ø Project reports,
Ø Seminar volumes,
Ø Conference reports,
Ø Important guidelines/instructions,
Ø Committee reports and memoranda,
Ø Papers in support of grant applications,
Ø Surveys,
Ø Technical documentations,
Ø Study materials/course materials for different level ,
Ø Photographs,
Ø Audio/video recordings,
Ø Statistical reports,
Ø Different supplementary information’s of University publications, etc.
All these may be collection types of the university Institutional repository.
But all of them should have following [Crow, Raym. 2002. Institutional Repository:checklist & resource Guide.(Washington, DC: SPARC). Available from http://www.arl.org/sparc/ ] characters:
Ø Scholarly—the material is research- or teaching-oriented;
Ø Produced, submitted, or sponsored by an institution’s faculty (and, optionally, Students) or other authorized agent;
Ø Non-ephemeral—the work must be in a complete form, ready for dissemination;
Ø Licensable in perpetuity—the author must be able and willing to grant the institution the right to preserve and distribute the work via the repository.
Materials that satisfy the above requirements might include working papers; conference presentations; monographs; course materials; annotated series of images; audio and video clips; published (or pre-published) peer-reviewed research papers; and supporting material for published or unpublished papers (for example, datasets, models, and simulations) etc. While repository content may thus be defined broadly, some repositories may elect to focus initially on text-based materials, even though they anticipate broadening coverage over time. Additionally, in the interest of encouraging participation and acquiring material to populate pilot and demonstration projects, some repositories may choose to adopt more relaxed (and possibly temporary) guidelines for content in the repository’s initial stages.
Document Format:
Generally it is found that archive software supports Postscript, PDF, ASCII, HTML, etc.
Ø Postscript: PostScript (PS) is a page description language used primarily in the electronic and desktop publishing areas. There are a number of advantages to using PS as the display system. It helps in printing the document, allows for the "dumping down" of printers. But the main advantage in using PostScript as a windowing system is that it allows one to write desktop publishing (DTP) and other graphically-intensive applications with a single set of graphics routines. The same code that is drawing to the window can be used to draw to the printer without any translation. DTP applications on traditional systems require the programmer to construct the GUI editor in the platform's own graphics system (for example, QuickDraw on the Macintosh, or GDI on Microsoft Windows) and then write additional code to translate the graphics into proper PostScript for printing. This often takes up the majority of the programming effort on such projects and is a major source of bugs [Postscript from Wikipedia, the free encyclopedia available at http://en.wikipedia.org/wiki/PostScript.HTML ] .
Ø PDF : Portable Document Format (PDF) is a file format developed by Adobe Systems for representing documents in a manner that is independent of the original application software, hardware, and operating system used to create those documents. A PDF file can describe documents containing any combination of text, graphics, and images in a device independent and resolution independent format. These documents can be one page or thousands of pages, very simple or extremely complex with a rich use of fonts, graphics, colour, and images. PDF is an open standard, and anyone may write applications that can read or write PDFs royalty-free.
In addition to encapsulating text and graphics, PDF files are most appropriate for encoding the exact look of a document in device-independent way.Free readers for many platforms are available for download from the Adobe website (http://www.adobe.com/products/acrobat/ ).PDF is primarily the combination of three technologies: a cut-down form of PostScript for generating the layout and graphics, a font-embedding/replacement system to allow fonts to travel with the documents, and a structured storage system to bundle these elements into a single file, with data compression where appropriate. [Portable Document Format from Wikipedia, the free encyclopedia. available at http://en.wikipedia.org/wiki/PostScript.HTML]
Ø ASCII : the term stands for American Standard Codes for Information Interchange. This is independent of platforms and application software. Any piece of writing can be done in this format, but has some limitations too. It can’t embed image or graphics and links in it, and can’t be made looking attractive.
Ø HTML: stands for hypertext Markup Language. This is an open standard used to create web page. The major advantage of it is Hyperlinks to connected pages. This is the most widely used web formatting language and easy to use. But the structure varies depending on browsers.
All these formats can be used to accept writings . XML or MSWord formats can also be accepted. PDF should be preferred and encouraged. Else, the administrator can change other formats to a PDF file format. This should be clearly mentioned in the guide to submission section.
Subject Headings:
Incorporation of subject headings is the most crucial job. Depending on its efficiency, recall precision ratio of search through subjects largely depends. So, Identifying useful set subject headings is one of the major challenges for repository implementers. In institutional repositories, various subjects are to come based on nature of publication. More over this is going to archive research papers which generally deal with very micro thoughts, some times again of inter disciplinary subjects. So, no existing list of subject heading can exhaustively produce exact subject headings for them. Broad subject headings may be appropriate for a single institutional repository. In this case, LCSH can be used. However, as access to institutional repositories becomes federated, it becomes more problematic [A Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/%20projects/institutionalrepositories/setupguide-e.html]. A user can’t profitably browse papers from a variety of repositories that use very different subject terminologies for representing a single concept. So, while considering world wide accessibility and cross searching facility, one has to think over internationally acceptable and widely used subject headings of any subject. Therefore, international conferences or discussions should be done on that matter.
Another way to rich uniformity is developing an open Subject heading list cumulating widely used subject terms in standard forms in international level. This should be accompanied with an exhaustive Vocabulary control device (e.g. thesaurus) containing all local variations of the standard term used. The software should be compliant with that subject heading list and should include that thesaurus within it. Whenever a query comes in nonstandard form, it should simultaneously convert it into standard term and recall all the entries done in that terms associated. Another option of incorporating a list of standard terms should be included with hat software interface, so that one can select terms from that list.
In India, UGC may take initiatives to prepare guidelines for incorporating subject heading lists for institutional repositories. Only this way, they can bring uniformity among different institutional repositories spread over India. Another way is incorporating a set of descriptors that are helpful for both specialists of that field and new learner of that subject [A Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/%20projects/institutionalrepositories/%20setupguide-e.html]. But the chance of high recall exists in that situation. Until any initiatives are taken to form freely available popular list of subject headings in all subjects reflecting even up to micro thoughts, this seems to be the only acceptable solution.
Some software, like Eprints loads their subject headings hierarchy in the database and it is very laborious job to alter them after uploading some entries on them. So, before starting uploading, a good number of collections should be collected together. This can help to rid over repetitiveness and corrections on Meta tags of entries. So, selection of subject heading list is essential and any modification should be done at starting phase. Moreover, before selection some existing archives should be checked for their information of selection of subject heading lists. This can help a lot to develop policies.
Preparing Committees
Keeping the above said reasons in view, committee structure should be chalked out. The committee may be structured in two layers. The highest and most powerful body-the Executive Committee should consists of VC of the university, Selected members of the faculties and Union representatives of Teachers, students and researchers. High level administrative officials, Finance officials, legal officer and other selected member from administrative body should be included in this Committee. Chief Librarian will represent library staffs there.
In working group also, some faculty members and legal officials has to be incorporated. But this committee will be headed by chief librarian and assisted by experienced library staffs who will operate the program. (Details included in Committee structure recommended for KU). [Crow, Raym. Institutional Repository:checklist & resource Guide. (Washington, DC: SPARC). Available from http://www.arl.org/sparc/ ].
Securing Faculty Participation & Administration Support
Institutional repositories offer considerable benefits to the institutions that sponsor them and to the faculty, researchers, students, librarians, and others that participate in them. At the same time, institutional repositories might encounter resistance from administrators, faculty, and others who either fail to understand the benefits that such repositories can deliver.. Equally, understanding and systematically addressing the objections raised to repositories will prove crucial to faculty participation and to the ultimate success of each repository implementation.
The perceptions and attitudes of university administrators are critical to gaining the support necessary to validate a repository’s standing within an institution. Even where a repository is implemented and managed entirely as a library initiative, the nature and extent of the efforts required to gain faculty awareness and participation in the repository presuppose the buy-in of an institution’s administration and its willingness to reallocate resources and/or provide additional funding. The rationale for universities and colleges implementing institutional repositories rests on two interrelated propositions (SPARC) one that supports a broad, future oriented benefit and another that offers direct and immediate benefits to each institution that implements a repository. Administrators secure fund for any type of initiative. They can take decisions for taking new proposals for advancement of the institutions. They can permit or deny it. They have the power of implementation of rules. The highest body of the University in its’ court meeting can modify rules according to their requirement. So, their role in building Institutional repository in a University is very important. They can be interested to setup Institutional Repositories in the University if the library can convince them about its advantages. Some of the issues may be:
Ø Increasing costs of journals: Libraries subscribe for different journals publishing on specific subjects. They want to provide researchers with latest developments of their area of interest. But the major barrier to it is high cost of journal subscription. Both printed and online journals are used today, which demands a very big amount to spend every year. Thus, high cost of journals forces libraries to restrict them within a very short list of choice. Even big libraries can’t go for every journals of any specific discipline. Those articles, which are published in those journals, not purchased remains unavailable to researchers. Thus they loose a very big number of publications which may be relevant to them. If every institution sets up an institutional repository and make them OAI-PMH compliant, so that, every archives could be searched, then the access to research outputs will be easier and almost free. This will help to reduce libraries’ journal budget. This can act as a potential future cost savings as the marketplace responds to institutional initiatives; adducing the direct benefits—both tangible and intangible—that a successful repository delivers to its host institution. This can help institutions reaching corresponding industries to come for their help in R&D, and recruiting their students/ scholars for troubleshooting. After all, the administrators have to pay something if the institution is to retain its high stature and reputation for innovation.
Governments and institutions fund for research. Publishers publish them in journals and sell them to make profit. In most of the cases, authors are getting no monetary benefits from that article. But while the library goes, they have to pay to purchase that journal (including that article). Thus the same agency that funded the researcher for research work has to pay again for the same output in published form. Here publishers are getting profit for just publishing and distributing them on demand. This duplicate expenditure can be avoided if an institutional repository is set up. The researcher can publish the output in any journal, but he has to submit one copy in IRs, which will be freely available to all, and thus reduces the expenditure in long run.
Ø Ensuring barrier free access: Hence, this IRs will be OAI compliant, every person having internet connection can access to it. This will be indexed in index of web crawlers and will be accessible to everyone. Cross searching among different repositories and different databases connected through worldwide registration will be accessible. Thus, every body interested can access to resources archived in IRs. Moreover, it can be said that the repository as a long-term investment in changing the structure of scholarly communication helps change the current scholarly communication model—and weaken publisher monopolies on faculty generated contents. That can ensure barrier free access to members of that institution, and in future it can reduce restrictions on access to scholarly publications.
Ø Institutional visibility and prestige: As producers of primary research, it is only to be expected that academic institutions would take an interest in capturing, disseminating, and preserving the intellectual output of their faculty, students, and staff. Currently, much of each institution’s intellectual output is diffused through thousands of scholarly journals. While faculty publication in these journals reflects positively on the host university, an institutional repository concentrates the intellectual product created by a university’s researchers, making a clearer demonstration of its scientific, educational, social, and economic value. This brings the institution to the world. Those universities having IRs will be enlisted in repositories soft ware registration list like those are running in developed countries. This will make all aware of existence, productivity and relevance of the research work from different organization. An institutional repository and supporting metrics provide university administrators with demonstrable evidence of the institution’s quality. Institutional repositories help university and college administrators—including Development and Marketing officers—reinforce an institution’s brand position and prestige.
Ø New platform of getting to the world: While institutional repositories centralize, preserve, and make accessible an institution’s intellectual capital, at the same time they will—ideally—form part of a global system of distributed, interoperable repositories that provides the foundation for a new disaggregated model of scholarly publishing.
Ø Ultimate future of the publications: Experts says that Institutional repositories have a bright future. It is considered to be a well known platform of archiving research output and making it accessible barrier free to all interested for a long time. To form a bridge of global knowledge base, institutional repositories will work as bricks of them.
Administrators can support IRs by :
Ø Funding for setup institutional repositories. This includes startup cost and continuous expenditure for internet services and hardware peripherals, staff trainings, new recruitments (if necessary), digitization of older thesis/dissertations, advertising, organizing seminars etc.
Ø Preparing new rules: this may be essential to gather all scholarly publications by the authors. Such as, submission of articles’ one e-copy could be mandatory for getting next allotted fund for research to scholars. This will force them to submit one e-copy of their writing to the university’s institutional repository.
Ø Implementing those rules to every concern-student, Research scholars, teachers, other staffs etc. Watching whether some body is avoiding submitting a copy.
Ø Modify rules according to needs: if it is found that enough scope has left to bypass the rules by any concerned, then those rules may required to modify. For e.g., a student writes some article during his course of study and publishes it in some journal, but does not willing to submit a copy in IRs. These types of situations can be avoided with well thought rules, strong implementations of them, and a very good user education. There are different ways to make people concern about the benefits of IRs.
Ø Helps to rid over intellectual property issues: This is a headache of a lot of people in this electronic era. Publications become easier over internet in this time. So, one can prepare any document on any topic by coping from others and do not mentioning them in references. This is simply theft this can be avoided by making persons aware about what references are and why they are to be added. But the most important question lies in other section. Author writes and sends to publishers for publication. For publication, they have to signature in some sort of declarations. Publishers generally sign them in such statement that the author has not submitted any other copy for publications else where, and can’t publish it somewhere else without prior written permission of publisher. This may stop authors for subscribing the same piece of writing in IRs. But the fact is , IRs are archive of the article, not a publication.
Else, to avoid unwanted situations, preprints may be accepted. After review and publication in any journal, authors can modify preprints or add some more relevant information and update their database. They may also add some sort of addenda to show changes of rectification. This trick can’t be protected by publishers and seems to make output freely available. Some publishers allow submitting post print articles in some specific conditions. So, while selecting a publisher for sending an article, one can check his policies. Library can compile a list of publishers allowing post print submission.
v Software administration policies: this involves various aspects and policies, but largely depends up on nature of software used. As IRs software is open source and permits customization, specific requirements can be adjusted as per local requirements. So local variations are possible and relating policy changes.
Ø Author’s registration policy: every author has to register by filling up a simple form send from the software. This is an authentic process of communicating to a person concerned. Here comes another question. Who can register? in case of institutional repositories, it may be restricted
· within students, teachers, research scholars and staffs of that university;
· may extend to old students or faculty members;
· may include concerns of colleges undertaken of the university;
· Or may permit any person within a geographical area (say within the State) having a minimum specified qualification; etc.
For the first three cases, university registration number may be the parity to be asked for with residential address and communication number. For the forth case, It may be any proven identity card’s e -copy (like electoral id card or passport or a letter from the employer of that person proving authenticity of that person’s skills and qualification, or from where he got his PhD etc) may be asked for. But this may seems to be a barrier for submission and recommended not to imply such barriers.
Ø Submission policy: distributed submission and centralized uploading. This means authors having registration can submit their copy, and administrator will check metadata incorporated, standardize subject terms, change file format if required (authors should permit that in shake of technical ease and policies). Then that writing may be uploaded to the database by the administrator. Simultaneously, it should also be informed to author through e-mail.
Ø Editing policy: Any types of post print do not require any editing. Dissertations, thesis etc. are presented and verified by a well organized body of academicians. So, they again do not need to be edited. But, if authors want to modify some portions recommended during presentation and evaluation process, it could be done as errata/additional chapter and attached separately with the document. As IRs is proposed to incorporate preprints, editing becomes an issue of discussion. Any preprint that has accepted for publication and just a mater of time to come out, again need not to edit. Because, it has undergone through a screening process by some authorized body (incase, submitted in some peer reviewed journal). But while, it is just submitted, and has not gone through review process, this requires comments of editorial board.
· Editorial board: The University should form an editorial board consisting of senior teachers, research guides, academicians having editing experiences and subject specialists of the University. They may incorporate specialists from outside the institution. But the work will be totally voluntary and interest of concerns is highly expected.
· As, various subjects are taught in a University, and researches are a part of its activities, it will face a lot of different types of writings in different subjects. So, it is simply not possible to form a very large body of editors comprising subject specialists from every subject field in more than one number. So, a core editing committee with experienced editors and senior professors/deans of the faculty are recommended to form. They will send requests to other concerns to help as guest editor while required. A list of potential editors/subject specialists have to prepare for that purpose.
· Editing should be done by the core committee and at least two different specialists, one from the University and another from outside the University. This will make the process more acceptable, and avoid any types clash in view with existing specialist and the author (as they might be known to each other and their view may not be matched. Two is better than one.) The committee may ask for some sort of changes before uploading.
· This is a lengthy process and a trivial job, too. This is important to keep the standard of materials in the database. But another question arises simultaneously- whether this will be considered as University publication or not, because this is edited by a body formed by the university authority and it can recommend for changes in the writing.
· This total process of editing can be avoided by simply denying accepting pre-prints that has not accepted for publication till date of submission. But this will hinder purpose of archiving. More over, publication is a very lengthy and time consuming process. Delay in publication may lead to duplication of work. Another way of bypassing the trivial process is to mention them as preprints. But again, researchers would not rely upon their data, and may be misguided, if published some where.
Therefore, before selecting types of materials accepted, the highest committee has to decide editorial policies and prepare a clear management policy for archiving. This would also add some advice/conditions to the authors about updating them after publications. They also have to decide weeding out policy. If post print or updated version is archived, they can remove preprints. They can decide to remove some sort of publications seems to be invalid, for e.g. older rules and regulations while newer comes and implemented.
Ø Metadata incorporation: who will incorporate metadata, is an important question.
o Authors can do it by simply filling up some standard form available in the data base, and administrators will check them. This can be done while authors become aware of how to it. But in the beginning time, Library personnel can do it for authors and show them how to do it..
o If enough library staffs are available, the total process can be done by them. They just get required information from authors and fill up Meta tags. This will lead to more authenticity, and can avoid verifying.
o Library staffs can also work for authors as ‘proxy’. In such situations where Meta tags are to be filled in by authors, and they are unable to do so.

v Standardized indexing: this is a process required to make data base effectively searchable. A good index can enhance recall-precession ratio. These features are come bundled with the software. It can be also customized based on its’ requirements. As a library and information science student, I won’t recommend for free text indexing, as it will produce high recall and remove effectiveness of good subject headings.
v Searching: Searching can be broadly divided into two sections. First, Meta data searching; and then full text searching. Meta data searching will be done by service providers to facilitate cross searching among all databases connected. Full text searching will be done by users. Both should support Boolean, truncation, wildcard and any term searching. Search for Meta tags not indexed should also be facilitating.
v Maintaining, backup creation in a regular interval, updating backup, mirroring sites, indexing through widely used search engines and directories –like google, yahoo; enlisting in scholarly publication search like Google scholars, etc should be ensured.
Advocacy Methods
Advocacy methods may be distributed in two regions.
A. Within Institution
B. Outside the Institution
This advocacy is not a one-time job. Libraries and institutions should have to do it continuously every year as a part of their user education activities. This will make it aware every new comer to the institution. In Universities, fresher welcome ceremony may work as a platform for informing new students about the repositories. Every departmental head may inform students about it in their first address to new batch. Library may handover them a leaf let when they go for their user’s card. With user education, library may include discussions about IRs.
Library may place a notice board above their computerized catalog /catalog cabinet written in attractive colors describing how to use/subscribe in IRs.
Library should take initiative to help/guide writers to post their writings in e-print in beginning.




Some relevant issues
Institutional E-Print Archive Ensures Enhanced Accessibility
Authors write to share their experience and knowledge on a particular issue/branch of knowledge aiming to be known among peer groups. They want to be considered as human resources on that particular area of study. They want opinions of their peer group on their work. That leads them to the hall of fame. To day, an author’s success is measured by not only volume of work they produce or number of publications on peer-reviewed journals, but also through the number of citations they received. An author, while writing a research paper takes help of a number of documents and finally quotes them with it’s bibliographic details. Citation implies a relationship between a part or the whole of the cited document and a part or the whole of citing document. Thus citation is acknowledgement that one document receives from another. [Bibliometric studies : on Indian library & information science literature / Gayatri Mahapatra. – New Delhi : Crest, 2000 p7] Therefore, one of the aim of authors is getting more citation.
One of the primary conditions of getting more citation is to reach almost every person interested on that topic for a long time. If authors go for traditional print version only, due to limited circulation they can’t reach a major portion of peer groups. Again subscription based online publications also have limited access problem. Raising cost of serials/database/ online journals has created ‘serial crisis’. [Callan(Paula).The development and implementation of a university-wide self-archiving policy at
Queensland University of Technology (QUT): Insights from the frontline. In
Institutional Repositories: The Next Stage. Workshop presented by SPARC & SPARC EUROPE, November 18–19, 2004, Washington, D.C.]. Even large institutions can’t afford all the core journals of any specific subject. So they have to cut list of journals to cope up with their budget. This causes access barrier to scholars. Beside cost, limited circulation, restricted access, use condition, retrieval problem, time restrictions etc. are other barriers of disseminating writings among peers. More over they can’t satisfy needs of future user community for limited archiving policy. Open access journals are widely used, but they also have limited archiving policy. It is found that, after a certain period, they only provide content pages of back volumes instead of their full text. Thus it looses some extent of citation.
Again, readers can’t access or even be aware of existence of many publications of their interest. So, a chance of repetition of work, loss of time, money and energy, and wastage of manpower and intellect slows down development of society and knowledge. Thus readers also suffer a lot. Libraries can’t gather all publications and so, their services are also restricted within a very narrow lane.
Open access journals have brought some fresh air in this restricted environment. But different Open Access Journals (OAJs) have their own policies, own conditions and limitations- for e.g., a limited archiving period. So, it is next to impossible to publish all works of any institution in any OAJ. Moreover, they do not publish dissertations and thesis etc. Some official decisions, important work guides (e.g. guidelines for Ph.D), tutorials, etc. may also needed to be archived and accessed by all.
Open Access Institutional e-print archives/ Repositories are the only solution in this situation. Open Access institutional repositories can:
Ø Collect all publications of the members of that institutions,
Ø Organize/process them in a standard way,
Ø Show total output of the institution,
Ø Made them accessible to all through internet,
Ø Accessibility will be of 24x7 type, unconditional, barrier free,
Ø Could be indexed by search engine spiders, and indexed in their web index,
Ø Could be registered and allow users to search from any part of the world,
Ø Archive them for longer period,
Thus, IRs confirms enhanced accessibility of publications. It also helps readers to find all relevant things together. It also enhances scope of getting more citations for authors. Steve Lawrence [Lawrence(Steve). Online or invisible. Available at : http://www.neci.nec.com/~lawrence/papers/online-nature01/ ] investigated the impact of free online availability by analyzing citation rates. He observed that, more cited articles on ‘Computer Science’ are available online. He said, online articles may be more highly cited because they are easier to access and thus, more visible and more likely to be read. He opined that free online availability facilitate access in multiple ways, including online archives, direct connections between scientists or research groups, hassle-free links from e-mails, discussion groups, and other services, indexing by web search engines, and the citation of third party search services. Free online availability of scientific literature offers substantial benefits to science and society. In IRs, all the work will be freely available to all searchers. Again this will enrich ability of library and information centers to find scholarly publications over the world and produce tailored personalized services to each individual user.
In future, while all institutions will setup their own repositories and enable cross search facility among them, all intellectual output/ production of this planet will form a large, exhaustive and exclusive bank of knowledge and make access totally barrier free.
Intellectual Property Rights
This is a major concern about IRs. Generally, following issues rises with it:
· Publisher’s permission to submit a copy somewhere else;
· Copy or duplication/theft of thought content;
Generally, most of the publishers do not allow submitting a copy elsewhere. This arise a problem to authors. Some times, they demand subscriptions from authors to make their work freely available.
Duplication of thought content without citing the work is just theft of the original work. While institutional e-print repositories are going to available to every person worldwide, it will increase scope of such works called plagiarisms. These will thus, violets intellectual property rights. Only awareness and truthfulness of users are the solution of the problem. They have to make aware that, they can use data from those writings, but need just a citation.
Research institutions, Government of a country fund for R&D activities. So, they may form rules to subscribe a copy in institutional repositories.
Quality Control
What will be the essential quality of the writing to be archived- is a major issue. It may be :
Pre-prints: submitted before publication to any traditional/e journal;
Post prints: submitted after publication else where;
Peer reviewed: reviewed by some peer groups and published/ accepted in some peer review journals;
Updated preprints: first, preprints are submitted and then updated after publications in somewhere else according to modifications.
For quality control of pre-prints submission, university may form a committee of experts, senior teachers/readers/professors from that institution or may take help from specialists of other organizations to screen primarily those submissions.
This may load a burden to the committee, but confirms the quality of work.
It is recommended that category of publication should be mentioned at the heading of the article/writing.
Preservation Policy
Preservation is another important question for archiving issues. In case of printed media, their preservation ability is proven to last long in course of time. But digital Medias are of a very small period, and not proven as more reliable than printed media. Here come two types of factors while preserving in digital media, longevity and technological support. Longevity has not been yet proven as it is a newer one. File format is an important issue relating to technological issues. Tremendous growth of technology brings newer version of same software in almost every year. It is feared that, after a decade or two, present formats may not be readable any more due to lack of technological support. Hard ware peripherals may also change to a large extent. More over, issues like 9/11 and Tsunami also proved that everything in the web will last for ever- is wrong interpretation. So, an issue relating to remote backup also rises. Preservation through multiple copies in distant places is also another thought. Some thinks, a large scale power failure and viral activities, hacking and intentional human activities may destroy the database.
But, most of the fears are of accidents or factors of chances. Even incase of paper media, nobody can ascertain that it will last for ever and provide access to all concerned. It is found that, after a long time, paper works requires special preservation techniques and restricted careful handling. This hinders one of the most important factors of preservation-accessibility. If anybody interested can’t use the document, then what is the utility of preserving them? Digital media is better option in this case. Library may decide to preserve one in digital media without restricting its accessibility, because it is easier to copy and circulate without affecting the original archive copy. More over, retrieval in archives are much easier than finding out one printed article from a heap of back volumes.
In institutional repositories, data are stored in a software independent format and migrated through successive hardware regimes. Data is stored together with the hard ware and software required to make or use it. So, it is found that, data preservation is easier in institutional repositories/archives than in individual digital medium.
But, there should be a strong backup policy. It should be done in a certain interval on a regular basis. It can be relied on optical media, or magnetic media or remote hard disc backup connected through network. There should be some good quality antivirus and fore walls for protecting data. To avoid unnecessary situation relating to power failure, a large inverter backup should be there. If data looses due to some unexpected situations, server in-charge should try to update it from preserved data base, or may consult with specialists to recover data in case of loss of backup, too(unexpected situations –like natural calamity).
Faculty Workload
The work of starting a repository is a vast job and the workload needs to distribute among faculties and library professionals while dealing with back logs of author’s writings.. Although the archiving software is associated with author self-archiving, self-posting through the system requires several steps. Given the significant disparity of technical proficiency amongst faculty, potential contributors might not have the expertise— or the inclination— to deposit materials themselves. Not surprisingly, then, early repository implementers consider library mediation of content submissions to be the only practical method of managing the archive, at least initially. This library management of the document contribution process typically includes:
Converting documents to allowed or preferred digital formats;
Assigning metadata and subject headings and/or reviewing author-assigned metadata or headings;
Providing faculty-authors with information regarding copyright and intellectual property issues. This can also involve providing information about the self-archiving policies of individual publishers, and even negotiating with individual publisher on behalf of contributing faculty; and
Quality control and other ingest-related and administrative processes.
Raym Crow opines that [ SPARC] one way to ease and encourage faculty and departmental participation is to frame participation in a manner that it addresses a problem the faculty wishes to solve. By helping collect and host papers for a university-sponsored conference, assuming responsibility for departmental working paper series, or taking on digital production and archiving responsibility for existing programs, repository implementers can lessen the workload of faculty while actively encouraging their participation. At the same time, such projects will have to be sensitive to the perceptions and apprehensions of the departmental support staff currently responsible for them. The user community orientation adopted by DSpace provides another alternative: each DSpace community designs a workflow process that accommodates the needs of its faculty and staff. In this way, administrative and technical responsibilities can be shared by the community’s resources, coordinated with the library.
Development & Operational Costs
Expanses are of labor (and the equivalent if some skill requirements are met via out-sourcing), Software, hardware, network, etc..
The technical support costs of developing and operating an institutional repository will depend on the service level agreement the repository has with the institution’s technical support operations, and possibly, with third parties. Implementers of EPrints software indicate that the staff time required to install and configure the software is approximately four to five FTE days. While other library staff can perform much of the policy-based component of the repository, setting up the repository technical infrastructure—even using a largely turn-key solution such as the EPrints software—requires the assistance of a technical systems administrator. In KU, Faculty stuffs from Computer Application Section may take the responsibility at initiation stage.
Software costs will depend on a basic “build or buy” (or “borrow”) decision, which has economic, strategic, and many practical considerations. At present, a number of proven, dependable, flexible, low-cost software solutions are available.
EPrints and DSpace are proven to work good in this purpose, and both are freely downloadable. They are open source, and could be customized. Their supporting software is also open source. So it won’t need any additional expenditure.
Hardware costs depend on the performance, storage, and other attributes of the
Configuration selected. EPrints can run on a basic hardware configuration, although disk storage, server capacity, and perhaps other specifications would need to be upgraded as the repository moved from a pilot stage into public operation and heavy use. Hardware specifications for DSpace are not yet available. However, system hardware costs for either system will vary with the fault tolerance that the repository is willing to accept (for example, low downtime tolerance might require an inventory of replacement drives, etc.), backup capabilities, and other requirements. The cost of such services will typically depend on the existing capabilities of such units and the extent to which the repository implementation can achieve operating efficiencies with existing technical operations. The same is true of networking, which should be a modest incremental expense to the institution’s existing network.

On-going technology labor costs, such as for system administration, are generally allocated as an increment of existing human resources and programs. Initially, non-technical staffing may also be handled via resource allocation, although larger initiatives will need to commit to staffing long-term program management positions.
Obviously, proponents of the new institutional repository will need to present
a full budget and probably multi-year forecasts at some point in their interaction with university and library administration.
The software should support switching over to newer versions or even in totally different software through a common data structure. But in-house customization may demand for expert’s help which may demand for additional expenditure.



LIBRARY’S ROLE IN INSTITUTIONAL REPOSITORIES
In institutes like universities, libraries always play an important role to provide information services to every concerned – students, research scholars, staffs, research guides, teachers. So library is the common place where every concerned has to come for fulfillment of their information needs. Libraries serve them through their resources. In this era of electronic resources, their performance through use of electronic resources has enhanced. They subscribe for e-journals and provide search ability to users in those databases. Day by day, costs of journals are raising forcing libraries to squeeze their list of preferred journals to cope up within their budget. Thus, scope of providing services also decreases.
As an inevitable part of research assistance, libraries can take most responsibility relating to setup and maintain institutional e-print repository, and advocator of the method of self archiving. Steven Harnad [HARNAD (Steven). For whom the gate tolls?... Available at http://www.cogsi.soton.ac.uk/~harnad/ ] has divided the work of setup in two waves, first- setup the archives and work as proxy of the author, and secondly, to maintain and popularize them.
Library could be advocator to setup institutional e-print repository. This will give library professionals another scope to serve patrons by using IT. This will enlarge their image to every concerned. They can apply their collective, consortia power to maintain archives, day to day problems, arrange them properly and prepare proposals to overcome them. They can help authors to archive their writings at the first stage. In future, they may instruct how to do it. They also may play as proxy to the authors in case of very busy, old persons in exchange of a minimum negotiable charge. This is a solely policy matter and the executive body should prepare clear instructions relating to that. But it is expected that, with personalized, individual attention and ease of filling up forms and doing steps of archiving, the matter may seem to be as easy as writing an e-mail.
Library stuffs can administer the server. They have enough skills of organization of knowledge. With their professional skills, they can manage Meta data factors more easily. But simultaneously, it demands some additional skills-like:
ü Maintaining the server,
ü Troubleshooting relating to error comes while data handling,
ü Maintain uniformity in standards and selection of metadata,
ü Provide enhanced scope of searching within the campus,
ü Making them searchable by spiders of search engines,
ü Making them OAI-PMH compliant and cross search facilities,
ü Educate users how to get maximum benefit from this archive,
ü Gather information of publisher’s policy and inform authors about them as well as how to bypass those problems and submit in archives,
ü Take initiatives to gather copies of author’s previous copies and attach them to archives,
ü Keep their eyes open about regularity of submission by authors, etc.
Besides professional knowledge, this also requires some more skills. Library professionals have enough patience and ability to talk to every individual interested. They can make others understood about utility of this archive. But this maintenance of sever needs some advanced IT skills. Here library staffs may feel insecure while handling a server. But proper training in short intervals will help them grow confidence on it. They have to gather knowledge of installation, LAN operations, Internet connectivity, virus problems, access control, backup techniques, working knowledge of programming and server administration, etc.
There fore, it is recommended that, more than one library staffs (one assistant librarian at least) may be trained in a regular interval about these factors. Librarian is proposed as a member of both Executive Committee and working group so that he can convince Executive Committee about need of training library staffs for that purpose instead of selecting one computer professional for the work. Here, for K.U., working group members are proposed by keeping eyes on the troubleshooting factors relating to IT skills. Charge of server maintenance is proposed to other than librarians’ because he is not only a highly experienced professional, but also act as administrative officer of the library. He has already under a burden of library operations. He will function as high level supervisor and get reports from server in-charge, and carry them to Executive Committee. He will bring decisions to working group and supervise implementing policies. He may advice server in-charge in case of critical situations, but should not take all burdens of maintaining it.

Technical Aspects

Open Source software

Open source software is software that includes source code and is usually available at no charge [Corrado,Edward M. Spring 2005.The importance of open access, open source, and open standards for libraries. Issues in Science and
Technology Librarianship. Available at http://www.istl.org/05-spring/article2.html ]. There are additional requirements besides the availability of source code that a program must meet before it is considered open source including:
the software must be free to redistribute; derivative works must be allowed;
the license can not discriminate against any persons; and
the license cannot discriminate against any fields of endeavor.
Software that is licensed under an open source license allows for a community of developers from around the world to improve the software by providing enhancements and bug fixes.
Libraries can realize many advantages by using open source software. One of the most obvious advantages is the initial cost. Open source software is generally available for free (or at a minimal cost) and it is not necessary to purchase additional licenses for every computer that the program is to be installed on or for every person who is going to use the software. Open source software not only has a lower acquisition cost than proprietary software, it often has lower implementation and support costs as well.
Evaluating Open Source Software is easy
It is easier to evaluate open source software then proprietary software. Since open source software is typically freely available to download, librarians and systems administrators can install complete production-ready versions of software and evaluate competing packages. This can be done not only without any license fees, but also without having to stick to a vendor's trial period, evaluate a limited version of the software, or deal with the vendor's sales personnel. If the library likes an overall open source package but would like a few added features they can add these features themselves. This is possible because the source code is available. Even if a library does not have in-house expertise they can benefit from source code availability because another library may be able to provide them the fix or they can hire a consultant to make the changes that they desire. It is to be noted that if a proprietary program "is deficient in some way [the user] must wait until the vendor decides it is financially viable to develop the enhancement -- an event that may never occur." With open source software the user can develop the enhancement themselves.
Support options
Open source software allows for more support options. Proprietary software vendors often package service with the product. This is particularly true of proprietary library-specific software. When support from a vendor is inadequate it is an additional expense to purchase another tier of support, assuming that it is even available. Open source software allows for different vendors to compete for support contracts based on quality of service and on price. Access to the source code also allows for self-support when practical and desired.
Impact of Open source software
The amount of vendor lock-in is dramatically reduced with open source software. The large initial costs often associated with proprietary software makes it difficult to reevaluate the choice of software when it does not live up to expectations. Proprietary software can lead to a single point of failure. If a vendor goes out of business or decides not to support a program anymore there is often nothing an user can do. Organizations using the software could provide self support or other vendors can come in and fill the void left by the previous vendor if the program were available as open source software.
For the present purpose, Library can use all open source software to do it. There are lots of software are available, but in practice, it is found that EPrints and DSpace are most widely used and discussed for their functionalities. So, discussions are bounded here with these two. EPrints and DSpace both run in Operating systems – Linux (7.2 -9.3) or Fedora core (1-4). Both are open source software. EPrints (www.eprints.org/) and DSpace (http://libraries.mit.edu/dspace-mit/technology/ ) themselves are open source. Their supporting software can also download from their respective links.
The Ability to Migrate and Survive
When considering a technical implementation for an institutional repository, it is important to remember that the explicit expectation is that the content managed by the system will survive the system itself and can migrate as new technologies evolve. In any event, switching costs from one repository technical solution to another would typically be high. Also, switching systems and solutions can be quite risky. Therefore, institutions will want to select their implementation path carefully. Even though several of the solutions are open source, they still involve database mapping and other customizations that would require additional investment if the infrastructure were changed.
Therefore, the system must be content-centric: applying standards and protocols that facilitate ongoing access to the information itself must be central to the system’s conception. The design and implementation of both the EPrints software and the DSpace system have been based on such standards. EPrints can export the archive metadata in XML in a structured format that facilitates migrating to a subsequent system. Both EPrints and DSpace are based on open source software licensing principles.
EPrints and DSpace offer off-the-shelf systems that allow an institution to implement a complete framework for an OAI-compliant repository without resorting to in-house technical development. Both systems can be customized to meet local requirements, allowing an institution to configure metadata formats, design subject hierarchies, define acceptable file formats, and register with OAI.

Open archive forum provides following comparison in characteristics and basic features (http://www.oaforum.org/resources/tvtoolscomp.php )between EPrints and DSpace


Feature
Eprints
Dspace
Installation
Eprints is easy to set up: An installation script automates most of the installation processes.
It is possible to chose between a source- or binary-installation. With the source one the software has to be compiled by the programmer. The binary one is precompiled for special architectures like Solaris Sparc systems. The programmer only needs to configure the software.
MySQL, Apache and mod_Perl, the components which are necessary for implementation are smooth installations - no matter if source- or binary-installation is chosen. The installation of additional required Perl modules need more time to resolve the dependencies.
There are two possibilities to support the system: One installation variant is a Solaris environment. The second variant, Linux, is easier to maintain.
If any installation problems are arising a comprehensive support is ensured. GNU Eprints has a separate website containing documentation, downloads, demonstration server and mailing lists: http://software.eprints.org/

The installation of DSpace requires a little more effort. But in fact DSpace is easy to run and maintain for any experienced systems engineer.
In order to run DSpace the following list of Software is necessary to be installed and configured before: Java 1.3, Tomcat 4.0+, Apache 1.3, PostgreSQL 7.3+, Ant 1.5. Details of the requirements can be viewed at: http://dspace.org/technology/system-docs/install.html#prerequisite
If the programmer follows step by step the installation documentation, Java, Ant and PostgreSQL are easy to install successfully.
To set up DSpace man needs to compile the DSpace source code with java tool Ant. The Tomcat server must be started by user "dspace" and user "dspace" should then create a database named "dspace".
With the installation some common problems arose, e.g. that Tomcat doesn't work when the DSpace is connected to Tomcat. Some changes in the configuration script solved that problem.
There is no support service for the DSpace installation. But there is detailed system documentation at: http://dspace.org/technology/system-docs/index.html. And also a public mailing list for the installation questions is supported.

Programminglanguage
Perl

Java

Operationsystem
Both environment variants had been tested: Solaris and Linux.
Furthermore it is also possible to install Eprints2 on any computer that is running with GNU/Linux or UNIX operating system.

DSpace had been tested on Linux Suse 7.3.
In general DSpace can run on Solaris, Linux and Windows systems

Functions
EPrints is free software which creates online archives.
It is possible to store documents in any common format that the archive administrator defined to be accepted. Each individual research paper/ eprint/ ... can be stored in more than one document format.
The archive can use any metadata schema; the administrator decides what metadata fields are held about each eprint. This is specified in three or four stages:
Definition of a maximal set of metadata fields that should be stored (e.g. authors, title, journal, journal volume, etc.)
Definition of different types of eprints (e.g. refereed journal article, thesis, technical report, unpublished preprint, etc.)
Specification for each type which metadata fields should be stored, and which of those fields are mandatory.
Decide how these metadata fields should be projected into the Open Archives world. (If necessary, interoperability can be switched off, but this is strongly discouraged.)
More functions can be viewed at http://software.eprints.org/

DSpace can be used for self archiving by institutions and faculties. It provides long-term physical storage and management of digital items in a repository.
DSpace is organised into "Communities" and "Collections", each of which retains its identity within the repository. It supports a variety of digital formats and content types including text, images, audio, and video and allows contributors to limit access to items in DSpace. All these items can be organised by an administration interface.
DSpace supports the OAI protocol 2.0 as a data provider. This OAI support was implemented using OCLC's OAICat open-source software to make DSpace item records available for harvesting.
Currently DSpace supports only the Dublin Core metadata element set with a few qualifications conforming to the library application profile. But there are still developing plans to support a subset of the IMS/SCORM element set (for describing education material) in the coming year.
More details of DSpace functionality can be founded at http://libraries.mit.edu/dspace-mit/technology/functionality.pdf

Re usage
Eprints is widespread all over the world. In August 2003 there are 72 worldwide archives running Eprints software officially listed (http://software.eprints.org/).

It is not reported how many archives are running DSpace software. One example of an European repository that implemented DSpace is "Erasmus University: Research Online".

Technology
Eprints uses traditional technologies and runs on pure Open Source systems: mySQL is the world's most popular open source database, recognized for its speed and reliability and Apache has been the most popular web server on the Internet since April of 1996.
Eprints is programmed by using the script language "Perl", that is low level but powerful.

DSpace operates with new technologies such as the Postgres database, that is more advanced than mySQL and Tomcat for jsp/java web application, that has higher performance than eprints.
Dspace supports and includes also handle server, which ensures that each document has unique and persistent URL.
Optionally, DSpace can be protected by the security features (SSL) of Tomcat. It is also possible to use the redirect function (port number can be omitted) from Apache referring to Tomcat.

Interoperability
Eprints is freely distributable and subject to the GNU General Public License. This means that its source code is open and freely modifiable by any programmer who wishes to modify it (on condition that modifications are all free and open).
Therefore in principle an adjustment to every environment is possible even if it is different than the recommended. Naturally this may be connected with substantial expenditure.
However Eprints offers no supporting documents there are nevertheless mailing lists for support.

The DSpace system is freely available as open-source software. This allows to make any necessary changes to the downloaded copy. The system was designed to make adaptations for individual organisations as easy as possible.
In fact, several modules in DSpace will probably be customised by organizations using this tool (e.g. it might be necessary to get authorization and authentication for more than one person). Or some organisations may want to adapt a different environment than recommended (e.g. replace postgreSQL by mySQL or Oracle). At the moment, substituting a different relational database than postgreSQL will require just a few changes to the system's Browse module.
Java provides documented Java APIs that can be enhanced to allow interoperation with other systems that an institution might be running (e.g. auto-depositing in DSpace a department's web document system, or the campus data warehouse).

Search
Eprints allows to scan each of the metadata field types in the database by simple or advanced search. Any metadata field can be searched with fine granularity by SQL querying the database.
DSpace offers two levels of text search: simple and advanced search. It's submission process also allows to use a qualified version of the Dublin Core metadata schema for the description of each item. These descriptions are stored in a relational database, which is used by the search engine to retrieve items.



Open Standards
The term "open standard" means different things to different people. Three key characteristics [Corrado,Edward M. Spring 2005.The importance of open access, open source, and open standards for libraries. Issues in Science and Technology Librarianship. Available at http://www.istl.org/05spring/article2.html ]of open standards are :
1) That anyone can use the standards to develop software,
2) Anyone can acquire the standards for free or without a significant cost, and
3) The standard has been developed in a way in which anyone can participate. When a standard has the first two of these characteristics (the ability to use the standard and to obtain it with out a significant cost) it can be said to be an open standard in a utility sense. That is to say that an open standard is a standard that is not encumbered by a patent, does not require proprietary software, and can be utilized by anyone without cost.
Proprietary standards can sometimes be expensive and it may be cost prohibited to purchase access to a proprietary standard if it is ever needed. Many people consider a standard to be sufficiently open as long as it is open in a utility sense. Others take this a step further and consider a standard to be open only if the process meets the criteria of being created and modified in an open process as well. Dublin Core is a completely open standard that is open both in utility and in process. All one has to do is show up and participate in order to contribute to the development of Dublin Core.
It is important for libraries and other cultural institutions to ensure long-term access to digital information. The rapid growth in digital technologies has led to new and improved applications for digital preservation. However at the same time it has also led to some problems as well. Two of these problems are obsolescence and dependency issues. The obsolescence problem is caused by the advances in hardware and software making many computers obsolete within a very few years. Dependency problems can arise if tools that are needed to communicate between systems or read file formats become unavailable. In order to account for obsolescence and dependency problems organizations must be able for migration of data into new systems. Data migration, however, cannot occur without access to data file formats.
Properly created open standards for file formats are less likely to become obsolete and are more reliable and stable then proprietary formats. In the event that an open standard file format does become obsolete, having access to the file format would allow anyone to easily, and legally, create a data conversion utility. File formats that use open standards can assist in long-term archiving because they allow for software and hardware independence. Open standards help alleviate issues caused by obsolescence or dependency problems since files created in formats that adhere to open standards are more likely than proprietary formats to be readable twenty or fifty years from now. This allows for greater flexibility and easy migration to different systems in the future.
The use of open standards can help assure interoperability of diverse systems. There are various software packages that are being used to create digital libraries, online library catalogs, and other resources that libraries relay on. These various systems need to be able to interact in order to provide the best possible service to patrons. The way to make certain that these diverse systems, and any future systems, can communicate with each other is by using open standards to help achieve the "free flow of information through interoperability" (The Open Group. 2005. Developer Declaration of Independence. Available: http://www.opengroup.org/declaration/declaration.htm ).
Some library-centric initiatives, including the Open Archives Institute (OAI), also support open standards. OAI's mission is to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content. DCMES is also supported open standard for OAI.

Hard ware and Operating System:
From the above study, it is found that hardware peripherals are not specified in both cases. To start with, following may be used:
1. Processor : Intel P4 2.8 GHz
2. Corresponding Intel original mother board, Monitor, etc.
3. Hard Disk : SCSI 120 GB HDD with 7200RPM speed
4. RAM : 1GB DDR RAM,
5. Multimedia objects : 52 X CD Writer/DVD writers,
6. High speed Internet connection.
7. Tape drive may be used for data backup.
8. Audio output facility is also required in case of audio data backup.
More than one physical hard disk drive may secure minimum loss of data in case of crash.

Operating System: Red Hat Linux 9.3 or Fedora Core 4

Present India
India is a vast country and developing fast towards being a developed country. UGC (ttp://www.ugc.ac.in/new_initiatives/hisp.htm) says that India has a large and complex Higher Education System. This comprises of nearly 310 universities and, a large group of research and development organizations. Universities in India are either set up by an Act of Parliament or the State Legislatures. In addition, some institutions are also conferred deemed to be University status by the Central Government. Universities are either unitary or affiliating. 131 universities in the Country are affiliating type. They together affiliate around 15,500 colleges. Total student enrolment is around 92 lakh. In addition, there are several professional councils that maintain standards in their respective fields. Some of these professional councils also maintain a Central Register of Professionals in their fields. In 1956, the Central Government had set up the University Grants Commission (UGC) to discharge its constitutional mandate of coordination, determination, and maintenance of standards in higher education.
Higher Education system in the Country is a loose configuration of heterogeneous organizational units - universities, colleges, professional councils etc. This diversity is a source of excellence and makes it vibrant. Coordination of such a diverse system of education is tricky yet necessary to ensure its credibility. So, they have meant to create a Knowledge Repository for communities of teachers and researchers in the Country.
UGC is developing a mechanism for tracking academic information resources such as learning resources, curricula, question banks, national theses etc., published in various formats through systematic, internationally used metadata data framework for tagging such resources.
In such situation, institutional repository will be a very good, timely and logistic effort for the University of Kalyani.




Proposed Line of Work
Proposed Administrative Committee Structure
Each and every work needs a strong well thought policy to organize anything successfully. Institutional repository setup needs a well-organized structure of decision-making body so that proper implementation of policies could be possible. So before going for other factors, one committee should be formed to study its possibility, scope and coverage of the work, as well as to administer over it. The committee should be at least in two levels: One- Executive committee and an Advisory committee.
The executive committee should/may consist of at least:
Chairman :Vice Chancellor of the University,
Convener :Chief Librarian/Library in-charge of the Central Library,
Other members:
· One member from Teacher’s council,
One member from non-teaching staff’s union,
One member from research scholar’s union,
One member from student’s union of the university,
Finance Officer of the University,
Chairman (or representative) from Principal/Teacher in charge’s committee (of colleges under the University jurisdiction) etc.
Deans of the faculty
Registrar of the University,
Legal Officer of the university,
External Member
The Working Group may consist of:
Chairman : Chief Librarian
Convener : Legal Officer of the University,
Other members:
One teacher from Computer Science/Application Department of the university,
One staff of Internet center of the University (optional),
One Library staff having excellence in Networking and operational knowledge of archiving,
One teacher from Library and Information Science Department,
Librarian/Senior officer of any reputed organization’s Library who has experiences of setting up Open archive (optional),
External member: Any senior Professor /Faculty member of the University having working experience as Editor of any Journal,
Purpose of the Executive committee:
To decide scope of the archive,
To discuss problems related to serial crisis,
To discuss budgetary constrains relating to serial purchase,
To enhance scope of access to scholarly publications,
To discuss motto of the archive,
To discuss standards of archived materials,
To discuss over existing rules and regulations of the University relating to such archiving,
Taking decisions relating to change/modify existing rules relating to submission of papers,/Thesis/Dissertations
Implementing rules over every concerned,
To discus over incoming problems relating to the new rules,
To ensure budget/funding for the work,
To decide level of work (as project or in large scale) and setting up deadline for evaluating its’ progress,
To confirm continuous advocacy,
To ensure service standards,
To assure subscribers about safe guarding their intellectual contributions,
To discuss over problems coming from author’s side,
To ensure world wide accessibility,
To ensure standardized services to all concerned,
To evaluate archive with existing one in developed countries,
To discuss advantages and disadvantages of this archive comparing existing worldwide archives,
To discuss over legal issues,
To ensure assistance to authors/subscribers,
Making policies relating to submission,
Selecting staffs for working/Administering on Server of Institutional repository,
Solving staff problems,
Preparing policies relating to training facilities for staff development (if necessary),
Meeting in a certain interval for discussions over existing situations,
Keeping eyes on development of archiving worldwide and developing own archive to cope up with them,
Preparing future plans for archive,
Setting goals and time lines for the work etc.
The Working group will discuss over:
Software selection criteria,
Existing hardware and hard ware requirements,
Condition of within campus LAN structure and recommend for any necessary modifications,
Server setup and maintenance issues,
Meta data selection criteria,
Technical problems related to archiving,
Methods of archiving,
Advocacy methods and it’s impacts,
Troubleshooting relating to software/ network problems,
Proposed modifications on structure of archives,
Format / way of archiving,
Impact of this repository on citations of scholarly publications,
Processing,
Minimum standard format of subscription materials,
How to overcome IPR issue problems,
Format of display,
File format of subscriptions,
Ease of understanding the policies,
Ensuring each staff/research guide and scholar is aware about this rule,
This is a proposed structure, and subject to change as per requirements. The Executive Council may take responsibilities instead of forming Executive Committee on this mater. But formation of working group is strongly recommended, as they will handle day to day activities and deal with problems in a regular basis.
The main activity of Executive committee is at the beginning, while policies are going to take shape. After then, they may meet once in a year or twice to discuss over its’ progress and suggesting developments. The working group should together at least once in a Month. This is important because they will be responsible for it’s’ success or failure.
Proposed Workload Distribution in KU
The early repository implementers consider library mediation of content submissions to be the only practical method of managing the archive, at least initially. The work load needs to distribute among faculties and library professionals while dealing with back logs of author’s writings. This library management of the document contribution process typically includes:
Converting documents to allowed or preferred digital formats;
Assigning metadata and subject headings and/or reviewing author-assigned metadata or headings;
Providing faculty-authors with information regarding copyright and intellectual property issues. This can also involve providing information about the self-archiving policies of individual publishers, and even negotiating with individual publisher on behalf of contributing faculty; and
Quality control and other ingest-related and administrative processes.
Although the archiving software is associated with author self-archiving, self-posting through the system requires several steps. Given the significant disparity of technical proficiency amongst faculty, potential contributors might be expected from them.
Raym Crow opines that [ SPARC] one way to ease and encourage faculty and departmental participation is to frame participation in a manner that it addresses a problem the faculty wishes to solve.
In KU, Students From Department of LIS can be assigned some project works as part of their academic curricula to collect and host papers for a university-sponsored conference, or taking responsibility for departmental working paper series, or taking on digital production and archiving a number of backlogs of different nature of bibliographic materials waiting to go in archive. Repository implementers can lessen the workload of faculty while actively encouraging their participation.
The user community orientation adopted by DSpace provides another alternative: each DSpace community designs a workflow process that accommodates the needs of its faculty and staff. In this way, administrative and technical responsibilities can be shared by the community’s resources, coordinated with the library.
Advocacy Methods
1. Within Institution
· By distributing literature: Distribution of literature describing advantages of IRs and how others can facilitate with it-may encourage writers for submission in institutional repositories.
· By distributing leaf let: This can reach every member of the institution, and will enhance their awareness as well as use of IRs.
· Through university magazines: This may work as a good platform to reach every member of that institution.
· Through library newsletters: This will encourage library users.
· Through notice to each department’s notice board: Sending notice to each department’s notice board will make teachers/staffs aware about it.
· Through amending some rules in university rules: University should amend new regulations about subscriptions of articles to institutional e-print repository, as researchers are funded by the university authority. This may be an essential condition for getting fund for research. This is also applicable to research guides, staffs and other students of that institution.
· Through user education in libraries: this is a regular process and will make each member aware about advantages of IRs.
· By explaining in researcher’s meetings: This gives scope to discuss face-to-face with researchers and convince them about larger usage of their works.
· By inspiring research guides/teachers to encourage students to submit e-prints in institutional repositories: Research guides can instruct scholars to submit an e-copy of their work to IRs.: Teachers can encourage students to write about some topics and post a copy to IRs. Internal seminars also produce a lot of literature. Teachers can encourage students to publish them in some journal and send a copy to institutional e-print repository. This will help students to achieve an identity among others.
· Through internal seminars, departmental meetings etc. :These may be used to inform students/specialists/staffs of the institution.
· Incorporating information in library user’s card/instruction sheet etc.:
This will make them conscious about the issue.
· Organizing special advocacy events for university staffs: Like annual meetings, debating competitions, annual sports day etc.
2. Outside The Institution
· Collaborating with other existing e-print archives,
· Registering in to OAI registration list,
· Sending information about establishment of e-print archives to discussion forums, news groups, specialist associations, research organizations.
· Posting news letters in LIS forums to make other librarians aware about it’s existence,
· Reading papers about establishments of institutional repositories in different seminars,
· Holding banners about its’ existence in regional/national/international level seminars,
· Sending information about it to professional associations etc.
This advocacy is not a one-time job. Libraries and institutions should have to do it continuously every year as a part of their user education activities. This will make it aware every new comer to the institution. In Universities, fresher welcome ceremony may work as a platform for informing new students about the repositories. Every departmental head may inform students about it in their first address to new batch. Library may handover them a leaf let when they go for their user’s card. With user education, library may include discussions about IRs.
Library may place a notice board above their computerized catalog /catalog cabinet written in attractive colors describing how to use/subscribe in IRs.
Library should take initiative to help/guide writers to post their writings in e-print in beginning.
Proposed Submission Process:
Distributed submission with centralized management is recommended policy for the purpose. At initiation, to free users from problems, library may take depositions in CD, then convert the files in to PDF form and upload after proper incorporation of metadata. In that case, metadata of the articles should be collected in the same format the author has to fill in, while posting to archive himself. Working as proxy for authors may also be practiced, in charge of a nominal amount for older people or who faces a lot of problems and can’t solve them himself. The charge should be very low, so that authors can’ gets away listening the amount.
Proposed Preservation Policy
Keeping Long term preservation , KU may follow Open Archive information System model (OAIS model available at http://ssdoo.gsfc.nasa.gov/host/isoas/ ). Switchover policies are to be kept in mind.
Weeding out policies should be there. It is recommended that,
· if the post print is deposited, preprint should be removed.
· After a new version of rules and regulations, and after implementations of them, older versions can be removed.
· After new edition comes, old edition books can also be removed.
· Thesis and dissertations have lost their relevancy due to long time period- should be removed. But their metadata and abstract must be present to ensure their existence.
Those documents, removed from archives due to space congestion, must be preserved in somewhere else for historical purpose and service on demand.
Metadata Selections
The archive has to be OAI compliant. DCMES may be used for articles. In case of Thesis and Dissertations, UGC norms should be followed.
Meta data Checking
Meta data incorporated by authors should be very carefully checked by the administrator. They can edit, change or even stop uploading at that stage. They have to notify the author through e-mail about errors in metadata they have incorporated.
Subject heading:
There are lots of subjects taught and discussed in KU. Researchers work in different subjects’ problems, as well as interdisciplinary subjects. Any existing Subject heading List can’t serve the purpose. It is not possible for KU at present to collect special subject heading lists on every subjects. So, until any international open standards are formed and any international level open access subject heading list covering every micro thoughts comes (or permits to form new standard term lists for inter-disciplinary subjects) LCSH may be a handy one.
Who can Participate
As this is an institutional repository, so members of the institutions (students, Teachers, Staffs, Research Scholars) may only have the access. There are some colleges under the jurisdiction of KU. As an extended family member of KU, they also will have the permission to submit here.
At present, there are a very few repositories at work. There are lots of people writing a good number of articles every year, though they are not directly connected to the University at present. I recommend permitting them to deposit in the archive. The university may ask for authentication of author’s qualifications and identity. Even (though should not) they may ask for a very little amount for each deposition.
Registration
Each and every user should have to register them selves by filling a vey small form through e-mail. This will provide him an account for searching in the archive.
Confirming Global Access
Again, the archive itself should register to open archive (http://www.openarchive.org/ ) / eprints’ archive (http://www.eprints.org/ ). They will enlist it to their list of archives. This will help metadata harvesters to harvest their archives and made it accessible. Links to search engine sites should also submit. DP9 (http://arc.cs.odu.edu:8080/dp9/index.jsp) is a software which can translate OAI compliant metadata into search engine friendly data.
Costs
A server is essential to start the archive. Hardware peripherals are not going to be very costly one. The cost for manpower will be maximum, followed by advocacy costs. Software are freely download able and requires almost no cost. Backup and networking will also demand for a good amount.
For Kalyani University, Library professionals with help of computer experts from Computer application Departments may reduce labor cost. If rules are made to submit e-copy in PDF (preferable) version to the library/ departments in CDs, in future, no conversion ill be necessary. . But retrospective conversion of backlogs will demand a big amount and time and manpower.
Handling Backlogs: a tricky proposal
If some students are assigned this conversion as their project work (specially to students of Library and Information Sciences), and made it mandatory for every LIS student to upload a certain number of back volumes of thesis and dissertations, then with time, the work load can be reduced to a great extent.
Use of Hard wares and Software:
A latest machine that supports SCSI bus HDD is recommended. The configuration may be as follows (as a test bed):
1. Processor: Intel latest processor
.2. Hard Disk: SCSI 120 GB HDD with 7200RPM speed
3. RAM : 2GB DDR RAM,
4. Multimedia objects: 52 X CD Writer/DVD writers,
5 Corresponding Intel original mother board,
6. Color monitor,
7. Internet Key board,
8. Optical mouse etc
9. High speed Internet connection.
10. Tape Drive (optional)
(This is not a rigid configuration, and can be customized as per requirements)
In case of Software, I recommend Open source software as both OS and archival purpose. Red Hat Linux 9.3 or Fedora Core 4 may be used as OS.
A brief comparison of Feature & Functionality between DSpace and EPrints are at a glance are given below. But it is optional to select any of them.

Feature
DSpace
EPrints
Technical Specifications
OAI-PMH version supported
OAI-PMH 2.0
OAI-PMH 2.0
Z39.50 protocol compliant
No
No
Open source license1
BSD
GNU GPL
Hardware
Minimum hardware requirements
No specification
No specification
San support
Y
Y

Software
Operating system
Unix/MacOSX/Windows
Linux/Windows
Programming language
Java
Perl
Database
PostgreSQL3
MySQL
Web server
Any
Apache 1.3
Java servlet engine
Any4
N/A
Repository & System Administration
Set-up/Installation
Time consuming
complicated
Automated installation script
Yes
Yes
System update script
Yes
Yes
Update system without overwriting
Yes
Yes
Customizing feature
Yes
Yes
User registration, authentication & password administration
Password administration
Y
Y
System-assigned passwords
Y
N
User selected passwords
Y
Y
Forgotten password function
Y
Y
Edit user profile
Y
Y
Limit Access by User Type
Y
Y
Multiple Authentication Methods
Y
N
Limit Access at File/Object Level
Y
Y
View pending content submissions
Y
Y
View approved content
Y
Y
View pending content administration
tasks
Y
-


In practice, I have tried with both EPrints and DSpace. I felt a lot of problems with EPrints installation. I could not establish relations between EPrints and MySQL server. At last, I had to move to Dspace. I used a script available from DRTC and pg73jdbc2.jar file to install it.
At first, I installed Red Hat Linux 9.3 in custom mode with additional package as postgraySQL server on.
I log on as root. Turned on postgrey sql server on from ‘server’-‘services’
Then I copied those two files in a folder under root.
Then opened new terminal from GUI mode.
Extracted (by tar –zxvf ) the file.
A new folder named dspace is created.
I changed to it.
Edited the setup with vi editor ( vi setup) in following section by adding # before each line required to avoide:
if test ! -f /etc/rc.d/init.d/postgresql
then echo " postgreSQL not loaded, load from Linux CDs; Exiting ..."
exit
fi

#if test -f /usr/share/java/*jdbc1.jar
#then jdbcdir="/usr/share/java"
# else
#if test -f /usr/share/pgsql/*jdbc1.jar
#then jdbcdir="/usr/share/pgsql"
# else
# echo "postgreSQL jdbc drivers are not loaded"
# echo "Use Linux CDs to load them"
# exit
#fi
#fi
echo "JDBC drivers are in $jdbcdir"

echo
read -p "Enter your mail server hostname ($HOSTNAME): " mailhost

· Then I extracted the .jar file under ‘/root/dspace/dspace-1.3.1-source/lib’.
· Then I run the setup (./setup) from the dspace folder.
· It asked for some parity checking. Then for mail server host name. I put here my machine’s predefined IP.
· I avoided password protection token to avoid complexity.
· Then it creates a user with default password “dspace”.
· Then it asked for DNS server name. I put there pre-assigned machine’s name.
· It went through different installations modules automatically and asked for administrator’s email id and password. Then processed them.
· At last it provided two links to put in browser. One for using dspace as user, and other for working as administrator.
· The whole process completed within a very few minutes (2 min. and 9 seconds here).
· I repeated it in different types of machines (PII, PIII, P4 with different hard disk capability –from 10 GB to 40 GB assigned for Red Hat Linux, with 128 MB and 256 MB RAM). They all worked well, and took less than 5 minutes.
· Access was tested through LAN. It provides access through LAN. Due to lack of infrastructure, accessibility through Internet could not be judged.
· Due to lack of time, its’ functionality and other activities could not worked out. Clear instructions are available through documentation bundled with it. Information regarding various aspects may also be collected from DSpace System Documentation http://libraries.mit.edu/dspace-mit/technology/system-docs/

























No.
References (unedited version)
Page Number

Oppenheim, Charles.2005. Open access and UK Science and Technology select committee report : free for all?. Journal of librarianship and information science. 37,1. p4
6

Corrado, E M. 2005.the importance of open access, open source, and open standards for libraries. Issues in science and technology librarianship. Available at http://www.istl.org/05-spring/article2.html
6

http://www.earlham.edu/~peters/fos/timeline.htm
7

MICI Metadata Clearinghouse (Interactiv) (homepage). Available at http://www.metadatainformation.org/
8

Metadata and Resource description. Available at http://www.w3c.org/metadata/
8

Crow, Rayam. SPARC institutional repository checklist & resource guide. Available at http://www.arl.org/sparc/IR/%20IR_Guide.html
9

at http://www.eprint.org/glossary/
10

A brief comparison of different institutional repository software is available in ‘OSI Guide to Institutional Repository Software v2.0
16

OSI Guide to Institutional Repository Software v2.0
17

OSI Guide to Institutional Repository Software v2.0
17

(http://www.dublincore.org/ )
18

A Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/projects/institutional_repositories/setup_guide-e.html
19

HISP project [available at http://www.ugc.ac.in/new_initiatives/hisp.html ]
19

Crow, Raym. 2002. Institutional Repository:checklist & resource Guide.(Washington, DC: SPARC). Available from http://www.arl.org/sparc/ ]
20

Postscript from Wikipedia, the free encyclopedia available at http://en.wikipedia.org/wiki/PostScript.HTML ] .
21

[Portable Document Format from Wikipedia, the free encyclopedia.availableathttp://en.wikipedia.org/wiki/PostScript.HTML]
22

A Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/%20projects/institutionalrepositories/setupguide-e.html].
23

A Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/%20projects/institutionalrepositories/%20setupguide-e.html
24

Crow, Raym. Institutional Repository:checklist & resource Guide. (Washington, DC: SPARC). Available fromhttp://www.arl.org/sparc/ ].
24,25,39,55

[HARNAD (Steven). For whom the gate tolls?... Available at http://www.cogsi.soton.ac.uk/~harnad/ ]
32,42

Mahapatra, Gayatri.Bibliometric studies : on Indian library & information science literature /. – New Delhi : Crest, 2000 p7
34

[Callan(Paula).The development and implementation of a university-wide self-archiving policy at Queensland University of Technology (QUT): Insights from the frontline. In Institutional Repositories: The Next Stage. Workshop presented by SPARC & SPARC EUROPE, November 18–19, 2004, Washington, D.C.].
34

Lawrence(Steve). Online or invisible. Available at : http://www.neci.nec.com/~lawrence/papers/online-nature01/ ]
35

Corrado,Edward M. Spring 2005.The importance of open access, open source, and open standards for libraries. Issues in Science and
Technology Librarianship. Available at http://www.istl.org/05-spring/article2.html ].
45,51

The Open Group. 2005. Developer Declaration of Independence. Available: http://www.opengroup.org/declaration/declaration.htm
52

(http://www.openarchive.org/ )
63

(http://www.eprints.org/ ).
63




*******************************************************************************************

N.B. this version is not exactly what I submitted for the examination purpose in the University. It is a preprint version which was edited in one or two places after this version was copied. Due to some personal reason, I could not submit the exact copy I submitted there. This version is out of TOC and some other specific areas.



***********************************************************************

*****Do not hesitate to mail me if any query.*****


Visit my sites at
http://chandansaha.tripod.com/
http://chandans-ejournal.tripod.com/
http://chandansezone.tripod.com/
or
My blog at http://chandans-ejournal.blogspot.com/

0 Comments:

Post a Comment

<< Home