Optical Engineering Science
Stephen Rolt


A practical guide for engineers and students that covers a wide range of optical design and optical metrology topics

Optical Engineering Science offers a comprehensive and authoritative review of the science of optical engineering. The book bridges the gap between the basic theoretical principles of classical optics and the practical application of optics in the commercial world. Written by a noted expert in the field, the book examines a range of practical topics that are related to optical design, optical metrology and manufacturing. The book fills a void in the literature by coving all three topics in a single volume.

Optical engineering science is at the foundation of the design of commercial optical systems, such as mobile phone cameras and digital cameras as well as highly sophisticated instruments for commercial and research applications. It spans the design, manufacture and testing of space or aerospace instrumentation to the optical sensor technology for environmental monitoring. Optics engineering science has a wide variety of applications, both commercial and research. This important book:

Offers a comprehensive review of the topic of optical engineering Covers topics such as optical fibers, waveguides, aspheric surfaces, Zernike polynomials, polarisation, birefringence and more Targets engineering professionals and students Filled with illustrative examples and mathematical equations Written for professional practitioners, optical engineers, optical designers, optical systems engineers and students, Optical Engineering Science offers an authoritative guide that covers the broad range of optical design and optical metrology topics and their applications.





Stephen Rolt

Optical Engineering Science



Optical Engineering Science



Stephen Rolt

University of Durham

Sedgefield, United Kingdom









This edition first published 2020

© 2020 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions (http://www.wiley.com/go/permissions).

The right of Stephen Rolt to be identified as the author of this work has been asserted in accordance with law.

Registered Offices

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office

The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com (http://www.wiley.com/).

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data

Names: Rolt, Stephen, 1956- author.

Title: Optical engineering science / Stephen Rolt, University of Durham,  Sedgefield, United Kingdom.

Description: First edition. | Hoboken, NJ : John Wiley & Sons, 2020. |  Includes bibliographical references and index.

Identifiers: LCCN 2019032028 (print) | LCCN 2019032029 (ebook) | ISBN  9781119302803 (hardback) | ISBN 9781119302797 (adobe pdf) | ISBN  9781119302810 (epub)

Subjects: LCSH: Optical engineering. | Optics.

Classification: LCC TA1520 .R65 2019 (print) | LCC TA1520 (ebook) | DDC  621.36–dc23

LC record available at https://lccn.loc.gov/2019032028 (https://lccn.loc.gov/2019032028)

LC ebook record available at https://lccn.loc.gov/2019032029 (https://lccn.loc.gov/2019032029)

Cover Design: Wiley

Cover Images: Line drawing cover image courtesy of Stephen Rolt, Background: © AF-studio/Getty Images




Preface


The book is intended as a useful reference source in optical engineering for both advanced students and engineering professionals. Whilst grounded in the underlying principles of optical physics, the book ultimately looks toward the practical application of optics in the laboratory and in the wider world. As such, examples are provided in the book that will enable the reader to understand and to apply. Useful exercises and problems are also included in the text. Knowledge of basic engineering mathematics is assumed, but an overall understanding of the underlying principles should be to the fore.

Although the text is wide ranging, the author is keenly aware of its omissions. In compiling a text of this scope, there is a constant pre-occupation of what can be omitted, rather than what is to be included. This tyranny is imposed by the manifest requirement of brevity. With this limitation in mind, choice of material is dictated by the author's experience and taste; the author fully accepts that the reader's taste may vary somewhat.

The evolution of optical science through the ages is generally seen as a progression of ideas, an intellectual journey culminating in the development of modern quantum optics. Although some in the ancient classical world thought that the sensation of vision actually originates in the eye, it was quickly accepted that vision arises, in some sense, from an external agency. From this point, it was easy to visualise light as beams, rays, or even particles that have a tendency to move from one point to another in a straight line before entering the eye. Indeed, it is this perspective that dominates geometric optics today and drives the design of modern optical systems.

The development of ideas underpinning modern optics is, to a large extent, attributed to the early modern age, most particularly the classical renaissance of the seventeenth century. However, many of these ideas have their origin much earlier in history. For instance, Euclid postulated laws of rectilinear propagation of light, as early as 300 BCE. Some understanding of the laws of propagation of light might have underpinned Archimedes' famous solar concentrator that (according to legend) destroyed the Roman fleet at the siege of Syracuse in 212 BCE. Whilst the law governing the refraction of light is famously attributed to Willebrord Snellius in the seventeenth century, many aspects of the phenomenon were understood much earlier. Refraction of light by water and glass was well understood by Ptolemy in the second century CE and, in the tenth century, Ibn Sahl and Ibn Al-Haytham (Alhazen) analysed the phenomenon in some detail.

From the early modern era, the intellectual progression in optics revolved around a battle between particle (corpuscular) or ray theory, as proposed by Newton, and wave theory, as proposed by Huygens. For a time, in the nineteenth century, the journey seemed to be at an end, culminating in the all-embracing description provided by Maxwell's wave equations. The link between wave and ray optics was provided by Fermat's theorem which dictated the light travels between two points by the path that takes the least time and this could be clearly derived from Maxwell's equations. However, this clarity was removed in the twentieth century when the ambiguity between the wave and corpuscular (particle) properties of light was restored by the advent of quantum mechanics.

This progression provides an understanding of the history of optics in terms of an intellectual journey. This is the way the history of optics is often portrayed. However, there is another strand to the development of optics that is often ignored. When Isaac Newton famously procured his prism at the Stourbridge Fair in Cambridge in 1665, it is clear that the fabrication of optical components was a well-developed skill at the time. Indeed, the construction of the first telescope (attributed to Hans Lippershey) would not have been possible without the technology to grind lenses, previously mastered by skilled spectacle makers. The manufacture of lenses for spectacles had been carried out in Europe (Italy) from at least the late thirteenth century CE. However, the origins of this skill are shrouded in mystery. For instance, Marco Polo reported the use of spectacles in China in 1270 and these were said to have originated from Arabia in the eleventh century.

So, in parallel to the more intellectual journey in optics, people were exercising their practical curiosity in developing novel optical technologies. In many early cultures, polished mirrors feature as grave goods in the burials of high-status individuals. One example of this is a mirror found in the pyramid build for Sesostris II in Egypt in around 1900 BCE. The earliest known lens in existence is the Nimrud or Layard lens attributed to the Assyrian culture (750–710 BCE). Nero is said to have watched gladiatorial contests through a shaped emerald, presumably to correct his myopic vision. Abbas Ibn Firnas, working in Andalucia in the ninth century CE developed magnifying lenses or ‘reading stones’.

These two separate histories lie at the heart of the science of optical engineering. On the one hand, there is a desire to understand or analyse and on the other hand there is a desire to create or synthesise. An optical engineer must acquire a portfolio of fundamental knowledge and understanding to enable the creation of new optical systems. However, ultimately, optical engineering is a practical discipline and the motivation for acquiring this knowledge is to enable the design, manufacture, and assembly of better optical systems. For this knowledge to be fruitful, it must be applied to specific tasks. As such, this book focuses, initially, on the fundamental optics underlying optical design and fabrication. Notwithstanding the advent of powerful software and computational tools, a sound understanding and application of the underlying principles of optics is an essential part of the design and manufacturing process. An intuitive understanding greatly aids the use of these sophisticated tools.

Ultimately, preparation of an extensive text, such as this, cannot be a solitary undertaking. The author is profoundly grateful to a host of generous colleagues who have helped him in his long journey through optics. Naturally, space can only permit the mention of a few of these. Firstly, for a thorough introduction and grounding in optics and lasers, I am particularly indebted to my former DPhil Supervisor at Oxford, Professor Colin Webb. Thereafter, I was very fortunate to spend 20 years at Standard Telecommunication Laboratories in Harlow, UK (later Nortel Networks), home of optical fibre communications. I would especially like to acknowledge the help and support of my colleagues, Dr Ken Snowdon and Mr Gordon Henshall during this creative period. Ultimately, the seed for this text was created by a series of Optical Engineering lectures delivered at Nortel's manufacturing site in Paignton, UK. In this enterprise, I was greatly encouraged by the facility's Chief Technologist, Dr Adrian Janssen.

In later years, I have worked at the Centre for Advanced Instrumentation at Durham University, involved in a range of Astronomical and Satellite instrumentation programmes. By this time, the original seed had grown into a series of Optical Engineering graduate lectures and a wide-ranging Optical Engineering Course delivered at the European Space Agency research facility in Noordwijk, Netherlands. This book itself was conceived, during this time, with the encouragement and support of my Durham colleague, Professor Ray Sharples. For this, I am profoundly grateful. In preparing the text, I would like to thank the publishers, Wiley and, in this endeavour, for the patience and support of Mr Louis Manoharan and Ms Preethi Belkese and for the efforts of Ms Sandra Grayson in coordinating the project. Most particularly, I would like to acknowledge the contribution of the copy-editor, Ms Carol Thomas, in translating my occasionally wayward thoughts into intelligible text.

This project could not have been undertaken without the support of my family. My wife Sue and sons Henry and William have, with patience, endured the interruption of many family holidays in the preparation of the manuscript. Most particularly, however, I would like to thank my parents, Jeff and Molly Rolt. Although their early lives were characterised by adversity, they unflinchingly strove to provide their three sons with the security and stability that enabled them to flourish. The fruits of their labours are to be seen in these pages.

Finally, it remains to acknowledge the contributions of those giants who have preceded the author in the great endeavour of optics. In humility, the author recognises it is their labours that populate the pages of this book. On the other hand, errors and omissions remain the sole responsibility of the author. The petty done, the vast undone…




Glossary


AC Alternating current

AFM Atomic force microscope

AM0 Air mass zero

AM1 Air mass one (atmospheric transmission)

ANSI American national standards institute

APD Avalanche photodiode

AR Antireflection (coating)

AS Astigmatism

ASD Acceleration spectral density

ASME American society of mechanical engineers

BBO Barium borate

BRDF Bi-directional reflection distribution function

BS Beamsplitter

BSDF Bi-directional scattering distribution function

CAD Computer aided design

CCD Charge coupled device

CD Compact disc

CGH Computer generated hologram

CIE Commission Internationale de l'Eclairage

CLA Confocal length aberration

CMM Co-ordinate measuring machine

CMOS Complementary metal oxide semiconductor

CMP Chemical mechanical planarisation

CNC Computer numerical control

CO Coma

COTS Commerical off-the-shelf

CTE Coefficient of thermal expansion

dB Decibel

DC Direct current

DFB Distributed feedback (laser)

DI Distortion

E-ELT European extremely large telescope

EMCCD Electron multiplying charge coupled device

ESA European space agency

f# F number (ratio of diameter to focal distance)

FAT Factory acceptance test

FC Field curvature

FEA Finite element analysis

FEL Filament emission lamp

FEL Free electron laser

FFT Fast Fourier transform

FRD Focal ratio degradation

FSR Free spectral range

FT Fourier transform

FTIR Fourier transform infra-red (spectrometer)

FTR Fourier transform (spectrometer)

FWHM Full width half maximum

GRIN Graded index (lens or fibre)

HEPA High- efficiency particulate air (filter)

HST Hubble space telescope

HWP Half waveplate

IEST Institute of environmental sciences and technology

IFU Integral field unit

IICCD Image intensifying charge coupled device

IR Infrared

ISO International standards organisation

JWST James Webb space telescope

KDP Potassium dihydrogen phosphate

KMOS K-band multi-object spectrometer

LA Longitudinal aberration

LCD Liquid crystal display

LED Light emitting diode

LIDAR Light detection and ranging

MTF Modulation transfer function

NA Numerical aperture

NASA National Aeronautics and Space Administration

NEP Noise equivalent power

NIRSPEC Near infrared spectrometer

NIST National institute of standards and technology (USA)

NMI National measurement institute

NPL National physical laboratory (UK)

NURBS Non-uniform rational basis spline

OPD Optical path difference

OSA Optical society of America

OTF Optical transfer function

PD Photodiode

PMT Photomultiplier tube

PPLN Periodically poled lithium niobate

PSD Power spectral density

PSF Point spread function

PTFE Polytetrafluoroethylene

PV Peak to valley

PVA Polyvinyl alcohol

PVr Peak to valley (robust)

QMA Quad mirror anastigmat

QTH Quartz tungsten halogen (lamp)

QWP Quarter waveplate

RMS Root mean square

RSS Root sum square

SA Spherical aberration

SI Système Internationale

SLM Spatial light modulator

SNR Signal to noise ratio

TA Transverse aberration

TE Transverse electric (polarisation)

TGG Terbium gallium garnet

TM Transverse magnetic (polarisation)

TMA Three mirror anastigmat

TMT Thirty metre telescope

USAF United States Airforce

UV Ultraviolet

VCSEL Vertical cavity surface emitting laser

VPH Volume phase hologram

WDM Wavelength division multiplexing

WFE Wavefront error

YAG Yttrium aluminium garnet

YIG Yttrium iron garnet

YLF Yttrium lithium fluoride




About the Companion Website


This book is accompanied by a companion website:

www.wiley.com/go/Rolt/opt-eng-sci (http://www.wiley.com/go/Rolt/opt-eng-sci)






The website includes:



● Problem Solutions

● Spreadsheet tools


Scan this QR code to visit the companion website.









1

Geometrical Optics





1.1 Geometrical Optics – Ray and Wave Optics


In describing optical systems, in the narrow definition of the term, we might only consider systems that manipulate visible light. However, for the optical engineer, the application of the science of optics extends well beyond the narrow boundaries of human vision. This is particularly true for modern instruments, where reliance on the human eye as the final detector is much diminished. In practice, the term optical might also be applied to radiation that is manipulated in the same way as visible light, using components such as lenses, mirrors, and prisms. Therefore, the word ‘optical’, in this context might describe electromagnetic radiation extending from the vacuum ultraviolet to the mid-infrared (wavelengths from ∼120 to ∼10 000 nm) and perhaps beyond these limits. It certainly need not be constrained to the narrow band of visible light between about 430 and 680 nm. Figure 1.1 (#x8_x_8_i18) illustrates the electromagnetic spectrum.

Geometrical optics is a framework for understanding the behaviour of light in terms of the propagation of light as highly directional, narrow bundles of energy, or rays, with ‘arrow like’ properties. Although this is an incomplete description from a theoretical perspective, the use of ray optics lies at the heart of much of practical optical design. It forms the basis of optical design software for designing complex optical instruments and geometrical optics and, therefore, underpins much of modern optical engineering.

Geometrical optics models light entirely in terms of infinitesimally narrow beams of light or rays. It would be useful, at this point, to provide a more complete conceptual description of a ray. Excluding, for the purposes of this discussion, quantum effects, light may be satisfactorily described as an electromagnetic wave. These waves propagate through free space (vacuum) or some optical medium such as water and glass and are described by a wave equation, as derived from Maxwell's equations:




(1.1)

E is a scalar representation of the local electric field; c is the velocity of light in free space, and n is the refractive index of the medium.

Of course, in reality, the local electric field is a vector quantity and the scalar theory presented here is a useful initial simplification. Breakdown of this approximation will be considered later when we consider polarisation effects in light propagation. If one imagines waves propagating from a central point, the wave equation offers solutions of the following form:




(1.2) (#x8_c01_para_0005)

Equation (1.2) (#x8_c01_disp_0002) represents a spherical wave of angular frequency, ω, and spatial frequency, or wavevector, k. The velocity that the wave disturbance propagates with is ω/k or c/n. In free space, light propagates at the speed of light, c, a fundamental and defined constant in the SI system of units. Thus, the refractive index, n, is the ratio of the speed of light in free space to that in the specified medium. All points lying at the same distance, r, from the source, will oscillate at an angular frequency, ω, and in the same phase. Successive surfaces, where all points are oscillating entirely in phase are referred to as wavefronts and can be viewed at the crests of ripples emanating from a point disturbance. This is illustrated in Figure 1.2 (#x8_x_8_i22). This picture provides us with a more coherent definition of a ray. A ray is represented by the vector normal to the wavefront surface in the direction of propagation. Of course, Figure 1.2 (#x8_x_8_i22) represents a simple spherical wave, with waves spreading from a single point. However, in practice, wavefront surfaces may be much more complex than this. Nevertheless, the precise definition of a ray remains clear:






Figure 1.1 (#x8_c01_para_0001) The electromagnetic spectrum.






Figure 1.2 (#x8_c01_para_0005) Relationship between rays and wavefronts.



At any point in space in an optical field, a ray may be defined as the unit vector perpendicular to the surface of constant phase at that point with its sense lying in the same direction as that of the energy propagation.




1.2 Fermat's Principle and the Eikonal Equation


Intuition tells us that light ‘travels in straight lines’. That is to say, light propagates between two points in such a way as to minimise the distance travelled. More generally, in fact, all geometric optics is governed by a very simple principle along similar lines. Light always propagates between two points in space in such a way as to minimise the time taken. If we consider two points, A and B, and a ray propagating between them within a medium whose refractive index is some arbitrary function, n(r), of position then the time taken is given by:




(1.3)

c is the speed of light in vacuo and ds is an element of path between A and B

This is illustrated in Figure 1.3 (#x8_x_8_i33).






Figure 1.3 (#x8_c01_para_0008) Arbitrary ray path between two points.



Fermat's principle may then be stated as follows:

Light will travel between two points A and B such that the path taken represents a local minimum in the total optical path between these points.

Fermat's principle underlies all ray optics. All laws governing refraction and reflection of rays may be derived from Fermat's principle. Most importantly, to demonstrate the theoretical foundation of ray optics and its connection with physical or wave optics, Fermat's principle may be directly derived from the wave equation. This proof demonstrates that the path taken represents, in fact, a stationary solution with respect to other possible paths. That is to say, technically, the optical path taken could represent a local maximum or inflexion point rather than a minimum. However, for most practical purposes it is correct to say that the path taken represents the minimum possible optical path.

Fermat's principle is more formally set out in the Eikonal equation. Referring to Figure 1.2 (#x8_x_8_i22), if instead of describing the light in terms of rays it is described by the wavefront surfaces themselves. The function S(r) describes the phase of the wave at any point and the Eikonal equation, which is derived from the wave equation, is set out thus:




(1.4)

The important point about the Eikonal equation is not the equation itself, but the assumptions underlying it. Derivation of the Eikonal equation assumes that the rate of change in phase is small compared to the wavelength of light. That is to say, the radius of curvature of the wavefronts should be significantly larger than the wavelength of light. Outside this regime the assumptions underlying ray optics are not justified. This is where the effects of the wave nature of light (i.e. diffraction) must be considered and we enter the realm of physical optics. But for the time being, in the succeeding chapters we may consider that all optical systems are adequately described by geometrical optics.

So, for the purposes of this discussion, it is one simple principle, Fermat's principle, that provides the foundation for all ray optics. For the time being, we will leave behind specific consideration of the detailed behaviour of individual optical surfaces. In the meantime, we will develop a very generalised description of an idealised optical system that does not attribute specific behaviours to individual components. Later on, this ‘black box model’ will be used, in conjunction with Gaussian optics to provide a complete first order description of complex optical systems.




1.3 Sequential Geometrical Optics – A Generalised Description



In applying geometrical optics to a real system, we are attempting to determine the path of a ray(s) through the system. There are a few underlying characteristics that underpin most optical systems and help to simplify analysis. First, most optical systems are sequential. An optical system might comprise a number of different elements or surfaces, e.g. lenses, mirrors, or prisms. In a sequential optical system, the order in which light propagates through these components is unique and pre-determined. Second, in most practical systems, light is constrained with respect to a mechanical or optical axis of symmetry, the optical axis, as illustrated in Figure 1.4 (#x8_x_8_i48). In real optical systems, light is constrained by the use of physical apertures or ‘stops’; this will be discussed in more detail later.

Of course, in practice, the optical axis need not be a continuous, straight line through an optical system. It may be bent, or folded by mirrors or prisms. Nevertheless, there exists an axis throughout the system with respect to which the rays are constrained.






Figure 1.4 (#x8_c01_para_0015) Constraint of rays with respect to optical axis.






Figure 1.5 (#x8_c01_para_0018) Generalised optical system and conjugate points.






1.3.1 Conjugate Points and Perfect Image Formation


We consider an ideal optical system which consists of a point source of light, the object, and an optical system that collects the light and re-directs all rays emanating from this point source or object, such that the rays converge onto a single point, the image point. At this stage, the interior workings of the optical system are undefined; the system behaves as a ‘black box’. The object is said to be located in object space and the image in image space and the pair of points are said to be conjugate points. This is illustrated in Figure 1.5 (#x8_x_8_i52).

In Figure 1.5 (#x8_x_8_i52), the two points P


 and P


 are conjugate. The optical system can be simple, for example a single lens, or it can be complex, containing many optical elements. The description above is entirely generalised. Where the object point lies on the optical axis, its image or conjugate point also lies on the optical axis. In Figure 1.5 (#x8_x_8_i52), the object point has a height of h


 with respect to the optical axis and its corresponding image point has a height of h


 with respect to the same axis. The ratio of these two heights gives the system (transverse) magnification, M:




(1.5) (#x8_c01_para_0102)

Points occupying a plane perpendicular to the optical axis are conjugate to points lying on another plane perpendicular to the optical axis. These planes are known as conjugate planes.




1.3.2 Infinite Conjugate and Focal Points


Where an image or object is located at infinity, all rays emerging from or travelling to these locations will be parallel with respect to each other. In this instance, the point located at infinity is said to be at an infinite conjugate. The corresponding conjugate point to the infinite conjugate is known as a focal point. There are two focal points. The first focal point is located in the object space with the corresponding image located at the infinite conjugate. The second focal point is located in the image space with the object placed at the infinite conjugate. Figure 1.6 (#x8_x_8_i65) depicts the first focal point:






Figure 1.6 (#x8_c01_para_0021) Location of first focal point.



As well as focal points, there are two corresponding focal planes. The two focal planes are planes perpendicular to the optical axis that contain the relevant focal point. For all points lying on the relevant focal plane, the conjugate point will lie at the infinite conjugate. In other words, all rays will be parallel with respect to each other. In general, the rays will not be parallel to the optic axis. This would only be the case for a conjugate point lying on the optical axis.




1.3.3 Principal Points and Planes


All points lying on a particular conjugate plane are associated with a specific transverse magnification, M, which is equal to the ratio of the image and object heights. For an ideal system, there exist two conjugate planes where the magnification is unity. These are known as the principal planes. Thus, there are two principal planes and the points where the optical axis intersects the principal planes are known as principal points. The first principal point (plane) is located in object space and the second principal point (plane) is located in image space. The arrangement is illustrated schematically in Figure 1.7 (#x8_x_8_i73).






Figure 1.7 (#x8_c01_para_0023) Principal points and principal planes.






1.3.4 System Focal Lengths


The reader might be used to ascribing a single focal length to an optical system, such as for a magnifying lens or a camera lens. However, in this general description, the system has two focal lengths. The first focal length, f


, is the distance from the first focal plane (or point) to the first principal plane (or point) and the second focal length, f


, is the distance from the second principal plane to the second focal plane. In many cases, f


 and f


 are identical. In fact, the ratio f


/f


 is equal to n


/n


, the ratio of the refractive indices of the media associated with the object and image spaces. However, this need not concern us at this stage, as the treatment presented here is entirely general and independent of the specific attributes of components or media.

In classical geometrical optics, the object location is denoted by the object distance, u, and the image location by the image distance, v. In the context of this general description, the object distance is simply the distance from the object to the first principal plane. Correspondingly, the image distance, v, is the distance from the second principal plane to the image. In addition, the object location can be described by the distance, x


, separating the object from the corresponding focal plane. Similarly, x


 represents the distance from the image to the second focal plane. This is illustrated in Figure 1.8 (#x8_x_8_i85).




1.3.5 Generalised Ray Tracing


This general description of an optical system is very economical in that the definition of conjugate points, focal planes, and principal planes provides sufficient information to determine the path of a ray in the image space, given the path of the ray in the object space. No assumptions are made about the internal workings of the optical system; it is merely a ‘black box’.

We see how input rays originating in the object space are mapped onto the image space for specific scenarios where the object is located at the input focal plan, the infinite conjugate, or the first principal plane. How can this be extended to determine the output path of any input ray? The general principle is set out in Figure 1.9 (#x8_x_8_i89). First, the input ray is traced from point P


 as far as its intersection with the (first) principal plane at A


. We know that this point, A


, is conjugated with point A


, lying at the same height at the second principal plane. This follows directly from the definition of principal planes. Second, we draw a dummy ray originating from the first focal point, f


, but parallel to the input ray and trace it to where it intersects the first principal plane at B


. We know that B


 is conjugated with point B


, lying at the same height on the second principal plane. Since this ray originated from the first focal point, its path must be parallel to the optical axis in image space and thus we can trace it as far as the second focal plane at P


. Finally, since the object ray and dummy rays are parallel in object space, they must meet at the second focal plane in the image space. Therefore, we can trace the image ray to point P


, providing a complete definition of the path of the ray in image space.






Figure 1.8 (#x8_c01_para_0025) System focal lengths.






Figure 1.9 (#x8_c01_para_0027) Tracing of arbitrary ray.






1.3.6 Angular Magnification and Nodal Points


The angular magnification of an optical system is the ratio of the angle (with respect to the optical axis) of a ray in image space and that of its conjugate in object space. There exists a pair of conjugate points lying on the optical axis where, for all possible rays, the angular magnification is unity. These are the nodal points. The first nodal point is located in object space and the second nodal point is located in image space. This is set out in Figure 1.10 (#x8_x_8_i96), where for a general conjugate pair, the angular magnification, α, is equal to θ


/θ


. For the nodal points, θ


 = θ


; that is to say, the angular magnification is unity. Where the two focal lengths are identical, or the object and image spaces are within media of the same refractive index, the nodal points are co-located with the principal points.






Figure 1.10 (#x8_c01_para_0028) Angular magnification and nodal points.






1.3.7 Cardinal Points


This brief description has provided a complete definition of an ideal optical system. No matter how complex (or simple) the optical system, this analysis defines the complete end-to-end functionality of an ideal system. On this basis, an optical designer will specify the six cardinal points of a system to describe the ideal behaviour of a design. These six cardinal points are:



First Focal Point

Second Focal Point

First Principal Point

Second Principal Point

First Nodal Point

Second Nodal Point


The principal and nodal points are co-located if the two system focal lengths are identical.




1.3.8 Object and Image Locations – Newton's Equation


The location of the cardinal points has given us a complete description of a generalised optical system. Given that the function of an optical system might be to produce an image of an object located at a specific point, we might want to know the location of that image. Figure 1.11 (#x8_x_8_i116) shows the relationship between a generalised object and image.

Referring to Figure 1.11 (#x8_x_8_i116) and by using similar triangles it is possible to derive two separate relations for the magnification h


/h


:











Figure 1.11 (#x8_c01_para_0031) Generalised object and image.



And:




(1.6) (#x8_x_8_i136)

The above equation is Newton's Equation and may be re-cast into a more familiar form using the definitions of object and image distances, u and v, as previously set out.




(1.7) (#x8_c01_para_0035)

If f


 = f


 = f, we are left with the more familiar lens equation. However, Eq. (1.7) (#x8_c01_disp_0008) is generally applicable to all optical systems. Most importantly, Eq. (1.7) (#x8_c01_disp_0008) will give the locations of the object and image in systems of arbitrary complexity. Many readers might have encountered Eq. (1.7) (#x8_c01_disp_0008) in the context of a simple lens where object and image distances are obvious and easy to determine. For a more complex system, one has to know the location of the principal planes as well in order to determine the object and image distances.




1.3.9 Conditions for Perfect Image Formation – Helmholtz Equation


Thus far, we have presented a description of an idealised optical system. Is there a simple condition that needs to be fulfilled in order to generate such an ideal image? It is easy to see from Figure 1.11 (#x8_x_8_i116) that the following relations apply:








Therefore:








As we will be able to show later, the ratio f


/f


 is equal to the ratio of the refractive indices, n


/n


, in the two media (object and image space). Therefore it is possible to cast the above equation in its more usual form, the Helmholtz equation:




(1.8) (#x8_c01_para_0079)

One important consequence of the Helmholtz equation is that there is a clear, inextricable linkage between transverse and angular magnification. Angular magnification is inversely proportional to transverse magnification. For small θ, tan θ and θ are approximately equal. So in the small signal approximation, the angular magnification, α is given by:








Hence:




(1.9)

We have, thus far, introduced two different types of optical magnification – transverse and angular. There is a third type of magnification that we need to consider, longitudinal magnification. Longitudinal magnitude, L, is defined as the shift in the axial image position for a unit shift in the object position, i.e.:




(1.10)

From Newton's Eq. (1.6) (#x8_c01_disp_0007):








And:




(1.11)

Thus, the longitudinal magnification is proportional to the square of the transverse magnification.




1.4 Behaviour of Simple Optical Components and Surfaces





1.4.1 General


The analysis presented thus far is entirely independent of the optical components that might populate the idealised optical system. In this section we will begin to consider, from the perspective of ray optics, the behaviour of real elements that make up this generalised system. At a basic level, only a few behaviours need to be considered in order to understand the propagation of rays through a real optical system. These are:



Propagation through a homogeneous medium

Refraction at a planar surface

Refraction at a curved (spherical) surface

Refraction through lenses

Reflection at a planar surface

Reflection at a curved (spherical) surface


As previously set out, the path of rays through a system is governed entirely by Fermat's principle. From this point, we will apply the simplest definition of Fermat's principle and assume that the time or optical path of rays is minimised. As far as propagation through a homogeneous medium is concerned, this leads to a perhaps obvious and trivial conclusion that light travels in straight lines. In fact, this describes a specific application of Fermat's principal, known as Hero's principle, namely that light follows the path of minimum distance between two points within a homogeneous medium.




1.4.2 Refraction at a Plane Surface and Snell's Law


The law governing refraction at a planar surface is universally attributed to Willebrord Snellius and referred to as Snell's law. This states that both incident and refracted rays lie in the same plane and their angles of incidence and refraction (with respect to surface normal) are given by:




(1.12)

This is illustrated in Figure 1.12 (#x8_x_8_i169).

The refractive indices of some optical materials (at 550 nm) are listed below:



Glass (BK7): 1.52

Plastic (Acrylic): 1.48

Water: 1.33

Air: 1.00027


Snell's law is, in fact, a direct consequence of Fermat's principle. The reader may wish to derive this through the application of differential calculus. In finding the optimum path from a point in one medium to a point in another medium, the ray will attempt, as far as possible, to minimise its path through the higher index medium. Snell's law thus represents the minimum optical path condition in this instance. Where the ray passes from a high index material to a low index material, there exists an angle of incidence where the angle of refraction is 90°. This angle is known as the critical angle and, for angles of incidence beyond this, the ray is totally internally reflected. The critical angle, θ


, is given by:






Figure 1.12 (#x8_c01_para_0048) Refraction at a plane surface.






(1.13) (#x8_c01_para_0057)

A single refractive surface is an example of an afocal system, where both focal lengths are infinite. Although it does not bring a parallel beam of light to a focus, it does form an image that is a geometrically true representation of the object.




1.4.3 Refraction at a Curved (Spherical) Surface


Most, if not all, curved optical surfaces are at least approximately spherical and are widely employed in the fabrication of lens components. Figure 1.13 (#x8_x_8_i181) illustrates refraction at a spherical surface.

As before, the special case of refraction at a spherical surface may be described by Snell's law:








If we now wish to calculate the angle φ in terms of θ, this process is, in principle, straightforward. We need also to take into account the angle the surface normal makes with the optical axis, Δ, and the radius of curvature, R, of the spherical surface. However, calculation is a little unwieldy, so therefore we make the simplifying assumption that all angles are small and:






Figure 1.13 (#x8_c01_para_0052) Refraction at a spherical surface.








Hence:








We can finally calculate φ in terms of θ:




(1.14) (#x8_c01_para_0057)

There are two terms on the RHS of Eq. (1.14) (#x8_c01_disp_0022). The first term, depending on the input angle θ is of the same form as Snell's law (for small angles) for a plane surface. The second term, which gives an angular deflection proportional to the height, h, and inversely proportional to the radius of curvature R, provides a focusing effect. That is to say, rays further from the optic axis are bent inward to a greater extent and have a tendency to converge on a common point. The sign convention used here assumes that positive height is vertically upward, as displayed in Figure 1.13 (#x8_x_8_i181) and a positive spherical radius corresponds to a scenario in which the centre of the sphere lies to the right of the point where the surface intersects the optical axis. Finally, a positive angle is consistent with an increase in ray height as it propagates from left to right in (1.13) (#x8_c01_disp_0018).

Equation (1.14) (#x8_c01_disp_0022) can be used to trace any ray that is incident upon a spherical refractive surface. If this surface is deemed to comprise ‘the optical system’ in its entirety, then one can use Eq. (1.14) (#x8_c01_disp_0022) to calculate the location of all Cardinal Points, expressed as a displacement, z along the optical axis. Positive z is to the right and the origin lies at the intersection of the optical axis and the surface. The Cardinal points are listed below. Cardinal points for a spherical refractive surface








In this instance, the two focal lengths, f


 and f


 are different since the object and image spaces are in different media. If we take the first focal length as the distance from the first focal point to the first principal point, then the first focal length is positive. Similarly, the second focal length, the distance from the second principal point to the second focal point, is also positive. The principal points are both located at the surface vertex and the nodal points at the centre of curvature of the sphere. It is important to note that, in this instance, the principal and nodal points do not coincide. Again, this is because the refractive indices of object and image space differ.




1.4.4 Refraction at Two Spherical Surfaces (Lenses)


Figure 1.14 (#x8_x_8_i198) shows a lens made up of two spherical surfaces, of radius, R


 and R


. Once again, the convention is that the spherical radius is positive if the centre of curvature lies to the right of the relevant vertex.

So, in the biconvex lens illustrated in Figure 1.14 (#x8_x_8_i198), the first surface has a positive radius of curvature and the second surface has a negative radius of curvature. The lens is made from a material of refractive index n


 and is bounded by two surfaces with radius of curvature R


 and R


 respectively. It is immersed totally in a medium of refractive index, n


 (e.g. air). In addition, it is assumed that the lens has negligible thickness (the thin lens approximation). Of course, as for the treatment of the single curved surface, we assume all angles are small and θ ∼ sinθ. First, we might calculate the angle of refraction, φ


, produced by the first curved surface, R


. This can be calculated using Eq. (1.14) (#x8_c01_disp_0022):






Figure 1.14 (#x8_c01_para_0060) Refraction by two spherical surfaces (lens).








Of course, the final angle, φ, can be calculated from φ


 by another application of Eq. (1.14) (#x8_c01_disp_0022):








Substituting for φ


 we get:




(1.15) (#x8_x_8_i204)

As for Eq. (1.14) (#x8_c01_disp_0022) there are two parts to Eq. (1.15) (#x8_c01_disp_0025). First, there is an angular term that is equal to the incident angle. Second, there is a focusing contribution that produces a deflection proportional to ray height. Equation (1.15) (#x8_c01_disp_0025) allows the tracing of all rays in a system containing the single lens and it is straightforward to calculate the Cardinal points of the thin lens: Cardinal points for a thin lens








Since both object and image spaces are in the same media, then both focal lengths are equal and the principal and nodal points are co-located. One can take the above expressions for focal length and cast it in a more conventional form as a single focal length, f. This gives the so-called Lensmaker's Equation, where it is assumed that the surrounding medium (air) has a refractive index of one (i.e. n


 = 1) and we substitute n for n


.




(1.16)




1.4.5 Reflection by a Plane Surface


Figure 1.15 (#x8_x_8_i214) shows the process of reflection at a plane surface. As in the previous case of refraction, the reflected ray lies in the same plane as the incident ray and the angle of reflection is equal and opposite to the angle of incidence.






Figure 1.15 (#x8_c01_para_0066) Reflection at a plane surface.



The virtual projected ray shown in Figure 1.15 (#x8_x_8_i214) illustrates an important point about reflection. If one considers the process as analogous to refraction, then a mirror behaves as a refractive material with an index of −1. This, in itself has an important consequence. The image produced is inverted in space. As such, there is no combination of positive magnification and pure rotation that will map the image onto the object. That is to say, a right handed object will be converted into a left handed image. More generally, if an optical system contains an odd number of reflective elements, the parity of the image will be reversed. So, for example, if a complex optical system were to contain nine reflective elements in the optical path, then the resultant image could not be generated from the object by rotation alone. Conversely, if the optical system were to contain an even number of reflective surfaces, then the parity between the object and image geometries would be conserved.

Another way in which a plane mirror is different from a plane refractive surface is that a plane mirror is the one (and perhaps only) example of a perfect imaging system. Regardless of any approximation with regard to small angles discussed previously, following reflection at a planar surface, all rays diverging from a single image point would, when projected as in Figure 1.15 (#x8_x_8_i214), be seen to emerge exactly from a single object point.




1.4.6 Reflection from a Curved (Spherical) Surface


Figure 1.16 (#x8_x_8_i225) illustrates the reflection of a ray from a curved surface.

The incident ray is at an angle, θ, with respect to the optical axis and the reflected ray is at an angle, ϕ to the optical axis. If we designate the incident angle as θ


 and the reflected angle as θ


 (with respect to the local surface normal), then the following apply, assuming all relevant angles are small:











Figure 1.16 (#x8_c01_para_0069) Reflection from a curved surface.



We now need to calculate the angle, ϕ, the refracted ray makes to the optical axis:




(1.17) (#x8_c01_para_0072)

In form, Eq. (1.17) (#x8_c01_disp_0028) is similar to Eq. (1.14) (#x8_c01_disp_0022) with a linear dependence of the reflected ray angle on both incident ray angle and height. The two equations may be made to correspond exactly if we make the substitution, n


 = 1, n


 = −1. This runs in accord with the empirical observation made previously that a reflective surface acts like a medium with a refractive index of −1. Once more, the sign convention observed dictates that positive axial displacement, z, is in the direction from left to right and positive height is vertically upwards. A ray with a positive angle, θ, has a positive gradient in h with respect to z.

As with the curved refractive surface, a curved mirror is image forming. It is therefore possible to set out the Cardinal Points, as before: Cardinal points for a spherical mirror








The focal length of a curved mirror is half the base radius, with both focal points co-located. In fact, the two focal lengths are of opposite sign. Again, this fits in with the notion that reflective surfaces act as media with a refractive index of −1. Both nodal points are co-located at the centre of curvature and the principal points are also co-located at the surface vertex.




1.5 Paraxial Approximation and Gaussian Optics


Earlier, in order to make our lens and mirror calculations simple and tractable, we introduced the following approximation:








That is to say, all rays make a sufficiently small angle to the optical axis to make the above approximation acceptable in practice. When this approximation is applied more generally to an entire optical system, it is referred to as the paraxial approximation (i.e. ‘almost axial’). If the same consideration is applied to ray heights as well as angles, the paraxial approximations lead to a series of equations describing the transformation of ray heights and angles that are linear in both ray height and angle. This first order theory is generally referred to as Gaussian optics, named after Carl Friedrich Gauss.

If we now assume that all rays are confined to a single plane containing the optical axis, then we can describe all rays by two parameters: θ – the angle the ray make to the optical axis and h – the height above the optical axis. If, after transformation by an optical surface, these parameters change to θ′ and h′, it is possible to write down a series of linear equations describing all transformations. These are set out in Eqs. 1.18 (#x8_c01_disp_0030)–1.21 (#x8_c01_disp_0033):




(1.18) (#x8_x_8_i237)




(1.19)




(1.20)




(1.21) (#x8_x_8_i237)

Even the most complex optical system may be described as a combination of all the above elements. At first sight, therefore, it would seem that this provides a complete description of the first order behaviour of an optical system. However, there is one important, but seemingly trivial, aspect that is not considered here. This is the case of ray propagation through space. The equations are, of course simple and obvious, but we include them for completeness.




(1.22) (#x8_x_8_i267)

Equation (1.8) (#x8_c01_disp_0011) introduced the Helmholtz equation, a necessary condition for perfect image formation for an ideal system. It is clear that Gaussian optics represents a mere approximation to the ideal of the Helmholtz equation. The contradiction between the two suggests that there may be imperfections in the ideal treatment of Gaussian optics. This will be considered later when we will look at optical imperfections or aberrations. In the meantime, we will consider a very powerful realisation of Gaussian optics that takes the basic linear equations previously set out and expresses them in terms of matrix algebra. This is the so-called Matrix Ray Tracing technique.




1.6 Matrix Ray Tracing





1.6.1 General


In Section 1.4 (#c01_sec_0013) we looked at the behaviour of some very simple components, mirrors and lenses, deriving the locations of the Cardinal Points. As discussed previously, the Cardinal Points provide a complete description of the first order properties of an optical system, no matter how complex.

The question then is how do we calculate the properties of a more complex optical system, such as the camera lens depicted in Figure 1.17 (#x8_x_8_i257)? It is not immediately obvious where the Cardinal points lie or what the focal length is. However, we can combine the generalised description of an optical system with the treatment of Gaussian optics to produce a model that describes the entire system as a black box acting on rays with a simple linear transformation. The black box may be visualised as below in Figure 1.18 (#x8_x_8_i261).

Following the basic premise of Gaussian optics, we can relate the input and output rays using a set of linear equations:




(1.23) (#x8_x_8_i262)




(1.24) (#x8_x_8_i262)






Figure 1.17 (#x8_c01_para_0081) Complex optical system.






Figure 1.18 (#x8_c01_para_0081) Modelling of complex systems.



Equations (1.23) (#x8_c01_disp_0035) and (1.24) (#x8_c01_disp_0036) may be combined in a matrix representation:




(1.25) (#x8_x_8_i264)

Equation (1.25) (#x8_c01_disp_0037) sets out the Matrix Ray Tracing convention used in this book. The reader should be aware that other conventions are used, but this is the most widely used. Equation (1.25) (#x8_c01_disp_0037) can be used to describe the overall system matrix or that of individual components. The question is how to build up a complex system from a large number of optical elements. The camera lens shown in Figure 1.17 (#x8_x_8_i257) has six lenses and we might represent each lens as a single matrix, i.e. M1, M2,…..,M6. Each matrix describes the relationship between rays incident upon the lens and those leaving. The impact of successive optical elements is determined by successive matrix multiplication. So the system matrix for the lens as a whole is given by the matrix product of all elements:




(1.26)

Note the order of the multiplication; this is important. M


 represents the first optical element seen by rays incident upon the system and the multiplication procedure then works through elements 2–6 successively. For purposes of illustration, each lens has been treated as being represented by a single matrix element. In practice, it is likely that the lens would be reduced to its basic building blocks, namely the two curved surfaces plus the propagation (thickness) between the two surfaces. We also must not forget the propagation through the air between the lens elements.

Representation of the key optical surfaces can be determined by casting Eqs. (1.18) (#x8_c01_disp_0030)–(1.22) (#x8_c01_disp_0034) in matrix format.




(1.27a) (#x8_x_8_i283)




(1.27b)




(1.27c)




(1.27d)




(1.27e)




(1.27f) (#x8_x_8_i283)

n


 and n


 represent the refractive index of first and second media respectively.




1.6.2 Determination of Cardinal Points


It is very straightforward to calculate the Cardinal Points of a system from the system matrix:








The matrix above represents the system matrix after propagating through all optical elements as shown in Figure 1.17 (#x8_x_8_i257). However, the convention adopted here is that an additional transformation is added after the final surface. This additional transformation is free space propagation to the original starting point. It must be emphasised that, this is merely a convention, and that the final step traces a dummy ray as opposed to a real ray. That is to say, in reality, the light does not propagate backwards to this point. In fact, this step is a virtual back-projection of the real ray which preserves the original ray geometry. The logic of this, as will be seen, is that in any subsequent analysis, the location of all cardinal points is referenced with respect to a common starting point. If this step were dispensed with, then the three first Cardinal Points would be referenced to the start point and the three second Cardinal Points to the end point. With this in mind, the Cardinal Points, as referenced to the common start point are set out below; the reader might wish to confirm this.








The determinant of the matrix, (AD−BC), is a key parameter. The ratio of the two focal lengths of the system is simply given by the determinant. That is to say the ratio of the two focal lengths is given by:




(1.28)

Inspecting all matrix expressions in Eqs. (1.27a (#x8_c01_disp_0039)–1.27f) (#x8_c01_disp_0044), the determinant of the matrix is simply n


/n


, the ratio of the indices in the two media, for all possible scenarios. Since the determinant of a matrix product is simply the product of the individual determinants, then the determinant of the overall system matrix is simply the ratio of the refractive indices in image and object space. Thus:




(1.29)

This relationship was anticipated in the more generalised discussion in 1.3.9 (#c01_sec_0012). Looking at the relationships for the principal and nodal points, it is clear when the determinant of the system matrix is unity, i.e. object and image space indices are the same, then the principal and nodal points are co-located.

In addition to the principal and nodal points, anti-principal points and anti-nodal points are sometimes (rarely) specified. Anti-principal points are conjugate points where the magnification is −1. Similarly, anti-nodal points are conjugate points where the angular magnification is −1.




1.6.3 Worked Examples



We can now use the foregoing analysis to see how matrix ray tracing might be used in practice. Here we focus on a number of useful practical examples.






Figure 1.19 (#x8_x_8_i298) Thick lens.






Worked Example 1.1 Thick Lens


The matrix for the system is simply as below – note the order:








We have two translations. The first translation represents the thickness of the lens and the second translation, by convention, traces the refracted rays back to the origin in z. This is so that, in interpreting the formulae for Cardinal points, we can be sure that they are all referenced to a common origin, located as in Figure 1.19 (#x8_x_8_i293). Positive axial displacement (z) is to the right and a positive radius, R, is where the centre of curvature lies to the right of the vertex. The final matrix is as below:








As both object and image space are in the same media, there is a common focal length, f, i.e. f


 = f


 = f. All relevant parameters are calculated from the above matrix using the formulae tabulated in Section 1.6.2 (#c01_sec_0023).

The focal length, f, is given by:








The formula above is similar to the simple, ‘Lensmaker’ formula for a thin lens. In addition there is another term, linear in thickness, t, which accounts for the lens thickness.

The focal positions are as follows:








The principal points are as follows:











Figure 1.20 (#x8_c01_para_0103) Hubble space telescope schematic.



Of course, since the refractive indices of the object and image spaces are identical, the nodal points are located in the same place as the principal points. If we take the example of a biconvex lens where R


 = −R


, then:








So, for a biconvex lens with a refractive index of 1.5 (#x8_c01_disp_0005), then the principal points lie about one third of the thickness from their respective vertices.




Worked Example 1.2 Hubble Space Telescope


The telescope part of the Hubble Space Telescope instrument is made up of two mirrors, a primary and a secondary. Characteristics of the telescope are shown in Figure 1.20 (#x8_x_8_i311). Data is courtesy of the National Aeronautics and Space Administration.

There are four matrix elements to consider here. First, there is a mirror with a radius of −11.04 m (note sign), followed by a translation of −4.905 m (again note sign). The third matrix element is a mirror (M2) of radius − 1.359 m. Finally, we translate by +4.905 m, so that both the input and output co-ordinates are referenced with respect to the same origin. The matrices are as below:








The focal positions are:








The principal points are at:













Since object and image space are in the same media, then the two focal lengths are the same. In addition, the nodal and principal points are co-located. However, when dealing with mirrors, one must be a little cautious. Each reflection is equivalent to a medium with a refractive index of −1, so that the matrix of a reflective surface will always have a determinant of −1. Therefore, any system having an even number of reflective surfaces, as in this example, then its matrix will have a determinant of 1. As such, the two focal lengths will be the same and principal and nodal points co-located. However, where there are an odd number of reflective surfaces, assuming object and image spaces are surrounded by the same media, then f


 = −f


. In this instance, principal and nodal points are separated by twice the focal length.

Although, in terms of overall length, the telescope is compact, ∼5 m primary–secondary separation, the focal length, at 58 m, is long. The focal length of the instrument is fundamental in determining the ‘plate scale’ the separation of imaged objects (stars, galaxies) at the (second) focal plane as a function of their angular separation. As such, a long focal length, of the order of 60 m, may have been a requirement at the outset. At the same time, for practical reasons, a compact design may also have been desired. One may begin to glance, therefore, at the significance, at the very outset of these very basic calculations in the design of complex optical instruments.




1.6.4 Spreadsheet Analysis


For the examples previously set out, matrix multiplication is a quick and convenient method for calculating the first order parameters of an optical system. Nonetheless, it must be recognised that as systems become more complex, with more optical surfaces, these calculations can become quite tedious. However, these matrix calculations are easy to embed with spreadsheet tools enabling the automatic computation of all cardinal points. By way of example, the previous calculation is set out and automated using a simple spreadsheet tool.








In the exercises that follow, the reader may choose to use this method to simplify calculations.




Further Reading


Born, M. and Wolf, E. (1999). Principles of Optics, 7e. Cambridge: Cambridge University Press. ISBN: 0-521-642221.

Haija, A.I., Numan, M.Z., and Freeman, W.L. (2018). Concise Optics: Concepts, Examples and Problems. Boca Raton: CRC Press. ISBN: 978-1-1381-0702-1.

Hecht, E. (2017). Optics, 5e. Harlow: Pearson Education. ISBN: 978-0-1339-7722-6.

Keating, M.P. (1988). Geometric, Physical, and Visual Optics. Boston: Butterworths. ISBN: 978-0-7506-7262-7.

Kidger, M.J. (2001). Fundamental Optical Design. Bellingham: SPIE. ISBN: 0-81943915-0.

Kloos, G. (2007). Matrix Methods for Optical Layout. Bellingham: SPIE. ISBN: 978-0-8194-6780-5.

Longhurst, R.S. (1973). Geometrical and Physical Optics, 3e. London: Longmans. ISBN: 0-582-44099-8.

Riedl, M.J. (2009). Optical Design: Applying the Fundamentals. Bellingham: SPIE. ISBN: 978-0-8194-7799-6.

Saleh, B.E.A. and Teich, M.C. (2007). Fundamentals of Photonics, 2e. New York: Wiley. ISBN: 978-0-471-35832-9.

Smith, F.G. and Thompson, J.H. (1989). Optics, 2e. New York: Wiley. ISBN: 0-471-91538-1.

Smith, W.J. (2007). Modern Optical Engineering. Bellingham: SPIE. ISBN: 978-0-8194-7096-6.

Walker, B.H. (2009). Optical Engineering Fundamentals, 2e. Bellingham: SPIE. ISBN: 978-0-8194-7540-4.




2

Apertures Stops and Simple Instruments





2.1 Function of Apertures and Stops


In the previous chapter, we were introduced to sequential geometric optics. The simple analysis presented there is contingent upon the paraxial approximation. It is assumed that all rays in their sequential progress through the optical system always subtend a negligibly small angle with respect to the optical axis. In this scenario, the effect of all optical elements may be described in terms of a simple set of linear (in ray height and angle) equations leading to perfect image formation. This analysis, as previously outlined, is referred to as Gaussian optics.

Of course, for real, non-ideal imaging systems, the assumptions underlying the paraxial approximation break down. An inevitable consequence of this is the creation of imperfections or aberrations in the formation of images. A full treatment of these optical aberrations forms the subject of succeeding chapters. In the meantime, consideration of the paraxial approximation might suggest that these imperfections or aberrations would be enhanced for rays that make a large angle with respect to the optical axis. It seems sensible, therefore, to restrict rays emanating from an object to a specific, restricted range of angles. In practice, for most systems, this is done by inserting an opaque obstruction with a circular aperture. This circular aperture is centred on the optical axis and is known as an aperture stop and restricts rays emanating from an object. To further control scattered light, the aperture stop is usually blackened in some manner.

In addition to selecting rays close to the optical axis and thus reducing imperfections, aperture stops also control and define the amount of light entering an optical system. This will be explored in more detail in the chapters relating to radiometry or the study of the analysis and measurement of optical flux. Naturally, the larger the aperture, then the more light is passed through the system. Most usually, the system aperture is formed by a purpose made mechanical aperture that is distinct from the optical elements themselves. However, on occasion, the system aperture may be formed by the physical boundary of an optical component, such as a lens or a mirror. This is true, for example, for a reflecting or refracting telescope, where the boundary of the first, or primary mirror, forms the aperture stop.




2.2 Aperture Stops, Chief, and Marginal Rays


This principle is illustrated in Figure 2.1 (#x9_x_9_i17) which shows an object together with a corresponding aperture stop. Note that the centre of the aperture stop corresponds to the intersection of its plane with the optical axis.

The aperture stop plays an important role in image formation and the analysis of optical systems. There are a number of important definitions relating to the aperture stop and its location. Of key significance is the chief ray which is a ray that that emanates from the object and intersects the plane of the aperture stop at its centre located at the optical axis. The angle, θ, that this ray makes with respect to the optical axis is known as the field angle. Another ray of critical importance is the marginal ray that emanates from the point where the object plane intersects the optic axis and strikes the edge of the aperture. The angle, Δ, the marginal ray makes with the axis effectively defines the size of the half angle of the cone of light emerging from a single on-axis point at the object plane and admitted by the aperture stop. The size of the aperture stop may be described either by its physical size or by the angle subtended. In the latter case, one of the most common ways of describing the aperture of an optical system is in terms of the numerical aperture (NA). The numerical aperture, is the product of the local refractive index, n, and the sine of the marginal ray angle, Δ.






Figure 2.1 (#x9_c02_para_0004) Aperture stop.






(2.1)

A system with a large numerical aperture, allows more light to be collected. Such a system, with a high numerical aperture is said to be ‘fast’. This terminology has its origins in photography, where the efficient collection of light using wide apertures enabled the use of short exposure times. An alternative convention exists for describing the relative size of the aperture, namely the f-number. For a lens system, the f-number, N, is given as the ratio of the lens focal length to the aperture diameter:




(2.2)

This f-number is actually written as f/N. That is to say, a lens with a focal ratio of 10 is written as f/10. The f-number has an inverse relationship to the numerical aperture and is based on the stop diameter rather than its radius. For small angles, where sinΔ = Δ, then the following relationship between the f-number and numerical aperture applies:




(2.3)

In this narrative, it is assumed that the aperture is a circular aperture, with an entire, unobstructed circular area providing access for the rays. In the majority of cases, this description is entirely accurate. However, in certain cases, this circular aperture may be partly obscured by physical or mechanical hardware supporting the optics or by holes in reflective optics. Such features are referred to as obscurations.

At this stage, it is important to emphasise the tension between fulfilment of the paraxial approximation and collection of more light. A ‘fast’ lens design naturally collects more light, but compromises the paraxial approximation and adds to the burden of complexity in lens and optical design. This inherent contradiction is explored in more detail in subsequent chapters.






Figure 2.2 (#x9_c02_para_0010) Location of entrance and exit pupils.






2.3 Entrance Pupil and Exit Pupil



The physical aperture stop may not actually be located conveniently in object space as shown in Figure 2.1 (#x9_x_9_i17). On the other hand, it may be located anywhere within the sequential train of optical components that make up the optical system. An example of this is shown in Figure 2.2 (#x9_x_9_i28), a situation that is true of many camera lenses, where the physical stop is located between lenses.

In the situation described, the entrance pupil is the image of the physical aperture as projected into object space. Correspondingly, the exit pupil is the image of the physical aperture as projected into image space. The exit pupil is located in the conjugate plane to the entrance pupil and may be regarded as the image of the entrance pupil. Along with the cardinal points of a system, the location of the entrance and exit pupils are key parameters that describe an optical system. Most particularly, the numerical aperture in object space is defined by the angle of the marginal ray that intersects the edge of the entrance pupil.




Worked Example 2.1 Cooke Triplet


Figure 2.3 (#x9_x_9_i40) shows a simplified illustration of an early type of camera lens, the Cooke triplet.

By convention, image space is assumed to be on the left-hand side of the illustration. All lenses are assumed to have no tangible thickness (thin lens approximation) and the axial origin lies at the first lens. Positive axial displacement is to the right.






Figure 2.3 (#x9_c02_para_0012) Cooke triplet.





i. Position and Size of Exit Pupil


It is easiest, first of all, to calculate the position of the exit pupil, as this is the stop imaged by a single lens (the third lens) of focal length 32.8 mm. The position of the aperture stop, the object in this instance, is 6.4 mm to the left of this lens. The distance, v, of the exit pupil from the third lens is therefore given by:








Thus, the exit pupil is 7.95 mm to the left of the third lens and 8.05 mm from the origin. The magnification is given by (minus) the ratio of image and object distances and so it is easy to calculate the size of the exit pupil:








ii. Cardinal Points of the Lens


The distance between the first and second lenses is 6.8 mm and between the second and third lenses is 9.2 mm. By convention, we retrace dummy rays −16 mm back to the origin at the first lens, so that all matrix ray tracing formulae are referred to a common origin. The matrix for the system is given below:








To calculate the position of the exit pupil we need to know the focal length of the system and the positions of the two focal points. Following the matrix relations set out in Chapter 1 (#x8_x_8_i3), we can calculate the following:

Focal length: 52.3 mm

Location of First Focal Point: −41.2 mm

Location of Second Focal Point: 57.7 mm

All distances are referenced to the axial origin at the first lens. There is, of course, a single effective focal length as both object and image spaces are considered to lie within media of the same refractive index.



iii. Position and Size of the Entrance Pupil


The imaged pupil or exit pupil lies in image space, 8.05 mm from the origin. This is 49.65 mm to the left of the second focal point. In applying Newton's formula, the second focal distance, x


 is then equal to −49.65. We can now calculate the first focal distance to determine the position of the entrance pupil.








The object or entrance pupil therefore lies 55.1 mm to the right of the first focal point and 13.9 mm (−41.2 + 55.1) to the right of the first lens.

The location of the entrance pupil expressed as an object distance is 52.3 − 55.1 or −2.8 mm. Similarly the location of the exit pupil expressed as an image distance is equal to −49.65 + 52.3 or +2.65 mm. The magnification (image/object) is, in this instance equal to 2.65/2.8 or 0.946. Therefore we have:








The diameter of the entrance pupil is, therefore, 15.1 mm

So, in summary we have:











Figure 2.4 (#x9_c02_para_0032) Optical system with a telecentric output.






2.4 Telecentricity


In the previous example, both entrance and exit pupils were located at finite conjugates. However, a system is said to be telecentric if the exit pupil (or entrance pupil) is located at infinity. In the case of a telecentric output, this will occur where the entrance pupil is located at the first focal point. In this instance, all chief rays will, in image space, be parallel. This is shown in Figure 2.4 (#x9_x_9_i67) which illustrates a telecentric output for two different field positions.

A telecentric output, as represented in Figure 2.4 (#x9_x_9_i67) is characterised by a number of converging ray bundles, each emanating from a specific field location, whose central or chief rays are parallel. There are a number of instances where optical systems are specifically designed to be telecentric. Telecentric lenses, for instance, have application in machine vision and metrology where non-telecentric output can lead to measurement errors for varying (object) axial positions.




2.5 Vignetting


The aperture stop is the principal means for controlling the passage of rays through an optical system. Ideally, this would be the only component that controls the admission of light to the optical system. In practice, other optical surfaces located away from the aperture stop may also have an impact on the admission of light into the system. This is because these optical components, for reasons of economy and other optical design factors, have a finite aperture. As a consequence, some rays, particularly those for larger field angles, may miss the lens or component aperture altogether. So, in this case, for field positions furthest from the optical axis, some of the rays will be clipped. This process is known as vignetting. This is shown in Figure 2.5 (#x9_x_9_i79).

Vignetting tends to darken the image for objects further away from the optical axis. As such, it is an undesirable effect. At the same time, it can be used to control optical imperfections or aberrations by deliberately removing more marginal rays.






Figure 2.5 (#x9_c02_para_0034) Vignetting.






2.6 Field Stops and Other Stops


In addition to the aperture stop, an optical system might also contain a field stop. This is an aperture located in a plane that is conjugate with the image plane. Its first purpose is to provide a crisp (often circular) boundary to the viewable image. Secondly, it excludes light from object locations lying outside the area of interest. In so doing, the field stop reduces the level of unwanted light that might otherwise be scattered into the image plane and so reduce image contrast. For the same reason, other, intermediate stops may be introduced into an optical design in order to further reduce the level of scattered light.




2.7 Tangential and Sagittal Ray Fans


The analysis pursued hitherto has considered the propagation of rays in a single plane. From an analytical perspective, for ray tracing in an ideal system and determining the cardinal points of that system, this is a perfectly acceptable approach. However, in reality, rays are not necessarily confined to the plane containing the object and the optical axis. With the selection of rays delineated by a two-dimensional, circular aperture, we must expect some rays to be out of this plane. A group of co-planar rays, emanating from a single object point and bounded by the entrance pupil is referred to as a ray fan. A ray fan that lies in the plane defined by the object and optical axis is known as the tangential ray fan. The sagittal ray fan emanates from the same object point and lies in a plane that is perpendicular to that of the tangential ray fan. This is illustrated in Figure 2.6 (#x9_x_9_i93).

The tangential ray fan is also referred to as the meridional ray fan; the two terms are equivalent. In general any ray that is not in the tangential plane, i.e. not a tangential ray, is referred to as a skew ray. A skew ray will never cross the optic axis.




2.8 Two Dimensional Ray Fans and Anamorphic Optics


The introduction of two distinct sets of ray fans, tangential and sagittal, together with the inclusion of skew rays confirms that sequential ray propagation in an axial geometry is essentially a two-dimensional problem. Hitherto, all discussion and, in particular, the matrix analysis, has been presented in a strictly one-dimensional form. However, the strict description of a ray in two dimensions requires the definition of four parameters, two spatial and two angular. In this more complete description, a ray vector would be written as:






Figure 2.6 (#x9_c02_para_0037) (a) Tangential ray fan; (b) Sagittal ray fan.








hx is the x component of the distance of the ray from the optical axis

θx is the x component of the angle of the ray to the optical axis

hy is the y component of the distance of the ray from the optical axis

θy is the y component of the angle of the ray to the optical axis


In this two dimensional representation, the matrix element representing each optical element would be a 4 × 4 matrix instead of a 2 × 2 matrix. However, the matrix is not fully populated in any realistic scenario. For a rotationally symmetric optical system, as we have been considering thus far, there can only be four elements:








That is to say, the impact of each optical surface is identical in both the x and y directions in this instance. However, there are optical components where the behaviour is different in the x and y directions. An example of this might be a cylindrical lens, whose curvature in just one dimension produces focusing only in one direction. The two dimensional matrix for a cylindrical lens would look as follows:








A component that possesses different paraxial properties in the two dimensions is said to be anamorphic. A more general description of an anamorphic element is illustrated next:








Note there are no non-zero elements connecting ray properties in different dimensions, x and y. This would require the surfaces produce some form of skew behaviour and this is not consistent with ideal paraxial behaviour. Since this is the case, the two orthogonal components, x and y, can be separated out and presented as two sets of 2 × 2 matrices and analysed as previously set out. All relevant optical properties, cardinal points are then calculated separately for x and y components. Even if focal points are identical for the two dimensions, the principal planes may not be co-located. This gives rise to different focal lengths for the x and y dimension and potentially differential image magnification. This differential magnification is referred to as anamorphic magnification. Significantly, in a system possessing anamorphic optical properties, the exit pupil may not be co-located in the two dimensions.




2.9 Optical Invariant and Lagrange Invariant


The field angle, i.e. the angle of the chief ray and the marginal ray angles, will change as the rays propagate through an optical system. The relationship between these angles is inherently constrained by the magnification properties of the optical system in the paraxial approximation. The optical invariant is a parameter that, in the paraxial approximation, constrains the relationship between any two rays that propagate through an optical system. We now have two general rays as described by their ray vectors:








The optical invariant, O, is given by:




(2.4)

The optical invariant is, in the paraxial approximation, preserved on passage through an optical system. That is to say:




(2.5) (#x9_x_9_i122)

n′, h′, θ′, etc. are ray parameters following propagation.

Derivation of the above invariant is straightforward using matrix analysis.








Hence:








From (1.23) we know that the determinant of the matrix is given by the ratio of the refractive indices in the relevant media, so:








Finally we arrive at Eq. (2.5) (#x9_c02_disp_0015)








The optical invariant is a generalised constraint that relates system lateral and angular magnification and applies to any arbitrary pair of rays. A very specific descriptor is created when the ray pair consists of the chief ray and the marginal ray. This special case of the optical invariant is known as the Lagrange invariant. The Lagrange invariant, H is given by:




(2.6)

If we now simply evaluate H at the entrance and exit pupils where, by definition, h


 is zero, then the product nh


θ


 is constant. The Lagrange invariant then simply articulates the fact that the angular and lateral magnifications are inversely related. In fact, the Lagrange invariant captures a more fundamental constraint to an optical system. If the object plane is uniformly illuminated, then the total light flux emanating from the plane is proportional to the square of the maximum field angle. The proportion of that flux that is admitted by the entrance pupil is itself proportional to the square of the marginal ray height. Therefore, the total flux passing through an optical system is proportional to the square of the Lagrange invariant, H


. Thus the Lagrange invariant is an expression of the conservation of energy as light propagates through an optical system. This will become of paramount significance when, in later chapters, we consider source brightness or radiance and the impact of the optical system on optical flux flowing through it.




2.10 Eccentricity Variable


The eccentricity variable, E, is a measure of how far an axial location in an optical system is from the stop. It is expressed in terms of the chief ray to marginal ray height at that particular location. Of course, at the pupil (entrance or exit) itself, the eccentricity variable will be zero. The eccentricity variable is defined as:




(2.7)

E is of course infinite at the focal point of a system. The variable is of great significance in the analysis of optical imperfections or aberrations where the distance of a component from the aperture stop is of critical importance.




2.11 Image Formation in Simple Optical Systems



These introductory chapters provide a complete description of ideal optical systems. That is to say, in the paraxial approximation, where imaging imperfections, or aberrations may be ignored, the analysis presented is substantially complete. Some very simple optical instruments are introduced at this point; their deficiencies are discussed later.




2.11.1 Magnifying Glass or Eye Loupe


The magnifying glass or eye loupe is perhaps the simplest optical system conceivable, in that is consists of a single lens that is intended to be used with the eye to magnify close objects. Our ability to resolve small, close objects is limited by our ability to focus at close quarters. Typically, although this varies with age and other factors, a comfortable distance for viewing near objects is about 250 mm. If the eye can resolve an angle of 1 arcminute, then this corresponds to a resolution of somewhat under 0.1 mm. Addition of a simple lens allows the eye to view objects at a much shorter distance. This is shown in Figure 2.7 (#x9_x_9_i143).

For the two cases illustrated in Figure 2.7 (#x9_x_9_i143), the eye's focussing power remains the same. Therefore, addition of a lens of focal length f will change the closest approach distance, d


, to:











Figure 2.7 (#x9_c02_para_0057) Simple magnifying lens.



If the magnification, M, provided by the lens is defined as the ratio of the final image sizes in the two scenarios, the magnification is given by:




(2.8)

In describing magnifying lenses, as suggested earlier, d


 is defined to be 250 mm. Thus, a lens with a focal length of 250 mm would have a magnification of ×2 and a lens with a focal length of 50 mm would have a magnification of ×6. In practice, simple lenses are only useful up to a magnification of ×10. This is partly because of the introduction of unacceptable aberrations, but also because of the impractical short working distances introduced by lenses with a focal length of a few mm. For higher magnifications, the compound microscope must be used.

Naturally, the pupil of this simple system is defined by the pupil of the eye itself. The size of the eye's pupil varies from about 3 mm in bright light, to about 7 mm under dim lighting conditions, although this varies with individuals.




2.11.2 The Compound Microscope


In the preceding subsection, the limitations of a simple magnifying lens were made clear. Overall, its functionality, in delivering a high magnification, is to convey an intermediate image, located at the infinite conjugate, to the human eye. Furthermore, to provide maximum magnification, the focal length of this lens must be as small as possible. For practical reasons, a magnification of greater than ×10 cannot be delivered. This difficulty is solved by the compound microscope where a two-lens system is used to provide a system focal length that is considerably shorter than would be afforded by a single lens. In essence, a compound microscope consists of two lenses, or lens groups. The first lens is the objective lens that lies close to the object and the second lens, in the traditional microscope, is the eyepiece. Of course, in many modern instruments, the eye is replaced by a pixellated detector chip. Nonetheless the logic followed here still applies. Figure 2.8 (#x9_x_9_i161) shows the general set up.

The two lenses are separated by the nominal tube length, d, and an intermediate image is formed by the objective lens within the tube. The eyepiece then presents the final image to the eye at the infinite conjugate. In other words, the intermediate image is designed to be located at a distance f


 (the eyepiece focal length) from the eyepiece. If the objective lens focal length is f


, then the matrix of the system is:








The entire co-ordinate system is referenced to the position of the objective lens. Of particular relevance here is the first focal length. From the above matrix we have the following equation for the system focal length:




(2.9) (#x9_x_9_i155)

The logic of Eq. (2.9) (#x9_c02_disp_0025) is that a shorter system focal length can be created than would be reasonably practical with a single lens. Using the same definition as used for the simple magnifying lens, the effective system magnification, M


, is given by the ratio of the closest approach distance d


, (250 mm), and the system focal length. The system magnification, is given by:




(2.10) (#x9_x_9_i157)

The bracketed quantity, (s − f


 − f


), i.e. the lens separation minus the sum of the lens focal lengths is known as the optical tube length of the microscope, and this will be denoted as d. Generally, for optical microscopes, this tube length is standardised across many commercial instruments with the standard values being 160 or 200 mm. Equation (2.10) (#x9_c02_disp_0026) may be rewritten as:






Figure 2.8 (#x9_c02_para_0062) Compound microscope.






(2.11)

The above formula gives the total magnification of the instrument as the product of the individual magnifications of the objective lens and eyepiece. In this context, these individual magnifications are defined as in Eqs. (2.12a) (#x9_c02_disp_0028) and (2.12b) (#x9_c02_disp_0029):




(2.12a) (#x9_x_9_i163)




(2.12b) (#x9_x_9_i163)

The equations above establish the standard definitions for microscope lens powers. For example, the magnification of microscope objectives is usually in the range of ×10 to ×100. For a standard tube length, d, of 160 mm, this corresponds to an objective focal length ranging from 16 to 1.6 mm. A typical eyepiece, with a magnification of ×10 has a focal length of 25 mm (d


 = 250 mm). By combining a ×100 objective lens with a ×10 eyepiece, a magnification of ×1000 can be achieved. This illustrates the power of the compound microscope.

The entrance pupil is defined by the aperture of the objective lens. This entrance pupil is re-imaged by the eyepiece to create an exit pupil that is close to the eyepiece. Ideally, this should be co-incident with the pupil of the eye. The distance of the exit pupil from final mechanical surface of the eyepiece is known as the eye relief. Placing the exit pupil further away from the physical eyepiece provides greater comfort for the user, hence the term ‘eye relief’. Objective lens aperture tends to be defined by numerical aperture, rather than f-number and range from 0.1 to 1.3 (for oil immersion microscopes).






Figure 2.9 (#x9_c02_para_0071) Basic optical telescope.






2.11.3 Simple Telescope


A classical optical telescope is an example of an afocal system. That is to say, no clearly defined focus is presented either in object or image space. As the name suggests, the telescope views distant objects, nominally at the infinite conjugate and provides a collimated output for ocular viewing in the case of a traditional instrument. As far as the instrument is concerned, both object and image are located at the infinite conjugate. Of course, this narrative does assume that the instrument is designed for ocular viewing as opposed to image formation at a detector or photographic plate. In any case, the design principles are similar. Fundamentally, the telescope provides angular magnification of a distant object, and this angular magnification is a key performance attribute.

The basic layout of a simple telescope is shown in Figure 2.9 (#x9_x_9_i171). Light from the distant object is collected by an objective lens whose focal length is f


 and then collimated by an eyepiece with a focal length of f


. These two lenses are separated by the sum of their focal lengths, thus creating an afocal system with an angular magnification given by the ratio of the lens focal lengths.

The matrix of the telescope is similar to that of the compound microscope, with an objective lens and eyepiece separated by some fixed distance.








The separation, s, is simply the sum of the two focal lengths and the system matrix is given by:




(2.13)

The angular magnification (the D value of the matrix) is simply −f


/f


. It is important to note the sign of the magnification, so that for two positive lenses, then the magnification is negative. In line with the previous discussion with regard to the optical invariant, the linear magnification (given by matrix element A) is the inverse of the angular magnification. Also, the C element of the matrix, attesting to the focal power of the system, is actually zero and is characteristic of an afocal system.

As in the case of the microscope, the objective lens forms the system entrance pupil. The exit pupil is formed by the eyepiece imaging the objective lens. This is located a short distance, approximately f


 from the eyepiece, this distance determining the ‘eye relief’. Ideally, for ocular viewing, the pupil of the eye should be co-incident with the exit pupil. Unlike the compound microscope, the exit pupil of a simple (ocular) telescope is relatively large, about the size of the pupil of the eye. Clearly, if the exit pupil were significantly larger than the pupil of the eye, then any light falling outside the ocular pupil would be wasted. In fact, in a typical telescope, where f1 ≫ f2, the size of the exit pupil is approximately given by the diameter of the objective lens multiplied by the ratio of the focal lengths.

As an example, a small astronomical refracting telescope might comprise a 75 mm diameter objective lens with a focal length of 750 mm (f/10) and might use a ×10 eyepiece. Eyepiece magnification is classified in the same way as for microscope eyepieces and so the focal length of this eyepiece would be 25 mm, as derived from Eq. (2.12b) (#x9_c02_disp_0029). The angular magnification (f


/f


) would be ×30 and the size of the pupil about 3 mm, which is smaller than the pupil of the eye.

In the preceding discussion, the basic description of the instrument function assumes ocular viewing, i.e. viewing through an eyepiece. However, increasingly, across a range of optical instruments, the eye is being replaced by a detector chip. This is true of microscope, telescope, and camera instruments.




2.11.4 Camera


In essence, the function of a camera is to image an object located at the infinite conjugate and to form an image on a light sensitive planar surface. Of course, traditionally, this light sensitive surface consisted of a film or a plate upon which a silver halide emulsion had been deposited. This allowed the recording of a latent image which could be chemically developed at a later stage. Depending upon the grain size of the silver halide emulsion, feature sizes of around 10–20 μm or so could be resolved. That is to say, the ultimate system resolution is limited by the recording media as well as the optics. For the most part, this photographic film has now been superseded by pixelated silicon detectors, allowing the rapid and automatic capture and processing of images. These detectors are composed of a rectangular array of independent sensor areas (usually themselves rectangular) that each produce a charge in proportion to the amount of light collected. Resolution of these detectors is limited by the pixel size which is analogous to the grain size in photographic film. Pixel size ranges from a one micron to a few microns.

Optically from a paraxial perspective, the camera is an exceptionally simple instrument. Its purpose is simply to image light from an object located at the infinite conjugate onto the focal plane, where the sensor is located. As such, from a system perspective one might regard the camera as a single lens with the sensor located at the second focal point. This is illustrated in Figure 2.10 (#x9_x_9_i192).

If this system is the essence of simplicity, then the Pinhole Camera, a very early form of camera, takes this further by dispensing with the lens altogether! A pinhole camera relies on a very small system aperture (a pinhole) defining the image quality. In this embodiment of the camera, all rays admitted by the entrance pupil follow closely the chief ray. However, light collection efficiency is low. Whilst in the paraxial approximation, the camera presents itself as a very simple instrument, as indeed early cameras were, the demands of light collection efficiency require the use of a large aperture which results in the breakdown of the paraxial approximation. As we shall see in later chapters, this leads to the creation of significant imperfections, or aberrations, in image formation which can only be combatted by complex multi-element lens designs. Thus, in practice, a modern camera, i.e. its lens, is a relatively complex optical instrument.






Figure 2.10 (#x9_c02_para_0079) Basic camera.



In defining the function of the camera, we spoke of the imaging of an object located at infinity. In this context, ‘infinity’ means a substantially greater object distance than the lens focal length. For the traditional 35 mm format photographic camera, a typical standard lens focal length would be 50 mm. The ‘35 mm’ format refers to the film frame size which was 36 mm × 24 mm (horizontal × vertical). As mentioned in Chapter 1 (#x8_x_8_i3), the focal length of the camera lens determines the ‘plate scale’ of the detector, or the field angle subtended per unit displacement of the detector. Overall, for this example, plate scale is 1.15° mm


. The total field covered by the frame size is ±20° (Horizontal) × ±13.5° (Vertical). ‘Wide angle’ lenses with a shorter focal length lens (e.g. 28 mm) have a larger plate scale and, naturally a wider field angle. By contrast, telephoto lenses with longer focal lengths (e.g. 200 mm), have a smaller plate scale, thus producing a greater magnification, but a smaller field of view.

Modern cameras with silicon detector technology are generally significantly more compact instruments than traditional cameras. For example, a typical digital camera lens might have a focal length of about 8 mm, whereas a mobile phone camera lens might have a focal length of about half of this. The plate scale of a digital camera is thus considerably larger than that of the traditional camera. Overall, as dictated by the imaging requirements, the field of view of a digital camera is similar to its traditional counterpart, although, in practice, equivalent to that of a wide field lens. Therefore, in view of the shorter focal length, the detector size in a digital camera is considerably smaller than that of a traditional film camera, typically a few mm. Ultimately, the miniaturisation of the digital camera is fundamentally driven by the resolution of the detector, with the pixel size of a mobile phone camera being around 1 μm. This is over an order of magnitude superior to the resolution, or ‘grain size’ of a high specification photographic film.




Further Reading


Haija, A.I., Numan, M.Z., and Freeman, W.L. (2018). Concise Optics: Concepts, Examples and Problems. Boca Raton: CRC Press. ISBN: 978-1-1381-0702-1.

Hecht, E. (2017). Optics, 5e. Harlow: Pearson Education. ISBN: 978-0-1339-7722-6.

Keating, M.P. (1988). Geometric, Physical, and Visual Optics. Boston: Butterworths. ISBN: 978-0-7506-7262-7.

Kidger, M.J. (2001). Fundamental Optical Design. Bellingham: SPIE. ISBN: 0-81943915-0.

Kloos, G. (2007). Matrix Methods for Optical Layout. Bellingham: SPIE. ISBN: 978-0-8194-6780-5.

Longhurst, R.S. (1973). Geometrical and Physical Optics, 3e. London: Longmans. ISBN: 0-582-44099-8.

Smith, F.G. and Thompson, J.H. (1989). Optics, 2e. New York: Wiley. ISBN: 0-471-91538-1.




3

Monochromatic Aberrations





3.1 Introduction


In the first two chapters, we have been primarily concerned with an idealised representation of geometrical optics involving perfect or Gaussian imaging. This treatment relies upon the paraxial approximation where all rays present a negligible angle with respect to the optical axis. In this situation, all primary optical ray behaviour, such as refraction, reflection, and beam propagation, can be represented in terms of a series of linear relationships involving ray heights and angles. The inevitable consequence of this paraxial approximation and the resultant linear algebra is apparently perfect image formation. However, for significant ray angles, this approximation breaks down and imperfect image formation, or aberration, results. That is to say, a bundle of rays emanating from a single point in object space does not uniquely converge on a single point in image space.

This chapter will focus on monochromatic aberrations only. These aberrations occur where there is departure from ideal paraxial behaviour at a single wavelength. In addition, chromatic aberration can also occur where first order paraxial properties of a system, such as focal length and cardinal point locations, vary with wavelength. This is generally caused by dispersion, or the variation in the refractive index of a material with wavelength. Chromatic aberration will be considered in the next chapter.

A simple scenario is illustrated in Figure 3.1 (#x10_x_10_i20) where a bundle of rays originating from an object located at the infinite conjugate is imaged by a lens. Figure 3.1 (#x10_x_10_i20)a presents the situation for perfect imaging and Figure 3.1 (#x10_x_10_i20)b illustrates the impact of aberration.

In Figure 3.1 (#x10_x_10_i20)b, those rays that are close to the axis are brought to a focus at the paraxial focus. This is the ideal focus. However, those rays that are further from the axis are brought to a focus at a point closer to the lens than the paraxial focus. In fact, the behaviour illustrated in Figure 3.1 (#x10_x_10_i20)b is representative of a simple lens; marginal rays are brought to a focus closer to the lens than the chief ray. However, in general terms, the sense of the aberration could be either positive or negative, with the marginal rays coming to a focus either before or after the paraxial focus.




3.2 Breakdown of the Paraxial Approximation and Third Order Aberrations


In formulating perfect or Gaussian imaging we assumed all relationships are linear. For example, Snell's law of refraction was reduced in the following way:




(3.1)

In making the paraxial approximation, we are considering just the first or linear term in the Taylor series. The next logical stage in the process is to consider higher order terms in the Taylor series.




(3.2) (#x10_c03_para_0007)






Figure 3.1 (#x10_c03_para_0003) (a) Gaussian imaging. (b) Impact of aberration.



Following the term that is linear in θ, we have terms that are cubic or third order in θ. Of course, these third order terms are followed by fifth and seventh order terms etc. in succession. Third order aberration theory deals exclusively with those imperfections associated with the third order departure from ideal behaviour, as illustrated in Eq. (3.2) (#x10_c03_disp_0002). Much of classical aberration theory is restricted to consideration of these third order terms and is, in effect a refinement or successive approximation to paraxial theory. Higher order (≥5) terms can be important in practical design scenarios. However, these are generally dealt with by numerical computation, rather than by a simple generically applicable theory.

Third order aberration theory forms the basis of the classical treatment of monochromatic aberrations. Unless specific steps are taken to correct third order aberrations in optical systems, then third order behaviour dominates. That is to say, error terms in the ray height or angle (compared to the paraxial) have a cubic dependence upon the angle or height. As a simple illustration of this, Figure 3.1 (#x10_x_10_i20)b shows rays originating from a single object (at the infinite conjugate). For perfect image formation, the height of all rays at the paraxial focus should be zero, as in Figure 3.1 (#x10_x_10_i20)a. However, the consequence of third order aberration is that the ray height at the paraxial focus is proportional to the third power of the original ray height (at the lens).

In dealing with third order aberrations, the location of the entrance pupil is important. Let us assume, in the example set out in Figure 3.1 (#x10_x_10_i20)b, that the pupil is at the lens. If the radius of the entrance pupil is r


 and the height a specific ray at this point is h, then we may define a new parameter, the normalised pupil co-ordinate, p, in the following way:




(3.3)

The normalised pupil co-ordinate can have values ranging from −1 to +1, with the extremes representing the marginal ray. The chief ray corresponds to p = 0. At this stage, it is useful to provide a specific and quantifiable definition of aberration. The quantity, transverse aberration, is defined as the difference in height of a specific ray and the corresponding chief ray as measured at the paraxial focus. The ‘corresponding chief ray’ emanates from the same object point as the ray under consideration. In addition, the term longitudinal aberration is also used to describe aberration. Longitudinal aberration (LA) is the axial distance from the point at which the ray in question intersects the chief ray and the location of the paraxial focus. The transverse aberration (TA) and longitudinal aberration definitions are illustrated in Figure 3.2 (#x10_x_10_i32).

In keeping with the previous arguments, the TA has a third order dependence upon the pupil function. This is illustrated in Eq. (3.4) (#x10_c03_disp_0004):




(3.4) (#x10_x_10_i26)

Transverse aberration has dimensions of length, whereas the pupil function is a dimensionless ratio. Geometrically, the LA is approximately equal to the transverse aberration divided by the ray angle which itself is proportional to the pupil function. Therefore, the longitudinal aberration has a quadratic dependence upon the pupil function. This is illustrated in Eq. (3.5) (#x10_c03_disp_0005).






Figure 3.2 (#x10_c03_para_0010) Transverse and longitudinal aberration.






(3.5) (#x10_x_10_i28)

In fact, if the radius of the pupil aperture is r


 and the lens focal length is f, then the longitudinal and transverse aberration are related in the following way:




(3.6) (#x10_c03_para_0020)

NA is the numerical aperture of the lens.

A plot of the transverse aberration against the pupil function is referred to as a ‘ray fan’. Ray fans are widely used to provide a simple description of the fidelity of optical systems. If one views the transverse aberration at the paraxial focus, then the transverse aberration should show a purely cubic dependence upon the pupil function. This is illustrated in Figure 3.3 (#x10_x_10_i45)a which shows the aberrated ray fan. If, one the other hand, the transverse aberration is plotted away from the paraxial focus, then an additional linear term is present in the plot. This is because pure defocus (i.e. without third order aberration) produces a transverse aberration that is linear with respect to pupil function. This is illustrated in Figure 3.3 (#x10_x_10_i45)b which shows a ray fan where both the linear defocus and third order aberration terms are present.

The underlying amount of third order aberration is the same in both plots. However, the overall transverse aberration in Figure 3.3 (#x10_x_10_i45)b (plotted on the same scale) is significantly lower than that seen in Figure 3.3 (#x10_x_10_i45)a. This is because defocus can, to some extent, be used to ‘balance’ the original third order aberration. As a result, by moving away from the paraxial focus, the size of the blurred spot is reduced. In fact, there is a point at which the size (root mean square radius) of the spot is minimised. This optimum focal position is referred to as the circle of least confusion. This is illustrated in Figure 3.4 (#x10_x_10_i51).

Most generally, the transverse aberration where third order aberration is combined with defocus can be represented as:




(3.7)

TA0is the nominal third order aberration and α represents the defocus






Figure 3.3 (#x10_c03_para_0015) (a) Ray fan for pure third order aberration. (b) Ray fan with third order aberration and defocus.



Since the geometry is assumed to be circular, to calculate the rms (root mean square) aberration, one must introduce a weighting factor that is proportional to the pupil function, p. The mean squared transverse aberration is thus:




(3.8)






Figure 3.4 (#x10_c03_para_0016) Balancing defocus against aberration – optimal focal position.



The expression is minimised where α = −2/3. To understand the significance of this, examination of Eq. (3.6) (#x10_c03_disp_0006) suggests that, without defocus, the marginal ray (p = 1) has a longitudinal aberration of TA


/NA. The defocus term itself produces a constant longitudinal aberration or defocus of αTA


/NA. Therefore, the optimum defocus is equivalent to placing the adjusted focus at 2/3 of the distance between the paraxial and marginal focus, as shown in Figure 3.4 (#x10_x_10_i51). Without this focus adjustment, with the third order aberration viewed at the paraxial focus, the rms aberration is TA


/4. However, adding the optimum defocus reduces the rms aberration to TA


/12, a reduction by a factor of 3.

This analysis provides a very simple introduction to the concept of third order aberrations. In the basic illustration so far considered, we have looked at the example of a simple lens focussing an on axis object located at infinity. In the more general description of monochromatic aberrations that we will come to, this simple, on-axis aberration is referred to as spherical aberration. In developing a more general treatment of aberration in the next sections we will introduce the concept of optical path difference (OPD).




3.3 Aberration and Optical Path Difference


In the preceding section, we considered the impact of optical imperfections on the transverse aberration and the construction of ray fans. Unfortunately, this treatment, whilst providing a simple introduction, does not lead to a coherent, generalised description of aberration. At this point, we introduce the concept of optical path difference (OPD). For a perfect imaging system, with no aberration, if all rays converge onto the paraxial focus, then all ray paths must have the same optical path length from object to image. This is simply a statement of Fermat's principle. We now consider an aberrated system where we accurately (not relying on the paraxial approximation) trace all rays through the system from object to image. However, at the last surface, we (hypothetically) force all rays to converge onto the paraxial focus. For all rays, we compute the optical path from object to image. The OPD is the difference between the integrated optical path of a specified ray with respect to the optical path of the chief ray. Of course, if there were no aberration present, the OPD would be zero. Thus, the OPD represents a quantitative description of the violation of Fermat's principle.

The general concept is shown in Figure 3.5 (#x10_x_10_i61). Rays are accurately traced from the object through the system, emerging into image space. That is to say, ray tracing proceeds until the last optical surface, mirror or lens etc. Following the preceding discussion, at some point, we force all rays to converge upon the paraxial focus. However, the convention for computing OPD is that all rays are traced back to a spherical surface centred on the paraxial focus and which lies at the exit pupil of the system. Of course, it must be emphasised that the real rays do not actually follow this path. In the generic system illustrated, the real ray is traced to point P located in object space and the optical path length computed. Thereafter, instead of tracing the real ray into the object space, a dummy ray is traced, as shown by the dotted line. This dummy ray is traced from point P to point Q that lies on the reference surface – a sphere located at the exit pupil and centred on the paraxial focus. The optical path length of this segment is then added to the total.






Figure 3.5 (#x10_c03_para_0023) Illustration of optical path difference.



After calculating the optical path length for the dummy ray OPQ, we need to calculate the OPD with respect to the chief ray. The chief ray path is calculated from the object to its intersection with the reference sphere at the pupil, represented, in this instance, by the path OR. In calculating the OPD, the convention is that the OPD is the chief ray optical path (OR) minus the dummy ray optical path (OPQ). Note the sign convention.








Having established an additional way of describing aberrations in terms of the violation of Fermat's principle, the question is what is the particular significance and utility of this approach? The answer is that, when expressed in terms of the OPD, aberrations are additive through a system. As a consequence of this, this treatment provides an extremely powerful general description of aberrations and, in particular, third order aberrations. Broadly, aberrations can be computed for individual system elements, such as surfaces, mirrors, or lenses and applied additively to the system as a whole. This generality and flexibility is not provided by a consideration of transverse aberrations.

There is a correspondence between transverse aberration and OPD. This is illustrated in Figure 3.6 (#x10_x_10_i72). At this point, we introduce a concept that is related to that of OPD, namely wavefront error(WFE). We must remember that, according to the wave description, the rays we trace through the system represent normals to the relevant wavefront. The wavefront itself originates from a single object point and represents a surface of equal phase. As such, the wavefront represents a surface of equal optical path length. For an aberrated optical system, the surface normals (rays) do not converge on a single point. In Figure 3.6 (#x10_x_10_i72), this surface is shown as a solid line. A hypothetical spherical surface, shown as a dashed line, is now added to represent rays converging on the paraxial focus. This surface intersects the real surface at the chief ray position. The distance between these two surfaces is the WFE.

In terms of the sign convention, the wavefront error, WFE, is given by:








The sign convention is important, as it now concurs with the definition of OPD. As the wavefronts form surfaces of constant optical path length, there is a direct correspondence between OPD and WFE. A positive OPD indicates the optical path of the ray at the reference sphere is less than that of the chief ray. Therefore, this ray has to travel a small positive distance to ‘catch up’ with the chief ray to maintain phase equality. Hence, the WFE is also positive.






Figure 3.6 (#x10_c03_para_0026) Wavefront representation of aberration.






Figure 3.7 (#x10_c03_para_0029) Simplified wavefront and ray geometry.



Both OPD and WFE quantify the violation of Fermat's principle in the same way. OPD is generally used to describe the path length difference of a specific ray. WFE tends to be used when describing OPD variation across an assembly of rays, specifically across a pupil. The concept of WFE enables us to establish the relationship between OPD and transverse aberration in that it helps define the link between wave (phase and path length) geometry and ray geometry. This is shown in Figure 3.7 (#x10_x_10_i76). It is clear that the transverse aberration is related to the angular difference between the wavefront and reference sphere surfaces.

We now describe the WFE, Φ, as a function of the reference sphere (paraxial ray) angle, θ. The radius of the reference sphere (distance to the paraxial focus) is denoted by f. This allows us to calculate the difference in angle, Δθ, between the real and paraxial rays. This is simply equal to the difference in local slope between the two surfaces.




(3.9)

n is the medium refractive index.

In this analysis, the WFE represents the difference between the real and reference surfaces with the positive axial direction represented by the propagation direction (from object to image). In this convention, the WFE has the opposite sign to the OPD. The transverse aberration, t, can be derived from simple trigonometry.




(3.10) (#x10_x_10_i83)

If θ describes the angle the ray makes to the chief ray, then Eq. (3.10) (#x10_c03_disp_0012) may be reformed in terms of the numerical aperture, NA. The numerical aperture is equal to nsinθ, and Eq. (3.11) (#x10_c03_disp_0013) may be recast as:




(3.11) (#x10_x_10_i83)

So, the transverse aberration may be represented by the first differential of the WFE with respect to the numerical aperture. In terms of third order aberration theory, the numerical aperture of an individual ray is directly proportional to the normalised pupil function, p. If the overall system, or marginal ray, numerical aperture is NA


, then the individual ray numerical aperture is simply NA


p. The transverse aberration is then given by:




(3.12) (#x10_c03_para_0035)

Equation (3.12) (#x10_c03_disp_0014) provides a simple direct relationship between OPD and transverse aberration. Of course, we know that, for third order aberration, the transverse aberration is proportional to the third power of the pupil function, p. If this is the case, then it is apparent, from Eq. (3.12) (#x10_c03_disp_0014), that the OPD is proportional to the fourth power of the pupil function. So, for third order aberration, the transverse aberration shows a third power dependence upon the pupil function whereas the OPD shows a fourth power dependence.

Applying these arguments to the analysis of the simple on-axis example illustrated earlier, with the object placed at the infinite conjugate, then the WFE can be represented by the following equation:




(3.13)

p is the normalised pupil function.

Figure 3.8 (#x10_x_10_i100) shows a plot of the OPD against the normalised pupil function; such a plot is referred to as an OPD fan.

Despite the fact that this simple aberration has a quartic dependence on the pupil function, it is still referred to as third order aberration after the transverse aberration dependence. As with the optimisation of transverse aberration, the OPD can be balanced by applying defocus to offset the aberration. We saw earlier that a simple defocus produces a linear term in the transverse aberration. Referring to Eq. (3.12) (#x10_c03_disp_0014), it is clear that defocus may be represented by a quadratic term. Equation (3.14) (#x10_c03_disp_0016) describes the OPD when some defocus has been added to the initial aberration.




(3.14) (#x10_x_10_i92)

An OPD fan with aberration plus balancing defocus is shown in Figure 3.9 (#x10_x_10_i104).

In this instance, the plot has a characteristic ‘W’ shape, with the curve in the vicinity of the origin dominated by the quadratic defocus term. As with the case for transverse aberration, the defocus can be optimised to produce the minimum possible OPD value when taken as a root mean squared value over the circular pupil. Again, using a weighting factor that is proportional to the pupil function, p, (to take account of the circular geometry), the mean squared OPD is given by:




(3.15) (#x10_x_10_i108)






Figure 3.8 (#x10_c03_para_0038) Quartic OPD fan.






Figure 3.9 (#x10_c03_para_0040) OPD fan with balancing defocus.



The above expression has a minimum where α = −¾. To understand the magnitude of this defocus, it is useful first to convert the new OPD expression into a transverse aberration using Eq. (3.12) (#x10_c03_disp_0014).




(3.16) (#x10_c03_para_0043)

From Eq. (3.16) (#x10_c03_disp_0018), it can be seen that the optimum defocus is 3/8 of the distance between the paraxial and marginal ray foci. This value is different to that derived for the optimisation of the transverse aberration itself. It should be understood that the optimisation of the transverse aberration and the OPD, although having the same ultimate purpose in minimising the aberration, nonetheless produce different results. Indeed, in the optimisation of optical designs, one is faced with a choice of minimising either the geometrical spot size (transverse aberration) or OPD in the form of rms WFE. The rationale behind this selection will be considered in later chapters when we examine measures of image quality, as applied to optical design.

The balanced defocus, as illustrated in Eq. (3.15) (#x10_c03_disp_0017) does significantly reduce the rms OPD. In fact, it reduces the OPD by a factor of four. Resultant rms values are set out in Eq. (3.17) (#x10_c03_disp_0019).




(3.17) (#x10_x_10_i108)




3.4 General Third Order Aberration Theory


Armed with a simple understanding of the basic concepts that lie behind the description of third order aberration, we can proceed to a more general and more powerful analysis. This analysis relies on a theoretical treatment of OPD as a measure of aberration. As pointed out earlier, although the lowest order aberration (beyond the paraxial approximation) has a fourth order dependence upon pupil function, this theory is still referred to as third order aberration theory. In the example we have hitherto considered, we have analysed an on axis object located at the infinite conjugate. For the more general treatment, we must consider off-axis objects with the chief ray having some non-zero field angle with respect to the optical axis. In addition, the object may have an arbitrary axial location and we must also consider the axial position of the pupil.

This third order theory is referred to as Gauss-Seidel aberration theory and is of general applicability to optical systems of arbitrary complexity. There is, however, one important constraint. The theory assumes that the entire geometry, component surfaces and so on, is circularly symmetric about the optical axis. In formulating the theory, we assume that the object presents a non-zero field angle, θ, with respect to the optic axis which is assumed to be oriented along the z axis. The chief ray is tilted by rotation about the x axis, so the object is offset from the optical axis in the y direction. The third order aberrations are to be expressed in terms of the field angle, θ, and the normalised pupil function, p. However, in this instance, because of the non-zero field angle, the rotational symmetry of the pupil is removed, so that separate x and y components of the pupil function, p


, p


, must be introduced.

The assumption in the Gauss-Seidel theory is that the underlying third order aberrations in a symmetrical optical system are themselves symmetrical and proportional to the fourth power of the pupil function. However, the finite field angle will effectively introduce an offset in the effective y component of the pupil function, Δp


, at some arbitrary optical component. This is illustrated in Figure 3.10 (#x10_x_10_i120) which shows generically how such an offset may be visualised.

What is suggested by Figure 3.10 (#x10_x_10_i120) is that if a co-ordinate transformation is applied in y that is proportional to the field angle, θ, then the ray fan can be made symmetrical about this new optical axis. That is to say, in Figure 3.10 (#x10_x_10_i120)b, any aberration generated would, in terms of OPD, simply be proportional to p


, with respect to the new axis. In arguing that the required offset is proportional to θ, rather than some other trigonometrical function, we are making an approximation based on linearization in θ. This is justified for third order analysis, since any error produced would only be visible in higher order aberration terms (than third order). In Figure (3.10) (#x10_x_10_i120), the pupil is shown at the optical surface under consideration. However, this is not a necessary condition; wherever the pupil is located a symmetrical ray fan may be produced by simple offset of the co-ordinate system in the Y axis.

Thus, by the argument presented here, any third order aberration may be represented by a pupil dependence of p


 augmented by a shift in the y component of the pupil function, Δp


 that is proportional to the field angle, θ. This is set out in Eq. (3.18) (#x10_c03_disp_0020), which describes the WFE, Φ, in terms of θ, p, and p


. From this point, we use WFE, rather than OPD as the key descriptor, as we are describing OPDs across the entire pupil. The offset pupil is now denoted by p′






Figure 3.10 (#x10_c03_para_0047) (a) Generic layout. (b) Layout with y co-ordinate transformation.






(3.18) (#x10_x_10_i116)

c is a constant of proportionality for the pupil offset.

Equation 3.18 (#x10_c03_disp_0020) may be expanded as follows:




(3.19) (#x10_x_10_i125)

Finally expanding Eq. (3.19) (#x10_c03_disp_0021) gives an expression for all third order aberrations:




(3.20) (#x10_c03_para_0053)

Equation (3.20) (#x10_c03_disp_0022) contains six distinct terms describing the WFE across the pupil. However, the final term, c


θ


, for a given field position, simply describes a constant offset in the optical path or phase of the rays originating from a particular point. That is to say, for a specific ray bundle, no OPD or violation of Fermat's principle could be ascribed to this term, when the difference with respect to the chief ray is calculated. Therefore, the final term in Eq. (3.20) (#x10_c03_disp_0022) cannot describe an optical aberration. We are thus left with five distinct terms describing third order aberration, each with a different dependence with respect to pupil function and field angle. These are the so called five third order Gauss-Seidel aberrations. Of course, in terms of the WFE dependence, all terms show a fourth order dependence with respect to a combination of pupil function and field angle. That is to say, the sum of the exponents in p and in θ must always sum to 4.




3.5 Gauss-Seidel Aberrations





3.5.1 Introduction


In this section we will describe each of the fundamental third order aberrations in turn. Re-iterating Eq. (3.20) (#x10_c03_disp_0022) below, it is possible to highlight each of the aberration terms:








We will now describe each of these five terms in turn.




3.5.2 Spherical Aberration


The first term, spherical aberration, has a simple fourth order dependence upon pupil function and no dependence upon field. This is illustrated in Eq. (3.21) (#x10_c03_disp_0024):




(3.21) (#x10_x_10_i137)

This aberration shows no dependence upon field angle and no dependence upon the orientation of the ray fan. Since, in the current analysis and for a non-zero field angle, the object is offset along the y axis, then the pupil orientation corresponding to p


 defines the tangential ray fan and the pupil orientation corresponding to p


 defines the sagittal ray fan. This is according to the nomenclature set out in Chapter 2 (#x9_x_9_i3). So, the aberration is entirely symmetric and independent of field angle. In fact, the opening discussion in this chapter was based upon an illustration of spherical aberration.

Spherical aberration characteristically produces a circular blur spot. The transverse aberration may, of course, be derived from Eq. (3.21) (#x10_c03_disp_0024) using Eq. (3.12) (#x10_c03_disp_0014). For completeness, this is re-iterated below:




(3.22) (#x10_c03_para_0059)

A 2D geometrical plot, of ray intersection at the paraxial focal plane, as produce by an evenly illuminated entrance pupil is referred to as a geometrical point spread function. Due to the symmetry of the aberration, this spot is circular. However, since the transformation in Eq. (3.22) (#x10_c03_disp_0025) is non-linear, the blur spot associated with spherical aberration is non uniform. For spherical aberration alone (no defocus or other aberrations), the density of the geometrical point spread function is inversely proportional to the pupil function, p. That is to say, spherical aberration manifests itself as a blur spot with a pronounced peak at the centre, with the density declining towards the periphery. This is illustrated in Figure 3.11 (#x10_x_10_i147). The spot, as shown in Figure 3.11 (#x10_x_10_i147), with a pronounced central maximum, is characteristic of spherical aberration and should be recognised as such by the optical designer.

As suggested earlier, the size of this spot can be minimised by moving away from the paraxial focus position. The ray fan and OPD fan for this aberration look like those illustrated in Figures 3.3 (#x10_x_10_i45) and 3.8 (#x10_x_10_i100). Overall, the characteristics of spherical aberration and the balancing of this aberration is very much as described in the treatment of generic third order aberration, as set out earlier.






Figure 3.11 (#x10_c03_para_0059) Geometrical spot associated with spherical aberration.






3.5.3 Coma


The second term, coma, has an WFE that is proportional to the field angle. Its pupil dependence is third order, but it is not symmetrical with respect to the pupil function. The WFE associated with coma is as below:




(3.23)

In the preceding discussions, the transverse aberration has been presented as a scalar quantity. This is not strictly true, as the ray position at the paraxial focus is strictly a vector quantity that can only be described completely by an x component, t


 and a y component t


. Equation (3.12) (#x10_c03_disp_0014) should strictly be rendered in the following vectorial form:




(3.24)

The transverse aberration relating to coma may thus be written out as:




(3.25) (#x10_x_10_i174)

From the perspective of both the OPD and ray fans the behaviour of the tangential (y) and sagittal ray fans are entirely different. As an optical designer, the reader should ultimately be familiar with the form of these fans and learn to recognise the characteristic third order aberrations. For a given field angle, the tangential OPD fan (p


 = 0) shows a cubic dependence upon pupil function, whereas, for the sagittal ray fan (p


 = 0), the OPD is zero. The OPD fan for coma is shown below in Figure 3.12 (#x10_x_10_i161).

The picture for the ray fans is a little more complicated. For both the tangential and sagittal ray fans, there is no component of transverse aberration in the x direction. On the other hand, for both ray fans, there is a quadratic dependence with respect to pupil function for the y component of the transverse aberration. The problem, in essence, it that transverse aberration is a vector quantity. However, when ray fans are computed for optical designs they are presented as scalar plots for each (tangential and sagittal) ray fan. The convention, therefore, is to plot only the y (tangential) component of the aberration in a tangential ray fan, and only the x (sagittal) component of the aberration in a sagittal ray fan. With this convention in mind, the tangential ray fan shows a quadratic variation with respect to pupil function, whereas there is no transverse aberration for the sagittal ray fan. Tangential and sagittal ray fan behaviour is shown in Figure 3.13 (#x10_x_10_i165) which shows relevant plots for coma.






Figure 3.12 (#x10_c03_para_0064) OPD fan for coma.






Figure 3.13 (#x10_c03_para_0065) Ray fan for coma.



Since the (vector) transverse aberration for coma is non-symmetric, the blur spot relating to coma has a distinct pattern. The blur spot is produced by filling the entrance pupil with an even distribution of rays and plotting their transverse aberration at the paraxial focus. If we imagine the pupil to be composed of a series of concentric rings from the centre to the periphery, these will produce a series of overlapping rings that are displaced in the y direction.

Figure 3.14 (#x10_x_10_i171) shows the characteristic geometrical point spread function associated with coma, clearly illustrating the overlapping circles corresponding to successive pupil rings. These overlapping rings produce a characteristic comet tail appearance from which the aberration derives its name. The overlapping circles produce two asymptotes, with a characteristic angle of 60°, as shown in Figure 3.14 (#x10_x_10_i171).






Figure 3.14 (#x10_c03_para_0067) Geometrical spot for coma.



To see how these overlapping circles are formed, we introduce an additional angle, the ray fan angle, φ, which describes the angle that the plane of the ray fan makes with respect to the y axis. For the tangential ray fan, this angle is zero. For the sagittal ray fan, this angle is 90°. We can now describe the individual components of the pupil function, p


 and p


 in terms of the magnitude of the pupil function, p, and the ray fan angle, φ:




(3.26)

From (3.25) (#x10_c03_disp_0028) we can express the transverse aberration components in terms of p and φ. This gives:




(3.27) (#x10_c03_para_0071)

A is a constant

It is clear from Eq. (3.27) (#x10_c03_disp_0030) that the pattern produced is a series of overlapping circles of radius A√2p


 offset in y by 2Ap


. Coma is not an aberration that can be ameliorated or balanced by defocus. When analysing transverse aberration, the impact of defocus is to produce an odd (anti-symmetrical) additional contribution with respect to pupil function. The transverse aberration produced by coma, is, of course, even with respect to pupil function, as shown in Figure 3.12 (#x10_x_10_i161). Therefore, any deviation from the paraxial focus will only increase the overall aberration.

Another important consideration with coma is the location of the geometrical spot centroid. This represents the mean ray position at the paraxial focus for an evenly illuminated entrance pupil taken with respect to the chief ray intersection. The centroid locations in x and y, C


, and C


, may be defined as follows.




(3.28)

By symmetry considerations, the coma centroid is not displaced in x, but it is displaced in y. Integrating over the whole of the pupil function, p (from 0 to 1) and allowing for a weighting proportional to p (the area of each ring), the centroid location in y, Cy may be derived from Eq. (3.27) (#x10_c03_disp_0030):




(3.29)

(the term cos2φ is ignored as its average is zero)

So, coma produces a spot centroid that is displaced in proportion to the field angle. The constant A is, of course, proportional to the field angle.




3.5.4 Field Curvature


The third Gauss-Seidel term produced is known as field curvature. The OPD associated with field curvature is second order in both field angle and pupil function. Furthermore, there is no dependence upon ray fan angle, as the WFE is circularly symmetric. Unlike in the case for coma, behaviour is identical for the tangential and sagittal ray fans.




(3.30) (#x10_c03_para_0077)

From Eq. (3.30) (#x10_c03_disp_0033), in the case of a single field point, the effect of a quadratic dependence of WFE on pupil function is to produce a uniform defocus. That is to say, a uniform defocus produces a characteristic quadratic pupil dependence in the WFE. The extent of this defocus is proportional to the square of the field angle, producing a curved surface which intersects the paraxial focal plane at zero field angle – the optical axis. If this field curvature were the only aberration, then this curved surface would produce a perfectly sharp image for all these field points. That is to say, with the presence of field curvature, the ideal focal surface is a curved surface or sphere rather than a plane. This is illustrated in Figure 3.15 (#x10_x_10_i193).

Figure 3.15 (#x10_x_10_i193) shows both the tangential and sagittal focal surfaces (S and T), with the optimum focal surface lying between the two. Ideally, for field curvature, the imaging surface should be curved, following the ideal focal surface. If, for instance, only a plane imaging surface is available, then this need not be located at the paraxial focus. This surface can, in principle, be located at an offset, such that the rms WFE is minimised across all fields. In calculating the rms WFE, this would be weighted according to area across all object space, as represented by a circle centred on the optical axis whose radius is the maximum object height.






Figure 3.15 (#x10_c03_para_0077) Field curvature.






Figure 3.16 (#x10_c03_para_0079) Ray fan plots illustrating field curvature.



Clearly, the OPD fan for field curvature is a series of parabolic curves whose height is proportional to the square of the field angle. There is no distinction between the sagittal and tangential fans. Similarly, the ray fans show a series of linear plots whose magnitude is also proportional to the square of the field angle. A series of ray fan plots for field curvature is shown in Figure 3.16 (#x10_x_10_i197).

In view of the symmetry associated with field curvature, the geometrical spot consists of a uniform blur spot whose size increases in proportion to the square of the field angle. In addition, this spot is centred on the chief ray; unlike in the case for coma, there is no centroid shift with respect to the chief ray.




3.5.5 Astigmatism


The fourth Gauss-Seidel term produced is known as astigmatism, literally meaning ‘without a spot’. Like field curvature, the WFE associated with astigmatism is second order in both field angle and pupil function. It differs from field curvature in that the WFE is non-symmetric and depends upon the ray fan angle as well as the magnitude of the pupil function. That is to say, the behaviour of the tangential and sagittal ray fans is markedly different.




(3.31)

In some respects, the OPD behaviour is similar to field curvature, in that, for a given ray fan, the quadratic dependence upon pupil function implies a uniform defocus. However, the degree of defocus is proportional to cos2φ. Thus, the defocus for the tangential ray fan (cos2φ = 1) and the sagittal ray fan (cos2φ = −1) are equal and opposite. Clearly, the tangential and sagittal foci are separate and displaced and this displacement is proportional to the square of the field angle. The displacement of the ray fan focus is set out in Eq. (3.32) (#x10_c03_disp_0035):




(3.32) (#x10_x_10_i204)

A is a constant

As suggested previously, for a given field angle, the OPD fan would be represented by a series of quadratic curves whose magnitude varies with the ray fan angle. Similarly, the ray fan itself is represented by a series of linear plots whose magnitude is dependent upon the ray fan angle. This is shown in Figure 3.17 (#x10_x_10_i213), which shows the ray fan for a given field angle for both the tangential and sagittal ray fans.

For a general ray, it is possible to calculate the two components of the transverse aberration as a function of the pupil co-ordinates.




(3.33) (#x10_c03_para_0086)






Figure 3.17 (#x10_c03_para_0084) Ray fan for astigmatism showing tangential and sagittal fans.






Figure 3.18 (#x10_c03_para_0086) Geometric spot vs. defocus for astigmatism.



According to Eq. (3.33) (#x10_c03_disp_0036), the blur spot produced by astigmatism (at the paraxial focus) is simply a uniform circular disc. Each point in the uniform pupil function simply maps onto a similar point on the blur spot, but with its x value reversed. However, when a uniform defocus is added, similar linear terms (in p) will be added to both t


 and t


, having both the same magnitude and sign. As a consequence, the relative magnitude of t


 and t


 will change producing a uniform elliptical pattern. Indeed, as mentioned earlier, there are distinct and separate tangential and sagittal foci. At these points, the blur spot is effectively transformed into a line, with the focus along one axis being perfect and the other axis in defocus. This is shown in Figure 3.18 (#x10_x_10_i217).

Due to the even (second order) dependence of OPD upon pupil function, there is no centroid shift evident for astigmatism. For Gauss-Seidel astigmatism, its magnitude is proportional to the square of the field angle. Thus, for an on-axis ray bundle (zero field angle) there can be no astigmatism. This Gauss-Seidel analysis, however, assumes all optical surfaces are circularly symmetric with respect to the optical axis. In the important case of the human eye, the validity of this assumption is broken by the fact that the shape of the human eye, and in particular the cornea, is not circularly symmetrical. The slight cylindrical asymmetry present in all real human eyes produces a small amount of astigmatism, even at zero field angle. That is to say, even for on-axis ray bundles, the tangential and sagittal foci are different for the human eye. For this reason, spectacle lenses for vision correction are generally required to compensate for astigmatism as well as defocus (i.e. short-sightedness or long-sightedness).




3.5.6 Distortion


The fifth and final Gauss-Seidel aberration term is distortion. The WFE associated with this aberration is third order in field angle, but linear in pupil function. In fact, a linear variation of WFE with pupil function implies a flat, but tilted wavefront surface. Therefore, distortion merely produces a tilted wavefront but without any apparent blurring of the spot. The WFE variation is set out in Eq. (3.34) (#x10_c03_disp_0037).




(3.34) (#x10_x_10_i222)

Thus, the only effect produced by distortion is a shift (in the y direction) in the geometric spot centroid; this shift is proportional to the cube of the field angle. However, this shift is global across the entire pupil, so the image remains entirely sharp. The shift is radial in direction, in the sense that the centroid shift is in the same plane (tangential) as the field offset. So, the OPD fan for the tangential fan is linear in pupil function and zero for the sagittal fan. The ray fan is zero for both tangential and sagittal fans, emphasising the lack of blurring.

Taken together with the linear (paraxial) magnification produced by a perfect Gaussian imaging system, distortion introduces another cubic term. That is to say, the relationship between the transverse image and object locations is no longer a linear one; magnification varies with field angle. If the height of the object is h


 and that of the image is h


, then the two quantities are related as follows:




(3.35) (#x10_c03_para_0093)






Figure 3.19 (#x10_c03_para_0093) (a) Pincushion (positive) distortion. (b) Barrel (negative) distortion.



M0is the paraxial magnification; ζ is a constant quantifying distortion

If we denote the x and y components of the object and image location by x


, y


 and x


, y


 respectively, then we obtain:




(3.36)

From Eq. (3.35) (#x10_c03_disp_0038), it is clear that an object represented by straight line that is offset from the optical axis in object space will be presented as a parabolic line in image space. As such, the image is clearly distorted. The sense and character of the distortion is governed by the sign and magnitude of ζ. This is shown in Figures 3.19 (#x10_x_10_i230)a,b.

Where ζ and the distortion is positive, the distortion is referred to as pincushion distortion, as suggested by the form shown in Figure 3.19 (#x10_x_10_i230)a. On the other hand, if ζ is negative, the resultant image is distended in a form suggested by Figure 3.19 (#x10_x_10_i230)b; this is referred to as barrel distortion.

Worked Example 3.1 The distortion of an optical system is given as a WFE by the expression, 4Φ


c


pcosφθ


, where Φ


 is equal to 50 μm and c = 1. The radius of the pupil, r


, is 10 mm. What is the distortion, expressed as a deviation in percent from the paraxial angle, at a field angle of 15°? From Eq. (3.12) (#x10_c03_disp_0014) and when expressed as an angle, the transverse aberration generated is given by:








The cosφ term expresses the fact that the direction of the transverse aberration is in the same plane as that of the object/axis. The proportional distortion is therefore given by:








(dimensions in mm; angles in radians)

The proportional distortion is therefore 0.13%.




3.6 Summary of Third Order Aberrations



At this stage it will be useful to summarise the five Gauss-Seidel aberrations in terms of the pupil and field dependence of their OPD and ray fans. It should be noted that for all Gauss-Seidel aberrations, the order of the pupil dependence and the order of the field angle dependence sum to four (for the OPD). In particular, it is important for the reader to understand how the different types of aberration vary with both pupil size and field angle. For example, in many optical systems, such as telescopes and microscopes, the range of field angles tend to be significantly smaller than the larger angles subtended to the pupil. Therefore, for such instruments, those aberrations with a higher-order pupil dependence, such as spherical aberration (4) and coma (3), will predominate.




3.6.1 OPD Dependence


The list below sets out the WFE dependence of the five Gauss-Seidel aberrations on pupil function, p, and field angle, θ.



● Spherical Aberration: ΦSA ∝ p




● Coma: ΦCO ∝ p


θ

● Field Curvature: ΦFC ∝ p


θ




● Astigmatism: ΦAS ∝ p


θ




● Distortion: ΦDI ∝ pθ





To quantify each aberration, we can define a coefficient, K, which describes the magnitude (in units of length) of the aberration. In addition, as well as normalising the pupil function, we can also normalise the field angle by introducing the quantity, h, which represents the ratio, θ/θ


, the ratio of the field angle to the maximum field angle.




(3.37)




(3.38)




(3.39)




(3.40) (#x10_x_10_i261)




(3.41)

The reader should take particular note of the form of Eq. (3.40) (#x10_c03_disp_0045). The description of astigmatism here is such that the mean defocus over all orientations of the ray fan is taken to be zero. However, other representations adopt the convention that the defocus is zero for the sagittal ray and the balance of the astigmatism is incorporated into the field curvature. That is to say, in these conventions, the astigmatism is taken to be proportional to cos


φ, rather than cos2φ, as in Eq. (3.40) (#x10_c03_disp_0045). Of course, in using cos


φ, an average defocus of the same form as field curvature is introduced, hence the reason for adopting the convention used here. If the field curvature and astigmatism were redefined according to that convention, then the following revised description would apply:




(3.42)




(3.43)




3.6.2 Transverse Aberration Dependence


The ray fan or transverse aberration dependence upon pupil function and field angle is such that the order of the two variables sum to three, as opposed to four for OPD. The dependence of transverse aberration is listed below:



● Spherical Aberration: tSA ∝ p




● Coma: tCO ∝ p


θ

● Field Curvature: tFC ∝ pθ




● Astigmatism: tAS ∝ pθ




● Distortion: ΦAS ∝ θ








3.6.3 General Representation of Aberration and Seidel Coefficients


The analysis presented in this chapter has demonstrated the power of using the OPD as a way of describing aberrations. More generally, when expressed as a WFE, it can be used to describe the deviation of a specific wavefront from an ideal wavefront that converges on a specific reference point. As such, this deviation can be used to describe defocus, which shows a quadratic dependence on pupil function and tilt, where the WFE is plane surface that is tilted about the x or y axis (the optical axis being the z axis). The standard representation for describing and quantifying generic WFE and aberration behaviour is shown in Eq. (3.44) (#x10_c03_disp_0049).




(3.44) (#x10_x_10_i275)

p is the pupil function and h is the object height (proportional to field angle θ); φ is the ray fan angle.

In the general term, W


, ‘a’ describes the order of the object height (field angle dependence), ‘b’ describes the order of the pupil function dependence and ‘c’ describes the dependence on the ray fan angle. The defocus and tilt, are of course paraxial terms. Overall, the dependence of each coefficient is given by Eq. (3.45) (#x10_c03_disp_0050):




(3.45) (#x10_x_10_i278)

It should be noted that this convention incorporates powers of cosφ, so the astigmatism term contains some average field curvature. Describing each of the aberration coefficients introduced earlier in terms of these coefficients gives the following:




(3.46)




(3.47)




(3.48) (#x10_c03_para_0108)




(3.49)




(3.50)

Another convention exists of which the reader should be aware. These are the so called Seidel coefficients, named after the nineteenth century mathematician, Phillip Ludwig von Seidel, who first elucidated the five monochromatic aberrations. The coefficients are usually denominated, S


, S


, S


, S


, and S


, referring to spherical aberration, coma, astigmatism, field curvature, and distortion. They nominally quantify the WFE, as the other coefficients do, but their magnitude is determined by the size of the blur spot that the aberration creates. The correspondence of these terms is as follows:




(3.51)




(3.52)




(3.53)




(3.54) (#x10_c03_para_0108)




(3.55)

The form of Eq. (3.54) (#x10_c03_disp_0059) is interesting. When compared to the definition of W


 in Eq. (3.48) (#x10_c03_disp_0053), an additional amount of astigmatism has been compounded with the field curvature. As such, this new representation of field curvature, S


 represents a fundamental and important property of an aberrated optical system and is referred to as the Petzval curvature. Its significance will be discussed more fully in the next chapter.

The treatment of aberrations, thus far, has been entirely generic. We have introduced the five Gauss-Seidel aberrations without specific reference to how they are generated at specific optical surfaces and by individual optical components. This will be discussed in detail in the next chapter. The most important feature of this treatment is that the third order aberrations are additive through a system when described in terms of OPD. That is to say, the five aberrations may be calculated independently at each optical surface and summed over the entire optical system. This analysis is an extremely powerful tool for characterisation of aberration in a complex system.




Further Reading


Born, M. and Wolf, E. (1999). Principles of Optics, 7e. Cambridge: Cambridge University Press. ISBN: 0-521-642221.

Hecht, E. (2017). Optics, 5e. Harlow: Pearson Education. ISBN: 978-0-1339-7722-6.

Kidger, M.J. (2001). Fundamental Optical Design. Bellingham: SPIE. ISBN: 0-81943915-0.

Kidger, M.J. (2004). Intermediate Optical Design. Bellingham: SPIE. ISBN: 978-0-8194-5217-7.

Longhurst, R.S. (1973). Geometrical and Physical Optics, 3e. London: Longmans. ISBN: 0-582-44099-8.

Mahajan, V.N. (1991). Aberration Theory Made Simple. Bellingham: SPIE. ISBN: 0-819-40536-1.

Mahajan, V.N. (1998). Optical Imaging and Aberrations: Part I. Ray Geometrical Optics. Bellingham: SPIE. ISBN: 0-8194-2515-X.

Mahajan, V.N. (2001). Optical Imaging and Aberrations: Part II. Wave Diffraction Optics. Bellingham: SPIE. ISBN: 0-8194-4135-X.

Slyusarev, G.G. (1984). Aberration and Optical Design Theory. Boca Raton: CRC Press. ISBN: 978-0852743577.

Smith, F.G. and Thompson, J.H. (1989). Optics, 2e. New York: Wiley. ISBN: 0-471-91538-1.




4

Aberration Theory and Chromatic Aberration





4.1 General Points


In the previous chapter, we developed a generalised description of third order aberration, introducing the five Gauss-Seidel aberrations. The motivation for this is to give the reader a fundamental understanding and a feel for the underlying principles. At the same time, it is fully appreciated that optical system design and detailed analysis of aberrations is underpinned by powerful optical software tools. Nevertheless, a grasp of the underlying principles, including an appreciation of the form of ray fans and optical path difference (OPD) fans, greatly facilitates the application of these sophisticated tools.

The treatment presented here is restricted to consideration of third order aberrations. Before the advent of powerful software analysis tools, the designer was compelled to resort to a much more elaborate and complex analysis, in particular introducing an analytical treatment of higher order aberrations. For all the labour that this would involve, the reader would gain little in terms of a useful understanding that could be applied to current design tools. As the third order aberrations are third order in transverse aberration and fourth order in OPD, so succeeding higher order aberrations are fifth, seventh etc. order in transverse aberration, but sixth, eighth order in OPD. That is to say, aberrations, whose order is expressed conventionally in terms of the transverse aberration, can only be odd. One can re-iterate the analysis of Section 3.4 (#c03_sec_0004) to generate the form and number of terms involved in the higher order aberrations. This is left to the reader, but it is straightforward to derive the number of distinct terms Nn as a function of aberration order, n:




(4.1)

In concentrating on third order aberrations, we shall, in the remainder of this chapter, seek to determine the impact of refractive surfaces, mirrors, and lenses on all the Gauss-Seidel aberrations. This analysis will proceed, initially, on the assumption that the surface in question lies at the pupil position. Subsequently, the impact of changing the position of the stop will be analysed. Manipulation of the stop position is an important variable in the optimisation of an optical design. The concept of the aplanatic geometry will be introduced where specific, simple optical geometries may be devised that are wholly free from either spherical aberration (SA) or coma (CO). These aplanatic building blocks feature in many practical designs and are significant because, in many instruments, such as telescopes and microscopes, there is a tendency for spherical aberration and coma to dominate the other aberrations. The elimination of spherical aberration and coma is thus a priority. Furthermore, by the same token, astigmatism (AS) and field curvature (FC) are more difficult to control. In particular, the control of field curvature is fundamentally limited by Petzval curvature, as alluded to in the previous chapter.






Figure 4.1 (#x11_c04_para_0004) Calculation of OPD for refractive surface.






4.2 Aberration Due to a Single Refractive Surface



The analysis of the aberrations of a single refractive surface is based on the computation of the OPD of a generalised field point to the appropriate order (4th) in terms of field angle, θ and ray height, r, at the pupil. For this analysis, we will assume that the pupil is located at the lens surface. In calculating the OPD, we force all rays to go to the paraxial focus and compute the OPD with respect to the chief. Figure 4.1 (#x11_x_11_i14) shows an object with a field angle, θ, located at a distance, u from a spherical refractive surface of radius R. It must be emphasised, in this instance, that this analysis applies specifically to a spherical surface. In this geometry, it is assumed that the object is displaced from the optical axis in the y direction. The paraxial image is itself located at a distance v from the surface and the position of a ray at the surface (and stop) is described by its components in x and y – h


 and h


.

The image in this case is the paraxial image and from the paraxial theory, the angle φ may be expressed in terms of θ as θ/n. To compute the optical path of a general ray as it passes from object to paraxial image, we need to define the ray co-ordinates at three points:


















The z co-ordinate of the stop position is derived from the binomial expansion for the axial sag of a sphere including terms up to the fourth power. In making this approximation, it is assumed that h is significantly less than R. If we were to adopt the paraxial approximation we would only consider the first r


 term in the expansion. In the case of third order aberration, we need to consider the next term. It is then very straightforward to calculate the total optical path, Φ, for a general ray in passing from object to paraxial image:




(4.2) (#x11_x_11_i24)

The two square root terms represent the optical path of two ‘legs’ of the journey, with the path through the glass adding a multiplicative factor of n. The next stage of the process is an extension of the paraxial theory. It is assumed that r


, r


, and uθ are all significantly less than u. We can now approximate Φ from Eq. (4.2) (#x11_c04_disp_0005) using the binomial theorem. In the meantime collecting terms we get:








Before deriving the third order aberration terms, we examine the paraxial contribution which contain terms in h up to order r2.




(4.3) (#x11_x_11_i30)

As one would expect, in the paraxial approximation, the optical path length is identical for all rays. However, for third order aberration, terms of up to order h


 must be considered. Expanding Eq. (4.2) (#x11_c04_disp_0005) to consider all relevant terms, we get:




(4.4) (#x11_x_11_i30)

Four of the five Gauss-Seidel terms are present – spherical aberration, coma, astigmatism, and field curvature. However, clearly there is no distortion. In fact, as will be seen later, distortion can only occur where the stop is not at the surface as it is here. Of course, Eq. (4.4) (#x11_c04_disp_0008) can be simplified if one considers that u, v, and R are dependent variables, as related in Eq. (4.3) (#x11_c04_disp_0007). Substituting v for u, and R, we can express the OPD in terms of u and R alone. Furthermore, it is useful, at this stage to split the OPD contributions in Eq. (4.4) (#x11_c04_disp_0008) into Spherical Aberration (SA), Coma (CO), Astigmatism (AS), and Field Curvature (FC). With a little algebraic manipulation this gives:




(4.5a) (#x11_x_11_i37)




(4.5b)




(4.5c) (#x11_x_11_i66)




(4.5d) (#x11_x_11_i37)




4.2.1 Aplanatic Points



It is worthwhile, at this juncture, to examine the four expressions in Eqs. (4.5a) (#x11_c04_disp_0009)–(4.5d) (#x11_c04_disp_0012) in some detail and, in particular, those for spherical aberration and coma. Before examining these expressions further, it is worthwhile to cast them in the form outlined in Chapter 3 (#x10_x_10_i3):




(4.6a) (#x11_c04_para_0012)




(4.6b) (#x11_c04_para_0012)






Figure 4.2 (#x11_c04_para_0012) Aplanatic points for refraction at single spherical surface.



There is a clear pattern in these expressions in that both spherical aberration and coma can be reduced to zero for specific values of the object distance, u. Examining Eqs. (4.6a) (#x11_c04_disp_0013) and (4.6b) (#x11_c04_disp_0014), it is evident that this condition is met where u = −R. That is to say, where the object is located at the centre of the spherical surface. However, this is a somewhat trivial condition where rays are undeviated by the surface and where the surface would not provide any useful additional refractive power to the system. Most significantly, another condition does exist for u = −(n + 1)R. Here, for this non-trivial case, both third order spherical aberration and coma are absent. This is the so-called aplanatic condition and the corresponding conjugate points are referred to as aplanatic points (Figure 4.2 (#x11_x_11_i43)). From Eq. (4.3) (#x11_c04_disp_0007) we can derive the image distance, v, as (n + 1)R/n. That is to say, the object is virtual and the image is real if R is positive and vice-versa if R is negative.

To be a little more rigorous, we might suppose that refractive index in object space is n


 and that in image space is n


. The location of the aplanatic points is then given by:




(4.7)

Fulfilment of the aplanatic condition is an important building block in the design of many optical systems and so is of great practical significance. As pointed out in the introduction, for those systems where the field angles are substantially less than the marginal ray angles, such as microscopes and telescopes, the elimination of spherical aberration and coma is of primary importance. Most significantly, not only does the aplanatic condition eliminate third order spherical aberration, but it also provides theoretically perfect imaging for on axis rays.




Worked Example 4.1 (#x11_c04_para_0089)Microscope Objective


The ‘front end’ of many high power microscope objectives exploits the principle of single surface aplanatic points through the use of a hyperhemisphere co-located with the object. The hyperhemisphere consists of a sphere that has been truncated at one of the aplanatic points which also coincides with the object location, as illustrated in Figure 4.3 (#x11_x_11_i56).

Using the hyperhemisphere, we wish to create a ×20 microscope objective for a standard optical tube length of 200 mm. In this example, it is assumed that two thirds of the optical power resides in the hyperhemisphere itself; other components collimate the beam. In other words:











Figure 4.3 (#x11_c04_para_0015) Hyperhemisphere objective.



The refractive index of the hyperhemisphere is 1.6. What is the radius, R, of the hyperhemisphere and what is its thickness?

For a tube length of 200 mm, a ×20 magnification corresponds to an objective focal length of 10 mm. As two thirds of the power resides in the hyperhemisphere, then the focal length of the hyperhemisphere must be 15 mm. Inspecting Figure 4.2 (#x11_x_11_i43), it is clear that the thickness of the hyperhemisphere is −R × (n + 1)/n, or −1.625 × R. To calculate the value of R, we set up a matrix for the system. The first matrix corresponds to refraction at the planar air/glass boundary, the second to translation to the spherical surface and the final matrix to the refraction at that surface. On this occasion, translation to the original reference is not included.








From the above matrix, the focal length is −R/0.6 and hence R = −9.0 mm. The thickness, t, we know is −1.625 × R and is 14.625. In this sign convention, R is negative, as the sense of its sag is opposite to the direction of travel from object to image space.

The (virtual) image is at (n + 1) × R from the sphere vertex or 2.6 × 9 = 23.4 mm.

In summary:









4.2.2 Astigmatism and Field Curvature


Unlike spherical aberration and coma, there is less scope for correction of astigmatism and field curvature. In Eqs. (4.5c) (#x11_c04_disp_0011) and (4.5d) (#x11_c04_disp_0012), astigmatism is corrected at the aplanatic point and field curvature at the radial points. However, the convention used in Eq. (4.5c) (#x11_c04_disp_0011) to describe astigmatic correction corresponds to zero sagittal ray defocus. On the other hand, using the alternative convention set out in Chapter 3 (#x10_x_10_i3) we have:




(4.8a) (#x11_x_11_i69)




(4.8b) (#x11_x_11_i71)

From Eq. (4.8a) (#x11_c04_disp_0019), it is evident that at the aplanatic condition where u = −(n + 1)R, the astigmatism vanishes, as does the spherical aberration and coma. It is interesting to see what might happen to the field curvature where this condition is fulfilled:




(4.9) (#x11_x_11_i77)

This is related to the Petzval field curvature, which, by definition, is the field curvature that arises when the astigmatism in the system is zero. Relating this to Eq. 4.8b (#x11_c04_disp_0020), then the field curvature may be expressed as:




(4.10)






Figure 4.4 (#x11_c04_para_0027) Field curvature for single refraction.



It is clear that Eq. (4.9) (#x11_c04_disp_0021) represents, with its quadratic dependence upon the pupil location, r, a degree of defocus, Δf, or longitudinal aberration, that is quadratic in the field angle. This defocus is given by:




(4.11)

The systematic field dependent defocus can be represented as a spherical surface where the each field point is in focus. The curvature of this surface, CPETZ and equivalent to 1/RPETZ where RPETZ is the Petzval Radius, is given by:




(4.12)

The sign is important, in that the Petzval curvature is in the opposite sense to that of the surface itself. This point is illustrated in Figure 4.4 (#x11_x_11_i76).

The most significant point about Petzval curvature is, in common with the underlying wavefront error, that it is additive through a system. To illustrate this, we might consider a system with N surfaces with radius of curvature R


. The material that follows each surface has a refractive index of n


. The Petzval curvature associated with the system is simply the sum of the individual curvatures and is referred to as the Petzval sum. This is given by:




(4.13) (#x11_c04_para_0029)

The practical implication of Eq. (4.13) (#x11_c04_disp_0025) is that if a system consists of elements with entirely positive or entirely negative focal power, then that system will always exhibit field curvature. To achieve a flat field, or a zero Petzval sum, then any positive optical elements must be balanced by negative elements elsewhere in the system.

It must be emphasised that the condition for perfect image formation on the Petzval surface applies specifically to the scenario where astigmatism has been removed.




4.3 Reflection from a Spherical Mirror


The third order analysis for a spherical mirror proceeds in very much the same way as the single refractive surface. That is to say, a ray is traced from the object location to the mirror and thence to the paraxial focus regardless as to whether the real ray actually terminates there. The general layout is shown in Figure 4.5 (#x11_x_11_i92). The sign convention used here is the same as applied to all previous analyses. That is to say, positive image distance is with the image to the right, and the image distance, as shown in Figure 4.5 (#x11_x_11_i92), is actually negative. However, it must be accepted that, as rays physically converge on this image point, then this image is actually real, despite v being negative. In addition, the same convention is applied to mirror curvature; the mirror depicted in Figure 4.5 (#x11_x_11_i92) has negative curvature.






Figure 4.5 (#x11_c04_para_0031) Reflection at spherical mirror.



The analysis proceeds as previously. Firstly, we set out the object and image positions and the ray intercept at the stop.


















The optical path is given by:




(4.14) (#x11_x_11_i101)

Rearranging:








In applying the binomial approximation, one needs to be careful with regard to the sign convention. It should be accepted that each of the square root terms in Eq. (4.14) (#x11_c04_disp_0029) is positive for a real object and real image. That is to say, all rays are physically traced to the appropriate location. In the case of a mirror surface, the definition of a real image corresponds to a negative image distance, u. Once again, we examine the paraxial terms




(4.15) (#x11_x_11_i105)

As for the refractive surface we expand Eq. (4.14) (#x11_c04_disp_0029) using the binomial theorem to give terms of the fourth order in OPD.




(4.16) (#x11_x_11_i105)

As with the refractive case, four of the five Gauss-Seidel terms are present – spherical aberration, coma, astigmatism, and field curvature. There is also no distortion. As previously, Eq. (4.16) (#x11_c04_disp_0032) can be simplified considering u, v, and R as dependent variables, as related in Eq. (4.15) (#x11_c04_disp_0031). We can, once more, express the OPD in terms of u and R alone. Splitting the OPD contributions in Eq. (4.16) (#x11_c04_disp_0032) into Spherical Aberration (SA), Coma (CO), Astigmatism (AS), and Field Curvature (FC) and with a little algebraic manipulation we have:




(4.17a) (#x11_x_11_i110)




(4.17b)




(4.17c) (#x11_x_11_i110)




(4.17d)

Equations (4.17a) (#x11_c04_disp_0033)–(4.17c) (#x11_c04_disp_0035) bear some striking similarities with respect to those for the refractive surface. In fact, if one substitutes n = −1 in the corresponding refractive formulae, one obtains expressions similar to those listed above. Thus, in some ways, a mirror behaves as a refractive surface with a refractive index of minus one. Once again, there are aplanatic points where both spherical aberration and coma are zero. This occurs only where both object and image are co-located at the centre of the spherical surface. The apparent absence of field curvature may appear somewhat surprising. However, the Petzval curvature is non-zero, as will be revealed. We can now cast all terms in the form set out in Chapter 3 (#x10_x_10_i3) and introduce the Lagrange invariant, which is equal to the product of r


 and θ


 (the maximum field angle):




(4.18a)




(4.18b)




(4.18c) (#x11_x_11_i115)




(4.18d) (#x11_x_11_i115)

The Petzval curvature is simply given by subtracting twice the KAS term in Eq. (4.18c) (#x11_c04_disp_0039) from the field curvature term in Eq. (4.18d) (#x11_c04_disp_0040). This gives:




(4.19)






Figure 4.6 (#x11_c04_para_0040) Petzval curvature for mirror.



In this instance, the Petzval surface has the same sense as that of the mirror itself. However, the radius of the Petzval surface is actually half that of the original surface. This is illustrated in Figure 4.6 (#x11_x_11_i120).

Calculation of the Petzval sum proceeds more or less as the refractive case. However, there is one important distinction in the case of a mirror system. For a system comprising N mirrors, each successive mirror surface inverts the sense of the wavefront error imparted by the previous mirrors.




(4.20)




4.4 Refraction Due to Optical Components





4.4.1 Flat Plate



Equations (4.5a) (#x11_c04_disp_0009)–(4.5d) (#x11_c04_disp_0012) give the Gauss-Seidel aberration terms for a spherical reflector. However, for a flat surface, where 1/R = 0, the aberration is non zero.




(4.21)

If we now make the approximation that r


/u∼NA


 and express all wavefront errors in terms of the normalised pupil function, we obtain the following expressions.











(4.22) (#x11_x_11_i133)

In all expressions, the wavefront error is proportional to the object distance. Equation 4.22 (#x11_c04_disp_0045) only considers refraction at a single surface. For a flat plate whose thickness is vanishingly small, it is clear that refraction at the second (glass-air) boundary will produce a wavefront error that is equal and opposite to that induced at the first surface. Furthermore, it is also clear that the form of wavefront error contribution will be identical to Eq. (4.22) (#x11_c04_disp_0045), but reversed in sign. For a glass plate of finite thickness, t, the effective object distance, expressed as the object distance in air, will be given by u + t/n. Therefore, the relevant wavefront error contributions at the second surface are given by:











(4.23)

The total wavefront error is then simply given by the sum of the two contributions. This is expressed in standard format, as below:




(4.24) (#x11_c04_para_0046)

The important conclusion here is that a flat plate will add to system aberration, unless the optical beam is collimated (object at infinite conjugate). This is of great practical significance in microscopy, as a thin flat plate, or ‘cover slip’ is often used to contain a specimen. A standard cover slip has a thickness, typically, of 0.17 mm. Examination of Eq. (4.24) (#x11_c04_disp_0048) suggests that this cover slip will add significantly to system aberration. In practice, it is the spherical aberration that is of the greatest concern, as θ


 is generally much smaller than NA


 in most practical applications. As a consequence, some microscope objectives are specifically designed for use with cover slips and have built in aberration that compensates for that of the cover slip. Naturally, a microscope objective designed for use with a cover slip will not produce satisfactory imaging when used without a cover slip.




Worked Example 4.2 Microscope Cover Slip


A microscope cover slip 0.17 mm thick is to be used with a microscope objective with a numerical aperture of 0.8. The refractive index of the cover slip is 1.5. What is the root mean square (rms) spherical aberration produced by the cover slip? The aberration is illustrated in Figure 4.7 (#x11_x_11_i148).

From Eq. (4.24) (#x11_c04_disp_0048):
















Figure 4.7 (#x11_c04_para_0047) Spherical aberration in cover slip.



Substituting the above values we get: Ksa = 0.003 22 mm or 3.2 μm.

The wavefront error (in microns) is thus given by:








where p is the normalised pupil function.








For reasons that will become apparent later, in practice, wavefront errors are usually expressed as a fraction of some standard wavelength, for example 589 nm. The above wavefront error represents about 0.4 × λ when expressed in this way. An rms wavefront error of about λ/14 is considered consistent with good image quality. This level of aberration is, therefore, significant and measures must be taken (within the objective) to correct for it.




4.4.2 Aberrations of a Thin Lens



We extend the treatment already outlined to analyse a thin lens. A thin lens can be considered as combination of two refractive surfaces, where the distance between the two surfaces is ignored. In practice, this is a reasonable assumption, provided the thickness is much less than the radii of the surfaces in question. Of course, the wavefront error produced by the two surfaces is simply the sum of the aberrations of the individual surfaces. A schematic for the analysis is shown in Figure 4.8 (#x11_x_11_i162).

The wavefront error contribution for the first surface is very easy to compute; it is simply that set out in Eqs. (4.5a) (#x11_c04_disp_0009)–(4.5d) (#x11_c04_disp_0012). To compute the contribution for the second surface, one can analyse this using the same methodology as in Section 4.2 (#c04_sec_0002), but exploiting natural symmetry. That is to say, one can analyse the second surface by rotating the whole surface about the y axis, such that z → −z and x → −x. In this event, for the second surface, R → −R


, u → v, θ → −θ. It is then simply a case of substituting these values into the formulae in Eqs. (4.5a) (#x11_c04_disp_0009)–(4.5d) (#x11_c04_disp_0012) and adding the wavefront error contribution of the first surface. The total wavefront error for the thin lens is then:






Figure 4.8 (#x11_c04_para_0052) Aberration analysis for thin lens.






(4.25a) (#x11_x_11_i169)




(4.25b)




(4.25c)




(4.25d) (#x11_x_11_i169)




4.4.2.1 Conjugate Parameter and Lens Shape Parameter


In terms of gaining some insight into the behaviour of a thin lens, the formulae in Eqs. (4.25a) (#x11_c04_disp_0053)–(4.25d) (#x11_c04_disp_0056) are a little opaque. It would be somehow useful to express the aberrations of a thin lens directly in terms of its focusing power and some other parameters. The first of these other parameters is the so called conjugate parameter, t. The conjugate parameter is defined as below:




(4.26) (#x11_x_11_i212)

As we are dealing with a thin lens, we can use the thin lens formula to calculate the focal length, f, of the lens:








This, in turn, leads to expressions for u and v:




(4.27) (#x11_x_11_i263)

Figure 4.9 (#x11_x_11_i179) illustrates the conjugate parameter schematically. The infinite conjugate is represented by a conjugate parameter of ±1. If the conjugate parameter is +1, then the image is at infinity. Conversely, a conjugate parameter of −1 is associated with an object located at the infinite conjugate. In the symmetric scenario where object and image distances are identical, then the conjugate parameter is zero. As illustrated in Figure 4.9 (#x11_x_11_i179), where the conjugate parameter is greater than 1, then the object is real and the image is virtual. Finally, where the conjugate parameter is less than −1, then the object is virtual and the image is real.






Figure 4.9 (#x11_c04_para_0057) Conjugate parameter.






Figure 4.10 (#x11_c04_para_0061) Coddington lens shape parameter.



We have thus described object and image location in terms of a single parameter. By analogy, it is also useful to describe a lens in terms of its focal power and a single parameter that describes the shape of the lens. The lens, of course, is assumed to be defined by two spherical surfaces, with radii R


 and R


, defining the first and second surfaces respectively. The shape of a lens is defined by the so-called Coddington lens shape factor, s, which is defined as follows:




(4.28)

As before, the power of the lens may be expressed in terms of the lens radii:








where n is the lens refractive index.

As with the conjugate parameter and the object and image distances, the two lens radii can be expressed in terms of the lens power and the shape factor, s.




(4.29) (#x11_x_11_i242)

Figure 4.10 (#x11_x_11_i183) illustrates the lens shape parameter for a series of lenses with positive focal power. For a symmetric, bi-convex lens, the shape factor is zero. In the case of a plano-convex lens, the shape factor is 1 where the plane surface faces the image and is −1 where the plane surface faces the object. A shape factor of greater than 1 or less than −1 corresponds to a meniscus lens. Here, both radii have the same sense, i.e. they are either both positive or both negative. For a shape parameter of greater than 1, the surface with the greater curvature faces the object and for a shape parameter of less than −1, the surface with the greater curvature faces the image. Of course, this applies to lenses with positive power. For (diverging) lenses with negative power, then the sign of the shape factor is opposite to that described here.




4.4.2.2 General Formulae for Aberration of Thin Lenses


Having parameterised the object and image distances and the lens radii in terms of the conjugate parameter, shape parameter, and lens power, we can recast the expressions in Eqs. (4.25a) (#x11_c04_disp_0053)–(4.25d) (#x11_c04_disp_0056) in a more generic form. With a little algebraic manipulation, we obtain the following expressions for the Gauss-Seidel aberration of a lens with the stop at the lens surface:




(4.30a) (#x11_c04_para_0067)




(4.30b) (#x11_x_11_i431)




(4.30c)




(4.30d) (#x11_c04_para_0067)

Again, casting all expressions in the form set out in Chapter 3 (#x10_x_10_i3), as for the expressions for the mirror we have




(4.31a) (#x11_x_11_i212)




(4.31b) (#x11_x_11_i217)




(4.31c) (#x11_x_11_i204)




(4.31d) (#x11_x_11_i204)

Once again, the Petzval curvature is simply given by subtracting twice the KAS term in Eq. (4.31c) (#x11_c04_disp_0069) from the field curvature term in Eq. (4.31d) (#x11_c04_disp_0070). This gives:




(4.32)

That is to say, a single lens will produce a Petzval surface whose radius of curvature is equal to the lens focal length multiplied by its refractive index. Once again, the Petzval sum may be invoked to give the Petzval curvature for a system of lenses:




(4.33)

It is important here to re-iterate the fact that for a system of lenses, it is impossible to eliminate Petzval curvature where all lenses have positive focal lengths. For a system with positive focal power, i.e. with a positive effective focal length, there must be some elements with negative power if one wishes to ‘flatten the field’.

Before considering the aberration behaviour of simple lenses in a little more detail, it is worth reflecting on some attributes of the formulae in Eqs. (4.30a) (#x11_c04_disp_0063)–(4.30d) (#x11_c04_disp_0066). Both spherical aberration and coma are dependent upon the lens shape and conjugate parameters. In the case of spherical aberration there are second order terms present for both shape and conjugate parameters, whereas the behaviour for coma is linear. However, the important point to recognise is that the field curvature and astigmatism are independent of both lens shape and conjugate parameter and only depend upon the lens power. Once again, it must be emphasised that this analysis applies only to the situation where the stop is situated at the lens.




4.4.2.3 Aberration Behaviour of a Thin Lens at Infinite Conjugate


We will now look at a simple special case to apply to a thin lens with the stop at the lens. This is the common situation where a lens is being used to focus an object located at the infinite conjugate, such as a telescope objective or a lens focusing a parallel laser beam. From Eq. (4.26) (#x11_c04_disp_0057), the conjugate parameter, t, is equal to −1. Substituting t = −1 into Eq. (4.31a) (#x11_c04_disp_0067) gives the spherical aberration as:




(4.34) (#x11_x_11_i214)

The important point to note about Eq. (4.34) (#x11_c04_disp_0073) is that the spherical aberration can never be equal to zero and that for a positive lens, KSA is always negative. This means that the longitudinal aberration for a positive lens is also negative and that, for all single lenses, more marginal rays are brought to a focus closer to the lens. Whilst Eq. (4.34) (#x11_c04_disp_0073) asserts that the spherical aberration in this case can never be zero, its magnitude can be minimised for a specific lens shape. Inspection of Eq. (4.34) (#x11_c04_disp_0073) reveals that this condition is met where:




(4.35) (#x11_x_11_i236)

This optimum shape factor corresponds to the so-called ‘best form singlet’ and is generally available from optical component suppliers, particularly with regard to applications in the focusing of laser beams. For a refractive index of 1.5, the optimum shape factor is around 0.7. This is close in shape to a plano-convex lens. However, it is important to emphasise, that optimum focusing is obtained where the more steeply curved surface is facing the infinite conjugate. Generally, also, where a plano-convex lens is used to focus a collimated beam, the curved surface should face the infinite conjugate. This behaviour is shown in Figure 4.11 (#x11_x_11_i225), which emphasises the quadratic dependence of spherical aberration on lens shape factor.

Coma for the infinite conjugate also depends upon the shape factor. However, in this instance, the dependence is linear. Once more, substituting t = −1 into Eq. (4.31b) (#x11_c04_disp_0068), we get:




(4.36)

Unlike in the case for spherical aberration, there exists a shape factor for which the coma is zero. This is simply given by:




(4.37)

For a refractive index of 1.5, this minimum condition is met for a shape factor of 0.8. This is similar, but not quite the same as the optimum for spherical aberration. Again, the most curved surface should face the infinite conjugate. Overall behaviour is illustrated in Figure 4.12 (#x11_x_11_i229).






Figure 4.11 (#x11_c04_para_0070) Spherical aberration vs. shape parameter for a thin lens.






Figure 4.12 (#x11_c04_para_0073) Coma vs lens shape for various conjugate parameters.



Once again, this specifically applies to the situation where the stop is at the lens surface. Of course, as stated previously, neither astigmatism nor field curvature are affected by shape or conjugate parameter.

Although it is impossible to reduce spherical aberration for a thin lens to zero at the infinite conjugate, it is possible for other conjugate values. In fact, the magnitude of the conjugate parameter must be greater than a certain specific value for this condition to be fulfilled. This magnitude is always greater than one for reasonable values of the refractive index and so either object or image must be virtual. It is easy to see from Eq. (4.31a) (#x11_c04_disp_0067) that this threshold value should be:




(4.38)

For n = 1.5, this threshold value is 4.58. That is to say for there to be a shape factor where the spherical aberration is reduced to zero, the conjugate parameter must either be less than −4.58 or greater than 4.58. Another point to note is that since spherical aberration exhibits a quadratic dependence on shape factor, where this condition is met, there are two values of the shape factor at which the spherical aberration is zero. This behaviour is set out in Figure 4.13 (#x11_x_11_i241) which shows spherical aberration as a function of shape factor for a number of difference conjugate parameters.




Worked Example 4.3 Best form Singlet



A thin lens is to be used to focus a Helium-Neon laser beam. The focal length of the lens is to be 20 mm and the lens is required to be ‘best form’ to minimise spherical aberration. The refractive index of the lens is 1.518 at the laser wavelength of 633 nm. Calculate the required shape factor and the radii of both lens surfaces. From Eq. (4.35) (#x11_c04_disp_0074) we have:











Figure 4.13 (#x11_c04_para_0076) Spherical aberration vs shape factor for various conjugate parameter values.



The optimum shape factor is 0.742 and we can use this to calculate both radii given knowledge of the required focal length. Rearranging Eq. (4.29) (#x11_c04_disp_0062) we have:













This gives:








It is the surface with the greatest curvature, i.e. R1, that should face the infinite conjugate (the parallel laser beam).




4.4.2.4 Aplanatic Points for a Thin Lens


Just as in the case of a single surface, it is possible to find a conjugate and lens shape pair that produce neither spherical aberration nor coma. For reasons outlined previously, it is not possible to eliminate astigmatism or field curvature for a lens of finite power. If the spherical aberration is to be zero, it must be clear that for the aplanatic condition to apply, then either the object or the image must be virtual. Equations (4.31a) (#x11_c04_disp_0067) and (4.31b) (#x11_c04_disp_0068) provide two conditions that uniquely determine the two parameters, s and t. Firstly, the requirement for coma to be zero clearly relates s and t in the following way:








Setting the spherical aberration to zero and substituting for t we have the following expression given entirely in terms of s:








and













Finally this gives the solution for s as:




(4.39a) (#x11_x_11_i275)

Accordingly the solution for t is




(4.39b)

Of course, since the equation for spherical aberration gives quadratic terms in s and t, it is not surprising that two solutions exist. Furthermore, it is important to recognise that the sign of t is the opposite to that of s. Referring to Figure 4.10 (#x11_x_11_i183), it is clear that the form of the lens is that of a meniscus. The two solutions for s correspond to a meniscus lens that has been inverted. Of course, the same applies to the conjugate parameter, so, in effect, the two solutions are identical, except the whole system has been inverted, swapping the object for image and vice-versa.

An aplanatic meniscus lens is an important building block in an optical design, in that it confers additional focusing power without incurring further spherical aberration or coma. This principle is illustrated in Figure 4.14 (#x11_x_11_i269) which shows a meniscus lens with positive focal power.

It is instructive, at this point to quantify the increase in system focal power provided by an aplanatic meniscus lens. Effectively, as illustrated in Figure 4.14 (#x11_x_11_i269), it increases the system numerical aperture in (minus) the ratio of the object and image distance. For the positive meniscus lens in Figure 4.14 (#x11_x_11_i269), the conjugate parameter is negative and equal to −(n + 1)/(n − 1). From Eq. (4.27) (#x11_c04_disp_0059) the ratio of the object and image distances is given by:








As previously set out, the increase in numerical aperture of an aplanatic meniscus lens is equal to minus the ratio of the object and image distances. Therefore, the aplanatic meniscus lens increases the system power by a factor equal to the refractive index of the lens. This principle is of practical consequence in many system designs. Of course, if we reverse the sense of Figure 4.14 (#x11_x_11_i269) and substitute the image for the object and vice versa, then the numerical aperture is effectively reduced by a factor of n.






Figure 4.14 (#x11_c04_para_0086) Aplanatic meniscus lens.






Worked Example 4.4 Microscope Objective – Hyperhemisphere Plus Meniscus Lens


We now wish to add some power to the microscope objective hyperhemisphere set out in Worked Example 4.1 (#x11_head_2_33). We are to do so with an extra meniscus lens situated at the vertex of the hyperhemisphere with a negligible separation. As with the hyperhemisphere, the meniscus lens is in the aplanatic arrangement. The meniscus lens is made of the same material as the hyperhemisphere, that is with a refractive index of 1.6. All properties of the hyperhemisphere are as set out in Worked Example 4.1 (#x11_head_2_33).

What are the radii of curvature of the meniscus lens and what is the location of the (virtual) image for the combined system? The system is as illustrated below.








We know from Worked Example 4.1 (#x11_head_2_33) that the original image distance produced by the hyperhemisphere is −23.4 mm. The object distance for the meniscus lens is thus 23.4 mm. From Eq. (4.39a) (#x11_c04_disp_0086) we have:








There remains the question of the choice of the sign for the conjugate parameter. If one refers to Figure 4.14 (#x11_x_11_i269), it is clear that the sense of the object and image location is reversed. In this case, therefore, the value of t is equal to +4.33 and the numerical aperture of the system is reduced by a factor of 1.6 (the refractive index). In that case, the image distance must be equal to minus 1.6 times the object distance. That is to say:








We can calculate the focal length of the lens from:








Therefore the focal length of the meniscus lens is 62.4 mm. If the conjugate parameter is +4.33, then the shape factor must be −(2n + 1), or −4.2 (note the sign). It is a simple matter to calculate the radii of the two surfaces from Eq. (4.29) (#x11_c04_disp_0062):













Finally, this gives R


 as −23.4 mm and R


 as −14.4 mm. The signs should be noted. This follows the convention that positive displacement follows the direction from object to image space.

If the microscope objective is ultimately to provide a collimated output – i.e. with the image at the infinite conjugate, the remainder of the optics must have a focal length of 37.44 mm (i.e. 23.4 × 1.6). This exercise illustrates the utility of relatively simple building blocks in more complex optical designs. This revised system has a focal length of 9 mm. However, the ‘remainder’ optics have a focal length of 37.4 mm, or only a quarter of the overall system power. Spherical aberration increases as the fourth power of the numerical aperture, so the ‘slower’ ‘remainder’ will intrinsically give rise to much less aberration and, as a consequence, much easier to design. The hyperhemisphere and meniscus lens combination confer much greater optical power to the system without any penalty in terms of spherical aberration and coma. Of course, in practice, the picture is complicated by chromatic aberration caused by variations in refractive properties of optical materials with wavelength. Nevertheless, the underlying principles outlined are very useful.




4.5 The Effect of Pupil Position on Element Aberration


In all previous analysis, it is assumed that the stop is located at the optical surface in question. This is a useful starting proposition. However, in practice, this is most usually not the case. With the stop located at a spherical surface, by definition, the chief ray will pass directly through the vertex of that surface. If, however, the surface is at some distance from the stop, then the chief ray will, in general, intersect the surface at some displacement from the surface vertex. This displacement is, in the first approximation, proportional to the field angle of the object in question. The general concept is illustrated in Figure 4.15 (#x11_x_11_i295).

Instead of the stop being located at the surface in question, the stop is displaced by a distance, s, from the surface. The chief ray, passing through the centre of the stop defines the field angle, θ. In addition, the pupil co-ordinates defined at the stop are denoted by r


 and r


. However, if the stop were located at the optical surface, then the field angle would be θ′, as opposed to θ. In addition, the pupil co-ordinates would be given by rx′ and ry′. Computing the revised third order aberrations proceeds upon the following lines. All the previous analysis, e.g. as per Eqs. (4.31a) (#x11_c04_disp_0067)–(4.31d) (#x11_c04_disp_0070), has enabled us to express all aberrations as an OPD in terms of θ′, rx′, and ry′. It is clear that to calculate the aberrations for the new stop locations, one must do so in terms of the new parameters θ, rx, and ry. This is done by effecting a simple linear transformation between the two sets of parameters. Referring to Figure 4.15 (#x11_x_11_i295), it is easy to see:




(4.40a)




(4.40b) (#x11_x_11_i297)






Figure 4.15 (#x11_c04_para_0097) Impact of stop movement.






(4.40c)

The effective size of the pupil at the optic is magnified by a quantity M


 and the pupil offset set out in Eq. (4.40b) (#x11_c04_disp_0095) is directly related to the eccentricity parameter, E, described in Chapter 2 (#x9_x_9_i3). Indeed, the product of the eccentricity parameter and the Lagrange invariant, H is simply equal to the ratio of the marginal and chief ray height at the pupil. That is to say:




(4.41)

In this case, r


 refers to the pupil radius at the stop and r


′ to the effective pupil radius at the surface in question. As a consequence, we can re-cast all three equations in a more convenient form.




(4.42) (#x11_x_11_i303)

The angle, θ


 is representative of the maximum system field angle and helps to define the eccentricity parameter and the Lagrange invariant. We already know the OPD when cast in terms of rx′, ry′, and θ, as this is as per the analysis for the case where the stop is at the optic itself. That is to say, the expression for the OPD is as given in Eqs. and these aberrations defined in terms of KSA′, KCO′, KAS′, KFC′, and KDI′. Therefore, the total OPD attributable to the five Gauss-Seidel aberrations is given by:




(4.43) (#x11_x_11_i303)

To determine the aberrations as expressed by the pupil co-ordinates for the new stop location, it is a simple matter of substituting Eq. (4.42) (#x11_c04_disp_0098) into Eq. (4.43) (#x11_c04_disp_0099). This results in the so-called stop shift equations:




(4.44a) (#x11_x_11_i310)




(4.44b)




(4.44c) (#x11_x_11_i310)




(4.44d) (#x11_x_11_i310)




(4.44e) (#x11_c04_para_0103)

What this set of equations reveals is that there exists a ‘hierarchy’ of aberrations. Spherical aberration may be transmuted into coma, astigmatism, field curvature, and distortion by shifting the stop position. Similarly, coma may be transformed into astigmatism, field curvature, and distortion and both astigmatism and field curvature may produce distortion. However, coma can never produce spherical aberration and neither astigmatism nor field curvature is capable of generating spherical aberration or coma. Equation (4.44e) (#x11_c04_disp_0104) reveals, for the first time, that it is possible to generate distortion by shifting the stop. Our previous idealised analysis clearly suggested that distortion is not produced where the lens or optical surface is located at the stop.

Another important conclusion relating to Eqs. (4.44a) (#x11_c04_disp_0100)–(4.44e) (#x11_c04_disp_0104) is the impact of stop shift on the astigmatism and field curvature. Inspection of Eqs. (4.44c) (#x11_c04_disp_0102) and (4.44d) (#x11_c04_disp_0103) reveals that the change in field curvature produced by stop shift is precisely double that of the change in astigmatism in all cases. Therefore, the Petzval curvature, which is given by KFC−2KAS remains unchanged by stop shift. This further serves to demonstrate the fact that the Petzval curvature is a fundamental system attribute and is unaffected by changes in stop location and, indeed component location. Petzval curvature only depends upon the system power. Thus, it is important to recognise that the quantity KFC−2KAS is preserved in any manipulation of existing components within a system. If we express the Petzval curvature in terms of the tangential and sagittal curvature we find:




(4.45) (#x11_c04_para_0105)

Since KPetz is not changed by any manipulation of component or stop positions, Eq. (4.45) (#x11_c04_disp_0105) implies that any change in the sagittal curvature is accompanied by a change three times as large in the tangential curvature. This is an important conclusion.

For small shifts in the position of the stop, the eccentricity parameter is proportional to that shift. Based on this and examining Eqs. (4.44a) (#x11_c04_disp_0100)–(4.44e) (#x11_c04_disp_0104), one can come to some general conclusions. For a system with pre-existing spherical aberration, additional coma will be produced in linear proportion to the stop shift. Similarly, the same spherical aberration will produce astigmatism and field curvature proportional to the square of the stop shift. The amount of distortion produced by pre-existing spherical aberration is proportional to the cube of the displacement. Naturally, for pre-existing coma, the additional astigmatism and field curvature produced is in proportion to the shift in the stop position. Additional distortion is produced according to the square of the stop shift. Finally, with pre-existing astigmatism and field curvature, only additional distortion may be produced in direct proportion to the stop shift.

As an example, a simple scenario is illustrated in Figure 4.16 (#x11_x_11_i321). This shows a symmetric system with a biconvex lens used to image an object in the 2f – 2f configuration. That is to say, the conjugate parameter is zero. In this situation, the coma may be expected, by virtue of symmetry, to be zero. For a simple lens, the distortion is also zero. The spherical aberration is, of course, non-zero, as are both the astigmatism and field curvature.

Using basic modelling software, it is possible to analyse the impact of small stop shifts on system aberration. The results are shown in Figure 4.17 (#x11_x_11_i325).

Clearly, according to Figure 4.17 (#x11_x_11_i325), the spherical aberration remains unchanged as predicted by Eq. (4.44a) (#x11_c04_disp_0100). For small shifts, the amount of coma produced is in proportion to the shift. Since there is no coma initially, the only aberration that can influence the astigmatism and field curvature is the pre-existing spherical aberration. As indicated in Eqs. (4.44c) (#x11_c04_disp_0102) and (4.44d) (#x11_c04_disp_0103), there should be a quadratic dependence of the astigmatism and field curvature on stop position. This is indeed borne out by the analysis in Figure 4.17 (#x11_x_11_i325). Similarly, the distortion shows a linear trend with stop position, mainly influenced by the initial astigmatism and field curvature that is present.

Although, in practice, these stop shift equations may not find direct use currently in optimising real designs, the underlying principles embodied are, nonetheless, important. Manipulation of the stop position is a key part in the optimisation of complex optical systems and, in particular, multi-element camera lenses. In these complex systems, the pupil is often situated between groups of lenses. In this case, the designer needs to be aware also of the potential for vignetting, should individual lens elements be incorrectly sized.






Figure 4.16 (#x11_c04_para_0107) Simple symmetric lens system with stop shift.






Figure 4.17 (#x11_c04_para_0108) Impact of stop shift for simple symmetric lens system.



The stop shift equations provide a general insight into the impact of stop position on aberration. Most significant is the hierarchy of aberrations. For example, no fundamental manipulation of spherical aberration may be accomplished by the manipulation of stop position. Otherwise, there some special circumstances it would be useful for the reader to be aware of. For example, in the case of a spherical mirror, with the object or image lying at the infinite conjugate, the placement of the stop at the mirror's centre of curvature altogether removes its contribution to coma and astigmatism; the reader may care to verify this.




4.6 Abbe Sine Condition


Long before the advent of powerful computer ray tracing models, there was a powerful incentive to develop simple rules of thumb to guide the optical design process. This was particularly true for the complex task of ameliorating system aberrations. Working in the nineteenth century, Ernst Abbe set out the Abbe sine condition, which directly relates the object and image space numerical apertures for a ‘perfect’, unaberrated system. Essentially, the Abbe sine condition articulates a specific requirement for a system to be free of spherical aberration and coma, i.e. aplanatic. The Abbe sine condition is expressed for an infinitesimal object and image height and its justification is illustrated in Figure 4.18 (#x11_x_11_i334).

In the representation in Figure 4.18 (#x11_x_11_i334) we trace a ray from the object to a point, P, located on a reference sphere whose centre lies on axis at the axial position of the object and whose vertex lies at the entrance pupil. At the same time, we also trace a marginal ray from the object location to the entrance pupil. The conjugate point to P, designated, P′, is located nominally at the exit pupil and on a sphere whose centre lies at the paraxial image location. For there to be perfect imaging, then the OPD associated with the passage of the marginal ray must be zero. Furthermore, the OPD of the ray from object to image must also be zero. It is also further assumed that the relative OPD of the object to image ray when compared to the marginal ray is zero on passage from points P to P′. This assumption is justified for an infinitesimal object height. Therefore, it is possible to compute the total object to image OPD by simply summing the path differences relative to the marginal ray between the object and point P and between the image and point P′. For there to be perfect imaging this difference must, of course be zero.






Figure 4.18 (#x11_c04_para_0112) Abbe sine condition.






(4.46) (#x11_x_11_i337)

n is the refractive index in object space and n′ is the refractive index in image space.

Equation 4.46 (#x11_c04_disp_0106) is one formulation of the Abbe sine condition which, nominally, applies for all values of θ and θ′, including paraxial angles. If we represent the relevant paraxial angles in object and image space as θ


 and θ


' then the Abbe sine condition may be rewritten as:




(4.47)

One specific scenario occurs where the object or image lies at the infinite conjugate. For example, one might imagine an object located on axis at the first focal point. In this case, the height of any ray within the collimated beam in image space is directly proportional to the numerical aperture associated with the input ray.

Figure 4.19 (#x11_x_11_i347) illustrates the application of the Abbe sine condition for a specific example. As highlighted previously, the sine condition effectively seeks out the aplanatic condition in an optical system. In this example, a meniscus lens is to be designed to fulfil the aplanatic condition. However, its conjugate parameter is adjusted around the ideal value and the spherical aberration and coma plotted as a function of the conjugate parameter. In addition, the departure from the Abbe sine condition is also plotted in the same way. All data is derived from detailed ray tracing and values thus derived are presented as relative values to fit reasonably into the graphical presentation. It is clear that elimination of spherical aberration and coma corresponds closely to the fulfilment of the Abbe sine condition.

The form of the Abbe sine condition set out in Eq. (4.46) (#x11_c04_disp_0106) is interesting. It may be compared directly to the Helmholtz equation which has a similar form. However, instead of a relationship based on the sine of the angle, the Helmholtz equation is defined by a relationship based on the tangent of the angle:








It is quite apparent that the two equations present something of a contradiction. The Helmholtz equation sets the condition for perfect imaging in an ideal system for all pairs of conjugates. However, the Abbe sine condition relates to aberration free imaging for a specific conjugate pair. This presents us with an important conclusion. It is clear that aberration free imaging for a specific conjugate (Abbe) fundamentally denies the possibility for perfect imaging across all conjugates (Helmholtz). Therefore, an optical system can only be designed to deliver aberration free imaging for one specific conjugate pair.






Figure 4.19 (#x11_c04_para_0116) Fulfilment of Abbe sine condition for aplanatic meniscus lens.






4.7 Chromatic Aberration





4.7.1 Chromatic Aberration and Optical Materials


Hitherto, we have only considered the classical monochromatic aberrations. At this point, we must introduce the phenomenon of chromatic aberration where imperfections in the imaging of an optical system are produced by significant variation in optical properties with wavelength. All optical materials are dispersive to some degree. That is to say, their refractive indices vary with wavelength. As a consequence, all first order properties of an optical system, such as the location of the cardinal points, vary with wavelength. Most particularly, the paraxial focal position of an optical system with dispersive components will vary with wavelength, as will its effective focal length. Therefore, for a given axial position in image space, only one wavelength can be in focus at any one time.

Dispersion is a property of transmissive optical materials, i.e. glasses. On the other hand, mirrors show no chromatic variation and their incorporation is favoured in systems where chromatic variation is particularly unwelcome. Such a system, where the optical properties do not vary with wavelength, is said to be achromatic. As argued previously, a mirror behaves as an optical material with a refractive index of minus one, a value that is, of course, independent of wavelength. In general, the tendency in most optical materials is for the refractive index to decrease with increasing wavelength. This behaviour is known as normal dispersion. In certain very specific situations, for certain materials at particular wavelengths, the refractive index actually decreases with wavelength; this phenomenon is known as anomalous dispersion.

Although dispersion is an issue of concern covering all wavelengths of interest from the ultraviolet to the infrared, for obvious reasons, historically, there has been particular focus on this issue within the visible portion of the spectrum. Across the visible spectrum, for typical glass materials, the refractive index variation might amount to 0.7–2.5%. This variation in the dispersive properties of different materials is significant, as it affords a means to reduce the impact of chromatic aberration as will be seen shortly. Figure 4.20 (#x11_x_11_i358) shows a typical dispersive plot, for the glass material, SCHOTT BK7®.






Figure 4.20 (#x11_c04_para_0121) Refractive index variation with wavelength for SCHOTT BK7 glass material.



Because of the historical importance of the visible spectrum, glass materials are typically characterised by their refractive properties across this portion of the spectrum. More specifically, glasses are catalogued in terms of their refractive indices at three wavelengths, nominally ‘blue’, ‘yellow’, and ‘red’. In practice, there are a number of different conventions for choosing these reference wavelengths, but the most commonly applied uses two hydrogen spectral lines – the ‘Balmer-beta’ line at 486.1 nm and the ‘Balmer-alpha’ line at 656.3, plus the sodium ‘D’ line at 589.3 nm. The refractive indices at these three standard wavelengths are symbolised as nF, nC, and nD respectively. At this point, we introduce the Abbe number, VD, which expresses a glass's dispersion by the ratio of its optical power to its dispersion:




(4.48) (#x11_c04_para_0123)

The numerator in Eq. (4.48) (#x11_c04_disp_0109) represents the effective optical or focusing power at the ‘yellow’ wavelength, whereas the denominator describes the dispersion of the glass as the difference between the ‘blue’ and the ‘red’ indices. It is important to recognise that the higher the Abbe number, then the less dispersive the glass, and vice versa. Abbe numbers vary, typically between about 20 and 80. Broadly speaking, these numbers express the ratio of the glass's focusing power to its dispersion. Hence, for a material with an Abbe number of 20, the focal length of a lens made from this material will differ by approximately 5% (1/20) between 486.1 and 656.3 nm.




4.7.2 Impact of Chromatic Aberration


The most obvious effect of chromatic aberration is that light is broad to a different focus for different wavelengths. This effect is known as longitudinal chromatic aberration and is illustrated in Figure 4.21 (#x11_x_11_i369).

As can be seen from Figure 4.21 (#x11_x_11_i369), light at the shorter, ‘blue’ wavelengths are focused closer to the lens, leading to an axial (longitudinal) shift in the paraxial focus for the different wavelengths. In summary, longitudinal chromatic aberration is associated with a shift in the paraxial focal position as a function of wavelength. Thus the effect of longitudinal chromatic aberration is to produce a blur spot or transverse aberration whose magnitude is directly proportional to the aperture size, but is independent of field angle. However, there are situations where, to all intents and purposes, all wavelengths share the same paraxial focal position, but the principal points are not co-located. That is to say, whilst all wavelengths are focused at a common point, the effective focal length corresponding to each wavelength is not identical. This scenario is illustrated in Figure 4.22 (#x11_x_11_i373).






Figure 4.21 (#x11_c04_para_0124) Longitudinal chromatic aberration.






Figure 4.22 (#x11_c04_para_0125) Transverse chromatic aberration.



The effect illustrated is known as transverse chromatic aberration or lateral colour. Whilst no distinct blurring is produced by this effect, the fact that different wavelengths have different focal lengths inevitably means that system magnification varies with wavelength. As a result, the final image size or height of a common object depends upon the wavelength. This produces distinct coloured fringing around an object and the size of the effect is proportional to the field angle, but independent of aperture size.

Hitherto, we have cast the effects of chromatic aberration in terms of transverse aberration. However, to understand the effect on the same basis as the Gauss-Seidel aberrations, it is useful to express chromatic aberration in terms of the OPD. When applied to a single lens, longitudinal chromatic aberration simply produces defocus that is equal to the focal length divided by the Abbe number. Therefore, the longitudinal chromatic aberration is given by:




(4.49a) (#x11_c04_para_0130)

f is the focal length of the lens and r the pupil position.






Figure 4.23 (#x11_x_11_i387) Huygens eyepiece.



Similarly, the transverse chromatic aberration can be expressed as an OPD:




(4.49b) (#x11_c04_para_0130)

Examining Eqs. (4.49a) (#x11_c04_disp_0110) and (4.49b) (#x11_c04_disp_0111) reveals that the ratio of transverse to longitudinal aberration is given by the ratio of the field angle to the numerical aperture. In practice, for optical elements, such as microscope and telescope objectives, the field angle is very much smaller than the numerical aperture and thus longitudinal chromatic aberration may be expected to predominate. For eyepieces, the opposite is often the case, so the imperative here is to correct lateral chromatic aberration.




Worked Example 4.5 Lateral Chromatic Aberration and the Huygens Eyepiece


A practical example of the correction of lateral chromatic aberration is in the Huygens eyepiece. This very simple, early, eyepiece uses two plano-convex lenses separated by a distance equivalent to half the sum of their focal lengths. This is illustrated in Figure 4.23 (#x11_x_11_i381).








Since we are determining the impact of lateral chromatic aberration, we are only interested in the effective focal length of the system comprising the two lenses. Using simple matrix analysis as described in Chapter 1 (#x8_x_8_i3), the system focal length is given by:








If we assume that both lenses are made of the same material, then their focal power will change as a function of wavelength by a common proportion, α. In that case, the system focal power at the new wavelength would be given by:








For small values of α, we can ignore terms of second order in α, so the change in system power may be approximated by:








The change in system power should be zero and this condition unambiguously sets the lens separation, d, for no lateral chromatic aberration:




(4.50)

If this condition is fulfilled, then the Huygens eyepiece will have no transverse chromatic aberration. However, it must be emphasised that this condition does not provide immunity from longitudinal chromatic aberration.






Figure 4.24 (#x11_c04_para_0137) Abbe diagram.






4.7.3 The Abbe Diagram for Glass Materials


For visible applications, the Abbe number for a glass is of equal practical importance as the refractive index itself. The Abbe diagram is a simple graphic tool that captures the basic refractive properties of a wide range of optical glasses. It comprises a simple 2D map with the horizontal axis corresponding to the Abbe number and the vertical axis to the glass index. A representative diagram is shown in Figure 4.24 (#x11_x_11_i401).

By referring to this diagram, the optical designer can make appropriate choices for specific applications in the visible. In particular, it helps select combinations of glasses leading to a substantially achromatic design. One special and key application is the achromatic doublet. This lens is composed of two elements, one positive and one negative. The positive lens is a high power (short focal length) element with low dispersion and the negative lens is a low power element with high dispersion. Materials are chosen in such a way that the net dispersion of the two elements cancel, but the powers do not. This will be considered in more detail in the next section.

The different zones highlighted in the Abbe diagram replicated in Figure 4.24 (#x11_x_11_i401) refer to the elemental composition of the glass. For example, ‘Ba’ refers to the presence of barium and ‘La’ to the presence of lanthanum. Originally, many of the dense, high index glasses used to contain lead, but these are being phased out due to environmental concerns. The Abbe diagram reveals a distinct geometrical profile with a tendency for high dispersion to correlate strongly with refractive index. In fact, it is the presence of absorption features within the glass (at very much shorter wavelengths) that give rise to the phenomenon of refraction and these features also contribute to dispersion.




4.7.4 The Achromatic Doublet


As introduced previously, the achromatic doublet is an extremely important building block in a transmissive (non-mirror) optical design. The function of an achromatic doublet is illustrated in Figure 4.25 (#x11_x_11_i413).






Figure 4.25 (#x11_c04_para_0140) The achromatic doublet.



The first element, often (on account of its shape) referred to as the ‘crown element’, is a high power positive lens with low dispersion. The second element is a low power negative lens with high dispersion. The focal lengths of the two elements are f


 and f


 respectively and their Abbe numbers V


 and V


. Since the intention is that the dispersions of the two elements should entirely cancel, this condition constrains the relative power of the two elements. Individually, the dispersion as measured by the difference in optical power between the red and blue wavelengths is proportional to the reciprocal of the focal power and the Abbe number for each element. Therefore:




(4.51) (#x11_x_11_i416)

From Eq. (4.51) (#x11_c04_disp_0117), it is clear that the ratio of the two focal lengths should be minus the inverse of the ratio of their respective Abbe numbers. In other words, the ratio of their powers should be minus the ratio of their Abbe numbers. The power of the system comprising the two lenses is, in the thin lens approximation, simply equal to the sum of their individual powers. Therefore, it is possible to calculate these individual focal lengths, f


 and f


, in terms of the desired system focal length of f:








Thus, the two focal lengths are simply given by:




(4.52) (#x11_x_11_i423)

In the thin lens approximation, therefore, light will be focused at the same point for the red and blue wavelengths. Consequentially, in this approximation, this system will be free from both longitudinal and transverse chromatic aberration. The simplicity of this approach may be illustrated in a straightforward worked example.




Worked Example 4.6 Simple Achromatic Doublet


We wish to construct and achromatic doublet with a focal length of 200 mm. The two glasses to be used are: SCHOTT N-BK7 for the positive crown lens and SCHOTT SF2 for the negative lens. Both these glasses feature on the Abbe diagram in Figure 4.24 (#x11_x_11_i401) and the Abbe number for these glasses are 64.17 and 33.85 respectively. The individual focal lengths may be calculated using Eq. (4.52) (#x11_c04_disp_0119):













Therefore, the focal length of the first ‘crown lens’ should be 94.5 mm and the focal length of the second diverging lens should be −179 mm.

Thus far, the analysis design of an achromatic doublet has been fairly elementary. In the previous worked example, we have constrained the focal lengths of the two lens elements to specific values. However, we are still free to choose the shape of each lens. That is to say, there are two further independent variables that can be adjusted. Achromatic doublets can either be cemented or air spaced. In the case of the cemented doublet, as presented in Figure 4.25 (#x11_x_11_i413), the second surface of the first lens must have the same radius as the first surface of the second lens. This provides an additional constraint; thus, for the cemented doublet, there is only one additional free variable to adjust. However, introduction of an air space between the two lenses removes this constraint and gives the designer an extra degree of freedom to play with. That said, the cemented doublet does offer greater robustness and reliability with respect to changes in alignment and finds very wide application as a standard optical component.

As a ‘stock component’ achromatic doublets are designed, generally, for the infinite conjugate. For cemented doublets, with the single additional degree of design freedom, these components are optimised to have zero spherical aberration at the central wavelength. This is an extremely important consideration, for not only are these doublets free of chromatic aberration, but they are also well optimised for other aberrations. Commercial doublets are thus extremely powerful optical components.




4.7.5 Optimisation of an Achromatic Doublet (Infinite Conjugate)


An air spaced achromatic doublet may be optimised to eliminate both spherical aberration and coma. The fundamental power of the wavefront approach in describing third order aberration is reflected in the ability to calculate the total system aberration as the sum of the aberration of the two lenses. In the thin lens approximation, we may simply use Eqs. (4.30a) (#x11_c04_disp_0063) and (4.30b) (#x11_c04_disp_0064) to express the spherical aberration and coma contribution for each lens element. We simply ascribe a variable shape parameter, s


 and s


 to each of the two lenses. The two conjugate parameters are fixed. In the particular case of a doublet designed for the infinite conjugate, the conjugate parameter for the first lens, t


, is −1. In the case of the second lens, the conjugate parameter, t


, is determined by the relative focal lengths of the two lenses and thus fixed by the ratio of the two Abbe numbers and, from Eq. (4.52) (#x11_c04_disp_0119), we get:




(4.53) (#x11_x_11_i438)

Without going through the algebra in detail, it is clear that having determined both t


 and t


, Eqs. (4.30a) (#x11_c04_disp_0063) and (4.30b) (#x11_c04_disp_0064) give us two expressions solely in terms of s


 and s


. These expressions for the spherical aberration and coma must be set to zero and can be solved for both s


 and s


. The important point to note about this procedure is that because Eq. 4.30a (#x11_c04_disp_0063) contains terms that are quadratic in shape factor, this is also reflected in the final solution. Therefore, in general, we might expect to find two solutions to the equation and this, in general, is true.




Worked Example 4.7 Detailed Design of 200 mm Focal Length Achromatic Doublet


At this point we illustrate the design of an air spaced achromat by looking more closely at the previous example where we analysed a 200 mm achromat design. We are to design an achromat with a focal length of 200 mm working at the infinite conjugate, using SCHOTT N-BK7 and SCHOTT SF2 as the two glasses, with the less dispersive N-BK7 used as the positive ‘crown’ element. Again, the Abbe numbers for these glasses are 64.17 and 33.85 respectively and the n


 values (refractive index at 589.6 nm) 1.5168 and 1.647 69. From the previous example, we know that focal lengths of the two lenses are:








The two conjugate parameters are straightforward to determine. The first conjugate parameter, t


, is naturally −1. Eq. (4.53) (#x11_c04_disp_0122) can be used to determine the second conjugate parameter, t


. This gives:








We now substitute the conjugate parameter values together with the refractive index values (ND) into Eq. (4.30a) (#x11_c04_disp_0063). We sum the contributions of the two lenses giving the total spherical aberration which we set to zero. Calculating all coefficients we get a quadratic equation in terms of the two shape factors, s


 and s


.




(4.54) (#x11_x_11_i444)

We now repeat the same process for Eq. (4.30b) (#x11_c04_disp_0064), setting the total system coma to zero. This time we get a linear equation involving s


 and s


.




(4.55) (#x11_x_11_i444)

Substituting Eq. (4.55) (#x11_c04_disp_0126) into Eq. (4.54) (#x11_c04_disp_0125) gives the desired quadratic equation:




(4.56) (#x11_x_11_i446)

There are, of course, two sets of solutions to Eq. (4.56) (#x11_c04_disp_0127), with the following values:



Solution 1: s


 = −0.194; s


 = 1.823

Solution 2: s


 = 3.198; s


 = 2.929


There now remains the question as to which of these two solutions to select. Using Eq. (4.29) (#x11_c04_disp_0062) to calculate the individual radii of curvature from the lens shapes and focal length we get:



Solution 1: R


 = 121.25 mm; R


 = −81.78 mm; R


81.29 mm; R


 = −281.88 mm

Solution 2: R


 = 23.26 mm; R


 = 44.43 mm; R


58.91 mm; R


 = −119.68 mm


The radii R


 and R


 refer to the first and second surfaces of lens 1 and R


 and R


 to the first and second surfaces of lens 2. It is clear that the first solution contains less steeply curved surfaces and is likely to be the better solution, particularly for relatively large apertures. In the case of the second solution, whilst the solution to the third order equations eliminates third order spherical aberration and coma, higher order aberrations are likely to be enhanced.

The first solution to this problem comes under the generic label of the Fraunhofer doublet, whereas the second is referred to as a Gauss doublet. It should be noted that for the Fraunhofer solution, R


 and R


 are almost identical. This means that should we constrain the two surfaces to have the same curvature (in the case of a cemented doublet) and just optimise for spherical aberration, then the solution will be close to that of the ideal aplanatic lens. To do this, we would simply use Eq. 4.29 (#x11_c04_disp_0062), forcing R


 and R


 to be equal and to replace Eq. 4.55 (#x11_c04_disp_0126) constraining the total coma, providing an alternative relation between s


 and s


. However, the fact that the cemented doublet is close to fulfilling the zero spherical aberration and coma condition further illustrates the usefulness of this simple component.

The analysis presented applies only strictly in the thin lens approximation. In practice, optimisation of a doublet such as presented in the previous example would be accomplished with the aid of ray tracing software. However, the insights gained by this exercise are particularly important. For instance, in carrying out a computer-based optimisation, it is critically important to understand that two solutions exist. Furthermore, in setting up a computer-based optimisation, an exercise, such as this, provides a useful ‘starting point’.




4.7.6 Secondary Colour


The previous analysis of the achromatic doublet provides a means of ameliorating the impact of glass dispersion and to provide correction at two wavelengths. In the case of the standard visible achromat, correction is provided at the F and C wavelengths, the two hydrogen lines at 486.1 and 656.3 nm. Unfortunately, however, this does not guarantee correction at other, intermediate wavelengths. If one views dispersion of optical materials as a ‘small signal’ problem, and that any difference in refractive index is small across the region of interest, then correction of the chromatic focal shift with a doublet may be regarded as a ‘linear process’. That is to say we might approximate the dispersion of an optical material by some pseudo-linear function of wavelength, ignoring higher order terms. However, by ignoring these higher order terms, some residual chromatic aberration remains. This effect is referred to as secondary colour. The effect is illustrated schematically in Figure 4.26 (#x11_x_11_i463) which shows the shift in focus as a function of wavelength.






Figure 4.26 (#x11_c04_para_0161) Secondary colour.



Figure 4.26 (#x11_x_11_i463) clearly shows the effect as a quadratic dependence in focal shift with wavelength, with the ‘red’ and ‘blue’ wavelengths in focus, but the central wavelength with significant defocus. In line with the notion that we are seeking to quantify a quadratic effect, we can define the partial dispersion coefficient, P, as:




(4.57)

If we measure the impact of secondary colour as the difference in focal length, Δf, between the ‘blue’ and ‘red’ and the ‘yellow’ focal lengths for an achromatic doublet corrected in the conventional way we get:




(4.58) (#x11_c04_para_0164)

where f is the lens focal length.

The secondary colour is thus proportional to the difference between the two partial dispersions. For simplicity, we have chosen to represent the partial dispersion in terms of the same set of wavelengths as used in the Abbe number. However, whilst the same central (n


) wavelength might be used, some wavelength other than the n


, hydrogen line might be chosen for the partial dispersion. Nevertheless, this does not alter the logic presented in Eq. (4.58) (#x11_c04_disp_0129). Correcting secondary colour is thus less straightforward when compared to the correction of primary colour. Unfortunately, in practice, there is a tendency for the partial dispersion to follow a linear relationship with the Abbe number, as illustrated in the partial dispersion diagram shown in Figure 4.27 (#x11_x_11_i478), illustrating the performance of a range of glasses.

Thus, in the case of the achromatic doublet, judicious choice of glass pairs can minimise secondary colour, but without eliminating it. In principle, secondary colour can be entirely corrected in a triplet system employing lenses of different materials. More formally, if we describe the three lenses as having focal powers of P


, P


, and P


, with the Abbe numbers represented as V


, V


, and V


 and the partial dispersions as, α


, α


, α


, then the lens powers may be uniquely determined from the following set of equations:




(4.59a) (#x11_c04_para_0166)




(4.59b)




(4.59c) (#x11_c04_para_0166)

As indicated previously, Figure 4.27 (#x11_x_11_i478) exemplifies the close link between primary and secondary dispersion, with a linear trend observed linking the partial dispersion and the Abbe number for most glasses. It is easy to demonstrate by presenting Eqs. (4.59a) (#x11_c04_disp_0130)–(4.59c) (#x11_c04_disp_0132) in matrix form that, if a wholly linear relationship exists between partial dispersion and Abbe number, then the matrix determinant will be zero. In this instance, a triplet solution is therefore impossible. Furthermore, the same analysis suggests that for a set of glasses lying close to a straight line on the partial dispersion plot will necessitate the deployment of lenses with very high countervailing powers. It is clear, therefore, that an optimum triplet design is afforded by selection of glasses that depart as far as possible from a straight-line plot on the partial dispersion diagram. In this context, the isolated group of glasses that appear in Figure 4.27 (#x11_x_11_i478), the fluorite glasses, are especially useful in correcting for secondary colour. These glasses lie particularly far from the general trend line for the ‘main series’ of glasses. Lenses which are corrected for both primary and secondary colour are referred to as apochromatic lenses. These lenses invariably incorporate fluorite glasses.






Figure 4.27 (#x11_c04_para_0164) Plot of partial dispersion against Abbe number.






4.7.7 Spherochromatism


In the previous analysis we learned that the basic design of simple doublet lenses allowed for the correction of both chromatic aberration and spherical aberration. Furthermore, this flexibility for correction could be extended to coma for an air spaced lens. However, since the refractive index of the two glasses in a doublet lens varies with wavelength, then inevitably, so does the spherical aberration. As such, spherical aberration can only be corrected at one wavelength, e.g. at the ‘D’ wavelength. This means that there will be some uncorrected spherical aberration at the extremes of the spectrum. This effect is known as spherochromatism. It is generally less significant in magnitude when compared with secondary colour.




4.8 Hierarchy of Aberrations


For some specific applications, such as telescope and microscope objective lenses, the field angles tend to be very much smaller than the angles associated with the system numerical aperture. In these instances, the off-axis aberrations, such as coma, are much less significant than the on-axis aberrations. Therefore, as far as the Gauss-Seidel aberrations are concerned, there exists a hierarchy of aberrations that can be placed in order of their significance or importance:



i. Spherical Aberration

ii. Coma

iii. Astigmatism and Field Curvature

iv. Distortion


That is to say, it is of the greatest importance to correct spherical aberration and then coma, followed by astigmatism, field curvature, and distortion. This emphasises the significance and use of aplanatic elements in optical design.

Of course, for certain optical systems, this logic is not applicable. For instance, in both camera lenses and in eyepieces, the field angles are very substantial and comparable to the angles associated with the numerical aperture. Indeed, in systems of this type, greater emphasis is placed upon the correction of astigmatism, field curvature, and distortion than in other systems.

With these comments in mind, it would be useful to summarise all the aberrations covered in this chapter and to classify them by virtue of their pupil and field angle dependence. Table 4.1 (#x11_x_11_i497) sets out the wavefront error dependence upon pupil and field angle for each of the aberrations.

It would be instructive, at this point, to take the example of the 200 mm doublet and to plot the wavefront aberrations attributable to some of the aberrations listed in Table 4.1 (#x11_x_11_i497) against numerical aperture. Spherochromatism is expressed as the difference in spherical aberration wavefront error between the n


 and n


 wavelengths (486.1 and 656.3 nm). Secondary colour is expressed as the wavefront error attributable to the difference in defocus between the n


 and n


 wavelengths (486.1 and 589.3 nm). A plot is shown in Figure 4.28 (#x11_x_11_i502).

It is clear that for the simple achromat under consideration, at least for modest lens apertures, the impact of secondary colour predominates. If a wavefront error of about 50 nm is consistent with ‘high quality’ imaging, then secondary colour has a significant impact for numerical apertures in excess of 0.05 or f#10. With numerical apertures in excess of 0.2 (f#2.5), higher order spherical aberration starts to make a significant contribution. On the other hand the effect of spherochromatism is more modest throughout. In this context, the impact of spherochromatism would only be a significant issue if secondary colour were first corrected.



Table 4.1 (#x11_c04_para_0171) Pupil and field dependence of principal aberrations.









Figure 4.28 (#x11_c04_para_0172) Contribution of different aberrations vs. numerical aperture for 200 mm achromat.



Of course, in practice, the design of such lens systems will be accomplished by means of ray tracing software or similar. Nonetheless, an understanding of the basic underlying principles involved in such a design would be useful in the initiation of any design process.




Further Reading


Born, M. and Wolf, E. (1999). Principles of Optics, 7e. Cambridge: Cambridge University Press. ISBN: 0-521-642221.

Hecht, E. (2017). Optics, 5e. Harlow: Pearson Education. ISBN: 978-0-1339-7722-6.

Kidger, M.J. (2001). Fundamental Optical Design. Bellingham: SPIE. ISBN: 0-81943915-0.

Kidger, M.J. (2004). Intermediate Optical Design. Bellingham: SPIE. ISBN: 978-0-8194-5217-7.

Longhurst, R.S. (1973). Geometrical and Physical Optics, 3e. London: Longmans. ISBN: 0-582-44099-8.

Mahajan, V.N. (1991). Aberration Theory Made Simple. Bellingham: SPIE. ISBN: 0-819-40536-1.

Mahajan, V.N. (1998). Optical Imaging and Aberrations: Part I. Ray Geometrical Optics. Bellingham: SPIE. ISBN: 0-8194-2515-X.

Mahajan, V.N. (2001). Optical Imaging and Aberrations: Part II. Wave Diffraction Optics. Bellingham: SPIE. ISBN: 0-8194-4135-X.

Slyusarev, G.G. (1984). Aberration and Optical Design Theory. Boca Raton: CRC Press. ISBN: 978-0852743577.

Smith, F.G. and Thompson, J.H. (1989). Optics, 2e. New York: Wiley. ISBN: 0-471-91538-1.

Welford, W.T. (1986). Aberrations of Optical Systems. Bristol: Adam Hilger. ISBN: 0-85274-564-8.




5

Aspheric Surfaces and Zernike Polynomials





5.1 Introduction


The previous chapters have provided a substantial grounding in geometrical optics and aberration theory that will provide the understanding required to tackle many design problems. However, there are two significant omissions.

Firstly all previous analysis, particularly with regard to aberration theory, has assumed the use of spherical surfaces. This, in part, forms part of a historical perspective, in that spherical surfaces are exceptionally easy to manufacture when compared to other forms and enjoy the most widespread use in practical applications. Modern design and manufacturing techniques have permitted the use of more exotic shapes. In particular, conic surfaces are used in a wide variety of modern designs.

The second significant omission is the use of Zernike circle polynomials in describing the mathematical form of wavefront error across a pupil. Zernike polynomials are an orthonormal set of polynomials that are bounded by a circular aperture and, as such, are closely matched to the geometry of a circular pupil. There are, of course, many different sets of orthonormal functions, the most well known being the Fourier series, which, in two dimensions, might be applied to a rectangular aperture. As the wavefront pattern associated with defocus forms one specific Zernike polynomial, the orthonormal property of the series means that all other terms are effectively optimised with respect to defocus. This topic was touched on in Chapter 3 (#x10_x_10_i3) when seeking to minimise the wavefront error associated with spherical aberration by providing balancing defocus. The optimised form that was derived effectively represents a Zernike polynomial.




5.2 Aspheric Surfaces





5.2.1 General Form of Aspheric Surfaces


In this discussion, we will restrict ourselves to surfaces that are symmetric about a central axis. Although more exotic surfaces are used, such symmetric surfaces predominate in practical applications. The most general embodiment of this type of surface is the so-called even asphere. Its general form is specified by its surface sag, z, which represents the axial displacement of the surface with respect to the axial position of the vertex, located at the axis of symmetry. The surface sag of an even asphere is given by the following formula:




(5.1) (#x12_x_12_i32)

c = 1/R is the surface curvature (R is the radius); k is the conic constant; αnis the even polynomial coefficient.

The curvature parameter, c, essentially describes the spherical radius of the surface. The conic constant, k, is a parameter that describes the shape of a conic surface. For k = 0, the surface is a sphere. More generally, the conic shapes are as set out in Table 5.1 (#x12_x_12_i20).



Table 5.1 (#x12_c05_para_0005) Form of conic surfaces.






Without the further addition of the even polynomial coefficients, α


, the surfaces are pure conics. Historically, the paraboloid, as a parabolic mirror shape, has found application as an objective in reflective telescopes. As will be seen subsequently, use of a parabolic mirror shape entirely eliminates spherical aberration for the infinite conjugate. The introduction of the even aspheric terms add further useful variables in optimisation of a design. However, this flexibility comes at the cost of an increase in manufacturing complexity and cost. Strictly speaking, at the first approximation, the terms, α


 and α


 are redundant for a general conic shape. Adding the conic term, k, to the surface prescription and optimising effectively allows local correction of the wavefront to the fourth order in r. In this context, the first two even polynomial terms are, to a significant degree, redundant.




5.2.2 Attributes of Conic Mirrors



There is one important attribute of conic surfaces that lies in their mathematical definition. To illustrate this, a section of an ellipsoid, i.e. an ellipse, is shown in Figure 5.1 (#x12_x_12_i30). An ellipse is defined by its two foci and has the property that a line drawn from one focus to any point on the ellipse and thence to the other focus has the same total length regardless of which point on the ellipse was included.

The ellipsoid is defined by its two foci, F


 and F


. In this instance, the shape of the ellipsoid is defined by its semi-major distance, a, and its semi-minor distance, b. As suggested, the key point about the ellipsoid shape sketched in Figure 5.1 (#x12_x_12_i30) is that the aggregate distance F


P + PF


 is always constant. By virtue of Fermat's principle, this inevitably implies that, since the optical path is the same in all cases, F


 and F


, from an optical perspective, represent perfect focal points with no aberration whatsoever generated by reflection from the ellipsoidal surface. In describing the ellipsoid above, it is useful to express it in terms of polar coordinates defined with respect to the focal points. If we label the distance F


P as d, then this distance may be expressed in the following way in terms of the polar angle, θ:






Figure 5.1 (#x12_c05_para_0007) Ellipsoid of revolution.






(5.2) (#x12_x_12_i34)

The parameter, ε, is the so-called eccentricity of the ellipse and is related to the conic parameter, k. In addition, the parameter, d


 is related to the base radius, R, as defined in the conic section formula in Eq. (5.1) (#x12_c05_disp_0001). The connection between the parameters is as set out in Eq. (5.3) (#x12_c05_disp_0003):




(5.3) (#x12_x_12_i32)

From the perspective of image formation, the two focal points, F


 and F


 represent the ideal object and image locations for this conic section. If x


 in Figure 5.1 (#x12_x_12_i30) represents the object distance u, i.e. the distance from the object to the nearest surface vertex, then it is also possible to calculate the distance, v, to the other focal point. These distances are presented below in the form of Eq. (5.2) (#x12_c05_disp_0002):




(5.4)

From the above, it is easy to calculate the conjugate parameter for this conjugate pair:








In fact, object and image conjugates are reversible, so the full solution for the conic constant is as in Eq. (5.5) (#x12_c05_disp_0006):




(5.5) (#x12_x_12_i38)

Thus, it is straightforward to demonstrate that for a conic section, there exists one pair of conjugates for which perfect image formation is possible. Of course, the most well known of these is where k = −1, which defines the paraboloidal shape. From Eq. (5.5) (#x12_c05_disp_0006), the corresponding conjugate parameter is −1 and relates to the infinite conjugate. This forms the basis of the paraboloidal mirror used widely (at the infinite conjugate) in reflecting telescopes and other imaging systems.

As for the spherical mirror, the effective focal length of the mirror remains the same as for the paraxial relationship:




(5.6) (#x12_x_12_i50)

More generally, the spherical aberration produced by a conic mirror is of a similar form as for the spherical mirror but with an offset:




(5.7)




Worked Example 5.1 Simple Mirror-Based Magnifier


We wish to construct a simple magnification system with a simple conic mirror. The system magnification is to be two and the object distance 100 mm. There is to be no on axis aberration. What is the prescription of the mirror, i.e. base radius and conic constant?

It is assumed that object and image are located the same side of the mirror, so that, in this context, the image distance is −200 mm. The overall set up is illustrated in the diagram:








The base radius of the conic mirror is very simple to calculate as it follows the simple paraxial formula, as replicated in Eq. (5.6) (#x12_c05_disp_0007):








This gives R = −133 mm.

We now need to calculate the conjugate parameter, t:








From Eq. (5.5) (#x12_c05_disp_0006) it is straightforward to see that k = −(1/t)


 and thus k = −0.1111. The shape is that of a slightly prolate ellipsoid.

The practical significance of a perfect on axis set up described in this example, is that it forms the basis of an ideal manufacturing test for such a conic surface. This will be described in more detail later in this text.




5.2.3 Conic Refracting Surfaces


There is no generic rule for conic refracting surfaces that generate perfect image formation for an arbitrary conjugate. However, there is a special condition for the infinite conjugate where perfect image formation results, as illustrated in Figure 5.2 (#x12_x_12_i66).

If the refractive index of the surface is n, assuming that the object is in air/vacuum, then the conic constant of the ideal surface is –n


. In fact, the shape is that of a hyperboloid. The abscissa of the hyperboloid effectively produce grazing incidence for rays originating from the object. By definition, therefore, the angle that the surface normal makes with the optical axis at the abscissa is equal to the critical angle. This restricts the maximum numerical aperture that can be collected by the system. With this constraint, it is clear that the maximum numerical aperture is equal to 1/n. In summary therefore:




(5.8)

Unfortunately, no other general condition for perfect image formation results for a conic surface. However, for perfect image correction, all orders of (on axis) aberration are corrected. Thus, although no condition for perfect image formation is possible, it is still possible, nevertheless, to correct for third order spherical aberration with a single refractive surface.






Figure 5.2 (#x12_c05_para_0024) Single refractive surface at infinite conjugate.






5.2.4 Optical Design Using Aspheric Surfaces


The preceding discussion largely focused on perfect imaging in specific and restricted circumstances. However, even where perfect imaging is not theoretically possible, aspheric surfaces are extremely useful in the correction of system aberrations with a minimum number of surfaces. For more general design problems, therefore, even asphere terms may be added to the surface prescription. With the stop located at a specific surface, adding aspheric terms to the form of that surface can only control the spherical aberration at that surface. One perspective on the form of a surface is that second order terms only add to the power of that surface, whereas fourth order terms control the third order (in transverse aberration) aberrations. The reasoning behind this assertion may be viewed a little more clearly by expanding the sag of a conic surface in terms of even polynomial terms:




(5.9) (#x12_x_12_i71)

Adding a conic term to the surface, in addition to defining the curvature of the surface by its base radius, effectively adds an independent term to Eq. (5.9) (#x12_c05_disp_0012), effectively controlling two polynomial orders in Eq. (5.9) (#x12_c05_disp_0012). To this extent, adding separate additional second order and fourth order terms to the even asphere expansion in Eq. (5.1) (#x12_c05_disp_0001) is redundant. From the perspective of controlling third order aberrations, Eq. (5.9) (#x12_c05_disp_0012) confirms the utility of a conic surface in adding a controlled amount of fourth order optical path difference (OPD) to the system. In fact, the amount of OPD added to the system, to fourth order, is simply given by the change in sag produced by the conic surface multiplied by the difference in refractive indices. If the refractive index of the first medium is n


, and that of the second medium, n


, then the change in OPD produced by introducing a conic parameter of k is given by:




(5.10) (#x12_c05_para_0029)

Equation (5.10) (#x12_c05_disp_0013) allows estimation of the spherical aberration produced by a conic surface introduced at the stop position. However, by virtue of the stop shift equations introduced in the previous chapter, providing fourth order sag terms at a surface remote from the stop not only influences spherical aberration, but also the other third order aberrations as well. In principle, therefore, by using aspheric surfaces, it is possible to eliminate all third order aberrations with fewer surfaces that would be possible with using just spherical surfaces alone. In fact, assuming that a system has been designed with zero Petzval curvature, it is only necessary to eliminate spherical aberration, coma, and astigmatism. Therefore, only three surfaces are strictly necessary. This represents a considerable improvement over a system employing only spherical surfaces. Notwithstanding the difficulties in manufacturing aspheric surfaces, some commercial camera systems are designed with this principal in mind.

Having introduced the underlying principles, it must be stated that design using aspheric surfaces is not especially amenable to analytical solution. In principle, of course, Eq. (5.10) (#x12_c05_disp_0013) could be used together with the relevant stop shift equations to compute analytically all third order aberrations. However, in practice, this is a rather cumbersome procedure and design of such systems proceeds largely by computer optimisation. Nevertheless, a clear understanding of the underlying principles is of invaluable help in the design process. An example, a simple two lens system, employing aspheric surfaces is sketched in Figure 5.3 (#x12_x_12_i79). This lens system replicates the performance of a three lens Cooke triplet with an aperture of f#5 and a field of view of 40°. Figure 5.3 (#x12_x_12_i79) is not intended to present a realistic and competitive design, but it merely illustrates the flexibility introduced by the incorporation of aspheric surfaces. In particular, it offers the potential to achieve the same performance with fewer surfaces.

Whilst aspheric components represent a significant enhancement to the toolkit of an optical designer, they represent something of a headache to the component manufacturer. As will be revealed later, in general, aspheric components are more difficult to manufacture and test and hence more costly. As such, their use is restricted to those situations where the advantage provided is especially salient. At the same time, advanced manufacturing techniques have facilitated the production of aspheric surfaces and their application in relatively commonplace designs, such as digital cameras, is becoming a little more widespread. Of course, the presence of conic and aspheric surfaces in large reflecting telescope designs is, by comparison, relatively well established.






Figure 5.3 (#x12_c05_para_0030) Simple two lens system employing aspheric components.






5.3 Zernike Polynomials





5.3.1 Introduction


In describing wavefront aberrations at any surface in a system, it is convenient to do so by expressing their value in terms of the two components of normalised pupil functions P


 and P


. Where the magnitude of the pupil function is equal to unity, this describes the position of a ray at the edge of the pupil. With this description in mind, we now proceed to describe the normalised pupil position in terms of the polar co-ordinates, ρ and θ. This is illustrated in Figure 5.4 (#x12_x_12_i88).






Figure 5.4 (#x12_c05_para_0032) Polar pupil coordinates.



The wavefront error across the pupil can now be expressed in terms of ρ and θ. What we are seeking is a set of polynomials that is orthonormal across the circular pupil described. Any continuous function may be represented in terms of this set of polynomials as follows:




(5.11) (#x12_x_12_i94)

The individual polynomials are described by the term f


(ρ,θ), and their magnitude by the coefficient, A


. The property of orthonormality is significant and may be represented in the following way:




(5.12) (#x12_x_12_i94)

The symbol, δ


 is the Kronecker delta. That is to say, when i and j are identical, i.e. the two polynomials in the integral are identical, then the integral is exactly one. Otherwise, if the two polynomials in the integral are different, then the integral is zero. The first property is that of normality, i.e. the polynomials have been normalised to one and the second is that of orthogonality, hence their designation as an orthonormal polynomial set.

Equations (5.11) (#x12_c05_disp_0014) and (5.12) (#x12_c05_disp_0015) give rise to a number of important properties of these polynomials. Initially we might be presented with a problem as to how to represent a known but arbitrary wavefront error, Φ(ρ,θ) in terms of the orthonormal series presented in Eq. (5.11) (#x12_c05_disp_0014). For example, this arbitrary wavefront error may have been computed as part of the design and analysis of a complex optical system. The question that remains is how to calculate the individual polynomial coefficients A


. To calculate an individual term, one simply takes the cross integral of the function, Φ(ρ,θ), with respect to an individual polynomial, f


(ρ, θ):








By definition we have:




(5.13) (#x12_x_12_i98)

So, any coefficient may be determined from the integral presented in Eq. (5.13) (#x12_c05_disp_0017). The coefficients, A


, clearly express, in some way, the magnitude of the contribution of each polynomial term to the general wavefront error. In fact, the magnitude of each component, A


, represents the root mean square (rms) contribution of that component. More specifically, the total rms wavefront error is given by the square root of the sum of the squares of the individual coefficients. That this is so is clearly evident from the orthonormal property of the series:




(5.14) (#x12_c05_para_0056)




5.3.2 Form of Zernike Polynomials


Following this general discussion about the useful properties of orthonormal functions, we can move on to a description of the Zernike circle polynomials themselves. They were initially investigated and described by Fritz Zernike in 1934 and are admirably suited to a solution space defined by a circular pupil. We will suppose initially, that the polynomial may be described by a component, R(ρ), that is dependent exclusively upon the normalised pupil radius and a component G(φ) that is dependent upon the polar angle, φ. That is to say:




(5.15)

We can make the further assumption that R(ρ) may be represented by a polynomial series in ρ. The form of G(φ) is easy to deduce. For physically realistic solutions, G(φ) must repeat identically every 2π radians. Therefore G(φ) must be represented by a periodic function of the form:




(5.16) (#x12_c05_para_0051)

where m is an integer

This part of the Zernike polynomial clearly conforms to the desired form, since not only does it have the desired periodicity, but it also possesses the desired orthogonality. The parameter, m, represents the angular frequency of the polar dependence.

Having dealt with the polar part of the Zernike polynomial, we turn to the radial portion, R(ρ). The radial part of the Zernike polynomial, R(ρ), comprises of a series of polynomials in ρ. The form of these polynomials, R(ρ), depends upon the angular parameter, m, and the maximum radial order of the polynomial, n. Furthermore, considerations of symmetry dictate that the Zernike polynomials must either be wholly symmetric or anti-symmetric about the centre. That is to say, the operation r → −r is equivalent to φ → φ + π. For the Zernike polynomial to be equivalent for both (identical) transformations, for even values of m, only even polynomials terms can be accepted for R(ρ). Similarly, exclusively odd polynomial terms are associated with odd values of m.

Overall, the entirety of the set of Zernike polynomials are continuous and may be represented in powers of Px and Py or ρcos(φ) and ρsin(φ). It is not possible to construct trigonometric expressions of order, m, i.e. cos(mφ) and ρsin(mφ) where the order of the corresponding polynomial is less than m. Therefore, the polynomial, R(ρ), cannot contain terms in ρ that are of lower order than the angular parameter, m.

To describe each polynomial, R(ρ), it is customary to define it in terms of the maximum order of the polynomial, n, and the angular parameter, m. For all values of m (and n), the polynomial, R(ρ), may be expressed as per Eq. (5.17) (#x12_c05_disp_0021).




(5.17) (#x12_x_12_i110)

Cn,m,irepresents the value of a specific coefficient

The parameter, Nn,m, is a normalisation factor. Of course, any arbitrary scaling factor may be applied to the coefficients, Cn,m,i, provided it is compensated by the normalisation factor. By convention, the base polynomial has a value of unity for ρ = 1. Of course, with this in mind, the purpose of the normalisation factor is to ensure that, in all cases, the rms value of the polynomial is normalised to one. It now remains only to calculate the values of the coefficients, Cn,m,i. These are determined from the condition of orthogonality which applies separately for Rn,m(ρ) and may be set out as follows:




(5.18) (#x12_x_12_i115)

The general formula for the coefficients Cn,m,i is set out in Eq. (5.18) (#x12_c05_disp_0022).

(5.19)




For i = n = 0, the value of the coefficient, Cn,m,i, as prescribed for the piston term, is unity. The value of the normalisation factor, Nn,m, is given in Eq. (5.20) (#x12_c05_disp_0024).




(5.20) (#x12_x_12_i117)

More completely we can express the entire polynomial:




(5.21a)




(5.21b)

The parameter, m, can take on positive or negative values as can be seen from Eq. (5.16) (#x12_c05_disp_0020). Of course, Eq. (5.16) (#x12_c05_disp_0020) gives the complex trigonometric form. However, by convention, negative values for the parameter m are ascribed to terms involving sin(mφ), whilst positive values are ascribed to terms involving cos(mφ).

Zernike polynomials are widely used in the analysis of optical system aberrations. Because of the fundamental nature of these polynomials, all the Gauss-Seidel wavefront aberrations clearly map onto specific Zernike polynomials. For example, spherical aberration has no polar angle dependence, but does have a fourth order dependence upon pupil function. This suggests that this aberration has a radial order, n, of 4 and a polar dependence, m, of zero. Similarly, coma has a radial order of 3 and a polar dependence of one. Table 5.2 (#x12_x_12_i136) provides a list of the first 28 Zernike polynomials.

In Table 5.2 (#x12_x_12_i136), each Zernike polynomial has been assigned a unique number. This is the ‘Standard’ numbering convention adopted by the American National Standards Institute, (ANSI). It has the benefit of following the Born and Wolf notation logically, starting from the piston term which is denominated the zeroth term. If the ANSI number is represented as j, and the Born and Wolf indices as n, m, then the ANSI number may be derived as follows:




(5.22)

Unfortunately, a variety of different numbering conventions prevail, leading to significant confusion. This will be explored a little later in this chapter. As a consequence of this, the reader is advised to be cautious in applying any single digit numbering convention to Zernike polynomials. By contrast, the n, m numbering convention used by Born and Wolf is unambiguous and should be used where there is any possibility of confusion.




5.3.3 Zernike Polynomials and Aberration


As outlined previously, there is a strong connection between Zernike polynomials and primary aberrations when expressed in terms of wavefront error. Table 5.2 (#x12_x_12_i136) clearly shows the correspondence between the polynomials and the Gauss Seidel aberrations, with the 3rd order Gauss-Seidel aberrations, such as spherical aberration and coma clearly visible.

The power of the Zernike polynomials, as an orthonormal set, lies in their ability to represent any arbitrary wavefront aberration. Using the approach set out in Eq. (5.13) (#x12_c05_disp_0017), it is possible to compute the magnitude of any Zernike term by the cross integral of the relevant polynomial and the wavefront disturbance. Furthermore, the total root mean square (rms) wavefront error, as per Eq. (5.14) (#x12_c05_disp_0018), may be calculated from the RSS (root sum square) of the individual Zernike magnitudes. That is to say, the Zernike magnitude of each term represents its contribution to the rms wavefront error, as averaged over the whole pupil.

The use of defocus to compensate spherical aberration was explored in Chapters 3 (#x10_x_10_i3) and 4 (#x11_x_11_i3). In this instance, for a given amount of fourth order wavefront error, we sought to minimise the rms wavefront error by applying a small amount of defocus.








Hence, without defocus, adjustment, the raw spherical aberration produced in a system may be expressed as the sum of three Zernike terms, one spherical aberration, one defocus and one piston term. The total aberration for an uncompensated system is simply given by the RSS of the individual terms. However, for a compensated system only the Zernike n = 4, m = 0 term needs be considered. This then gives the following fundamental relationship:



Table 5.2 (#x12_c05_para_0052) First 28 Zernike polynomials.









(5.23) (#x12_c05_para_0068)

The rms wavefront error has thus been reduced by a factor of six by the focus compensation process. Furthermore, this analysis feeds in to the discussion in Chapter 3 (#x10_x_10_i3) on the use of balancing aberrations to minimise wavefront error. For example, if we have succeeded in eliminating third order spherical aberration and are presented with residual fifth order spherical aberration, we can minimise the rms wavefront error by balancing this aberration with a small amount of third order aberration in addition to defocus. Analysis using Zernike polynomials is extremely useful in resolving this problem:








As previously outlined, the uncompensated rms wavefront error may be calculated from the RSS sum of all the four Zernike terms. Naturally, for the compensated system, we need only consider the first term.

(5.24)




For the fifth order spherical aberration, the rms wavefront error has been reduced by a factor of 20 through the process of aberration balancing. In terms of the practical application of this process, one might wish to optimise an optical design by minimising the rms wavefront error. Although, in practice, the process of optimisation will be carried out using software tools, nonetheless, it is useful to recognise some key features of an optimised design. By virtue of the previous example, optimisation of spherical aberration should lead to an OPD profile that is close to the 5th order Zernike term. This is shown in Figure 5.5 (#x12_x_12_i148) which illustrates the profile of an optimised OPD based entirely on the relevant fifth order Zernike term. The graph plots the nominal OPD again the normalised pupil function with the form given by the Zernike polynomial, n = 6, m = 0.

In the optimisation of an optical design it is important to understand the form of the OPD fan displayed in Figure 5.5 (#x12_x_12_i148) in order recognise the desired endpoint of the optimisation process. It displays three minima and two maxima (or vice versa), whereas the unoptimised OPD fan has one fewer maximum and minimum. Thus, although the design optimisation process itself might be computer based, nevertheless, understanding and recognising the how the process works and its end goal will be of great practical use. That is to say, as the computer-based optimisation proceeds, on might expect the OPD fan to acquire a greater number of maxima and minima.






Figure 5.5 (#x12_c05_para_0061) Fifth order Zernike polynomial and aberration balancing.



One can apply the same analysis to all the Gauss-Seidel aberrations and calculate its associated rms wavefront error.




(5.25a) (#x12_c05_para_0064)




(5.25b) (#x12_c05_para_0072)




(5.25c)




(5.25d) (#x12_c05_para_0064)

θ represents the field angle

Equations (5.25a) (#x12_c05_disp_0032)–(5.25d) (#x12_c05_disp_0035) are of great significance in the analysis of image quality, as the rms wavefront error is a key parameter in the description of the optical quality of a system. This will be discussed in more detail in the next chapter.

Worked Example 5.2 A plano-convex lens, with a focal length of 100 mm is used to focus a collimated beam; the refractive index of the lens material is 1.52. It is assumed that the curved surface faces the infinite conjugate. The pupil diameter is 12.5 mm and the aperture is situated at the lens. What is the rms spherical aberration produced by this lens – (i) at the paraxial focus; (ii) at the compensated focus? What is the rms coma for a similar collimated beam with a field angle of one degree?

Firstly, we calculate the spherical aberration of the single lens. With the object at infinity and the image at the first focal point, the conjugate parameter, t, is equal to −1. The shape parameter, s, for the plano convex lens is equal to 1 since the curved surface is facing the object. From Eq. (4.30a) (#x11_c04_disp_0063) the spherical aberration of the lens is given by:








r


 = 6.25 mm (12.5/2); f = 100 mm; n = 1.52; s = 1; t = −1

By substituting these values into the above equation, the spherical aberration may be directly calculated:








where A = 4.13 × 10


 mm ρ = r/r




From Eq. (5.23) (#x12_c05_disp_0029), the uncompensated rms wavefront error is A/√5 and the compensated error is A/√180. Therefore the rms values are given by:

Φrms(paraxial) = 185 nm; Φrms(compensated) = 30.8 nm

Secondly, we calculate the coma. From (4.30b), the coma of the lens is given by:








Again, substituting the relevant values for f, n, rmax, s, and t, we get:








where A = 3.24 × 10


 mm ρ = r/r


  ry = r sin ϕ

From (5.25b) (#x12_c05_disp_0033)




We are told that θ = 1° or 0.0174 rad. Therefore, Φrms= 6.66 × 10−6or 6.66 nm




5.3.4 General Representation of Wavefront Error


We have emphasised the synergy between Zernike polynomials and the classical treatment of aberrations in an axially symmetric optical system, i.e. the Gauss-Seidel aberrations. However, in practice, in real optical systems, these axial symmetries are often compromised, either by accident or by design. Some systems are deliberately designed whereby not all optical surfaces are aligned to a common axis. These will inevitably introduce non-standard wavefront aberrations into the system. Most significantly, even with a symmetrical design, component manufacturing errors and system alignment may introduce more complex wavefront errors into the system. Naturally, alignment errors create an off-axis optical system ‘by accident’. Manufacturing or polishing errors might produce an optical surface whose shape departs from that of an ideal sphere or conic in a somewhat complex fashion. For example, the effects of these errors may be to introduce a trefoil term (n = 3, m = 3) into the wavefront error; this is not a standard Gauss-Seidel term.

As argued, Zernike polynomials are widely used in the analysis of wavefront error both in the design and testing of optical systems. From a strictly analytical and theoretical point of view the description of wavefront error in terms of its rms value is the most meaningful. However, for largely historical reasons, wavefront error is often presented as a ‘peak to valley’ error. That is to say, the value presented is the difference between the maximum and minimum OPD across the pupil. Historically, the wavefront error for a system might have been derived from a visual inspection of a fringe pattern in an interferogram. The maximum deviation of fringes is relatively straightforward to estimate visually from a fringe pattern which might have been produced photographically. However, the rms wavefront error is more directly related to system performance. Calculation of the rms wavefront error across a pupil is a mathematical process that requires computational data acquisition and analysis and has only been universally available in more recent times. Therefore, the use of the peak to valley description still persists.

One particular disadvantage of the peak to valley description is that it is unusually responsive to large, but highly localised excursions in the wavefront error. More generally, as a rule of thumb, the peak to valley is considered to be 3.5 times the rms value. Of course, this does depend upon the form of the wavefront error. Table 5.3 (#x12_x_12_i181) sets out this relationship for the first 11 Zernike terms (apart from piston). For comparison, a standard statistical measure is also presented – namely for a normally distributed wavefront error profile, the limits containing 95% of the wavefront error distribution (±1.96 standard deviations).

The values presented in Table 5.3 (#x12_x_12_i181) are simply the ratio of the peak to valley (p-to-v) error for that particular distribution. To overcome the principal objection to the p-to-v measure, namely its heightened sensitivity to local variation a new peak to valley measure has been proposed by the Zygo Corporation. This measure is known as P to Vr or peak to valley robust. In this measure, the wavefront error is fitted to a set of 36 Zernike polynomials. Although this process is carried out by computational analysis, the procedure is very simple. Essentially the calculation process exploits the orthonormal properties of the polynomial set and calculates the contribution of each Zernike term using the relation set out in Eq. (5.12) (#x12_c05_disp_0015). Following this process, the maximum and minimum of the fitted surface is calculated and the revised peak to valley figure calculated. Of course, the reduced set of 36 polynomials cannot possibly replicate localised asperities with a high spatial frequency content. Therefore, the fitted surface is effectively a smoothed version of the original and the peak to valley value derived is more representative of the underlying physics.



Table 5.3 (#x12_c05_para_0076) Peak to valley: Root mean square (rms) ratios for different wavefront error forms.






Table 5.4 (#x12_c05_para_0080) Comparison of Zernike numbering systems.






It must be stated, at this point, that the 36 polynomials used, in this instance, are not those that would be ordered as in Table 5.1 (#x12_x_12_i20). That is to say, they are not the first 36 ANSI standard polynomials. As mentioned earlier, there are, unfortunately, a number of competing conventions for the numbering of Zernike polynomials. The convention used in determining the P to Vr figure is the so called Zernike Fringe polynomial convention. The logic of ordering the polynomials in a different way is that this better reflects, in the case of the fringe polynomial set, the spatial frequency content of the polynomial and its practical significance in real optical systems.




5.3.5 Other Zernike Numbering Conventions


The ordering convention adopted by the Fringe polynomials expresses, to a significant degree, the spatial frequency content of the polynomial. As a consequence, the polynomials are ordered by the sum of their radial and polar orders, rather than primarily by the radial order. That is to say, the polynomials are ordered by the sum n + m, as opposed to n alone. For polynomials of equal ‘fringe order’ they are then ordered by descending values of the modulus of m, i.e. |m|, with the positive or cosine term presented first.

Another convention that is very widely used is the Noll convention. The Noll convention proceeds in a broadly similar way to the ANSI convention, in that it uses the radial order, n, as the primary parameter for sorting. However, there are a number of key differences. Firstly, the sequence starts with the number one, as opposed to zero, as is the case for the other conventions. Secondly, the ordering convention for the polar order, m, as in the case of the fringe polynomials, follows the modulus of m rather its absolute value. However, the ordering is in ascending sequence of |m|, unlike the fringe polynomials. The ordering of the sine and cosine terms is presented in such a way that all positive m (cosine terms) are allocated an even number. In consequence, sometimes the sine term occurs before the cosine term in the sequence and sometimes after. Table 5.4 (#x12_x_12_i185) shows a comparison of the different numbering systems up to ANSI number 65.




Further Reading


American National Standards Institute (2017). Methods for Reporting Optical Aberrations of Eyes, ANSI Z80.28:2017. Washington DC: ANSI.

Born, M. and Wolf, E. (1999). Principles of Optics, 7e. Cambridge: Cambridge University Press. ISBN: 0-521-642221.

Fischer, R.E., Tadic-Galeb, B., and Yoder, P.R. (2008). Optical System Design, 2e. Bellingham: SPIE. ISBN: 978-0-8194-6785-0.

Hecht, E. (2017). Optics, 5e. Harlow: Pearson Education. ISBN: 978-0-1339-7722-6.

Noll, R. (1976). Zernike polynomials and atmospheric turbulence. J. Opt. Soc. Am. 66 (3): 207.

Zernike, F. (1934). Beugungstheorie des Schneidenverfahrens und Seiner Verbesserten Form, der Phasenkontrastmethode. Physica 1 (8): 689.




6

Diffraction, Physical Optics, and Image Quality





6.1 Introduction


Hitherto, we have presented optics purely in terms of the geometrical interpretation provided by the propagation and tracing of rays. Notwithstanding this rather simplistic foundation, this conveniently simple picture is ultimately derived from an understanding of the wave nature of light. More specifically, Fermat's principle, which underpins geometrical optics is itself ultimately derived from Maxwell's famous wave equations, as introduced in Chapter 1 (#x8_x_8_i3). However, in this chapter, we shall focus on the circumstances where the assumptions underlying geometrical optics breakdown and this convenient formulation is no longer tractable. Under these circumstances, we must look to another approach, more explicitly tied to the wave nature of light, the study of physical optics. To look at this a little more closely, we must further examine Maxwell's equations. The ubiquitous vector form in which Maxwell's equations are now cast is actually due to Oliver Heaviside and these are set out below:




(6.1a)




(6.1b)




(6.1c)




(6.1d)

D, B, E, H, and J are all vector quantities, where D is the electric displacement, B the magnetic field, E the electric field, H the magnetic field strength and J the current density.

The quantities D and E and B and H are themselves interrelated:




(6.2)

The quantities, ε


 and μ


, are the permittivity and magnetic permeability of free space respectively. These quantities are associated specifically with free-space (vacuum). The quantities ε and μ are the relative permittivity and relative permeability of a specific medium or substance.

These equations may be greatly simplified if we assume that the local current and charge density is zero and we are ultimately presented with the classical wave equation.




(6.3) (#x13_c06_para_0006)

The next stage in this critique of geometrical optics is to use Maxwell's equation to derive the Eikonal equation, that was briefly introduced in Chapter 1 (#x8_x_8_i3).




6.2 The Eikonal Equation


In Eq. 6.3 (#x13_c06_disp_0006), we have presented that wave equation in its true vector format. That is to say, the equation describes the electric field, E, as a vector quantity. However, much of what we will present in this chapter is a simplification of the wave equation, known as scalar theory. In this case, it is assumed that the electric field may be represented as a pseudo-scalar quantity. That is to say, the electric field, although varying in magnitude, is confined to one specific orientation and may be treated as if it were a scalar quantity. In fact, this approximation is reasonable where light is closely confined to some axis of propagation, i.e. consistent with the paraxial approximation. Thus, we are to understand that there are some limitations to this treatment.

In presenting the Eikonal equation according the scalar view, we assume that solutions to the wave equation are of the form:




(6.4) (#x13_x_13_i29)

E


(x, y, z) is a slowly varying envelope function and S(x, y, z) is the spatially varying phase of the wave. In fact S(x, y, z) has dimensions of length and when it is equal to the wavelength the phase term it describes is equal to 2π. The angular frequency is denoted by ω and the spatial frequency by k.

The scalar form of the wave equation may be written as








From the above, we can derive the Eikonal equation, but we must assume the E


(x, y, z) and the first differential of S(x, y, z) vary slowly with respect to position. The classical Eikonal equation is set out in Eq. 6.5 (#x13_c06_disp_0009).




(6.5) (#x13_x_13_i27)

It is clear that by differentiating Eq. (6.4) (#x13_c06_disp_0007) twice with respect to x, y, and z, that in deriving Eq. (6.5) (#x13_c06_disp_0009), we are neglecting terms containing the second differential with respect to S. We are also ignoring changes in the envelope function. Thus it is clear that in deriving Eq. (6.5) (#x13_c06_disp_0009), we are making the following assumptions:




(6.6a) (#x13_c06_para_0011)

and




(6.6b) (#x13_c06_para_0011)

What Eq. (6.6a) (#x13_c06_disp_0010) suggests is that the envelope function must vary slowly compared to the wavelength. In addition, Eq. (6.6b) (#x13_c06_disp_0011) suggests that the curvature of the wavefront must be small when compared to the spatial frequency, k. In other words, the assumptions underlying the Eikonal equation are only justified where the radius of any wavefront is much greater than the wavelength. As the Eikonal equation underpins geometrical optics, this sets the limits on the applicability of this methodology, and we must then seek other, more general, means to describe the behaviour of light. These methods are, of course, based on a more rigorous application of Maxwell's equations and are generally categorised under the heading of physical optics.




6.3 Huygens Wavelets and the Diffraction Formulae


Although Maxwell's equations form the rigorous description of electromagnetic wave propagation, we will first proceed from the rather more intuitive description by Huygens' principle. Huygens' principle states that, given a known wave disturbance described by a continuous surface of equal phase – the wavefront, then the amplitude of the wave at any point in space may be determined as the sum of the amplitude of forward propagating wavelets from that surface. This is illustrated in Figure 6.1 (#x13_x_13_i40).






Figure 6.1 (#x13_c06_para_0012) Conceptual illustration of Huygens' principle.



The amplitude of the wave represents the strength of the local electric or magnetic field. In this case, in our scalar representation, we consider the amplitude as the magnitude of the vector electric field. The flux or power per unit area transmitted by the wave is determined by the Poynting vector, which is the cross product of the electric and magnetic fields. In the context of this scalar treatment, the flux density is proportional to the square of the electric field. In the Huygens' representation, as illustrated in Figure 6.1 (#x13_x_13_i40), the amplitude of the secondary waves emerging from some point on the original wavefront is inversely proportional to the distance from that point. It follows, therefore, that the flux density associated with that secondary wave follows an inverse square dependence with distance. This is further illustrated in Figure 6.2 (#x13_x_13_i47) which summarises the geometry.








Figure 6.2 (#x13_x_13_i47) describes the contribution to the wave amplitude at point P′ made by a single point, P, on the original wavefront. The original wavefront has an amplitude, A(x, y, z) which may be complex. The angle, χ, is the angle the line from P to P′ makes to the normal to the wavefront. As indicated in Figure 6.2 (#x13_x_13_i47), there is some dependence of the secondary wave amplitude upon this angle, in the form of f(χ). There is no intuitive process that can shed further light on the precise form of this function. Elucidation of this can only be provided by a proper application of Maxwell's equation. Re-iterating the description of the Huygens' representation in Figure 6.2 (#x13_x_13_i47), it can be described more formally, as in Eq. (6.7) (#x13_c06_disp_0013).






Figure 6.2 (#x13_x_13_i41) Huygens secondary wave geometry.






Figure 6.3 (#x13_c06_para_0015) Geometry for Rayleigh diffraction equation of the first kind.






(6.7) (#x13_x_13_i43)

Proper application of Maxwell's equations gives rise to a series of equations that are similar in form to the Huygens' representation shown in Eq. (6.7) (#x13_c06_disp_0013). These include the so-called Rayleigh diffraction formulae of the first and second kinds. In the first case, it is assumed that the amplitude of the wave disturbance A(x, y, z) is known across some semi-infinite plane. We now seek to determine the amplitude, A(x′, y′, z′) at some other point in space. The geometry of this is illustrated in Figure 6.3 (#x13_x_13_i51).

Equation (6.8) (#x13_c06_disp_0014) shows the Rayleigh diffraction formula of the first kind.




(6.8) (#x13_x_13_i54)

Equation (6.8) (#x13_c06_disp_0014) is referred to as the Rayleigh diffraction formula of the first kind. In form, Eq. (6.8) (#x13_c06_disp_0014) is very similar to what one might expect from the summation of an expression of the form shown in the Huygens' representation in Eq. (6.7) (#x13_c06_disp_0013). We have formally expressed the summation of the Huygens wavelets as a surface integral over the plane, as shown in Figure 6.3 (#x13_x_13_i51). Note, however, instead of the decay of the wavelet amplitude with distance being expressed as in Eq. (6.7) (#x13_c06_disp_0013), a differential with respect to the axial distance is added. This is crucial, since it gives an insight into the formulation of the inclination term f(χ) which will be explored further a little later.

The other condition covered by the Rayleigh formulae occurs where the axial gradient of the amplitude is known rather than the amplitude itself. In this instance, we have the Rayleigh diffraction formula of the second kind.




(6.9) (#x13_x_13_i311)

If we combine these two solutions and make the qualifying assumption that k ≫ 1/s, then we obtain the so-called Kirchoff diffraction formula, which is replicated in Eq. (6.10) (#x13_c06_disp_0016).




(6.10) (#x13_x_13_i59)

The Kirchoff diffraction formula lacks the generality of the Rayleigh formulae, as it only applies where the secondary wave propagation distance is much greater than the wavelength. However, it provides a useful reference point for comparison with the Huygens approach. The factor, 1 + cosχ is the inclination factor that was alluded to previously. A further approximation may be made where the system is paraxial, i.e. where cosχ ∼ 1. In this case, there is no inclination factor to speak of. Furthermore, if the axial displacement s is very much larger than the lateral extent of the illuminated area defined by A(x, y, z), then for all intents and purposes, s is constant and the inverse term may be taken outside the integral. This is the so-called Fraunhofer approximation and may be written as:




(6.11) (#x13_x_13_i68)




6.4 Diffraction in the Fraunhofer Approximation


The assumptions underlying the Fraunhofer approximation are relevant to a wealth of problems in optical engineering. In particular, the approximation relates the behaviour and distribution of electromagnetic radiation in two distinct zones, the so-called near field and far field. Separation of these two zones must be such that the preceding approximations apply, i.e. that the axial displacement is much larger than the lateral extent of the radiation field and, of course, is much greater than the wavelength. We now wish to calculate the amplitude on a sphere whose vertex is located at z′ = z


 and whose centre is located at z = 0, where the near field is located. Figure 6.4 (#x13_x_13_i75) shows the general scheme.

Choice of the reference sphere centred on the near field location places the following constraint upon the variables, x, y, x′, y′, and z′:








Expanding the propagation distance, s, in terms of the variables, x, y, and x′, y′ we can re-write Eq. (6.11) (#x13_c06_disp_0017):








In the Fraunhofer approximation, we are seeking to calculate the amplitude at the limit where z


 tends to infinity. We wish to know the far field distribution at some angle, θ, where z


 tends to infinity. Therefore, we can assume that x′ ≫ x and y′ ≫ y. Hence, the diffraction formula may be recast in the following form:




(6.12) (#x13_x_13_i80)






Figure 6.4 (#x13_c06_para_0021) Far field diffraction.






Figure 6.5 (#x13_c06_para_0026) Far field diffraction of laser beam emerging from fibre.



Equation (6.12) (#x13_c06_disp_0020) has the form of a Fourier transform. So, the far field diffraction pattern of a near field amplitude distribution is simply given by the Fourier transform of that near field distribution. Of course, we must understand all the caveats that apply to this treatment, namely that the far field distribution must imply that the distance of the ‘far field’ location from the near field location must be sufficiently great. Finally, we might like to cast Eq. (6.12) (#x13_c06_disp_0020) more conveniently in terms of the angles involved:




(6.13) (#x13_x_13_i141)

NAxand NAyare the numerical apertures (sine of the angles) in the x and y directions respectively.

A typical example of the application of Fraunhofer diffraction might be the emergence of a laser beam from a very small, single mode optical fibre a few microns across. As the beam emerges from the fibre, it will have some near field distribution. In fact, the spatial variation of this amplitude may be approximated by a Gaussian distribution. In the light of the previous analysis, the angular distribution of the emitted radiation far from the fibre will be the Fourier transform of the near field distribution. This is shown in Figure 6.5 (#x13_x_13_i79).

We will be returning to the subject of diffraction and laser beam propagation later in this chapter. A more traditional concern is the impact of diffraction upon image formation in real systems. As far as the design of optical systems is concerned, hitherto we have only been concerned with the impact of aberrations in limiting optical performance. In the next section, we will examine the application of Fraunhofer diffraction to the study of image formation in optical system and the way in which the presence of diffraction limits optical resolution.




6.5 Diffraction in an Optical System – the Airy Disc



In the Fraunhofer approximation, we considered the effect of diffraction by considering a near field amplitude distribution and a far field, nominally located at infinity. However, it is not necessary for the far field to be physically located at infinity. For example the (second) focal point of an optical system is conjugated to an object plane located at infinity. In this instance, the relation of the two planes is perfectly described by the Fraunhofer approximation. This is illustrated schematically in Figure 6.6 (#x13_x_13_i94) which shows the realisation of the far field of a laser source. The focus of the lens in Figure 6.6 (#x13_x_13_i94) is conjugate to infinity in object space and, assuming the lens aberration is not significant, then the Fraunhofer diffraction pattern would be imaged at this location.

In the case of the near field distribution associated with the laser, the far field distribution will be given by the Fourier transform of the near field distribution, but mediated by the focal length of the lens. In other words, the spatial distribution, A(x′, y′) of the far field at the lens focus is given by:




(6.14) (#x13_x_13_i101)

In practice, the quantity, x′/f, can be regarded as equivalent to the numerical aperture (NA) associated with the far field distribution.






Figure 6.6 (#x13_c06_para_0028) Imaging of a Fraunhofer diffraction pattern by a simple lens.






Figure 6.7 (#x13_c06_para_0032) Diffraction of evenly illuminated pupil.



In terms of a real optical system, the greatest practical interest is invested in the diffraction produced by the pupil. For an object located at infinity and the physical stop located in object space, the far field diffraction pattern of the pupil will be formed at the focal point of the system. Of course, the pupil, or its image, the exit pupil, is of great significance in the analysis of an optical system as the system optical path difference (OPD) is, by convention, referenced to a sphere whose vertex lies at the exit pupil. As such, the diffraction pattern produced by a uniformly illuminated circular disc is of prime importance in the analysis of optical systems.

We will now assume that an optical system is seeking to image a point object, and the exit pupil size can be expressed as an even cone of light with a numerical aperture, NA. It will produce a diffraction pattern at the focus of the system, whose extent and form we wish to elucidate. A schematic of this scenario is shown in Figure 6.7 (#x13_x_13_i98).

We are now simply required to determine the Fourier transform of a circular disc. In fact, the Fourier transform of a circular disc is described in terms of J


(x), a Bessel function of the first kind. Proceeding along the lines set out in Eq. (6.14) (#x13_c06_disp_0022), we find that the far field distribution at the system focus is given by:




(6.15)

It is natural, of course, that the far field distribution retains the circular symmetry of the near field. We have to remember that, in this analysis, we have calculated the amplitude (electric field) of the far field distribution. The flux density, I(r′), is proportional to the square of the electric field and this is given by:




(6.16) (#x13_c06_para_0035)

The pattern produced at the far field location, as defined by Eq. (6.16) (#x13_c06_disp_0024) is known as the Airy disc. For r′ → 0, Eq. (6.16) (#x13_c06_disp_0024) tends to one. Thus, all values computed by Eq. (6.16) (#x13_c06_disp_0024) represent the local flux taken in ratio to the central maximum. The form of the Airy disc consists of a bright central region surrounded by a number of weaker rings. This is shown in Figure 6.8 (#x13_x_13_i109).






Figure 6.8 (#x13_c06_para_0035) Airy disc.



The importance of the Airy disc lies in the fact that it represents the ideal replication of a point source in a totally unaberrated system. Hitherto, in the idealised geometrical optics representation, a point source would be replicated as a point image. The presence of diffraction, therefore, critically compromises resolution. That is to say, even in a perfect optical system, the lateral resolution of the system is limited by the extent of the Airy disc. At this point it is useful to examine the form of the Airy disc in more detail. Figure 6.9 (#x13_x_13_i114) shows a graphical trace of the Airy disc, expressed in terms of the ratio r′/r


.






Figure 6.9 (#x13_c06_para_0036) Graphical trace of Airy disc.






Figure 6.10 (#x13_c06_para_0037) The Rayleigh criterion and ideal diffraction limited resolution.



As illustrated in Figure 6.9 (#x13_x_13_i114), the full width half maximum (FWHM) is equal to 3.233r


. Equally significant is the presence of local minima at 3.832r


 and 7.016r


. It is more informative to express these values in terms of the wavelength and numerical aperture. This gives the FWHM as 0.514λ/NA and the locations of the minima as 0.610λ/NA and 1.117λ/NA. At first sight, the FWHM may seem a useful indication of the ideal optical system resolution. In practice, it is the location of the first minimum that forms the basis for the conventional definition of ideal resolution. The rationale for this is shown in Figure 6.10 (#x13_x_13_i118).

Considering two adjacent point sources, these are said to be resolved when the maximum of one Airy disc lies at the minimum of the other. Therefore the separation of the two images must be equal to 0.610λ/NA. This is the so-called Rayleigh criterion for diffraction limited imaging. Under the Rayleigh criterion, two separated and resolved peaks are seen with a local minimum between them at 73.5% of the maximum. This is illustrated in Figure 6.11 (#x13_x_13_i126)

At this point, we will re-iterate the formula describing diffraction limited resolution under the Rayleigh criterion, as it is fundamental to the enumeration of resolution in a perfect optical system. This is set out in Eq. (6.17) (#x13_c06_disp_0025).




(6.17) (#x13_x_13_i121)






Figure 6.11 (#x13_c06_para_0038) Profile of two point sources just resolved under Rayleigh criterion.






Worked Example 6.1 Microscope Objective


A microscope objective has a numerical aperture of 0.8. What is the diffraction limited resolution at the D wavelength of 589.3 nm?

Calculation is very straightforward, we simply need to substitute the relevant values into Eq. (6.17) (#x13_c06_disp_0025).








The resolution is 0.45 μm.

This figure only applies to ‘perfect’ or diffraction limited in an aberration-free system. The presence of aberrations will affect the resolution, as will be considered in the next section.




6.6 The Impact of Aberration on System Resolution





6.6.1 The Strehl Ratio


In the preceding analysis we examined the diffraction pattern produced by a circular disc – namely the pupil. This produced the Airy diffraction pattern. In this analysis, we ignored the impact of phase, i.e. the possibility that the amplitude across the pupil might have a complex component. In fact, for a point source, the phase across the pupil is, by definition, directly related to the OPD. That is to say, if we assume that the modulus of the near field amplitude, A(x, y) is unity, the complex amplitude is given by:








where Φ(x, y) is the wavefront error across the pupil.

The final diffraction pattern is given by the Fourier transform of the above which, from Eq. (6.13) (#x13_c06_disp_0021), is given by:




(6.18) (#x13_c06_para_0064)

We now wish to compute the amplitude at the central location of the far field pattern, i.e. where NAx and Nay = 0. In this case the Fourier transform can be further simplified:




(6.19) (#x13_c06_para_0049)

For an optical system that is close to perfection, or almost diffraction limited, we can make the further assumption that kΦ ≪ 1 at all locations across the pupil. We find that the ratio of the amplitude with the presence of aberration to that without is approximately given by:




(6.20) (#x13_x_13_i148)

The expressions in the pointed brackets in Eq. 6.19 (#x13_c06_disp_0029)represent the mean square wavefront error and the mean wavefront error respectively.

However, the expression in Eq. (6.20) (#x13_c06_disp_0030) is merely the amplitude of the disturbance and not the flux. To calculate the flux density at the centre of the diffraction pattern, we need to multiply Eq. (6.19) (#x13_c06_disp_0029) by its complex conjugate. This gives:




(6.21)

The expression contained within the brackets is merely the variance of the wavefront error taken across the pupil. If we define the root mean square (rms) wavefront error, Φ


, as the rms value computed under the assumption that the average wavefront error has been normalised to zero, we get the following fundamental relationship:




(6.22) (#x13_c06_para_0052)

Equation (6.22) (#x13_c06_disp_0032) is of great significance. The ratio expressed in Eq. (6.22) (#x13_c06_disp_0032), the ratio of the aberrated and unaberrated flux density, is referred to as the Strehl ratio. The Strehl ratio is a measure of the degradation produced by the introduction of system imperfections. Of course, Eq. (6.22) (#x13_c06_disp_0032) only applies where kΦ





 ≪ 1. The fact that the peak flux of a diffraction pattern is reduced by the introduction of aberration necessarily implies that the distribution is in someway broadened, i.e. the resolution is reduced. For example, if the Strehl ratio is 0.8, then the area associated with the diffraction pattern is likely to have increased by about 20% and the linear dimension by about 10%.




6.6.2 The Maréchal Criterion


The Strehl ratio is of great practical significance. It affords a useful, practical, but somewhat arbitrary definition of ‘diffraction limited’ imaging. Where the Strehl ratio is 0.8 or greater, the image is said to be diffraction limited. This is the so-called Maréchal criterion. It can be expressed as follows:




(6.23)

This condition is widely accepted as the basis for the definition of diffraction limited imaging. As a measure of system wavefront error, the peak to valley wavefront error, Φ


 ≪ λ/4 is often preferred, in practice. This is very much a traditional description of system wavefront error, preserved for historical reasons, primarily on account of the ease of reckoning peak to valley fringe displacements on visual images of interferograms. This consideration has, of course, been displaced by the ubiquitous presence of computer-based interferometry which has rendered the calculation of rms wavefront errors a trivial process. Nonetheless, the peak to valley description remains prevalent. Although dependent upon the precise distribution of wavefront error, as a rule of thumb, the peak to valley wavefront error is about 3.5 times the standard deviation or rms value. Therefore, we can set out another condition for diffraction limited imaging.




(6.24)

Worked Example 6.2 A simple ×10 microscope objective is to be made from a BK7 plano-convex lens and is to operate at the single wavelength of 589.3 nm. The refractive index of BK7 at 589.3 nm is 1.518 and the assumed microscope tube length is 160 mm. We also assume that only the microscope objective contributes to system aberration. What is the maximum objective numerical aperture consistent with diffraction limited performance, assuming on-axis spherical aberration as the dominant aberration?

Firstly, for a wavelength of 589.3 nm, from Eq. (6.22) (#x13_c06_disp_0032):








For a ×10 objective, the focal length should be 160/10 = 16 mm for a 160 mm tube length. The tube length is much longer than the objective focal length, so, for all intents and purposes, the image is at the infinite conjugate, as illustrated.








Note that the curved surface faces the infinite conjugate in order to minimise spherical aberration. Thus, in this instance, the conjugate parameter is 1 and the lens shape factor is −1. From Eq. (4.30a) (#x11_c04_disp_0063)








Substituting the values of s and t:








Substituting the values of n (1.518) and f (16 mm) into the equation, we get:








Further assuming that we are using defocus to ‘balance’ and minimise aberrations, we can relate the rms wavefront error to the maximum OPD:













For the ‘diffraction limited’ condition to be fulfilled the rms wavefront error must be less than 41.94 nm.








The minimum numerical aperture is thus 0.107. Substituting this value into Eq. (5.17) (#x12_c05_disp_0021), then it is clear that the resolution of the objective is 3.37 μm. The design of complex objectives does, of course, generally proceed by virtue of ray tracing. However, this illustration does provide some insight into the limitations of simple optics and the value added by more complex designs.




6.6.3 The Huygens Point Spread Function


The Huygens point spread function is the diffraction pattern produced in the image plane by a point source located at the object plane. Determination of the point spread function proceeds as per Eq. (6.18) (#x13_c06_disp_0028) and so is fundamentally influenced by the system wavefront error across the pupil. Of course, as previously argued, any wavefront error reduces the flux at the axial location in proportion to the Strehl ratio. In concert with this, the width of the diffraction pattern increases with increasing wavefront error.

As will be discussed in more detail later, where the wavefront error is small, purely geometrical ray tracing produces a very different spatial distribution when compared to the point spread function. Naturally, in the limit where the wavefront error becomes large, the Huygens point spread function tends to the geometrical spot distribution. This is quite an important consideration, as it impacts how optical systems are optimised with regard to optical performance. If the intention is to design a diffraction limited system, the most efficient optimisation proceeds by minimising the wavefront error. However, where the intended wavefront error is large when compared to the diffraction limit, it is best to optimise the system by minimising the geometrical spot size.




6.7 Laser Beam Propagation





6.7.1 Far Field Diffraction of a Gaussian Laser Beam



The far field divergence of a laser beam may be accounted for by the Fraunhofer approximation according to Eq. (6.13) (#x13_c06_disp_0021). In the treatment of a laser beam, we consider the beam to have a ‘near field’ location, or beam waist, where the phase is uniform across a plane perpendicular to the propagation direction. That is to say, the wavefronts are planar at this beam waist location. In addition, it must be assumed that the wavefront is ‘coherent’, i.e. that an unambiguous phase relation exists across the wavefront at all times. We now wish to know the disturbance produced by this near field distribution in the far field. To make the problem tractable, we assume that the near field distribution of the laser beam waist may be described by a Gaussian function. This is a useful approximation, however, it must be emphasised that this is an approximation. In practice, the profile of a laser beam is not quite Gaussian, with more flux being in the wings, far from the centre, than would be expected from a Gaussian distribution.

In the Gaussian approximation, we may, for example, describe the laser beam emerging from the end of a single mode fibre or from the output of a semiconductor laser facet or from a helium neon laser. The size of the beam waist is described by the parameter, w


, namely the radial distance at which the amplitude, A(r), falls to 1/e, or 37% of the peak value. When expressed in terms of flux, the size of the beam waist, r


 defines the radius at which the flux, I(r) falls to (1/e)


 or 13% of the peak value. For example, in the case of a single mode telecoms fibre, a laser beam emerging from the end of the fibre might have a beam waist, w


 of 5.5 μm. Equation (6.25) (#x13_c06_disp_0042) expresses the form of the laser beam profile:




(6.25) (#x13_x_13_i186)

Having characterised the beam waist in this way, it is useful to relative it to both the FWHM, dFWHM, and the rms radius, rrms.




(6.26)

In the Fraunhofer approximation, the far field may be derived from the Fourier transform of the near field. At this point, the significance of the Gaussian representation becomes clear, as the Fourier transform of a Gaussian is another Gaussian. The far field representation is thus given by another Gaussian, with the divergence expressed by a characteristic numerical aperture, NA


.




(6.27)

From the Fourier transform, it is possible to derive a clear relationship between the near field beam waist, w


, and the far field divergence, NA


. This is given by Eq. (6.28) (#x13_c06_disp_0045).




(6.28) (#x13_x_13_i192)

As one might expect, the far field divergence is inversely proportional to the size of the beam waist, a smaller beam waist effecting a larger divergence.




Worked Example 6.3 – Beam Divergence of a Fibre Laser


A laser beam with a wavelength of 1.55 μm emerges from a single mode fibre. The laser beam, at this point can be characterised as a beam with a waist size of 5.25 μm. What is the characteristic numerical aperture, NA


, associated with the far field divergence?

Substituting the relevant values into Eq. (6.28) (#x13_c06_disp_0045), we get:








The numerical aperture is thus 0.094 and this corresponds to a divergence angle of 5.39°.

Expressed as a FWHM, the near field beam width is 6.18 μm, and the rms radius is 4.37 μm. Similarly in the far field the FWHM divergence angle is 6.35° and the rms divergence angle is 3.81°.




6.7.2 Gaussian Beam Propagation



Thus far, we have analysed the relationship between the near field and far field dispositions of a Gaussian beam. We now turn to the more general case of the propagation of a Gaussian beam. As in the previous analysis, the laser beam is defined by a characteristic Gaussian radius. In this case, the characteristic radius, w(z), is a function of the axial propagation distance, z. To describe the laser beam, we introduce an envelope function, A


(x, y, z) that describes the radial profile of a propagating laser beam.




(6.29) (#x13_x_13_i208)

In practice, in most cases, the size, w(z), of the laser beam is much larger than the wavelength, and we can use the slowly varying envelope approximation. This is, in effect, a paraxial approximation, where far field divergence angles are necessarily small and can be formally expressed as:

(6.30)




Applying Eq. (6.29) (#x13_c06_disp_0047) to the scalar version of Maxwell's equation and taking into account the above approximation, we obtain the so-called paraxial Helmholtz equation:




(6.31) (#x13_x_13_i302)

As suggested, in the case of a Gaussian beam, the amplitude may be expressed a characteristic width, w(z) which varies slowly with respect to z. Most significantly, w(z) can be complex, giving rise to wavefront curvature. This may be easily seen if the complex part of the envelope is subsumed within the sinusoidal propagation term, leading to a quadratic phase variation across the beam. In the Fresnel and paraxial approximation, these quadratic wavefronts may be viewed as approximately spherical. This is illustrated in Figure 6.12 (#x13_x_13_i218).

Mathematically, the Gaussian beam envelope is expressed as follows:




(6.32) (#x13_x_13_i213)

The component A


(z) represents the peak on axis amplitude and this would be expected to diminish as the beam expands. To make the analysis more tractable, we subsume this axial variation into the exponential function as a complex function of z, β(z). Furthermore, both real and imaginary parts of the Gaussian are combined into a single complex function, α(z). Hence, to solve the paraxial Helmholtz equation in this instance we re-cast Eq. (6.32) (#x13_c06_disp_0050) in the following form:











Figure 6.12 (#x13_c06_para_0079) Gaussian beam.



And substituting this into the paraxial Helmholtz equation:








This equation must hold for all values of r. Both the quadratic terms and those with no dependence in r must sum to zero. Taking the quadratic element only, we may derive a very simple relationship for α(z).




(6.33)

By viewing Eq. (6.32) (#x13_c06_disp_0050) it is straightforward to relate β(z) to the beam size, w(z) and the radius R(z):




(6.34)

To interpret the constant C, we assume that there exists a minimum beam size, the beam waist, where the wavefront curvature is zero. We denote this beam waist by the symbol, w


. It is clear, then, that the constant C is equal to the square of the beam waist. Eliminating the constant, C, gives:




(6.35) (#x13_x_13_i227)

We note that there some characteristic distance, Rz, over which the beam expands around the beam waist. This distance is known as the Rayleigh distance and the Eq. (6.35) (#x13_c06_disp_0055) may be finally re-cast to give the following:




(6.36) (#x13_x_13_i229)

The Rayleigh distance is of particular significance. In effect, it sets the demarcation between the near field and the far field. In the case of the far field, z ≫ Z


 the expressions in Eq. (6.36) (#x13_c06_disp_0056) revert to the Fraunhofer diffraction pattern of a Gaussian beam:




(6.37) (#x13_c06_para_0088)

This may be compared with Eq. (6.28) (#x13_c06_disp_0045) and is very similar in form. Thus Eq. (6.37) (#x13_c06_disp_0057) represents the far field diffraction pattern of a Gaussian beam. In the near field where z ≪ ZR, the beam is parallel and the size constant at w


 and, of course, the radius tends to infinity. At the Rayleigh distance, the beam size is increased by a factor corresponding to the square root of two and the wavefront radius is equal to twice the Rayleigh distance. This is illustrated more formally in Figure 6.13 (#x13_x_13_i242).

For values of z that are of a similar magnitude to the Rayleigh distance, then the beam is in an intermediate zone between the near and far fields.




Worked Example 6.4 – Rayleigh Distance of Fibre Laser


In the previous worked example, we introduced a fibre laser with a wavelength of 1.55 μm, with a Gaussian beam size of 5.25 μm. If we assume that the beam waist is located at the exit from the fibre, what is the Rayleigh distance of the laser beam? In addition, what is the beam size, w(z), 50 μm from the exit from the fibre and what is the wavefront radius at that location?

From Eq. (6.36) (#x13_c06_disp_0056):








The Rayleigh distance is 55.9 μm






Figure 6.13 (#x13_c06_para_0088) Form of expanding Gaussian beam and beam waist.



Calculation of the beam size and the wavefront radius also proceeds from Eq. (6.36) (#x13_c06_disp_0056).








The beam size, w(z), is 7.05 μm








The wavefront radius is 112.4 μm




6.7.3 Manipulation of a Gaussian Beam



The preceding analysis has presented an analysis of the propagation of a Gaussian beam through free space. However, in most practical instances, the beam will be manipulated by optical components, lenses, and mirrors and so on. Therefore it would be useful to be able to understand the impact of an individual optical component or system on the propagation of a Gaussian beam. In the analysis presented here, the component or system is simply represented in its paraxial form by a ray tracing matrix, as presented in Chapter 1 (#x8_x_8_i3). The matrix is populated by four elements, A, B, C, and D and these transform the wavefront radius according to the following equation:




(6.38) (#x13_x_13_i253)

(R1and R2are the input and output radii respectively)

For a Gaussian beam, the wavefront curvature can be represented by the complex quantity, q, where q = z + iZ


. The distance from the beam waist is represented by z and Z


 is the Rayleigh distance. Expression (6.38) (#x13_c06_disp_0061) may be re-cast in the following form:




(6.39)

where q = z + iZ




This is the so-called ABCD law for the manipulation of Gaussian beams.




Worked Example 6.5 – Gaussian Beam Manipulation


A helium neon laser beam at 633 nm has a beam waist of 0.6 mm, located 80 mm from a positive lens of focal length 60 mm. Calculate the size of the beam waist following the lens and its location with respect to the lens.

If we take the origin of the co-ordinate system to be at the original beam waist (z = 0), then the system matrix consists of a translation by 80 mm followed by a 60 mm focal length lens.








A = 1; B = 80; C = −0.0167; D = −0.333

The Rayleigh distance of the original beam is given by:








Using the ABCD law:








The lens lies at −59.9 mm from the beam waist and hence the beam waist is 59.9 mm in advance of the lens. The Rayleigh distance is 2.201 mm. This corresponds to a beam waist size (from Eq. (6.36) (#x13_c06_disp_0056)) of 0.02 mm or 20 μm.

The new beam waist lies approximately at the focus of the lens. Since the lens lies much closer to the original beam waist than the corresponding Rayleigh distance, then the beam is almost parallel and the new beam waist should lie very close to the focal position.




6.7.4 Diffraction and Beam Quality


All the analysis presented thus far has assumed a Gaussian beam that possesses perfect spatial coherence. Perfect spatial coherence implies that an unambiguous phase relationship exists between all points across the wavefront. A less than perfect wave disturbance is composed of a number of different components whose phase relationship is entirely random. As such, spatial coherence is defined more formally as the correlation between the wave disturbance at two points. The complex amplitude of a wave at a point, A(t), may be expressed in Fourier space in terms of a frequency distribution, S(f). The coherence between two points, x, y is simply given by the cross correlation of the two disturbances:




(6.40)

Perfect coherence is represented by a correlation of one; complete incoherence is represented by a correlation of zero. In many practical problems in Gaussian beam propagation, it may be assumed that the coherence of the laser beam is one. However, this is dependent upon the number of independent ‘modes’ that characterise the laser beam. A single mode is effectively one unique solution to the wave equation and laser devices are often engineered in such a way that only one of these modes is allowed to propagate. The extent to which this is true is a measure of the laser's beam quality. The beam quality of a laser is generally expressed by the parameter M


or M squared and is indicative of the number of modes supported by the beam. If this value is one or close to one, then the beam quality is high and any propagation analysis will proceed as previously described. For a beam with a beam quality defined by the M


 parameter, then the spatial coherence is given by:




(6.41)

Returning to the practical question of Gaussian beam propagation, the beam propagation may be expressed entirely in the original form given by Eq. (6.36) (#x13_c06_disp_0056), except with a revised Rayleigh distance, Z′


.




(6.42) (#x13_x_13_i277)

It is clear from Eq. (6.42) (#x13_c06_disp_0068), that, where M


 is significantly greater than one, the divergence of the laser beam in the far field is greater than would be expected from a perfect beam. The revised equivalent of 5.68, giving the beam divergence is set out in Eq. (6.43) (#x13_c06_disp_0069).




(6.43) (#x13_x_13_i277)

In practice, the M


 value for a laser beam is measured and then analysed using the relationships set out in Eqs. (6.42) (#x13_c06_disp_0068) and (6.43) (#x13_c06_disp_0069). The parameter is generally specified for many commercial laser systems.




6.7.5 Hermite Gaussian Beams


Further exploring the theme of multiple modes as individual solutions to the wave equation, we must recognise that the simple Gaussian beam previously defined is not the only solution to the paraxial Helmholtz equation. In fact, a complete set of orthonormal solutions exist of which the simple Gaussian solution is the first member. Orthonormal, in this context, means that the cross-integral of two different solutions is always zero and that involving the same solutions is always unity. This is an important property, as will be seen later. This set of solutions are defined by the Hermite-Gaussian polynomials, where the original Gaussian amplitude envelope is multiplied by a unique Hermite polynomial in x and y. Each Hermite polynomial solution is defined by its maximum order in x and y, which we will refer to as l and m respectively. Overall, the solution may be represented as:




(6.44) (#x13_x_13_i286)

The expression w(z) is simply the Gaussian beam radius for any specific value of z, as given in Eq. (6.36) (#x13_c06_disp_0056) and A[w(z)] is a normalising factor. The first few polynomials are set out Table 6.1 (#x13_x_13_i291).

Figure 6.14 (#x13_x_13_i296) shows graphically the form of some low order Hermite polynomials.

The orthogonal nature of the polynomials provides a suggestion as to their utility. As with a Fourier series, any arbitrary beam profile may be represented as a summation of a series of Hermite polynomials. If we represent the full expression contained in Eq. (6.44) (#x13_c06_disp_0070) as Gl,m(x, y, z), then the series may be represented as:




(6.45) (#x13_c06_para_0117)

C


are coefficients describing the amplitude of each term



Table 6.1 (#x13_c06_para_0112) Low order Hermite polynomials.









Figure 6.14 (#x13_c06_para_0113) A selection of low order Hermite polynomials.



Assuming the beam profile is known as some plane, z


, then each coefficient may be calculated by exploiting the orthonormal property of the series:




(6.46) (#x13_c06_para_0117)

Thus, Gaussian-Hermite polynomials represent a powerful tool for physical optics propagation. Assuming a beam profile is known at some point, the relevant coefficients may be calculated according to Eq. (6.46) (#x13_c06_disp_0072) and summed according to Eq. (6.45) (#x13_c06_disp_0071) and then propagated in free space according to Eq. (6.44) (#x13_c06_disp_0070).




6.7.6 Bessel Beams


An interesting solution to the paraxial Helmholtz equation, Eq. (6.31) (#x13_c06_disp_0049), is the so-called Bessel beam. Generally, Bessel functions of the first kind form a series of solutions to the equation. Most specifically for an axially symmetric form, the solution is represented by a Bessel function of the first kind and zeroth order. The unique feature of this solution is that the wavefronts are planar and the amplitude envelope of the beam does not change as it propagates. That is to say, the beam appears to be diffractionless and does not diverge. The solution is of the form given by:




(6.47)

J


 is a Bessel function of the first kind and zeroth order; 


; kr is the effective transverse wavevector; β is the propagation wavevector given by: 




Another interesting type of beam is the Talbot beam. Rather than retaining a constant profile as it propagates, the Talbot beam replicates itself at specific propagation distances. For further details, the reader is advised to consult the Further Reading section at the end of the chapter.




6.8 Fresnel Diffraction


The study of Gaussian beam propagation has provided us with a more quantitative description of near field and far field propagation and where the boundary between the two zones occurs. In our analysis of Fraunhofer diffraction, we considered only the far field approximation. Related to the concept of the Rayleigh distance for Gaussian beam propagation is a dimensionless parameter called the Fresnel number. If the near field is defined by some aperture with a radial dimension of a, then the Fresnel number, F, at a propagation distance of L from the aperture is given by:




(6.48)

Referring to Eq. (6.36) (#x13_c06_disp_0056), then the Gaussian beam equivalent of the Fresnel number is the ratio of the Rayleigh distance, Z


, to the beam propagation distance. For Fresnel numbers much less than one, then the diffraction pattern may be considered as a far field pattern and the Fraunhofer approximation applies. Where the Fresnel number is much greater than one, then one is in the near field.

The analysis of Fresnel diffraction is derived from the Rayleigh diffraction formulae (Eqs. [6.8] (#x13_c06_disp_0014) and [6.9] (#x13_c06_disp_0015)). The key assumption in the Fresnel analysis relates to an approximation of the propagation distance, s. If one assumes that the near field object is located at z = 0, then the propagation distance may be approximated in the following manner:




(6.49)

In making the above approximation, based on a Taylor series expansion, we are choosing to ignore terms of fourth order in x and y. These terms cannot be permitted to make a significant contribution to the phase when the approximation is applied to the Rayleigh formulae. Setting out the fourth order terms more explicitly, it is straightforward to delineate the approximation more clearly:








If we re-cast z as the propagation distance, L, and represent the ratio x/z as θ, the angular size of the near field and also denominate the near field radius as a, then the Fresnel condition is given by:




(6.50) (#x13_c06_para_0135)

The value of the Fresnel approximation is that it now permits us to treat the axial propagation distance, z, as a constant and to remove it from the integral in the diffraction equation, producing a more simple expression involving integration with respect to x and y. This is the so-called Fresnel integral and it is set out below.




(6.51) (#x13_x_13_i320)

It would be useful, at this point, to illustrate the assumptions underlying Fresnel diffraction with a practical example. An optical system populated with components with a standard diameter of 25 mm would have an effective radius of 12.5 mm. For a wavelength of 500 nm, the Fresnel approximation applies to distances much greater than 250 mm. At that distance, the Fresnel number is about 1000, so we are clearly in the near field zone.

To illustrate the application of Fresnel diffraction, we might now apply it to a uniformly illuminated slit of width w. Without loss of generality, this provides a simple illustration of the application of Fresnel diffraction in one dimension. For a given source point, e.g. x = 0, then the phase of the sinusoidal component of Eq. (6.51) (#x13_c06_disp_0078) is of critical interest. In particular, we are concerned with points where the phase expressed in Eq. (6.51) (#x13_c06_disp_0078) is a half period number of waves. That is to say:




(6.52) (#x13_x_13_i322)

The effect of diffraction at an edge or an aperture is to produce an alternating series of light and dark rings. The disposition of these rings is affected by the relative phases of contributions from the source. As such, Eq. (6.52) (#x13_c06_disp_0079) provides some indication of the location of these rings. The locations of these points, as set out in Eq. (6.52) (#x13_c06_disp_0079) are referred to as the Fresnel zones. Based on application of Eq. (6.51) (#x13_c06_disp_0078), for the one dimension, the diffracted amplitude from the slit is proportional to:




(6.53)

We make the substitution s = x − x′ and make the further assumption that the diffraction pattern is symmetrical about the centre of the slit. In doing this, we may be permitted, without loss of generality in assuming that x > 0. The integral now becomes:




(6.54)

We will now refer to the quantity w/2 − x′ as Δ. The quantity Δ now represents the distance in x from the positive edge of the slit.




(6.55) (#x13_x_13_i328)

The sign of the first and third terms in Eq. (6.55) (#x13_c06_disp_0082) is dependent upon the sign of Δ. If Δ is greater than 0, then the sign is negative and vice versa. The structure of the integral above is of great importance, as it can be decomposed into two relatively simple integrals of the form:




(6.56)

The above integral is of great importance and is known as the Fresnel integral. Plotting both components of amplitude in Figure 6.15 (#x13_x_13_i338), we produce the familiar form of the Cornu spiral.

Progression around the Cornu spiral in Figure 6.15 (#x13_x_13_i338) is marked by increasing values of Δ, the distance from the slit boundaries. Each successive Fresnel zone is marked in Figure 6.15 (#x13_x_13_i338) and the numbering of the zones is as per Eq. (5.52). Most importantly, it is clear from Figure 6.15 (#x13_x_13_i338) that an asymptote is reached for large values of Δ. At large values of Δ, the integral tends to 0.25 + 0.25i. If, in Eq. (6.55) (#x13_c06_disp_0082), one assumes that w-Δ is large, then this asymptotic value must be added to the integral. In this case, we can now reasonably approximate the integral expression in Eq. (6.55) (#x13_c06_disp_0082) in the following manner:




(6.57) (#x13_c06_para_0134)

When Δ is large and positive, the integral part of Eq. (6.57) (#x13_c06_disp_0084) cancels out the constant asymptotic values, so the amplitude is zero. Of course, that the amplitude is zero away from the illuminated portion of the slit is to be expected. In the opposite scenario, where a position within the illuminated area is viewed, then the flux levels tend to a uniform value. Around the edge position, and towards the illuminated area, a series of light and dark bands emerge. One can see from the disposition of the Cornu spiral, that the contrast of these bands diminishes as the effective Fresnel zone number increases and, from Eq. (6.52) (#x13_c06_disp_0079), they also become more tightly packed.

We can now illustrate this process by considering a slit with a width of 2 mm which is illuminated by a 500 nm source. By reference to the example of Gaussian beam propagation, we assume in this analysis that the illumination is significantly spatially coherent. This does not necessarily imply the use of a laser beam; in practice it means that the slit is illuminated by a parallel beam with very small angular divergence. We now view the slit at a distance of 100 mm. The Fresnel number is 20 and, as the effective angle, θ, is 0.01 rad, the Fresnel approximation is clearly justified by applying Eq. (6.50) (#x13_c06_disp_0077). Applying the Fresnel integral to this specific problem, we obtain the flux distribution described by Figure 6.16 (#x13_x_13_i354).






Figure 6.15 (#x13_c06_para_0132) Fresnel integral and Cornu spiral.



As previously indicated, the illuminated portion is described a series of fringes characterised by the spacing of the Fresnel zones. In the obscured region, the flux tails off to zero. At the slit boundary, the flux is one half of the nominal. Of course, if the set up were reversed, and an obscuration substituted for the slit, then the pattern in Figure 6.16 (#x13_x_13_i354) would be reversed.

Generally, in problems associated with Fresnel diffraction, the diffraction pattern produced by sharp edges broadly follows that illustrated in Figure 6.16 (#x13_x_13_i354). The characteristic diffraction pattern away from the sharp edge feature consists of a series of ripples denominated by the relevant Fresnel zone.




6.9 Diffraction and Image Quality





6.9.1 Introduction


The analysis of image quality is central to any analysis of an imaging system. Where the wavefront error of a system is rather larger than the operating wavelengths of the system, the performance of the system may be adequately described by geometrical optics. Metrics such as geometrical spot size, as derived directly from ray tracing, prevail in this instance. However, where the wavefront errors are very much less than this, then diffraction effects prevail. Indeed, where effects other than diffraction may be legitimately ignored, the image is said to be diffraction limited. Overall, there are a number of metrics that quantitatively describe image quality and these are summarised below:



● Geometric spot size (rms spot size, 90% encircled energy etc.) – Geometric optics

● Point spread function (rms spot size, 90% encircled energy etc.) – Wave optics

● Strehl Ratio – Wave optics

● Modulation Transfer Function (MTF)





Figure 6.16 (#x13_c06_para_0135) Fresnel diffraction at 100 mm from 2 mm Slit – λ = 500 nm.






6.9.2 Geometric Spot Size


This is perhaps the most straightforward of the image quality metrics to visualise. By virtue of ray tracing, for example, using ray tracing software, a number of representative rays that uniformly illuminate the entrance pupil are traced to the image plane. Of course, in an ideal image formation system, all rays would be traced to a common image point. However, deviation from this ideal behaviour is a measure of the image quality. Furthermore, this process would be attempted for a number of different field positions and, inevitably, for a dispersive system, for a number of different wavelengths. An example geometric spot is shown in Figure 6.17 (#x13_x_13_i362), illustrating the impact of spherical aberration and coma.

In order to quantify the data depicted in Figure 6.17 (#x13_x_13_i362), a number of different measures may be adopted. Measurements are characterised typically with respect to some central location. This central location may either be the intersection of the chief ray at the image plane or the weighted mean location of all intersecting rays – the centroid. That the two conventions might produce different answers is evident from the depiction of the comatic spot diagram in Figure 6.17 (#x13_x_13_i362), where the chief ray intersection corresponds to the apex at the bottom of the spot. Whichever convention is used, the size of the spot may be described in the following ways:






Figure 6.17 (#x13_c06_para_0139) Geometric spots for spherical aberration and coma.





● Full width half maximum (FWHM) – the physical width in one dimension at which the flux density falls to half of the maximum

● Root mean square (rms) spot size

● Encircled energy – the physical radius within which some fixed proportion (e.g. 50% or 80%) of all rays lie.

● Ensquared energy – the size of the square within which some fixed proportion (e.g. 50% or 80%) of all rays lie

● Enslitted energy – the width of the slit within which some fixed proportion (e.g. 50% or 80%) of all rays lie


The FWHM is a useful description of the width of a sharp geometrical peak. On the other hand, the rms spot size is more mathematically useful, but not always universally applicable. In the case of an Airy disc, the rms spot size is actually infinite and the rms spot size is not so useful in situations where an image is attended by a large background signal. Encircled energy is useful to gauge the amount of light passing through a small circular aperture. Its equivalent for a rectangular geometry, the ensquared energy, is particularly useful for pixelated detectors whose sensor elements are naturally either square or rectangular. Similarly, for slitted instruments, such as spectrometers, enslitted energy is a useful metric.

Where the overall wavefront error is significantly larger than the wavelength, this geometric description of image quality is perfectly adequate. However, where this is not the case, we must look to a new approach.




6.9.3 Diffraction and Image Quality


In Section 6.5 (#c06_sec_0005) we examined briefly the diffraction pattern produced by a uniformly illuminated pupil. This is the so-called Airy disc. The Airy disc is the diffraction pattern that one would obtain in the absence of any system aberration. In Section 6.6 (#x13_x_13_i3), we included the effect of system aberration, introducing the Huygens point spread function. The Huygens point spread function is the flux distribution pattern produced at the image plane produced by a point source at the object plane. Figure 6.18 (#x13_x_13_i379) shows an example of an aberrated pupil, where the OPD is mapped in two dimensions across a circular pupil.

The Huygens point spread function (PSF) of the same system is shown in Figure 6.19 (#x13_x_13_i383).

The PSF shown in Figure 6.19 (#x13_x_13_i383) shows much deeper rippling when compared to the Airy distribution and, unlike the geometrical analysis, represents an accurate solution for the local flux distribution at the image. In analysing the PSF, one can use similar metrics as for the geometric spot size, with the addition of the Strehl ratio:






Figure 6.18 (#x13_c06_para_0143) OPD map across pupil.






Figure 6.19 (#x13_c06_para_0144) Huygens point spread function.





● Strehl Ratio

● Full width half maximum

● Root mean square (rms) spot size

● Encircled energy

● Ensquared energy

● Enslitted energy


As mentioned previously, the Strehl ratio describes the ratio of the aberrated peak flux to the unaberrated peak flux. A ratio of 0.8 or greater, by virtue of the Maréchal criterion, is considered to be ‘diffraction limited’. This is consistent with an rms wavefront error of lambda/14 or a peak to valley wavefront error of lambda/4. This measure was introduced earlier in Section 6.6 (#x13_x_13_i3), and is an exceptionally important metric to keep in mind when designing a system that is diffraction limited or near diffraction limited.




6.9.4 Modulation Transfer Function


The MTF expresses the ability of an imaging system to replicate the contrast of a specific object pattern. In the case of the MTF, the object is represented by a sinusoidally varying pattern of light and dark, described by some kind of spatial frequency, k


. That is to say, the spatial variation of the object illumination is represented by:




(6.58) (#x13_x_13_i396)

The contrast ratio of the illumination is defined as the ratio of the difference of the maximum and minimum fluxes to the sum of those fluxes. The illumination pattern represented in Eq. (6.58) (#x13_c06_disp_0085) has a contrast ratio of unity, with the minimum flux being zero. However, because of imaging imperfections, this is not fully represented at the image plane and the contrast is somewhat reduced. Assuming the system magnification is M, then




(6.59)

For an input contrast ratio of unity, the MTF is defined as the contrast ratio of the image:




(6.60)






Figure 6.20 (#x13_c06_para_0150) MTF pattern.



A typical example of an MTF pattern is shown in Figure 6.20 (#x13_x_13_i403).

The MTF response shown in Figure 6.20 (#x13_x_13_i403) illustrates different final contrast levels, varying from 2% to 100%. In addition, a range of input spatial frequencies is shown. In practice, there is a tendency for the contrast ratio to reduce at higher spatial frequencies; a typical imaging system has a reduced capacity for replicating fine details. A typical MTF plot against spatial frequency is shown in Figure 6.21 (#x13_x_13_i409).






Figure 6.21 (#x13_c06_para_0151) Typical MTF plot.



It is evident, from Figure 6.21 (#x13_x_13_i409), that the MTF declines with spatial frequency. Also included in the plot is the MTF of the diffraction limited system. In fact, the MTF is the absolute value of the complex optical transfer function (OTF). The OTF of a system is related to the Fourier transform of the point spread function. In fact, for the diffraction limited system, the MTF follows a fairly simple mathematical prescription. There is some maximum spatial frequency, υ


, above which the MTF is zero and this is defined by the system numerical aperture, NA. In this case, the diffraction limited MTF is simply given by:




(6.61) (#x13_c06_para_0157)

The MTF is widely used in the testing and analysis of camera systems. One particular attribute of the MTF is especially useful. For a system composed of a number of subsystems, the MTF of the system is simply given by the product of the individual MTFs:




(6.62) (#x13_c06_para_0156)

Analysis of the MTF is also useful in incorporating the behaviour of the detector. In a traditional context, where photographic film had been used, the contrast provided by the film media would be defined by the spatial frequency at which its effective MTF fell to 50%. For high contrast black and white film, this spatial frequency might have been of the order of 100 cycles per mm, although this would vary with film type and sensitivity. On the whole, colour film had poorer contrast with the equivalent spatial frequency being less than 50 cycles per mm. Of course, modern cameras base their detection upon pixelated sensors. In this instance, the characteristic spatial frequency is defined by Nyquist sampling where the equivalent spatial frequency covers two whole pixels. That is to say, for a pixel spacing of 5 μm, the equivalent spatial frequency is 100 cycles per mm.

Worked Example 6.6 We are designing a camera system to give an MTF of 0.5 at 100 cycles per mm. The camera has a pixelated detector with a pixel spacing of 5 μm. It may be assumed that the effective MTF of this detector is 0.75. The remainder of the system may be assumed to be diffraction limited. For a working wavelength of 500 nm, what is the minimum numerical aperture that the system needs to have to fulfil its requirement?

From Eq. (6.62) (#x13_c06_disp_0089) – the MTF of the remainder of the system is equal to 0.5/0.75 = 0.67.

Using this MTF figure we can calculate υ/υ


 from Eq. (6.61) (#x13_c06_disp_0088) and this amounts to 0.265, giving υ


 as 377 cycles per mm. Again from Eq. (6.61) (#x13_c06_disp_0088), given a wavelength of 500 nm, we can calculate the minimum numerical aperture of the system as 0.095 or about f#5.




6.9.5 Other Imaging Tests


The MTF provides a clearly mathematically defined test pattern for testing and subsequent analysis. However, there are other image resolution tests based upon the replication of reticulated patterns, often consisting of sharply delineated features, such as lines or line pairs. One example of this is the 1951 USAF resolution test chart which is a standard reticle placed at the object location. Broadly speaking, this consists of a set of line features whose characteristic size reduces by the sixth root of two when progressing from feature to feature. Visual inspection of the final image enables determination of the minimum line spacing resolution. The standard USAF pattern is illustrated in Figure 6.22 (#x13_x_13_i426).

Although, these types of test are inherently simpler and less capital intensive, the reliance on human visual inspection is, in itself, a weakness. Where, at one time, analytical complexity precluded the widespread use of MTF and other more abstruse processes, the availability of high performance computational power overcomes this obstacle nowadays.






Figure 6.22 (#x13_c06_para_0158) 1951 USAF resolution test chart.

Source: Image Provided by Thorlabs Inc.






Further Reading


Bleaney, B.I. and Bleaney, B. (1976). Electricity and Magnetism, 3e. Oxford: Oxford University Press. ISBN: 978-0-198-51141-0.

Born, M. and Wolf, E. (1999). Principles of Optics, 7e. Cambridge: Cambridge University Press. ISBN: 0-521-642221.

Lipson, A., Lipson, S.G., and Lipson, H. (2011). Optical Physics. Cambridge: Cambridge University Press. ISBN: 978-0-521-49345-1.

Saleh, B.E.A. and Teich, M.C. (2007). Fundamentals of Photonics, 2e. New York: Wiley. ISBN: 978-0-471-35832-9.

Wolf, E. (2007). Introduction to the Theory of Coherence and Polarisation of Light. Cambridge: Cambridge University Press. ISBN: 978-0-521-82211-4.

Yariv, A. (1989). Quantum Electronics, 3e. New York: Wiley. ISBN: 978-0-471-60997-1.




7

Radiometry and Photometry





7.1 Introduction


In the preceding chapters, we have been concerned with the general behaviour of light in an optical system, as described by ray and wave propagation. Hitherto, there has been no interest in the absolute magnitude of the wave disturbance. On the other hand, radiometry and photometry is intimately concerned with the absolute flux of light within a system, its analysis and, above all, its measurement.

At this point we will make a distinction between the two terms, radiometry and photometry. Radiometry relates to analysis of the absolute magnitude of optical flux, as defined by the relevant SI unit, e.g. Watts or Watts per square metre. In contrast, photometry is concerned with the measurement of flux as mediated by the sensitivity of some detector. Most notably, although not exclusively, the detector in question might be the human eye. So, from a radiometric perspective 1 W of ultraviolet or infrared emission is worth 1 W of visible emission. However, from a photometric view (as referenced to the human eye) the ultraviolet and infrared emissions are worthless.

In the study of radiometry, we are interested in the emission of light from a physical source that might have some area dS and subtend some solid angle, dΩ. The light may either be directly emitted from a luminous source, such as a lamp filament, or scattered indirectly. The generic geometry for this is illustrated in Figure 7.1 (#x14_x_14_i19).

The geometry above may be applied both to the emission of light from a surface or to the absorption/scattering of light at a surface. The distinction between these two scenarios simply implies a reversal of the direction of travel of the rays.




7.2 Radiometry





7.2.1 Radiometric Units


For the purposes of this introduction, we will confine the initial discussion to radiometry, as opposed to photometry, where we are able to quantify the optical power of a source simply in terms of its output in watts. Fundamental to the analysis of radiometry are the radiometric quantities and their associated radiometric units. The most basic measure of an optical source is its radiant flux, Φ, measured in watts. Associated with the radiant flux is the radiant flux density, E. This refers to the total flux per unit area that is incident upon or is leaving a surface element and is measured in watts per square metre. If the radiant flux is incident upon a surface, then the radiant flux density is more usually referred to as the irradiance. If, on the other hand, the flux is emitted from the surface, it is referred to as exitance. It is of the utmost importance to apprehend that according to the strict definitions of radiometry, flux per unit area is never described as intensity. There is often a ‘colloquial’ tendency to describe flux per unit as intensity. However, this term is reserved rather for flux per unit solid angle. As such, the radiant intensity of a (point) source, I, is defined as its flux per unit solid angle and is measured in watts per steradian.






Figure 7.1 (#x14_c07_para_0003) Emission from a generic source.





Table 7.1 (#x14_c07_para_0007) Radiometric units.






Radiance, L, is the flux arriving at or leaving a surface associated with a pencil of rays, per unit solid angle per unit surface area projected onto a plane normal to the direction of travel of those rays. It is measured in watts per square metre per steradian. Radiance is intimately related to ‘how bright’ an extended object appears and is not affected by distance from the object. For example, in the case of the sun, as one moves away from the sun, the irradiance of the solar illumination inevitably diminishes. However, the angle subtended by the solar disc reduces proportionally and the smaller solar disc would appear just as bright if one were so ill advised as to view it.

All the radiometric quantities and associated units are summarised in Table 7.1 (#x14_x_14_i22).




7.2.2 Significance of Radiometric Units


The radiant flux density can be taken as the differential of the flux with respect to area. Expressing this mathematically:




(7.1) (#x14_x_14_i30)

The important point to recognise about Eq. (7.1) (#x14_c07_disp_0001), is that the area, dS, is not only described by a scalar area, but also by a vector that defines the surface normal. Thus, orientation of the surface is of importance, and this will be described in more detail presently. Radiant intensity, may also be expressed mathematically, in this case as the differential of the flux with respect to the solid angle.




(7.2)

Most usually, intensity is used to describe the output of a point source. Simple geometry may be used to establish the relationship between the intensity of a point source and the irradiance it produces at a surface located at some distance from the source. This gives rise to the so-called inverse square law. The inverse square law states that the irradiance delivered by a point source to a distant object is inversely proportional to the square of the separation. Operation of the inverse square law is illustrated in Figure 7.2 (#x14_x_14_i36).






Figure 7.2 (#x14_c07_para_0010) Operation of the inverse square law.






Figure 7.3 (#x14_c07_para_0012) Radiance and exitance from a surface.



In the geometry illustrated in Figure 7.2 (#x14_x_14_i36), the irradiated surface is situated at a distance r from the source and its normal is at an angle θ with respect to the line joining the source at the surface. As alluded to earlier, the orientation of the surface is of some relevance. Assuming that the radiant intensity of the source is I, then the irradiance at the surface is given by:




(7.3) (#x14_c07_para_0021)

Radiance is the flux arriving at a surface or leaving a surface per unit area, per unit solid angle. The area, in this case, is the projected area whose normal is aligned with the ray pencil, rather than the surface normal. This is illustrated schematically in Figure 7.3 (#x14_x_14_i40).

Expressing the intensity in terms of the area of an element of surface, dS, we obtain the following:




(7.4) (#x14_x_14_i49)

L is the radiance and I the radiant intensity.




7.2.3 Ideal or Lambertian Scattering


An ideal or Lambertian scatterer scatters light from a surface with uniform radiance irrespective of the scattering angle, θ. An imperfect approximation to a Lambertian surface might be a blank sheet of paper or a matt surface, such as a painted wall. In practice, for most surfaces, the radiance has a tendency to decline with θ. However, for a Lambertian surface, the radiant intensity of the scattered light is (from Eq. [7.4] (#x14_c07_disp_0004)) given by:




(7.5) (#x14_x_14_i51)

Hence for a Lambertian scatterer, the radiant intensity emitted from a surface element is proportional to the cosine of the angle with respect to the surface normal. In many instances, we are interested in the total amount of light scattered from a surface. This is the total hemispherical scatter or total hemispherical exitance from a surface. At first sight, this would seem to amount to the product of the solid angle, 2π, and the radiance, L. However, the radiant intensity declines according to the cosine law Eq. (7.5) (#x14_c07_disp_0005) and the total hemispherical exitance may be derived from the following integral, based on spherical polar co-ordinates:




(7.6) (#x14_x_14_i81)

Hence, the total hemispherical exitance is half what would be expected if the radiant intensity were constant as a function of polar angle.




7.2.4 Spectral Radiometric Units


In most instances, the output of an optical source varies very significantly with wavelength. As such, we are generally interested in the radiometric flux within a very narrow band of wavelengths and how this quantity varies with wavelength. In this case, flux becomes spectral flux and radiance becomes spectral radiance, ad so on. If λ is the wavelength of interest, the corresponding spectral quantities may be defined as follows:




(7.7a)




(7.7b)




(7.7c)




(7.7d)

By way of illustration, we will examine the spectral intensity produced by a commonly used illumination source. The xenon arc lamp is extensively used in commercial and laboratory applications as a ‘point source’, with a spectrum similar to that of the sun. Such (nominally) point sources are generally described by their radiant intensity, which gives a useful measure of the overall output of the source. In the case of the spectral measure, spectral radiant intensity is measured in Watts per steradian per nm. Figure 7.4 (#x14_x_14_i72) shows a plot of spectral radiant intensity versus wavelength for a 1000 W xenon lamp.

Similarly, the solar flux arriving at the earth's surface may be denominated in terms of the spectral irradiance – that is the solar flux per unit area of the earth's surface per unit bandwidth. In the case of Figure 7.5 (#x14_x_14_i76), the data presented represents the spectral irradiance of the sun above the earth's atmosphere, as signified by the parameter ‘AM0’ or air mass zero.

Of course, Figure 7.5 (#x14_x_14_i76) does not present the solar irradiance as it would be at the sun's surface; this would be very much greater and would fall off according to the inverse square law, Eq. (7.3) (#x14_c07_disp_0003). When calculating the spectral radiance associated with the data in Eq. (7.5) (#x14_c07_disp_0005), one would have to divide the irradiance by the solid angle subtended by the 0.5° solar disc, i.e. 6.8 × 10


 sr. Peak solar spectral radiance (at ∼500 nm) would be about 30 000 W m


 sr


 nm


.




7.2.5 Blackbody Radiation


Thermal radiation is associated with the thermal emission of electromagnetic radiation from an incandescent source. In particular, blackbody emission occurs when a solid surface is in thermal equilibrium with the surrounding electromagnetic radiation. The exitance associated with a black body emitter at an absolute temperature of T, is proportional to the fourth power of the temperature and given by the well known Stefan's law:




(7.8)

σ is Stefan's constant (5.67 × 10


 W m


 K


); ε is the surface emissivity (1 for a perfect black body).






Figure 7.4 (#x14_c07_para_0019) Xenon arc lamp spectral intensity.






Figure 7.5 (#x14_c07_para_0020) Solar spectral irradiance.

Source: NASA SORCE Satellite Data – Courtesy University of Colorado.



Most importantly, blackbody emission has a characteristic spectral distribution, quantified by its spectral radiance which depends only upon the wavelength and the surface temperature. The spectral radiance of blackbody emission is defined by Planck's law:




(7.9) (#x14_x_14_i81)

Lλis expressed in SI units – W m−3sr−1;h is Planck's constant; c the speed of light; k the Boltzmann constant.

To convert Eq. (7.9) (#x14_c07_disp_0012) to spectral exitance from a surface, one assumes Lambertian emission and the spectral radiance is multiplied by a factor of π to give the exitance, as per Eq. (7.6) (#x14_c07_disp_0006). Indeed, the overall radiance and exitance can be obtained by integrating Eq. (7.9) (#x14_c07_disp_0012) with respect to wavelength. This implies that Stefan's constant is not actually a fundamental unit and can be expressed in terms of more fundamental units, as follows:




(7.10)

Taking the data from Figure 7.5 (#x14_x_14_i76), and using the angular size of the sun, we can plot the data as spectral radiance rather than spectral irradiance. This is illustrated in Figure 7.6 (#x14_x_14_i87). It is quite apparent that the spectral distribution of solar radiance conforms quite closely to that of blackbody emission. For reference, Figure 7.6 (#x14_x_14_i87) shows a plot of 5800 K blackbody emission generated using Eq. (7.9) (#x14_c07_disp_0012). Thus, to a reasonable approximation, solar radiation can be described as blackbody emission with a characteristic temperature of 5800 K. As stated previously, radiance describes the effective brightness of a surface and, for blackbody emission is purely related to the physical characteristics of the source, temperature, and so on and not to geometry. So, as stated earlier, although the spectral irradiance of solar emission is reduced as one moves away from the sun, the corresponding reduction in the angular size of the sun maintains the spectral radiance at a constant level.






Figure 7.6 (#x14_c07_para_0025) Solar spectral radiance and 5800 K blackbody radiance.






Figure 7.7 (#x14_c07_para_0026) Étendue of a pencil of rays.






7.2.6 Étendue



Étendue is the product of the area and solid angle of a pencil of rays in an optical system. The concept of étendue is central to the understanding of the radiometry of an optical system together with many other aspects of optical system performance. As applied to an optical system, its étendue may be represented as the product of the entrance pupil area and the solid angle of the input field. A critical aspect of the behaviour of étendue in an optical system is the operation of the Lagrange invariant. Effectively, the Lagrange invariant and the inverse relationship between linear angular magnification implies that étendue must be preserved in an ideal optical system. That is to say, for a perfect paraxial system, as the imaged (exit) pupil size is increased, the corresponding field angle will be reduced proportionately. Of course, this only applies to a perfect optical system and any image degradation due to aberration has a tendency to increase the étendue. The concept of étendue is illustrated in Figure 7.7 (#x14_x_14_i91).

More formally, as illustrated in Figure 7.7 (#x14_x_14_i91), the étendue of a pencil of rays is given by:




(7.11) (#x14_x_14_i100)

G is the étendue and θ is the tilt of the surface normal with respect to the ray pencil.

As outlined earlier, for a generalised and perfect optical system, its étendue is a system invariant. Describing the pupil size of a generalised optical system by its numerical aperture, NA and the field by its total area, S, then the system étendue is given by:




(7.12)

The similarity of Eqs. (7.11) (#x14_c07_disp_0014) and (7.4) (#x14_c07_disp_0004) which denominates the connection between radiance and flux, brings us to the fundamental utility of étendue in radiometric calculations. It is easy to appreciate that, for an optical system, the radiance associated with a pencil of rays is the derivative of the flux with respect to the étendue.




(7.13) (#x14_c07_para_0031)

If the étendue of a pencil of rays is invariant through an ideal system, then the implication of Eq. (7.13) (#x14_c07_disp_0016) is that the radiance associated with the object and image must be identical. This is very important, as it conveys a fundamental thermodynamic truth. If one considers a blackbody object, any reduction in étendue through the system would imply that the radiance of the image is higher than that of the object. In the context of blackbody radiation, the associated temperature of the image would be higher than that of the object. Therefore, the effect of this would be to take energy from the lower temperature source (the object) and convey it to a higher temperature body (the image) without doing work. This is in violation of the second law of thermodynamics. Any imperfections in the optical system (aberrations) tend to increase the étendue and so reduce the radiance at the image.

The practical utility of étendue lies in its assistance in expediting radiometric calculations in complex optical systems. If one has a source with some known spectral radiance, L


, a system with étendue, G


, and a system throughput of ξ, then the flux, Φ


, arriving at the image is simply given by:




(7.14) (#x14_x_14_i112)

The throughput, ξ, is simply a measure of how much light is transmitted through an optical system as mediated by any scattering, absorption, or reflection that occurs. If, as in the case of an ideal system, none of the optical surfaces were to absorb, scatter or reflect any light, then the throughput would be 100%.




Worked Example 7.1 Flux Calculation


To illustrate the power of the foregoing analysis, we will now examine a practical example. An optical system is designed to view the filament of a tungsten halogen lamp. A camera with an aperture of f#2 images the filament onto the square pixels of a detector; the size of the pixels is 10 μm. For a single pixel of interest, only a small part of the incandescent filament is imaged which fills the entire pixel. The filament itself may be regarded as a blackbody emitter with a temperature of 3000 K. A narrowband filter is included in the optical train which only admits light in a 5 nm wide band around 500 nm. With the exception of the filter, the system throughput, ξ, is 80%. What is the flux arriving at a single pixel?








We are told that the source is a 3000 K blackbody emitter; therefore we should be able to calculate the spectral radiance from Eq. (7.9) (#x14_c07_disp_0012). In fact, we are interested in the spectral radiance at 500 nm. From Eq. (7.9) (#x14_c07_disp_0012), the spectral radiance is 2.6 × 10


 W m


 sr


 or 260 W m


 sr


 nm


 at 500 nm. The radiance transmitted by the 5 nm bandwidth filter is 5 × 260 or 1300 W m


 sr


 at. We now need to calculate the system étendue from Eq. (7.9) (#x14_c07_disp_0012). The numerical aperture of the system in image space is 0.25 (for f#2) and the area, S, of a single pixel is 10


 × 10


 = 10


 m


.








The solution is now almost complete, we only need to apply Eq. (7.14) (#x14_c07_disp_0017), making an allowance for the throughput, ξ, of 80%:








Thus, the power arriving at a single pixel is 2.04 × 10−8W.

The essential point of the previous analysis is that the same fundamental logic and analysis applies, irrespective of the complexity of the optical system under investigation. In this example, we are not given any details of the optical design, only the pupil and field size. Nevertheless, we are able to estimate the flux arriving at the detector pixel. Of course, we are assuming that system aberrations do not play a significant role.




7.3 Scattering of Light from Rough Surfaces


Much of the preceding analysis has focused on self-luminous sources. These sources, such as blackbody emitters, more or less emit light in a random fashion. In the study of radiometry, we are also interested in surfaces that scatter light in a more or less random fashion. This is distinct from specular surfaces, i.e. mirrors, which reflect light in a deterministic, ordered fashion. The topic was touched on very briefly when we touched on perfect or Lambertian scattering where the radiance of the scattered light is independent of the scattered angle. Unfortunately, this condition is not realised in real materials, so an alternative approach is needed. Here we shall now describe a more generalised treatment, which describes the scattering from a surface using the so-called bi-directional reflection distribution function (BRDF).






Figure 7.8 (#x14_c07_para_0042) Illustration of BRDF.



Light that is incident upon a surface is described by its irradiance and its incident angle, θ


, as depicted in Figure 7.8 (#x14_x_14_i122). The scattered light is described by its radiance and its output polar angle, θ


. Significantly, since the incident light breaks the symmetry of the scattering surface about the normal, the azimuthal angle, φ, of the scattered light also needs to be described. The BRDF is simply the derivative of the output radiance with respect to the input irradiance.

Naturally, the BRDF is a function of wavelength, so the input irradiance might be defined as E(θ


, λ) and the output radiance as E(θ


, φ, λ). In this case, the BRDF is given by:




(7.15)

Units for BRDF are sr


 and a perfect Lambertian scatterer, with a total hemispherical reflectance of unity, would have a uniform BRDF of 1/π. Interest in the radiometry of scattering arises from two principal practical considerations. Firstly, in many applications in optical imaging, there is a requirement to provide uniform illumination over a specific input field. Secondly, there is a diverging motivation in that the optical designer is keen to avoid the deleterious impact of scattered light on image contrast. Therefore, it is important to understand not only the impact of the optical components themselves in manipulating light, but also the effect of the optical mounts and surrounding enclosures and other non-optical surfaces in scattering light.

The preceding chapters have given a clear understanding as to the underlying principles of optical design in so far as the optical components and surfaces are concerned. Ultimately, as will be discussed in detail later, in contemporary design, this proceeds by the use of optical modelling software. For the optical components themselves, the process is referred to as sequential modelling where rays progress in a deterministic fashion and in a clear sequence from one optical surface to the next. In contrast, scattering is an inherently stochastic process, with the scattered distribution described by the BRDF which is essentially a probability distribution for scattering. In the light of these random processes, there is no inherent, ordered sequence of surfaces through which the light progresses. As such, any modelling in this scenario must account for the non-sequential nature of light propagation. Such modelling, of course, must account for the geometrical distribution of any scattering and the study of BRDF distributions is of considerable practical utility.

An example of the BRDF of a real material, Spectralon®, is shown in Figure 7.9 (#x14_x_14_i135). The data is for a wavelength of 900 nm and normal incidence, i.e. θ


 = 0. Spectralon, is based upon sintered polytetrafluoroethylene (PTFE) and represents the closest approximation to an ideal scatterer of any material. Even so, there is a tendency for the BRDF to decline with increasing polar angle.




7.4 Scattering of Light from Smooth Surfaces


The foregoing analysis is entirely appropriate for the scattering of light from matt or rough surfaces. However, polished surfaces, such as those in lenses and mirrors, can contribute to the unintended scattering of light, even though their roughness is very low. Analysis of this type of scattering is of exceptional importance where low levels of stray light might degrade faint images. For such surfaces, it is useful to quantify the roughness of the surface in terms of the root mean square roughness, σ


, which expresses the rms departure of the surface from the ideal surface, whether that be a plane, spherical, or aspherical surface. This is illustrated in Figure 7.10 (#x14_x_14_i139), showing the high spatial frequency departure from the nominal shape.






Figure 7.9 (#x14_c07_para_0046) BRDF of Spectralon at 900 nm for normal illumination.






Figure 7.10 (#x14_c07_para_0047) Surface roughness.



For polished optical surfaces, such as mirrors and lenses, σ


 is very low, typically a fraction of a nanometre. The surface roughness is thus a very small fraction of the wavelength of light and, in this case, surface scattering may be presented as a diffraction problem. That is to say, a perfect surface would produce the reflection of a perfect wavefront and the surface roughness imposes a wavefront error equal to twice the surface roughness (due to the reflective double pass). In classical diffraction analysis, we would analyse the additional wavefront error induced in terms of the image quality degradation. That is to say the scattered light caused by the departure from nominal surface shape would cause some kind of change in the clarity of the image itself. However, in the case of surface roughness, the scattered light is considered entirely separately from image degradation. In terms of the departure of the surface from the nominal shape, only high spatial frequency variations are considered to contribute to scattering and are included in the definition of surface roughness. If Fraunhofer diffraction is considered, then the high spatial frequency components of surface roughness scatter the light far away from the nominal image. As such, this produces an irradiance distribution that is clearly separated from the imaged spot at the image focal plane. In practice, spatial wavelengths of less than 0.1–1.0 mm are considered as surface roughness; longer wavelength departures are analysed as ‘form error’ and contribute to image degradation. The analysis of scattering proceeds in a similar way to the calculation of the Strehl ratio (Chapter 6 (#x13_x_13_i3)) for small system wavefront errors and gives a total hemispherical reflection of:




(7.16) (#x14_x_14_i156)

It is tempting to proceed with an analysis of scattering on the assumption that this ‘small signal’ scattering is Lambertian in character. However, this is very far from the truth. The angle of scattering, from simple Fraunhofer diffraction analysis is proportional to the spatial frequency of the surface roughness component. Of course, Fourier analysis may be used to express the roughness deviation of any surface in terms of the sum or integral of a series of sinusoidal terms of varying frequency. The random surface roughness of the type depicted in Figure 7.10 (#x14_x_14_i139) may be thus analysed and its power spectrum (i.e. square of the amplitude) may be expressed as a power spectral density (PSD) as a function of spatial frequency. As such, the PSD represents surface deviation power per unit spatial frequency bandwidth. The ‘power’ of a surface deviation is proportional to the square of the amplitude and might be measured in mm


 and since the surface is represented by Fourier components in two dimensions (x and y), spatial frequency bandwidth might be measured in mm


. Therefore, for an area based description, as opposed to a linear one, PSD has dimensions of length


, e.g. mm


. The relevance of this discussion is for all polished surfaces, the PSD falls off very rapidly with spatial frequency and, as a consequence, the scattering amplitude or BRDF diminishes rapidly with angle (with respect to the main beam).

To a reasonable approximation, the PSD follows an inverse power law dependence upon spatial frequency. For a two dimensional Fourier description, for typical polished surfaces, this power law exponent is around −3. In the corresponding linear Fourier description, which is sometimes used, this exponent is around −2 and the PSD dimensions are mm


, rather than mm


. However, in this text we will retain the two dimensional description. Figure 7.11 (#x14_x_14_i147) shows an idealised PSD spectrum for a polished surface with nominal frequency exponent of −3. The total integrated surface roughness for the plot in Figure 7.11 (#x14_x_14_i147) is 5 nm rms. Apart from the simple exponent in Figure 7.11 (#x14_x_14_i147), we have introduced a ‘corner frequency’, f


, where the PSD reaches a maximum value. Without the introduction of a corner frequency, the integrated roughness would tend to infinity when the integral proceeds to zero spatial frequency. In the context of our discussion on scattering, this corner frequency relates more to the somewhat arbitrary demarcation between scattering and image degradation, as previously outlined. This boundary may typically be between spatial frequencies of 1 and 10 mm


 or spatial wavelengths between 0.1 and 1 mm.






Figure 7.11 (#x14_c07_para_0050) PSD for idealised polished surface (note units are in microns).



With the introduction of the corner frequency, f


, surface roughness power dependence upon spatial frequency may be modelled in a very specific way, as set out in Eq. (7.17) (#x14_c07_disp_0022):




(7.17) (#x14_x_14_i148)

A more generalised formulation of Eq. (7.17) (#x14_c07_disp_0022) is the so-called k correlation model which introduces the ABC parameters:




(7.18) (#x14_x_14_i152)

In our specific model, as outlined in Eq. (7.17) (#x14_c07_disp_0022), the C parameter in Eq. (7.18) (#x14_c07_disp_0023) is three. The parameter, B, is effectively the inverse of the corner frequency. In terms of the utility of this model with regard to scattering, the spatial frequencies may be directly translated into scattering angles or, more strictly, the sine of the scattering angles. As a consequence, the ABC model may be re-cast to given an explicit solution for the BRDF in terms of the scattering angle, θ:




(7.19) (#x14_x_14_i154)

Of course, the ABC coefficients in Eq. (7.19) (#x14_c07_disp_0024) are not the same as those in Eq. (7.18) (#x14_c07_disp_0023). Equation (7.19) (#x14_c07_disp_0024) may be integrated across all polar angles to give the total hemispherical reflection. This gives:




(7.20) (#x14_x_14_i156)

Equations (7.19) (#x14_c07_disp_0024) and (7.20) (#x14_c07_disp_0025) gives us the ability to model scattering from mirror surfaces. However, when modelling the direct scattering from lens surfaces, we must replace Eq. (7.16) (#x14_c07_disp_0021) for the hemispherical scattering with the following equation:




(7.21) (#x14_c07_para_0056)

In Eq. (7.21) (#x14_c07_disp_0026) for a lens surface, the optical path difference is represented by the product of (n − 1) and the form error, as opposed to twice the form error, as in a mirror. As such, Eq. (7.21) (#x14_c07_disp_0026) gives a clear indication that the scattering from lens surfaces is much less than that from mirrors. For example, for a lens material with a refractive index of 1.5, the total scattering is diminished by a factor of 16 when compared to a mirror.

Worked Example 7.2 A polished mirror has a surface roughness of 1.5 nm rms. We are interested in its scattering at a wavelength of 633 nm. For the purposes of subsequent analysis, we may assume that the C exponent has a value of 3. In addition, the corner frequency may be assumed to be 4 mm


. What is the total hemispheric reflection at the designated wavelength? Calculate the A and B coefficients.

The total hemispheric reflection is given by Eq. (7.16) (#x14_c07_disp_0021). We are told that σ


 = 1.5 nm and λ = 633 nm.








The corner frequency, f


 is 4 mm


 and the B coefficient is given by the simple Eq. (7.19) (#x14_c07_disp_0024):








Finally, from (7.20) (#x14_c07_disp_0025) we have:








Thus, in the full representation, A = 22.0, B = 395, and C = 3.

In terms of practical application, models such as the ABC model are extremely useful in the validation of designs, such as cameras and telescopes where restriction of scattered light is of paramount importance. This topic will be considered further when we look in more detail at the optical design process in later chapters.




7.5 Radiometry and Object Field Illumination





7.5.1 Köhler Illumination


Hitherto, in all discussions of image formation, no attention has been paid to the illumination of the object. It is assumed, quite arbitrarily, that the object spontaneously emits rays. This may be perfectly proper for a self-luminous object. However, in many cases, the object is not luminous and needs to be illuminated evenly across the entire field. The earliest investigation of this problem is due to August Köhler, resulting in the development of the Köhler illumination system, still in use today. Most light sources, such as filaments lamps or arc sources have a highly spatially non-uniform irradiance. Traditionally, Köhler illumination was developed with a filament lamp source in mind. In this scheme, the light from the filament is collected by two lenses, the collector lens and the condenser lens and presented to the object. However, instead of imaging the filament at the object, which would produce uneven illumination, the filament is imaged at the nominal pupil location which it overfills. The Köhler illumination scheme is shown in Figure 7.12 (#x14_x_14_i181).

The field stop is located close to the collector lens which images the filament onto the aperture stop location. The condenser lens is separated from the aperture stop by its focal length and thus images the filament at the infinite conjugate. In this way, the object plane is uniformly illuminated. Of course, the pupil itself is not uniformly illuminated. However, this is not an impediment to image formation, provided the pupil is well filled. Uniform illumination of both image and pupil conjugates from an uneven source, such as a filament can only be achieved through division of amplitude, e.g. by scattering. This will be dealt with in the next section.




7.5.2 Use of Diffusers


The problem with using an imaging system for illumination, as in Köhler illumination, is that the uneven illumination source must be imaged at some conjugate in the system. This problem may be circumvented by use of a diffusing component within an optical system. Diffusers scatter light in a random but controlled fashion and take the form of transmissive components, such as ground glass screens and opal diffusers, and reflective screens, such as Spectralon diffusers. Reflective materials can approach Lambertian behaviour, but transmissive materials such as ground glass scatter light into relatively narrow angles. Ground glass screens produce a broadly Gaussian BRDF distribution with a full-width half-maximum scattering angle of between 5° and 20° depending upon the coarseness of the ground surface. ‘Engineered diffusers’ based on diffractive surfaces, can be used to create tailor made scattering profiles, such as a top hat profile, where the scattered flux is constant up to a specific scattering angle whence is falls to zero. Figure 7.13 (#x14_x_14_i185) shows the scattering profile of some diffusers.






Figure 7.12 (#x14_c07_para_0063) Köhler illumination.






Figure 7.13 (#x14_c07_para_0065) Diffuser scattering profile.



Overall, diffusers are very useful in re-arranging light by division of amplitude to promote even illumination. However, it must be understood, in a radiometric context, that diffusers inevitably increase system étendue and their use is inevitably accompanied by a significant reduction in radiance at the final image plane.




7.5.3 The Integrating Sphere





7.5.3.1 Uniform Illumination


Some exacting technical applications require the creation of highly uniform illumination across a field. This is particularly the case in instrument calibration, where even illumination to better than ±1% might be required. Such even illumination may be provided by an integrating sphere. An integrating sphere consists of a spherical cavity coated with some high reflectivity, diffusing material. The sphere is provided with a number of ports, which are apertures in the spherical shell and significantly smaller than the sphere diameter. One of these ports is designated as the input port and one as the output port. The design of the integrating sphere is such that input and output ports are not intervisible and light can only reach the output port by scattering off the internal walls of the integrating sphere. This is shown in Figure 7.14 (#x14_x_14_i196).

The internal coating of the integrating sphere is made of some nominally white coating that scatters efficiently. Traditionally, classic white paint pigments, such as titania (TiO


) and barium sulphate (BaSO


), were used. More recently, this has been replaced by Spectralon (sintered PTFE) for ultraviolet and visible applications and gold coating for infrared applications. These materials have a hemispherical reflectance of over 99% over wide regions of the spectrum. The integrating sphere is designed with a combined port area much smaller than the internal surface area of the sphere. In this way, before exiting the output port, the light must undergo a large number of scattering events. For Lambertian scattering at some point on the internal surface of the sphere, it can be demonstrated, for a spherical geometry, that the irradiance produced at other points of the sphere is entirely uniform.






Figure 7.14 (#x14_c07_para_0067) Integrating sphere.



However, in practice, no real material is perfectly Lambertian. Nevertheless, in theory, for an infinite number of scattering events, the radiance distribution of the light exiting the output port tends to the Lambertian distribution, even if the internal coating is non-Lambertian. Therefore, as the area of the ports is reduced, as a fraction of the sphere area, then the emission from the output port becomes more Lambertian. As a rule of thumb, the port area should make up no more than 5% of the total sphere area. Thus, for a reasonably small port fraction, the integrating sphere has the property of considerably enhancing the Lambertian quality of the emission from the output port, over and above that of the reflective coating of the sphere itself.

If a light source injects a specific flux into the integrating sphere, as indicated in Figure 7.13 (#x14_x_14_i185), then the irradiance seen at a point on the sphere's surface is not merely the flux divided by the internal area of the sphere. By making the assumption that the integrating sphere is effective in promoting uniform internal illumination, the internal irradiance may be calculated by assuming the flux input is balanced by flux loss from the ports and absorption in the sphere coating. If the internal area of the sphere, including port area, is A, the fractional area occupied by the ports, f, the reflectivity of the sphere coating, R, and the flux input, Φ, then the internal irradiance, E, is given by:




(7.22)

The quantity M is the so-called ‘multiplier’. In practice, for many integrating sphere applications, R > 0.99 and thus M is approximately the inverse of the port fraction. Thus, with a port fraction of 5%, the multiplier is 20. That is to say, the internal irradiance is 20 times greater than would be expected from dividing the flux by the sphere area. The 5% port area restriction means that the port diameter should be smaller than 45% of the sphere diameter and less than a third of the sphere diameter for two ports, and so on.

It is clear that the integrating sphere delivers uniform radiance at the output port. By providing the integrating sphere with a calibrated source, or by calibration of its output radiance and irradiance, it can provide a standard calibrated (spectral) radiance.




7.5.3.2 Integrating Sphere Measurements


By integrating flux uniformly over a large solid angle, integrating spheres can provide an unbiased measurement of flux for diverging sources. That is to say, the integrating sphere integrates emission from these sources across all angles. Examples of such sources might include light emitting diodes (LEDs) and incandescent lamps, and so on. In such measurements, radiation from a lamp is directed into the input port with a photodetector situated at an exit port. This setup is illustrated in Figure 7.15 (#x14_x_14_i211)a. In this example, the source is placed at the input port, although for lamps, the source is often placed inside the sphere. A detector placed at the output port is used to monitor integrating sphere radiance. By calibrating the detector using a source (e.g. laser) of known flux, the absolute flux may be calculated. Figure 7.15 (#x14_x_14_i211)b illustrates the principle of (total) reflectance measurement. A source irradiates a sample situated opposite the input port. Again, a detector at the output port monitors the integrating sphere radiance. Reference reflectors of known reflectivity are available and such reflectors may be substituted for the sample for calibration purposes. Comparison of the two measurements will give the reflectivity of the sample.




7.5.4 Natural Vignetting


In many respects, the Lambertian illumination of an entrance pupil as would be provided by an integrating sphere represents an ideal situation. However, the irradiance produced at an image plane is actually non-uniform, assuming perfect imaging. In this context, ‘perfect imaging’ means perfect replication of the entrance pupil at the exit pupil. The effect described is known as natural natural vignetting. In the perfect realisation of this phenomenon, the irradiance produced at the image plane is proportional to the fourth power of the cosine of the field angle. The logic of this is illustrated in Figure 7.16 (#x14_x_14_i215).






Figure 7.15 (#x14_c07_para_0073) (a) Flux measurement. (b) Reflectance measurement.






Figure 7.16 (#x14_c07_para_0074) Natural vignetting.



If the (Lambertian) radiance at the exit pupil is L, from Eq. (7.5) (#x14_c07_disp_0005), the radiant intensity emerging at a normal angle of θ from an area element, dS, of the pupil is given by:








However, from the inverse square law Eq. (7.3) (#x14_c07_disp_0003) we know that the irradiance produced at the image plane is equal to:








θ is the angle of the ray to the image plane normal (same as angle to pupil normal)

Since r = x/cos θ, we finally arrive at the following relationship for natural vignetting:




(7.23) (#x14_c07_para_0079)

Equation (7.23) (#x14_c07_disp_0033) summarises the phenomenon of natural vignetting. The reason for the term ‘natural vignetting’ is this effect replicates artificial vignetting, i.e. the darkening of an image towards the edges of a wide field caused by obstruction of light by physical apertures, other than the main stop. In the case of natural vignetting, however, there is no physical obstruction of the light path.




7.6 Radiometric Measurements





7.6.1 Introduction


In any real application, we are interested in the measurement of radiometric quantities, such as irradiance and radiant intensity. However, absolute measurements of these quantities are, in practice, extremely challenging. As an example, absolute measurement of flux or irradiance and so on to ±1% represents a high precision measurement. Although calibration plays an important role in any measurement, this is especially true for radiometric measurements. Absolute radiometric measurement generally proceeds by the use of calibrated detectors. These detectors convert the optical flux into an electrical or thermal signal which can be directly monitored. Critically, the sensitivity of these detectors has been carefully calibrated using a reference source providing a known spectral output. Hence, the signal can be directly converted into flux or radiance, and so on. The reference sources are generally maintained by, or derived from, National Measurements Institutions (NMIs), such as the National Physical Laboratory (NPL) or the National Institute of Standards and Technology (NIST).






Figure 7.17 (#x14_c07_para_0081) Substitution radiometer.






7.6.2 Radiometric Calibration





7.6.2.1 Substitution Radiometry


Calibrated measurements of optical flux are ultimately derived from the principle of substitution radiometry. In this measurement, optical radiation is wholly absorbed in a specially designed black cavity and the temperature increase measured by a thermal transducer. Thereafter, the optical power is substituted by electrical input derived from a resistance heater. The original optical flux is given by the electrical input required to produce the same temperature change. The principle is illustrated in Figure 7.17 (#x14_x_14_i232).

The optical beam in Figure 7.16 (#x14_x_14_i215) may, for example, be derived from a stabilised laser beam. This laser beam, thus characterised, may then be used to calibrate the sensitivity of a detector. Ultimately, the temperature rise with respect to the surroundings provides the signal for this measurement. As a consequence, any drift in the ambient temperature interferes with the fidelity of the measurements. For this reason, the highest precision measurements are obtained with a cryogenic radiometer, where the cavity, sensor, and heater are enclosed within a vacuum and cooled to a few degrees Kelvin.




7.6.2.2 Reference Sources


The primary reference source for flux, or rather for spectral irradiance, is a carefully maintained blackbody source. For the ultraviolet, visible, and near infrared spectral regions, the blackbody source is based upon a pyrolytic graphite cavity. Such sources can operate up to a temperature of 3500 K. In order to capture the spectral irradiance, the output from the blackbody is characterised by a number of filtered detectors previously calibrated by a substitution radiometer. A filtered detector is comprised of a sensor with a bandpass filter which only admits radiation within a narrow range of wavelengths. The general setup is shown in Figure 7.18 (#x14_x_14_i247).

The pyrolytic graphite discs that comprise the cavity are heated electronically, as indicated. Fully calibrated, this is a precision, broadband radiometric source. However, it is not practical for use in a standard laboratory setting. Therefore, practical calibration is generally carried out using transfer standards. These are simpler light sources whose spectral irradiance has been calibrated (ultimately) against a primary source at an NMI. One very commonly used example of a transfer standard is a filament emission lamp or FEL. This lamp is simply a well characterised and calibrated quartz halogen lamp. Generally the FEL is a 1000 W lamp whose irradiance at a nominal distance of 500 mm has been measured and calibrated at an NMI. These emission lamps approximate to a 3200 K blackbody source. Table 7.2 (#x14_x_14_i250) shows calibrated spectral radiance levels for a typical lamp.

The process of transferring the standard from the primary to the transfer standard does increase the uncertainty of the calibration of the FEL lamp and the calibration uncertainty for this type of lamp is of the order of 1–2%, depending upon the wavelength. Any subsequent use of the FEL in laboratory calibration of photodetector sensitivity must faithfully replicate the NMI calibration set up. The laboratory set up might look like that shown in Figure 7.19 (#x14_x_14_i259).






Figure 7.18 (#x14_c07_para_0083) Blackbody radiometric source.





Table 7.2 (#x14_c07_para_0084) Spectral radiance for typical calibrated FEL lamp.






Great care must be taken to minimise the contribution from scattered light from the surroundings. The original calibration is based entirely on direct radiation from the lamp; any contribution from scattered light would compromise this.




7.6.2.3 Other Calibration Standards


Measurement, characterisation, and modelling of reflection represent an important part of radiometry as the preceding discussions illustrate. Measurement of reflectivity, might be on either polished (specular) or diffuse surfaces. In the former case, the reflectivity is a simple function of the incident angle as, for specular reflection, the reflected angle is pre-determined. For laboratory measurements of specular reflectance, reference standards may be obtained that have been calibrated at NMIs. These might be aluminised mirrors or polished glass blanks with low, but measurable reflectivity. For diffuse reflection, the interest is not only in total reflection (total hemispherical reflectivity) but also in its distribution with angle. In this case, full characterisation of BRDF is of interest. Again, routine laboratory measurements are facilitated by the provision of calibrated artefacts. These might include ∼100% reflectance standards in addition to matt black standards to provide a nominal zero reference.






Figure 7.19 (#x14_c07_para_0085) FEL lamp calibration.






7.7 Photometry





7.7.1 Introduction


Radiometry is concerned with the measurement of absolute flux levels of optical radiation. However, in many practical instances, we are rather concerned with the effect of these flux levels on detection systems, most notably the human eye. For instance, real, tangible radiometric fluxes in the infrared are of no relevance to human vision. Therefore, photometry is concerned with optical fluxes as mediated by some detection sensitivity, most particularly of the human eye. Naturally most of the discussion here relates to visual photometry, although there are other areas of photometry, such as astronomical photometry.




7.7.2 Photometric Units


For visual photometry, each radiometric unit has its corresponding photometric unit. The photometric equivalent of radiant flux is the luminous flux expressed in lumens and the equivalent of radiant intensity is luminous intensity whose base unit is the candela. Similarly, the radiometric quantities, irradiance and radiance correspond to illuminance and luminance respectively in photometry. The base unit for illuminance is lux and that for luminance is candela per square metre. Units for luminance are occasionally referred to as nits. Comparison of the radiometric and photometric quantities is set out in Table 7.3 (#x14_x_14_i271).

Each photometric quantity is derived from the respective radiometric quantity by integration across the visible spectrum using a spectrally dependent weighting function V(λ). This weighting function is a standardised representation of the sensitivity of the human eye. Normally, this standard weighting function is taken to represent photopic (daytime) vision as opposed to scotopic (dark adapted) vision. This standard weighting function, V(λ), or luminous efficiency function, was originally established by the Commission Internationale de l'Éclairage (CIE) in 1924. By definition, V(λ) has a maximum value of unity and, for the photopic function, this occurs at a wavelength of 555 nm, corresponding to the peak sensitivity of the human eye. The function has since been revised slightly on a number of occasions most notably in 1978 and 2005. Figure 7.20 (#x14_x_14_i276) shows the plot for both photopic and scotopic sensitivity.



Table 7.3 (#x14_c07_para_0089) Photometric quantities.









Figure 7.20 (#x14_c07_para_0090) Luminous efficiency function.



However, V(λ) is only a relative measurement of luminous efficiency. To link photometric units to their corresponding radiometric quantities a constant of proportionality, K


, must be added to relate the two. That is to say, if the radiometric spectral flux is Φ


(λ) and the corresponding luminous flux is Φ


(λ) then the two may be linked by the following equation:




(7.24)

The value of K


 is defined as 683.002 lm W


. That is to say, an optical beam with a wavelength of 555 nm (actually 5.4 × 10


 Hz or 555.17 nm) and having a luminous flux of 1 lm, actually has a radiant flux of 1/683.002 W. At first sight, this might seem a rather curious definition. The reason for this is essentially historical. It is the candela, rather than the lumen that forms the base SI photometric unit. All other photometric units are derived from the candela. As such, the candela is defined as the luminous intensity of a source of monochromatic radiation of frequency 5.4 × 10


 Hz having a radiant intensity of 1/683.002 W sr


. However, originally the definition of luminous intensity was related directly to the output of a standard hydrocarbon burning lamp. In fact, the candela was historically related to an earlier unit of luminous flux, candlepower. So, for historical consistency, a radiometric intensity 1/683 W sr


 at 555 nm is broadly related to the output of a ‘standard candle’. Attempts were made to produce reference sources of luminous intensity using standard blackbody emitters. However, these proved to be unreliable and were superseded by the current radiometric definition.



Table 7.4 (#x14_c07_para_0093) Typical illuminance levels for difference environments.







7.7.3 Illumination Levels


Since optical photometry is fundamentally connected to light levels as mediated by the sensitivity of the human eye, levels of illuminance are intimately related to the ability to perform visually based tasks. For the indoor environment, lighting levels may be designed for specific areas. A generally comfortable level of illuminance for a domestic environment is around 100 lx. For an office environment, where moderately demanding visual tasks are to be performed, a level of 300–500 lx is acceptable. For more critical tasks, such as visual inspection, a higher level of 500–2000 lx may be called for. Of course, daylight illumination levels are very much higher, ranging from 1000 lx on an overcast day to 25 000 lx for full sunshine. Table 7.4 (#x14_x_14_i282) sets out some typical illumination levels for different environments:

Another important consideration in illumination sources is their efficiency. The efficiency of domestic and industrial light sources is measured in lumens per watt. From that perspective, the ideal light source is a monochromatic source with a wavelength of 555 nm, giving a maximum efficiency of 683 lm W


 of optical output. A reasonable approximation to this is the sodium vapour street lamp, providing virtually monochromatic light at 589 nm with an electrical efficiency of 200 lm W


. However, such a highly coloured source is not acceptable for domestic and industrial applications where broadband or nominally ‘white’ sources are preferred. The least efficient sources are incandescent tungsten sources which are being replaced in domestic and industrial applications due to their poor energy efficiency. Their immediate successors, fluorescent mercury lamps create broadband emission from fluorescent phosphor coatings irradiated by ultraviolet emission from mercury spectral lines. More latterly, these are being replaced by white light LEDs which rely on ultraviolet emission from gallium nitride diodes to create broad band fluorescence from phosphors. Efficiencies of these sources are set out in Table 7.5 (#x14_x_14_i290).



Table 7.5 (#x14_c07_para_0094) Luminous efficiencies of different sources.









Figure 7.21 (#x14_c07_para_0097) Luminous efficiency vs. blackbody temperature.



In fact, the luminous efficiency of a blackbody source may be calculated directly from the Planck distribution set out in Eq. (7.9) (#x14_c07_disp_0012) and the luminous efficiency function, V(λ). The plot is shown in Figure 7.21 (#x14_x_14_i295). The peak efficiency occurs around 6000 K and it is, of course, no coincidence that this is close to the solar blackbody temperature of 5800 K. Clearly, the human eye has been ‘designed’ to efficiently harvest light from its primary illumination source.

The brightness of different sources (to the human eye), or luminance, is expressed in candelas per square metre or nits. Representative values range from 80 cdm


 for a typical cinema screen to 7 × 10


 cdm


 for the filament of an incandescent lamp and 1.6 × 10


 cdm


 for the solar disk. As for the luminous efficiency plot, the luminance of a blackbody source may be derived directly from the Planck distribution and the luminous efficiency curve, V(λ). This plot is shown in Figure 7.22 (#x14_x_14_i306).




7.7.4 Colour





7.7.4.1 Tristimulus Values


The preceding discussion has been wholly concerned with the level of illumination rather than (human) perception of the spectral distribution. This spectral distribution is described by the notion of colour, as perceived by humans. From the perspective of human vision, colour is discerned by the relative stimulation of three types of colour receptors (the cones). To model this process, the CIE, in 1931, proposed a set of colour matching functions, effectively mimicking the relative sensitivity of each type of sensor. The colour matching functions are represented as three separate curves, x(λ), y(λ), and z(λ), and operate, in principle, in a similar manner to the V(λ) curve for photopic efficiency. However, each curve is shifted with respect to the others. The form of these curves is illustrated in Figure 7.23 (#x14_x_14_i310).






Figure 7.22 (#x14_c07_para_0098) Luminance vs. blackbody temperature.






Figure 7.23 (#x14_c07_para_0099) Colour matching curves.



It must be emphasised that the colour matching curves and the luminous efficiency curve are merely representative of human visual perception. These curves represent the fruits of sustained efforts to find a representative average of human perception. However, not surprisingly, there are considerable variations in spectral sensitivity between individuals.

Quite significantly, the y(λ) curve follows that of the standard V(λ) curve. As for the basic photometric quantities, an input spectral radiance is transformed by integrating across the spectral range using the colour matching functions. However, instead of producing a single luminous flux value, three separate tristimulus values, X, Y, and Z are derived, as below.




(7.25)

From the preceding arguments, the Y tristimulus value is a measure of the luminance of the source. Normalisation of the tristimulus values provides a two dimensional description of colour.




(7.26)

Only the x and y ordinates are used in the standard CIE chromaticity diagram which provides a standardised quantification of the human perception of colour. The chromaticity diagram provides a plot in these two dimensions, with the third degree of freedom effectively corresponding to the luminous flux or intensity. Although it is perhaps obvious from the preceding discussion, this tripartite description of colour is purely an artefact of human vision and in no sense related to any property of light. Indeed, in recording any manifestly complex or subtle spectral distribution, the human eye can only, in effect, describe these by three independent parameters. It is clear that it is very possible for different spectral distributions to produce the same X, Y, Z stimulus values. This effect is known as metamerism. This highlights the limited spectral information that is provided by the three different sensor types. Indeed, two surface coatings (e.g. painted) can appear to be the same colour under one illumination (e.g. fluorescent) but different under another illumination source (e.g. tungsten) because of this effect.




7.7.4.2 RGB Colour


In many instances in describing colour we are interested in the effect of adding or blending colours. As a result of the three sensor types it is clear, in principle, that a linear combination of three different colours may be used to create a wide range colour sensations. The three colours themselves are described by a linear combination of the tristimulus values, X, Y, and Z and are known as primary colours. Definition of the suite of primary colours is arbitrary and established by virtue of convention. The guiding principle is that these three colours must be capable of being admixed to create as wide as possible a range of colours without recourse to negative coefficients in the linear combination. This range is referred to as a gamut. The original standard (1931 CIE) colour representation is the so-called RGB system (Red – Green – Blue) of primary colours using three standard monochromatic stimuli at 700 nm (R), 546.1 nm (G), and 435.8 nm (B). In this scheme, definition of the RGB primary colours from the tristimulus values is given by the following (X, Y, Z) vectorial representation:




(7.27)

The inverse transformation between the two representations is effected by the following matrix:




(7.28)

Presentation of this RGB colour convention is intended for illustrative purposes only. This simple scheme has been largely superseded. In reality, there are a plethora of different primary colour conventions designed with specific applications, such computer screen rendition and so on, in mind. Some conventions take account of the non-linearity of the eye's response. That is to say, we abandon the linear convention hitherto prescribed. Other conventions ensure that uniform movements across colour space correspond to uniform changes in human perception of colour. These are called perceptually uniform colour spaces.

In principle, an equal admixture of primary colour components leads some form of standard white colouration or ‘white point’. The concept of whiteness as a chromatic descriptor is purely associated with the human perception of colour, rather than a fundamental property of a source. However, definition of whiteness, is convention dependent. Rather than defining a colour sensation by virtue of the admixture of RGB, it may also be defined by another three parameter set, HSL or hue, saturation, and luminosity. Hue is a measure of the undiluted colour, loosely corresponding to the equivalent monochromatic wavelength of stimulation. Saturation describes the purity of the colour, or the extent to which white must be admixed with a pure monochromatic colour to achieve the desired colour. The final degree of freedom is provided by luminosity which correlates to the brightness of the sensation, effectively the sum of the RGB components.

Colour difference, ΔE, is a measure of the absolute difference in colour between two different colours. It is generally expressed as the root sum of squares of the difference in each of the three colour ordinates and dependent upon the convention adopted. With this in mind, the concept of colour temperature describes the temperature of the blackbody radiator that most closely matches the colour of interest, i.e. with the smallest colour difference. This is particularly associated with the characterisation of light sources. Somewhat ironically, the term ‘cool’ describes a source with a bluer spectral distribution whereas a ‘warm’ light source refers to illumination with a larger red contribution. This is rather based on human perception and psychology; a ‘cool’, bluer blackbody source is, of course, hotter than a redder ‘warm’ source.

Much of the preceding discussion introduces the topic of colour with treatment of just one, antecedent, colour convention. As such, this provides a useful description of the basic underlying principles. However, the topic in itself is much too broad to provide any comprehensive treatment here and the reader is referred to specialist texts for further study. Some guidance is provided in the short bibliography and the end of the chapter.




7.7.5 Astronomical Photometry


Astronomical photometry is concerned with the measurement of the magnitude of electromagnetic flux from stellar objects, stars galaxies, and so on. For modern observations, these measurements are almost exclusively dependent upon semi-conductor detectors. For a given stellar object, the ideal measurement might involve the high resolution capture of its spectral irradiance at the earth across a wide range of wavelengths. That is to say, a detailed spectrum of each object should be obtained that can be related to absolute spectral irradiance. However, as stellar objects of interest are almost invariably faint, the amount of flux that is captured in any given wavelength band is necessarily very small. For the majority of measurements, therefore, this approach is impractical. A more practical solution is to use a number of spectrally filtered detectors to monitor flux from the star (via a telescope). Each filter has a relatively broad passband, e.g. 100 nm and centred on some specific wavelength, e.g. 555 nm. Using a small number of these filtered detectors across the ultraviolet, visible, and infrared spectral ranges, provides what amounts to low resolution spectral information for the source. However, since interpretation of the spectral quality of a stellar object is based on a limited number (e.g. 3) of spectrally filtered measurements, there is a clear correlation with visual photometry.




Конец ознакомительного фрагмента.


Текст предоставлен ООО «Литрес».

Прочитайте эту книгу целиком, купив полную легальную версию (https://www.litres.ru/book/stephen-rolt/optical-engineering-science-62703145/) на Литрес.

Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.


