The HTML Tags that Deal with Text

HTML Tutorial Chapter 2


The HTML Tags that Deal with Text (HTML Tutorial Chapter 2)

by Christopher Heng, thesitewizard.com

In this chapter of the HTML tutorial series, we will learn more about the HTML tags, entities and other paraphernalia that deal with text.

If you have only just arrived at this site with the intention of learning HTML, please start at the first chapter of the tutorial. (This article you're looking at is actually chapter 2.) Those who do not know anything about creating websites should read How to Make / Create Your Own Website: The Beginner's A-Z Guide before anything else.

Language and Your Website: Character Encoding, Character Sets and Content Type

When you type some text characters into a file on your computer (any file, including things like HTML files), the characters are not saved as letters of the alphabet in the file but as a series of numbers. (It's actually more technical than that, if we go down to the electronic level, but the above loose description will do for our purposes.) For example, if you type the letter "a" into a plain text file, your editor actually saves the number 97 into the file. When you display that file, the text editor or web browser (or any other program) reads the number and interprets it as "a" and displays it as such.

The problem is that 97 does not intrinsically mean "a" or any other thing on a computer. It means "a" only if the program reading it expects a text file using a particular language, encoded using a specific system that is pre-agreed by everyone in the computer world. In this system, 97 happens to mean the small letter (lowercase) "a". In a different system, or character set, 97 may mean some other letter of some other alphabet.

Since a web page can contain any language in the world, encoded using any character set, a web browser needs to know both the language your page uses as well as the character set it is using in order for it to display the text correctly.

One way to specify the language that your web page uses is to add an attribute to the <html> tag. If you will recall from chapter 1 of this HTML tutorial, your entire web page proper is enclosed between <html> and </html>. The addition of a language attribute specifying that the content of the page is in English will result in the HTML code from the previous chapter looking like the following.

Notice that instead of a plain <html> tag, we now have <html lang="en">. I added a space after the "html" part of the tag, and added additional words. As is probably obvious to you, the attribute "lang" specifies the language the page is written in. The portion after the equals sign ("="), enclosed in quotation marks, specifies the language. Since our sample web page uses English, I have specified "en" as the language. If your page uses German, you will have to use "<html lang="de">", and if it uses French, you will need to specify "<html lang="fr">". As you probably have already noticed, you can't just specify any old language name after "lang=". You need to find out the actual code for that language. A list of language names can be found here. Remember to enclose the name in quotation marks before placing it after "lang=".

Apart from the language attribute to the HTML tag, you also need to specify the character set that your web page uses. This can be done by adding a new tag into your HEAD section. If your web page uses English and only the basic English alphabet, numbers and punctuation marks, you can specify the ISO-8859-1 character set by inserting the following line into your HEAD section. (If you don't know what I mean by "inserting a line into the HEAD section", you should review chapter 1 again, where I taught you how to insert the TITLE tag into that section. The procedure is the same.)

<meta http-equiv="Content-type" content="text/html;charset=ISO-8859-1">

Alternatively, as preferred by many people, you can also specify the UTF-8 Unicode character set instead.

<meta http-equiv="Content-type" content="text/html;charset=utf-8">

The UTF-8 character set supports all languages, including English. In fact, the ISO-8859-1 character set is just a subset of UTF-8. If you don't know which to choose, specify the UTF-8 character set. You'll have less trouble that way, especially if at some indeterminate point in the future, you suddenly decide to use some accented character or a character that does not occur in the basic English alphabet. (Remember that there are actually some words in the English language that use these accented characters.) Before you ask, most of thesitewizard.com's pages currently use ISO-8859-1 as its character set, but that's due to historical reasons. "Historical reasons" in this context is my way of saying that I created my site this way many years ago, when I didn't know better, and kept everything as it was through sheer inertia (ie, laziness).

There are other ways to specify your language and character set besides inserting them into your web page. For example, it is also possible to set it as a configuration option in your web server. (The web server is the computer program that your web host uses to run your website.) However, since this is an HTML tutorial, I will not deal with that method here. In any case, it's actually good practice to specify your language on your page. This way, your page will still be valid if you change web hosts and forget to configure the new server.

Note that the <meta> tag does not have a closing tag.

With the addition of the attribute specifying your web page's language as well as the META tag specifying the character set, your web page is now complete as a minimal web page. That is to say, it has all the minimum elements that are required on a valid web page. (That of course doesn't mean that it has what you want on your website. It simply means that the page you now have is considered a valid HTML page with all the required elements. There's obviously still a lot more to learn.)

Reprise: The <p> Tag

Now that we've declared the language and character set that our web page is going to use, we can start using the HTML tags that pertain to the manipulation of text.

We have already met <p>, or the paragraph tag, in chapter one. If you're not already familiar with this tag, perhaps because you skipped chapter 1, please return to that chapter and read up on it. I mention this tag here only for completeness.

If you remember, the <p> is used to start a new paragraph on your web page. Web browsers, when encountering the tag, usually start the text following your tag on a new line on the web page. Since <p> always starts a new paragraph, you cannot nest <p> tags inside each other. That is, if you want a new paragraph, close off the existing paragraph with </p> before using a new <p> tag.

In addition, if you have carried out the experiments I mentioned in the previous chapter, you'd have noticed that if your page had other content just above your paragraph tag, browsers usually leave a blank line before starting your new paragraph. This brings us to the question of how you can place some text directly on the line beneath the current one, without leaving the blank line that you get between paragraphs when you use <p>. For example, if you are a poet, and want to publish your poetry on your page, how do you put your verses on the page without leaving huge gaps between each line? This is the function of the next tag that we're going to learn.

The <br> Tag

The <br> tag is used when you want to put what follows on a new line without actually starting a paragraph. Take a look at the following HTML code as an example. To simplify my examples from now on, I will only show the code relevant to the tags I'm explaining. That is, I won't show things like the <head> section, the <body> tags and the other tags that are understood to be on every web page. If you're not sure what these other tags are, please read chapter 1 and the section above.

<p>
Had I the heavens' embroided cloths,<br>
Enwrought with golden and silver light,<br>
The blue and the dim and the dark cloths<br>
Of night and light and the half-light,<br>
I would spread the cloths under your feet:<br>
But I, being poor, have only my dreams;<br>
I have spread my dreams under your feet;<br>
Tread softly because you tread on my dreams.
</p>

Insert the above poem by William Butler Yeats into your web page (the one you started in chapter 1), and reload the page in your web browser to see the effect. Notice that each line in the stanza occupies its own line on the page, the way you would expect in a poem. The <br> tag, which I like to think of as being short for "line break", forces everything that follows to a new line.

Notice that there is no closing tag for <br>. Since the <br> tag merely creates the end of a line, and does not signal the start of something (the way a paragraph tag does), it does not need (nor does it have) a closing tag.

Page and Section Headings

What if you wanted to put a heading on your web page, for example, to put a visible title for the above poem? HTML provides numerous tags for this purpose, namely, <h1>, <h2>, <h3>, <h4>, <h5> and <h6>.

To see the tag in action, add the following line to your web page, in a blank line just above the opening <p> tag of the poem you inserted above.

<h1>Cloths of Heaven, by W. B. Yeats</h1>

Reload the page in your browser and examine the results.

These tags have the following characteristics:

  1. Use the Lower-Numbered Header Tags First

    Basically, a <h1> tag signals the highest level header on a web page. Many webmasters use it to display the title of a web page. If you have divided your web page into sections, with some paragraphs containing their own title, those paragraph titles may use <h2>. If those paragraphs have sub-paragraphs, and you need to put a title to them, use <h3>. And so on.

    In other words, you should use the header tag with the smallest number first, before using one with a larger number. For example, all of thesitewizard.com's web pages use the <h1> tag for the title of the web page. The <h2> tag is used for the titles of each section on the page, such as the words "Page and Section Headings" you find at the beginning of this section. The titles of sub-sections within these sections use the <h3> tag. The heading "Use the Lower-Numbered Header Tags First" that you see above is enclosed within the <h3> tag pair.

  2. Put Your Header Text Within the Tag Pair

    As you will have probably observed from my example, the header tags come in pairs, like the paragraph tags. For example, the <h1> signals the start of the header text to the web browser, and the </h1> tag signals the end of it.

  3. Put the Header Outside the Paragraph Tag

    Notice that I told you to put the header tag outside the paragraph block. The header is not part of the paragraph. Header and paragraph tags are what is known in HTML as block-level tags. The browser formats the content you place within such tags in its own block. For example, each paragraph stands on its own, as a distinct block, on your web page. The header tag, likewise, places the text enclosed within it in a separate block with its own formatting (which is the topic of the next point).

  4. Appearance of the Header in Web Browsers

    Browsers typically display the text contained between header tags in a way so that it stands out in some way on the page. Most browsers do this by putting the headers in bold. The smaller numbered headers, like <h1>, are usually displayed with larger fonts. To distinguish between the smaller numbered headers and those with bigger numbers, browsers may render the larger numbered headers using smaller fonts.

    Before you ask, control of the appearance of the headers, like the rest of your web page, is done using Cascading Style Sheets (CSS). As such, fine-tuning of the appearance of the header tags will be only taught in the later chapters of this tutorial series, in the CSS chapters. Do not choose which header tags to use based on its current appearance, for example, don't choose a particular header level because it has the size you want. Sizes of the text and all such things can be changed later. Choose the header tag according to the level the header appears in. Top level headers should use <h1>. Headers one level lower should use <h2>. And so on. The appearance of the headers can always be changed later.

    By way of reassurance, so that you know that you can really change the text enclosed in header tags so that they appear in any way you want, take a look at thefreecountry.com's pages, for example, the Budget Web Hosting page. The words at the top "Budget Web Hosts (Page 1)" are actually enclosed in <h1></h1> tags. They are orange on a purple background because I added CSS rules to specify that colour ("color" if you use a different variant of English) combination for that set of header tags. The words underneath, "Cheap, affordable, commercial web hosts", were simply enclosed in <h2></h2> tags. No special HTML tags were used for those effects.

    To reiterate, choose the appropriate header tags for your titles according to the function those titles play on your page, not according to the appearance those tags currently confer. The appearance can be changed later, using CSS.

Emphasizing Text: Putting Text in Bold or Italics

There are occasions where you may want to emphasize a particular word or group of words on your web page. Emphasis can be achieved using one of the following HTML tags:

There are, however, some things you should note about the use of these tags.

  1. Do not use them for stylistic purposes

    Do not use these tags for the purpose of changing the appearance of words so that they look prettier on your web page. For example, don't put text in these tags just so that you can get the higher numbered header tags like <h6></h6> to be displayed in bold. As I mentioned before, things that pertain to the appearance of your web page should be modified using CSS, not by choosing HTML tags that happen to cause your text to take on a certain appearance.

    Use the tags only if you really want certain things emphasized. For example, I put the words "not" in the first line of the paragraph above in <strong></strong> to draw attention to it. The principle to learn here is that you should use HTML tags for the function those tags suggest. Appearance changes for the sake of making the web page look good are to be done in CSS. Be patient. We'll get to dealing with CSS eventually. You need to learn HTML first because CSS assumes a prior knowledge of HTML.

  2. Do not put block level tags within these tags

    If you have an entire paragraph that you want to emphasize, for example, to give a warning about the consequences of taking some action, you should always put your <strong></strong> and/or <em></em> tags inside your paragraph tags.

    For example, the following code illustrates the correct use of these tags.

    <p>
    <strong>The "strong" tag is placed within the "p" tag in this example.
    This is as it should be.</strong>
    </p>

    The code below is incorrect, because it places paragraph tags inside the <strong></strong> tags.

    <strong><p>This code is invalid. Do NOT use it!</p></strong>

    The <strong></strong> and <em></em> tags are inline tags. The <p></p> tags are block tags. Block tags can contain any number of inline tags, but inline tags cannot contain a block tag.

How to Insert Angle Brackets and Other Special Characters

Since angle brackets like "<" and ">" are used to flag the start and end of HTML tags, you might be wondering how you can actually insert such characters into your web page without the web browser mistaking it for a tag. For example, how do you insert the "less than" sign in a mathematical expression like "x < 5"?

To insert a left angle bracket, "<", you have to type "&lt;" (without the quotes) into your web page. A right angle bracket, ">", is inserted by typing "&gt;" (without the quotes) into your web page.

Like the angle brackets, the ampersand, "&", is a special character treated differently from other characters by all web browsers. It flags the start of a sequence of characters known in the HTML standard as a character entity reference. The most commonly used character entity references are as follows:

As you may have guessed, once we defined the ampersand ("&") as a special character that begins a character entity reference, we need a way to tell the web browser when we really need an ampersand, hence the creation of another new character entity reference, "&amp;", to represent the ampersand itself. In other words, when you need to type an ampersand on your web page, don't type "&". Type "&amp;" (without the quotes).

The "&nbsp;" character entity reference represents a non-breaking space. As I'm sure you've noticed when you carried out your experiments in chapter 1, when you put a space between 2 words in your web page, the browser may wrap the second word onto a new line if it runs out of space on the current line. If you have a pair of words that must always be placed together on the same line, put "&nbsp;" (without the quotes) between those 2 words instead of the normal space character to force the browser to always keep those two words together. This character entity reference is also useful when you need to put more than 1 space between two words. Again, as you found out in chapter 1, whenever you put more than one space character between two words, web browsers collapse the additional space characters so that you always end up with only one space displayed. If you really need the extra space characters for whatever reason, you will need to insert a "&nbsp;" for every space character you want displayed, instead of multiple space characters.

Other (Less Used) HTML Tags for Presenting Text

There are other HTML tags pertaining to text that can be used in your web documents. Most webmasters don't use them, but here are a few that you may (or may not) find useful.

There are a few other HTML tags that are rarely used by webmasters. They are listed below for completeness' sake, although I doubt most of you will ever use them.

After looking through the list, some of you are probably thinking, "But Chris, what does the browser show when I use these tags?" A simple way to find out is to try it out on your web page and load it in your browser to see the results. However, I would suggest that the question actually misses the point of these tags. HTML tags are meant to describe the type of content you're typing on your page. They are not meant to prescribe appearance. Appearance on a web page is not meant to be controlled by HTML tags, but by Cascading Style Sheets, or CSS. You can cause those HTML tags to make your text take on any appearance you like with CSS. Its default appearance is not really relevant. (For the curious but lazy reader: some of the above tags simply put the text in a monospace (fixed pitch) font, while others don't change the appearance of your text at all.)

Do you have to memorise all these tags so that if you happen to use things like abbreviations, you can remember to tag them appropriately? In practice, few webmasters bother. For example, words like "HTML" and "CSS" are used liberally throughout thesitewizard.com, but I don't recall tagging even a single instance of them with the <abbr></abbr> tag pair (other than in the example above). In fact, it's quite possible to get by without knowing any of the tags mentioned in this entire subsection. Of course if you want all your abbreviations to be displayed in (say) brown, then it's simplest to just tag them all and create a CSS rule that tells the browser to display all abbreviations in brown.

How About Fonts and Other Snazzy Design Features?

For those of you holding your breath, waiting for me to finally get round to talking about how you can change the fonts on your web page, you'll have to wait a bit longer. Font changes are considered as one of the things that govern the appearance of your web page, and are thus done using Cascading Style Sheets, not HTML. Although there are ways to change fonts using HTML, those methods are deprecated, which, in the world of computers, basically means that they are outdated, and will eventually be removed from the standard.

As such, we'll talk about fonts when we get to the CSS part of this tutorial series.

Do Not Interweave Different Tags

Now that you know enough to cause problems, it's important that you learn an important rule in writing HTML code, otherwise you'll end up creating invalid HTML.

The easiest way to explain this rule is to use an example. Consider the following HTML code snippet.

<p>
<strong><em>thesitewizard.com</em></strong> is a tutorial website for webmasters.
</p>

Notice that "thesitewizard.com" is enclosed within both <strong> and <em> tags. As such, in most (or all) browsers, that word will be displayed in both bold and italics. Observe carefully the order of the opening and closing tags. The innermost starting tag (that is, the tag closest to "thesitewizard.com") is <em>. It is matched by the innermost closing tag of </em>. Just outside that pair of tags is the <strong></strong> tag pair. The opening <strong> tag is matched by the closing </strong> tag on the same (outer) level.

Now look at the following invalid HTML code.

<p>
This code has <strong><em>errors</strong></em>! Can you spot the problem?
</p>

The innermost <em> tag is not matched by an innermost closing </em> tag. Instead, the web browser parsing that sentence encounters an unexpected </strong> tag. Although you might think that you managed to get all the closing tags into your web page, the opening and closing tags are not really at the same level. The mixing up of opening and closing tags in random order causes the code to be invalid. It's like writing a mathematical expression with the following bracket structure:

{[(2+5] X [7-8}) + 1]

Even if you assume that any type of bracket can be used for a mathematical expression, the above expression still makes no sense because the brackets are all jumbled up instead of matching each other at the correct level. In the same way, the HTML code with opening and closing tags jumbled up in a mess also makes no sense.

Note that just because web browsers are often able to decipher the mess that some webmasters create with interwoven opening and closing tags of different types doesn't mean that such code is valid. Remember that browsers are not required to parse and display invalid code in the way you want. They cannot be expected to read your mind, after all. Do not depend on browsers to be telepathic. For all you know, it may be a bug in the browser that caused it to display the bad code in the way you think it should be displayed. When that bug is fixed (and browser vendors fix bugs in their browsers all the time), your web page will no longer appear in the way you intended it.

But how do you spot errors on your web page? After all, it's not like you deliberately set out to create an invalid page. Obviously, if you made an error, it must have been through either ignorance or carelessness. As such, they may be very hard to spot, especially if you don't even know they were errors in the first place. This is the topic of the next section.

How to Check Your Web Page for Errors: Validating Your Code

The easiest way to get your HTML code checked is to run it through something known as a validator. An HTML validator is a computer program that looks at your web pages for errors that violate the HTML standard. It will spot things like the jumbled tags that I mentioned above, as well as other types of errors. There are many free HTML validators around, some of which you can download and install on your computer, others where you can access only when you're connected to the Internet.

Note that HTML validators only check for HTML errors. They will not check your English spelling, your English grammar, or whether your page looks nice. They merely check if you have things like an opening HTML tag without a matching closing HTML tag, non-existent HTML tags (something that happens when you make a spelling error in your tag, such as using a "strnog" tag when you mean a "strong" tag), and other things that will cause your HTML structure not to conform to the HTML rules.

For now, open a browser window to the W3 Consortium's HTML Validator. This validator gives you 3 ways to check your web page for errors.

  1. If you have already uploaded your web page to your website, you can simply type the URL to your page into the "Address" field of the validator for it to check. Click the "Check" button when you're done.

  2. Otherwise, if you have not yet published your web page to the Internet (as is probably the case for the majority of you), look for the words "Validate by File Upload" somewhere near the top of the page and click it. Click the "Choose..." button next to the "File" field. A dialog box should appear. This dialog box shows you the files on your computer. Navigate to the HTML file that you've been working on, select it and click "Open" (or "OK" or whatever the button on your dialog box says; the exact words depend on your web browser and your operating system). The complete file path of your web page should appear in the "File" field. ("File path" here means the file name, folder and drive [if you use Windows] of a particular file.) Click the "Check" button when you're done. Your browser will upload your file to the validator for checking.

  3. Another way to get your page checked is to click the "Validate by Direct Input" words at the top of the validator page. You can then copy and paste your entire HTML page into the box and click the "Check" button to get the page checked.

(If you can't decide, and you haven't uploaded your page to your web host yet, just use the "Validate by File Upload" option. If you have already published your web page, it's probably easiest to use "Validate by URI".)

If your page has no errors, the validator should say something like "The document was successfully checked as HTML 4.01 Transitional!" followed by a line that gives the results, such as "Result: Passed". Otherwise, the results line will tell you the number of errors and warnings your page produced. In such a case, scroll down the results page to read more about the errors the validator found on your page. The detailed information gives you the exact line number (and column number) of the error in your page along with a description of what the error is.

Incidentally, if you used the "Validate by Direct Input", you will always get one warning telling you that the validator assumed that your page uses "UTF-8" as the character encoding. Since you copied and pasted your code directly into the validator using this method, and the validator's page itself uses UTF-8 for its character encoding, your copied-and-pasted text will always end up using UTF-8 whether or not your real page actually uses this encoding. (The browser transparently converts your text into UTF-8 as specified by the validator's own UTF-8 encoded page.) If this bothers you because you think it may mask an error on your page, just use the "Validate by File Upload" or "Validate by URI" options instead.

Next Chapter

In the next chapter, we will deal with how you can create bullet point lists as well as numbered lists.

Copyright © 2011-2013 Christopher Heng. All rights reserved.
Get more free tips and articles like this, on web design, promotion, revenue and scripting, from http://www.thesitewizard.com/.

You are here:

Top » HTML Tutorials »

Other articles on: HTML, CSS, Getting Started

thesitewizard™ News Feed (RSS Site Feed)  Subscribe to thesitewizard.com newsfeed

Do you find this article useful? You can learn of new articles and scripts that are published on thesitewizard.com by subscribing to the RSS feed. Simply point your RSS feed reader or a browser that supports RSS feeds at http://www.thesitewizard.com/thesitewizard.xml. You can read more about how to subscribe to RSS site feeds from my RSS FAQ.

Please Do Not Reprint This Article

This article is copyrighted. Please do not reproduce this article in whole or part, in any form, without obtaining my written permission.

Related Pages

New Articles

Popular Articles

How to Link to This Page

It will appear on your page as:

The HTML Tags that Deal with Text





Home
Donate
Contact Us
Link to Us
Topics
Site Map

Getting Started
Web Design
Search Engines
Revenue Making
Domains
Web Hosting
Blogging
JavaScripts
PHP
Perl / CGI
HTML
CSS
.htaccess / Apache
Newsletters
General
Seasonal
Reviews
FAQs
Wizards

 

 
Free webmasters and programmers resources, scripts and tutorials
 
HowtoHaven.com: Free How-To Guides
 
Site Design Tips at thesitewizard.com
Find this site useful?
Please link to us.