some image

Codes and Tags

Leer contenido de una pagina web con Java

Etiquetas: Blog, Programación 1 comment

A continuación muestro un ejemplo de como se podría leer una página web utilizando Java. Basicamente la lectura consiste 3 puntos.

  1. Crear un objeto URL al cual se le pasa la “url” de la página a leer.
  2. Crear un objeto URLConnection para abrir una conexión a la “url” indicada.
  3. Realizar la léctura utilizando el objeto BufferedReader, donde el InputStreamReader utilizado como parámetro, se saca del método getInputStream() del objeto que contiene la conexión.

En este ejemplo colocaré la léctura y la búsqueda de coincidencias como un valor agregado :)

Página Objetivo : http://projecteuler.net
Palabra a buscar : “and”

Codigo

 

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;

/**
 * LeerPaginaHtml con Java
 *
 * @author codesandtags
 *
 */
public class LeerPaginaHtml {

	public static void main(String[] args) {

		LeerPaginaHtml ejemplo = new LeerPaginaHtml();
		ejemplo.leerPagina("http://projecteuler.net");
		ejemplo.buscarTextoPagina("http://projecteuler.net", "and");
	}

	/**
	 * Realiza la lectura de una pagina enviada como parametro e imprime el
	 * contenido
	 *
	 * @param url
	 * @return
	 */
	public String leerPagina(String url) {
		System.out.println("Leyendo Pagina : " + url);
		StringBuffer resultado = new StringBuffer();

		try {
			URL urlPagina = new URL(url);
			URLConnection urlConexion = urlPagina.openConnection();
			urlConexion.connect();

			// Creamos el objeto con el que vamos a leer
			BufferedReader lector = new BufferedReader(new InputStreamReader(
					urlConexion.getInputStream(), "UTF-8"));
			String linea = "";
			String contenido = "";

			while ((linea = lector.readLine()) != null) {
				resultado.append(String.valueOf(linea));
				resultado.append("\n");
			}

		} catch (Exception e) {
			e.printStackTrace();
		}

		System.out.println("Contenido : \n\n" + resultado.toString());
		return resultado.toString();
	}

	/**
	 * Realiza la lectura de una pagina enviada como parametro y busca las
	 * coincidencias de un texto especificado.
	 *
	 * @param url
	 * @param texto
	 * @return
	 */
	public String buscarTextoPagina(String url, String texto) {
		System.out.println("Leyendo Pagina : " + url);
		StringBuffer resultado = new StringBuffer();
		int veces = 0;

		try {
			URL pagina = new URL(url);
			URLConnection uc = pagina.openConnection();
			uc.connect();

			// Creamos el objeto con el que vamos a leer
			BufferedReader lector = new BufferedReader(new InputStreamReader(
					uc.getInputStream(), "UTF-8"));
			String linea = "";
			String contenido = "";

			while ((linea = lector.readLine()) != null) {

				if (linea.contains(texto)) {
					int idx = 0, coincidencias = 0;

					while (linea.indexOf(texto, idx) > 0) {
						idx = linea.indexOf(texto, idx) + 1 + texto.length();
						coincidencias++;
					}

					resultado.append(String.valueOf(coincidencias
							+ " coincidencia(s) =>"));
					resultado.append(String.valueOf(linea));
					resultado.append("\n");

					veces += coincidencias;
				}
			}

		} catch (Exception e) {
			e.printStackTrace();
		}

		System.out.println("Se encontro un total de : " + veces
				+ " coincidencias con la expresion [" + texto + "] ");
		System.out.println(resultado.toString());
		return resultado.toString();
	}
}

 

Resultado

Leyendo Pagina : http://projecteuler.net
Contenido :

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="author" content="Colin Hughes" />
<meta name="description" content="A website dedicated to the fascinating world of mathematics and programming" />
<meta name="keywords" content="programming,mathematics,problems,puzzles" />

<title>Project Euler</title>
<link rel="shortcut icon" href="http://projecteuler.net/favicon.ico" />
<link rel="stylesheet" type="text/css" href="style_main.css" />
<link rel="stylesheet" type="text/css" href="style_light.css" /></head>

<body>
<div id="container">

<div id="nav" class="noprint">
<ul>
<li id="current"><a href="about" title="About" accesskey="h">About</a></li>
<li><a href="register" title="Register" accesskey="1">Register</a></li>
<li><a href="problems" title="Problems" accesskey="2">Problems</a></li>
<li><a href="login" title="Login" accesskey="3">Login</a></li>
</ul>
</div>
<div id="info_panel"><a href="rss2_euler.xml"><img src="images/icon_rss.png" alt="RSS Feed" title="RSS Feed" /></a><a href="secure=yes"><img src="images/icon_lock.png" alt="Use secure connection" title="Use secure connection" /></a></div>

<div id="logo" class="noprint">
<img src="images/pe_banner_light.png" alt="Project Euler .net" />
</div>

<div id="content">
<h2>About Project Euler</h2>
<img src="images/euler_portrait.png" alt="Leonhard Euler (1707-1783)" style="margin:10px;float:right;" />
<h3>What is Project Euler?</h3>
<p>Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.<br /><br />
The motivation for starting Project Euler, and its continuation, is to provide a platform for the inquiring mind to delve into unfamiliar areas and learn new concepts in a fun and recreational context.</p>
<br />
<h3>Who are the problems aimed at?</h3>
<p>The intended audience include students for whom the basic curriculum is not feeding their hunger to learn, adults whose background was not primarily mathematics but had an interest in things mathematical, and professionals who want to keep their problem solving and mathematics on the edge.</p>
<br />
<h3>Can anyone solve the problems?</h3>
<p>The problems range in difficulty and for many the experience is inductive chain learning. That is, by solving one problem it will expose you to a new concept that allows you to undertake a previously inaccessible problem. So the determined participant will slowly but surely work his/her way through every problem.</p>
<br />
<h3>What next?</h3>
<p>In order to track your progress it is necessary to setup an account and have Cookies enabled. If you already have an account then <a href="login">Login</a>, otherwise please <a href="register">Register</a> - it's completely free!</p>
<p>However, as the problems are challenging then you may wish to view the <a href="problems">Problems</a> before registering.</p><br /><br /><p style="text-align:center;font-size:125%;font-style:italic;color:#999;">&quot;Project Euler exists to encourage, challenge, and develop the skills and enjoyment of anyone with an interest in the fascinating world of mathematics.&quot;</p></div>

<div id="footer" class="noprint">
<a href="copyright">Project Euler Copyright Information</a>
<!--/Creative Commons License--><!-- <rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<Work rdf:about="">
<license rdf:resource="http://creativecommons.org/licenses/by-nc-sa/2.0/uk/" />
<dc:type rdf:resource="http://purl.org/dc/dcmitype/Text" />
</Work>
<License rdf:about="http://creativecommons.org/licenses/by-nc-sa/2.0/uk/"><permits rdf:resource="http://web.resource.org/cc/Reproduction"/><permits rdf:resource="http://web.resource.org/cc/Distribution"/><requires rdf:resource="http://web.resource.org/cc/Notice"/><requires rdf:resource="http://web.resource.org/cc/Attribution"/><prohibits rdf:resource="http://web.resource.org/cc/CommercialUse"/><permits rdf:resource="http://web.resource.org/cc/DerivativeWorks"/><requires rdf:resource="http://web.resource.org/cc/ShareAlike"/></License></rdf:RDF> -->
</div>
</div>
<div style="height:1px;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </div></body>
</html>

Leyendo Pagina : http://projecteuler.net
Se encontro un total de : 12 coincidencias con la expresion [and]
1 coincidencia(s) =><meta name="description" content="A website dedicated to the fascinating world of mathematics and programming" />
2 coincidencia(s) =><p>Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.<br /><br />
3 coincidencia(s) =>The motivation for starting Project Euler, and its continuation, is to provide a platform for the inquiring mind to delve into unfamiliar areas and learn new concepts in a fun and recreational context.</p>
2 coincidencia(s) =><p>The intended audience include students for whom the basic curriculum is not feeding their hunger to learn, adults whose background was not primarily mathematics but had an interest in things mathematical, and professionals who want to keep their problem solving and mathematics on the edge.</p>
1 coincidencia(s) =><p>The problems range in difficulty and for many the experience is inductive chain learning. That is, by solving one problem it will expose you to a new concept that allows you to undertake a previously inaccessible problem. So the determined participant will slowly but surely work his/her way through every problem.</p>
1 coincidencia(s) =><p>In order to track your progress it is necessary to setup an account and have Cookies enabled. If you already have an account then <a href="login">Login</a>, otherwise please <a href="register">Register</a> - it's completely free!</p>
2 coincidencia(s) =><p>However, as the problems are challenging then you may wish to view the <a href="problems">Problems</a> before registering.</p><br /><br /><p style="text-align:center;font-size:125%;font-style:italic;color:#999;">&quot;Project Euler exists to encourage, challenge, and develop the skills and enjoyment of anyone with an interest in the fascinating world of mathematics.&quot;</p></div>

About Edwin Torres

Related Posts

  • Logo Java
  • Logo Java
  • Logo Java
  • Logo Java

Add your comment